Tracing and using data lineage for pipeline processing in Astro-WISE

Experimental Astronomy - Tập 35 - Trang 131-155 - 2011
Johnson Mwebaze1, Danny Boxhoorn1, Edwin A. Valentijn1
1Kapteyn Astronomical Institute, University of Groningen, AV Groningen, The Netherlands

Tóm tắt

Most workflow systems that support data provenance primarily focus on tracing lineage of data. Data provenance by data lineage provides the derivation history of data including information about services and input data that contributed to the creation of a data product. We show that tracing lineage by means of full backward chaining not only enables users to share, discover and reuse the data, but also supports scientific data processing through storage, retrieval and (re)processing of digitized scientific data. In this paper, we present Astro-WISE, a distributed system for processing, analyzing and disseminating wide field imaging astronomical data. We show how Astro-WISE traces lineage of data and how it facilitates data processing, retrieval, storage and archiving. Particularly we show how it solves issues related to the changing data items typical for the scientific environment, such as physical changes in calibrations, our insight in these changes and improved methods for deriving results.

Tài liệu tham khảo

Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. ssdbm 00, 423 (2004) Bavoil, L., Callahan, S.P., Crossno, P.J., Freire, J., Vo, H.T.: Vistrails: enabling interactive multiple-view visualizations. In: IEEE Visualization 2005, pp. 135–142 (2005) Begeman, K.G., Belikov, A.N., Boxhoorn, D.R., Dijkstra, F., Valentijn, E.A., Vriend, W.J., Zhao, Z.: Merging grid technologies. Journal of Grid Computing 8, 199– 221 (2010) Buneman, P., Khanna, S., Tan, W.C.: Why and Where: a Characterization of Data Provenance, vol. 1973, pp. 316–330. Springer (2001) Cohen-Boulakia, S., Biton, O., Cohen, S., Davidson, S.: Addressing the provenance challenge using zoom. Concurr. Comput. Pract. Exper. 20(5), 497–506 (2008) Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. In: VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 471–480. Morgan Kaufmann, San Francisco, CA, USA (2001) Elaine Angelino, D.Y., Seltzer, M.: Starflow: a script-centric data analysis environment. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) Provenance and Annotation of Data and Processes, Third International Provenance and Annotation Workshop, IPAW 2010, Troy, NY, USA, 15–16 June 2010. Proceedings. Lecture Notes in Computer Science, vol. 6378. Springer (2010). doi:10.1007/978-3-642-17819-1 Foster, I., Vockler, J., Wilde, M., Zhao, Y.: Chimera: avirtual data system for representing, querying, and automating data derivation. ssdbm 00, 37 (2002) Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2, 99–241 (2010). doi:10.1561/1800000010 Moreau, L., Freire, J., Futrelle, J., Mcgrath, R.E., Myers, J., Paulson, P.: Provenance and annotation of data and processes. In: The Open Provenance Model: An Overview, pp. 323–326. Springer, Berlin, Heidelberg (2008) Penney, D.J., Stein, J.: Class modification in the gemstone object-oriented dbms. In: Conference Proceedings on Object-Oriented Programming Systems, Languages and Applications, OOPSLA ’87, pp. 111–117. ACM, New York, NY, USA (1987) Reilly, C.F., Naughton, J.F.: Transparently gathering provenance with provenance aware condor. In: First Workshop on Theory and Practice of Provenance, pp.13:1–13:10. USENIX Association, Berkeley, CA, USA (2009). http://dl.acm.org/citation.cfm?id=1525932.1525945 Simmhan, Y., Plale, B., Gannon, D.: Karma2: provenance management for data-driven workflows. Int. J. Web Service Res. 5(2), 1–22 (2008) System, W., Altintas, I., Barney, O., Jaeger-frank, E.: Provenance collection support in the kepler scientific workflow system. In: In Proceedings of the International Provenance and Annotation Workshop (IPAW), pp. 118–132. Springer (2006) Valentijn, E.A., McFarland, J., Snigula, J., Begeman, K., Boxhoorn, D., Renegelink, R., Helmich, E., Heraudeau, P., Kleijn, G.V., Vermeij, R., Vriend, W.J., Tempelaar, M.J.: Astro-wise: chaining to the universe. In: Astronomical Data Analysis Software and Systems XVI, ASP Conference Series, vol. 376 (2007)