A conceptual framework for composing and managing scientific data lineage

R. Bose1
1Donald Bren School of Environmental Science and Management, University of California, Santa Barbara, CA, USA

Tóm tắt

Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. However, the lack of a definitive data model for lineage, and the poor fit between current data management tools and scientific software, effectively prevent researchers front determining, preserving, or providing the lineage of the data products they use and create. Based on a comprehensive review of lineage-related research and previous prototype systems, a conceptual framework is presented to help identify and assess basic lineage system components. Within this framework, a direction is outlined for future work on general methods for composing and managing lineage for scientific data.

Từ khóa

#History #Yarn #Environmental management #Documentation #Data models #Software tools #Software prototyping #Prototypes #Assembly #Pipelines

Tài liệu tham khảo

barkstrom, 1998, Digital Archive issues from the Perspective of an Earth Science Data Producer, Proc of the International Standards Organization (ISO) Archiving Workshop Series Digital Archive Directions (DADs) Workshop brown, 1995, Big Sur: A system for the management of Earth science data, Proceedings of International Conference on Very Large Data Bases, 720 lanter, 1990, Lineage in GIS: the Problem and a Solution, Technical Report 90–20 1999, Global Environmental Change: Research Pathways for the Next Decade, National Research Council bose, 2002, Composing and Managing Lineage for Scientific Data: A Review, Technical Report 10.1109/SSDM.2001.938550 2001, AT&T 1993, Geolineus Version 3.0 User Manual alonso, 1993, GOOSE: Geographic Object Oriented Support Environment, Proc ACM Workshop Advances in Geographic Information Systems, 38 1998 10.1061/(ASCE)0733-9372(1993)119:1(5) clarke, 1995, Lineage, in Elements of Spatial Data Quality, 13 french, 1995, What is Metadata?, Proc of the SDM-92 Workshop, 3 buneman, 2000, Where was your data yesterday, and where will it go tomorrow?, Data Annotation and Provenance for Scientific Applications White Paper alonso, 1997, Towards a Platform for Distributed Application Development, in Workflow Management Systems and Interoperability, 164, 195 hachem, 1993, The Gaea System: A Spatio-Temporal Database System for Global Change Studies, proceedings of the AAAS Workshop on Advances in Data Management for the Scientist and Engineer, 84 10.1109/ICDE.1997.581742 10.1109/DEXA.1999.795211 1986, National Aeronautics and Space Administration (NASA), Report of the EOS Data Panel Volume IIa Earth Observing System Data and Information System Technical Memorandum 87777 10.1109/SSDM.2001.938549 abiteboul, 2000, Data on the Web: From Relations to Semistructured Data and XML