Chimera: a virtual data system for representing, querying, and automating data derivation
Proceedings 14th International Conference on Scientific and Statistical Database Management - Trang 37-46
Tóm tắt
A lot of scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "data grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision event data from a high-energy physics experiment, and searching digital sky survey data for galactic clusters, with promising results.
Từ khóa
#Data systems #Computer applications #Documentation #Distributed computing #Grid computing #Processor scheduling #Distributed databases #Computational modeling #Discrete event simulation #PhysicsTài liệu tham khảo
zhao, 2002, Virtual Galaxy Clusters: An Application of the GriPhyN Virtual Data Toolkit to Sloan Digital Sky Survey Data, Technical Report GriPhyN-2002–05
10.1109/ICDE.1997.581742
williams, 1998, Interfaces to Scientific Data Archives, Center for Advanced Computing Research
chen, 1997, Constructing and Maintaining Scientific Database Views, Conference on Scientific and Statistical Database Management
chervenak, 2001, The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets, J Network and Computer Applications, 187
10.1109/ICDE.2000.839437
10.1145/357775.357777
deelman, 2002, PhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists, Proc 11th Int Symp High Performance Distributed Computing, 10.1109/HPDC.2002.1029922
deelman, 2001, Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency, Technical Report GriPhyN-2001–1
deelman, 2001, Transformation Catalog Design for GriPhyN, Technical Report GriPhyN-2001–1
foster, 2001, Data Grid Reference Architecture, Technical Report GriPhyN-2001–1
foster, 1999, The Grid Blueprint for a New Computing Infrastructure
10.1109/HPDC.2001.945176
10.1109/HPDC.2001.945178
10.1109/SC.2002.10021
10.1145/209891.209901
annis, 2000, The MaxBCG Technique for Finding Galaxy Clusters in SDSS Data, AAS 195th Meeting
avery, 2001, An International Virtual-Data Grid Laboratory for Data Intensive Science, Technical Report GriPhyN-2001–1
10.1126/science.293.5537.2037
avery, 2001, The GriPhyN Project: Towards Petascale Virtual Data Grids, Technical Report GriPhyN-2001–15
buneman, 2002, Scientific Data, ACM SIGMOD International Conference on Management of Data
baru, 1998, The SDSC Storage Resource Broker, Proc CASCON'98 Conference
allcock, 2001, Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing, Mass Storage Conference
buneman, 2001, Why and Where: A Characterization of Data Provenance, International Conference on Database Theory
2001, The DataGrid Architecture, EU DataGrid Project DataGrid-12-D12 4–333671–3–0
10.1016/S0010-4655(01)00253-3
ioannidis, 1996, ZOO: A Desktop Experiment Management Environment, Proc 22th Int Conf on Very Large Data Bases, 274
10.1142/S021821579200012X
10.1109/DCS.1988.12507
10.1147/sj.332.0326
marian, 2001, Change-Centric Management of Versions in an XML Warehouse, 28th International Conference on Very Large Data Bases
della, 0, The CMS Experiment, The compact muon solenoid