Logical data expiration

D. Toman1
1Department of Computer Science, University of Waterloo, Waterloo, Ont., Canada

Tóm tắt

Summary form only given. Data expiration is an essential component of data warehousing solutions: whenever large amounts of data are repeatedly collected over a period of time, it is essential to have a clear approach to identifying parts of the data no-longer needed and a policy that allows disposing and/or archiving these parts of the data. Such policies are necessary even if adding storage to accommodate an ever-growing collection of data were possible, since the growing amount of data needs to be examined during querying and in turn leads to deterioration of query performance over time. The approaches to data expiration range from adhoc administrative policies or regulations to sophisticated data analysis-based techniques. The approaches have, however, one thing in common: intuitively, they try to identify the parts of the data collection that are not needed in the future. The key to deciding if a piece of information will be needed in the future lies in identifying what queries can be asked over the collection of data and how the collection can evolve from its current state. The various techniques proposed in the literature differ in the way they identify no longer needed parts of data. The author formalizes the notion of data expiration in terms of how the data is used to answer queries. He surveys existing approaches to the problem in a unified framework and discusses their features and limits and the limits of data expiration based techniques in general. Particular focus is on comparing the performance of various data expiration methods.

Từ khóa

#Logic #History #Database systems #Computer science #Electronic mail #Warehousing #Data analysis #Aging #Spatial databases #Upper bound

Tài liệu tham khảo

garcia-molina, 1998, Expiring data in a warehouse, VLDB'98 Proceedings of 24rd International Conference on Very Large Data Bases, 500 10.1016/S0169-023X(96)00010-9 gupta, 1995, Maintenance of materialized views: Problems, techniques, and applications, Data Engineering Bulletin, 18, 3 huyn, 1997, Multiple-view self-maintenance in data warehousing environments, VLDB'97 Proceedings of 23rd International Conference on Very Large Data Bases, 26 10.1007/978-1-4615-2289-8_23 jones, 1993, Partial Evaluation and Automatic Program Generation 10.1016/0306-4379(87)90004-4 10.1109/PDIS.1996.568677 10.1007/978-1-4615-2289-8 toman, 1996, Point vs. Interval-based Query Languages for Temporal Databases, Proc 15th ACM Symp Principles Database Syst, 58 10.1145/210197.210200 10.1145/137097.137889 10.1007/978-1-4615-5643-5 10.1006/jcss.1995.1088 chomicki, 0, Temporal Logic in Information Systems. In Chomicki and Saake [6], 31 10.1109/69.404030 10.1109/ICDE.1992.213153 10.1145/233269.233364 10.1145/237661.237674 10.1007/BFb0053704 10.1007/BFb0014160 toman, 2001, Expiration of historical databases (extended abstract), Proceedings of TIME-01 8th International Symposium on Temporal Representation and Reasoning, 128 10.1007/BFb0100998 10.1016/0306-4379(88)90005-1 10.1007/3-540-46439-5_28