Access Support Tree and TextArray: a data structure for XML document storage and retrieval

D. Scheffner1, J.-C. Freytag1
1Department of Computer Science, Humboldt University of Berlin, Berlin, Germany

Tóm tắt

The characteristics of XML documents require new ways of storing and querying such documents. Queries on both textual content and structural aspects must be supported efficiently. For this reason, we examined existing work on both document storage approaches and models for querying documents to derive requirements that are essential for the storage of XML documents. As a result of our study, we designed the Access Support Tree and TextArray (AST/TA) data structure. The important idea of the AST/TA data structure is the separation of the (logical) structure of a document from its "visible" text content. The latter is represented as a single contiguous string. At the same time the AST/TA data structure provides a tight integration to guarantee consistent changes. We introduce the AST/TA data structure formally by, its abstraction, namely the AST/TA model and compare requirements of our AST/TA approach with those found in the current literature. Finally, we describe the advantage of the AST/TA model based on the AST/TA design principles.

Từ khóa

#Tree data structures #XML #Information retrieval #Data structures #Content based retrieval #Database languages #Computer science #Internet #Search engines #Merging

Tài liệu tham khảo

volz, 1996, An OODBMS-IRS Coupling for Structured Documents, Data Engineering Bulletin, 19, 34 2000, World Wide Web Consortium. Document Object Model (DOM) Level 2 Core Specification, Version 1 0 Technical Report REC-DOM-Level-2-Core-20001113 W3C 2000, Extensible Markup Language (XML), Version 1.0 (Second Edition), Technical Report REC-xml-20001006 2001, World Wide Web Consortium. XML Information Set, echnical Report REC-xml-infoset-20011024 2001, World Wide Web Consortium. XML Schema Part 1: Structures, Technical Report PR-xmlschema-I-20010330 2001, World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Data Model, Technical Report WD-query-datamodel-20010607 yeates, 2000, On Tag Insertion and its Com-plexity, In Proceedings of PRICAI 2000 International Workshop on Text and Data Mining, 52 10.1109/ICDE.2000.839412 2000, XML Extender (Administration and Programming) 10.1007/BF01832136 10.1145/263479.263482 scheffner, 2001, Access Support Tree & TextArray: Data Structures for XML Document Storage, Technical Report HUB-IB-157 salminen, 0, PAT Expressions: An Algebra for Text Search, Acta Linguistica Hungarica, 41, 277 heuer, 1999, IRQL - Yet Another Language for Querying Semi-Structured Data?, Technical Report Preprint CS-01–99 1998, Multimedia Data Management tompa, 1997, Views of Text Digital Media Information Base (DMIB ‘97)