Big Data Management: What to Keep from the Past to Face Future Challenges?
Tóm tắt
The emergence of new hardware architectures, and the continuous production of data open new challenges for data management. It is no longer pertinent to reason with respect to a predefined set of resources (i.e., computing, storage and main memory). Instead, it is necessary to design data processing algorithms and processes considering unlimited resources via the “pay-as-you-go” model. According to this model, resources provision must consider the economic cost of the processes versus the use and parallel exploitation of available computing resources. In consequence,
new methodologies, algorithms and tools for querying, deploying and programming data management functions have to be provided in scalable and elastic architectures that can cope with the characteristics of Big Data aware systems (intelligent systems, decision making, virtual environments, smart cities, drug personalization). These functions, must respect QoS properties (e.g., security, reliability, fault tolerance, dynamic evolution and adaptability) and behavior properties (e.g., transactional execution) according to application requirements. Mature and novel system architectures propose models and mechanisms for adding these properties to new efficient data management and processing functions delivered as services. This paper gives an overview of the different architectures in which efficient data management functions can be delivered for addressing Big Data processing challenges.
Tài liệu tham khảo
Adiba M (2007) Ambient, continuous and mobile data, Presentation
Afrati FN, Sarma AD, Menestrina D, Parameswaran AG, Ullman JD (2012) Fuzzy joins using mapreduce. In: ICDE, pp 498–509
Alexandrov A, Bergmann R, Ewen S, Freytag JC, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V et al (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964
Alomar G, Fontal YB, Torres Viñals J (2015) Hecuba: Nosql made easy. In: Montserrat GF (ed) BSC doctoral symposium, 2nd edn. Barcelona Supercomputing Center, Barcelona, pp 136–137
Amedro B, Baude F, Caromel D, Delbé C, Filali I, Huet F, Mathias E, Smirnov O (2010) An efficient framework for running applications on clusters, grids, and clouds. In: Cloud computing. Springer, New York, pp 163–178
Astrahan MM, Blasgen MW, Chamberlin DD, Eswaran KP, Gray JN, Griffiths PP, Frank King W, Lorie RA, McJones PR, Mehl JW et al (1976) System R: relational approach to database management. ACM Trans Database Syst 1(2):97–137
Athanassoulis M, Kester M, Maas L, Stoica R, Idreos S, Ailamaki A, Callaghan M (2016) Designing access methods: the rum conjecture. In: International conference on extending database technology (EDBT)
Atkinson MP, Bancilhon F, DeWitt DJ, Dittrich KR, Maier D, Zdonik SB (1989) The object-oriented database system manifesto. In: DOOD, vol 89, pp 40–57
Atzeni P, Bugiotti F, Rossi L (2012) Uniform access to non-relational database systems: the SOS platform. In: Advanced information systems engineering. Springer, New York, pp 160–174
Bancilhon F, Delobel C, Kanellakis PC (eds) (1992) Building an object-oriented database system, the story of O2. Morgan Kaufmann, San Francisco
Banerjee S, Krishnamurthy V, Krishnaprasad M, Murthy R (2000) Oracle8i-the XML enabled data management system. In: Proceedings of the 16th international conference on data engineering, pp 561–568
Batoory DS, Barnett JR, Garza JF, Smith KP, Tsukuda K, Twichell BC, Wise TE (1988) GENESIS: an extensible database management system. IEEE Trans Softw Eng 14(11):1711–1730
Batory DS (1988) Concepts for a database system compiler. In: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, pp 184–192
Blakeley JA (1994) Open object database management systems. In: SIGMOD conference, p 520
Blakeley JA (1996) Data access for the masses through ole db. In: ACM SIGMOD record, vol 25. ACM, New York, pp 161–172
Blakeley JA (1996) OLE DB: a component DBMS architecture. In: Proceedings of the twelfth international conference on data engineering, pp 203–204
Blakeley JA, Pizzo MJ (1998) Microsoft universal data access platform. In: ACM SIGMOD record, vol 27. ACM, pp 502–503
Bliujute R, Saltenis S, Slivinskas G, Jensen CS (1999) Developing a datablade for a new index. In: Proceedings of the 15th international conference on data engineering, pp 314–323
Borkar V, Carey MJ, Li C (2012) Inside big data management: ogres, onions, or parfaits? In: Proceedings of the 15th international conference on extending database technology. ACM, pp 3–14
Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Proj Website 11(2007):21
Bruno G, Collet C, Vargas-Solar G (2006) Configuring intelligent mediators using ontologies. In: EDBT workshops, pp 554–572
Carey M, Haas L (1990) Extensible database management systems. ACM SIGMOD Rec 19(4):54–60
Carey MJ, DeWitt DJ, Frank D, Graefe G, Richardson JE, Shekita EJ, Muralikrishna M (1991) The architecture of the EXODUS extensible dbms. In: On object-oriented database, system, pp 231–256
Castrejón JC, López-Landa R, Lozano R (2011) Model2roo: a model driven approach for web application development based on the eclipse modeling framework and spring roo. In: 21st international conference on electrical communications and computers (CONIELECOMP), pp 82–87
Cattell RGG, Barry D (eds) (1997) The object database standard: ODMG 2.0. Morgan Kaufmann, San Francisco
Cattell R (2010) Scalable SQL and NoSQL data stores. SIGMOD Record 39(4):12–27
Chaiken R, Jenkins B, Larson PÅ, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276
Chrysanthis PK, Ramamritham K (1994) Synthesis of extended transaction models using acta. ACM Trans Database Syst 19(3):450–491
Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387
Collet C (2000) The NODS project: networked open database services. In: 14th European conference on object-oriented programming (ECOOP-2000), June 2000
Collet C, Amann B, Bidoit N, Boughanem M, Bouzeghoub M, Doucet A, Gross-Amblard D, Petit J-M, Hacid M-S, Vargas-Solar G (2013) De la gestion de bases de données à la gestion de grands espaces de données. Ingénierie des Systèmes d’Information 18(4):11–31
Collet C, Belhajjame K, Bernot G, Bobineau C, Bruno G, Finance B, Jouanot F, Kedad Z, Laurent D, Tahi F, Vargas-Solar G, Vu T-T, Xue X (2004) Towards a mediation system framework for transparent access to largely distributed sources, the mediagrid project. In: ICSNW, pp 65–78
Collet C, Vargas-Solar G, Grazziotin-Ribeiro H (2000) Open active services for data-intensive distributed applications. In: IDEAS, pp 349–359
Collet C, Vu T-T (2004) QBF: a query broker framework for adaptable query evaluation. In: FQAS, pp 362–375
Dadam P, Kuespert K, Andersen F, Blanken HM, Erbe R, Guenauer J, Lum VY, Pistor P, Walch G (1986) A DBMS prototype to support extended NF2 relations: an integrated view on flat tables and hierarchies. In: SIGMOD conference, pp 356–367
Davulcu H, Freire J, Kifer M, Ramakrishnan IV (1999) A layered architecture for querying dynamic web content. In: ACM SIGMOD record, vol 28. ACM, pp 491–502
Dessloch S, Chen W, Chow J-H, Fuh Y-C, Grandbois J, Jou M, Mattos NM, Nitzsche R, Tran BT, Wang Y(2001) Extensible indexing support in db2 universal database. In: Compontent database systems, pp 105–138
Dessloch S, Mattos N (1997) Integrating SQL databases with content-specific search engines. VLDB 97:528–537
DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98
Dittrich KR, Geppert A (2000) Component database systems. Morgan Kaufmann, San Francisco
Dittrich KR, Gotthard W, Lockemann PC (1986) DAMOKLES–a database system for software engineering environments. In: Advanced programming environments, pp 353–371
Drew P, King R, Heimbigner D (1992) A toolkit for the incremental implementation of heterogeneous database management systems. VLDB J Int J Very Large Data Bases 1(2):241–284
D’souza DF, Wills AC (1998) Objects, components, and frameworks with UML: the catalysis approach, vol 1. Addison-Wesley, Reading
Fayad M, Schmidt DC (1997) Object-oriented application frameworks. Commun ACM 40(10):32–38
Franklin M (2013) The berkeley data analytics stack: present and future. In: IEEE international conference on big data, pp 2–3
Fritschi H, Gatziu S, Dittrich KR (1998) FRAMBOISE: an approach to framework-based active database management system construction. In: Proceedings of the seventh international conference on information and knowledge management. ACM, pp 364–370
Frost S (1998) Component-based development for enterprise systems: applying the SELECT perspective. Cambridge University Press, Cambridge
García-Bañuelos L, Duong P-Q, Collet C (2003) A component-based infrastructure for customized persistent object management. In: DEXA workshops, pp 536–541
Garcia-Molina H, Papakonstantinou Y, Quass D, Rajaraman A, Sagiv Y, Ullman J, Vassalos V, Widom J (1997) The TSIMMIS approach to mediation: data models and languages. J Intell Inform Syst 8(2):117–132
Georgakopoulos D, Hornick M, Krychniak P, Manola F (1994) Specification and management of extended transactions in a programmable transaction environment. In: Proceedings of 10th international conference data engineering, 1994, pp 462–473
Geppert A, Dittrich KR (1994) Constructing the next 100 database management systems: like the handyman or like the engineer? ACM SIGMOD Rec 23(1):27–33
Geppert A, Scherrer S, Dittrich KR (1997) Construction of database management systems based on reuse. University of Zurich, KIDS
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, pp 29–43
Graefe G (1995) The Cascades framework for query optimization. Data Eng Bull 18(3):19–29
Graefe G, DeWitt DJ (1987) The EXODUS optimizer generator, vol 16. ACM, New York
Graefe G, McKenna WJ (1993) The volcano optimizer generator: extensibility and efficient search. In: Proceedings of the ninth international conference on data engineering, 1993, pp 209–218
Gray J, Reuter A (1993) Transaction processing: concepts and techniques. Morgan Kaufmann Publishers, Burlington
Gunarathne T, Zhang B, Tak-Lon W, Qiu J (2013) Scalable parallel computing on clouds using twister4azure iterative mapreduce. Future Gener Comput Syst 29(4):1035–1048
Guzenda L (2000) Objectivity/DB-a high performance object database architecture. In: Workshop on high performance object databases
Haas LM, Chang W, Lohman GM, McPherson J, Wilms PF, Lapis G, Lindsay B, Pirahesh H, Carey MJ, Shekita E (1990) Starburst mid-flight: as the dust clears [database project]. IEEE Trans Knowl Data Eng 2(1):143–160
Haas LM, Freytag JC, Lohman GM, Pirahesh H (1989) Extensible query processing in starburst. In: SIGMOD conference, pp 377–388
Haerder T (2005) DBMS architecture-still an open problem. BTW 65:2–28
Haerder T, Rahm E (2001) Datenbanksysteme: Konzepte und Techniken der Implementierung; mit 14 Tabellen. Springer, Berlin
Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317
Haerder T, Reuter A (1985) Architektur von datenbanksystemen fuer non-standard-anwendungen. In: BTW, pp 253–286
Hainaut JL, Henrard J, Englebert V, Roland D, Hick JM (2009) Database reverse engineering. In: Encyclopedia of database systems, pp 723–728
Hey T, Tansley S, Tolle KM (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research
Idreos S, Alagiannis I, Johnson R, Ailamaki A (2011) Here are my data files. Here are my queries. Where are my results? In: Proceedings of 5th Biennial conference on innovative data systems research, number EPFL-CONF-161489
Isard M, Budiu M, Yuan Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper Syst Rev 41(3):59–72
Knolle H, Schlageter G, Welker E, Unland R (1992) TOPAZ: a tool kit for the construction of application-specific transaction managers. In: Objektbanken fuer experten. Springer, pp 254–280
Kossmann D (2008) Building web applications without a database system. In: Proceedings of the 11th international conference on extending database technology: advances in database technology, EDBT ’08. ACM, New York, pp 3
Krieger D, Adler RM (1998) The emergence of distributed component platforms. Computer 31(3):43–53
Lindsay B, McPherson J, Pirahesh H (1987) A data management extension architecture, vol 16. ACM, New York
Linnemann V, Kuespert K, Dadam P, Pistor P, Erbe R, Kemper A, Suedkamp N, Walch G, Wallrath M (1988) Design and implementation of an extensible database management system supporting user defined data types and functions. In: VLDB, pp 294–305
Long J, Mayzak S (2011) Getting started with Roo. O’Reilly, Sebastopol
Lordan F, Tejedor E, Ejarque J, Rafanell R, Alvarez J, Marozzo F, Lezzi D, Sirvent R, Talia D, Badia RM (2014) Servicess: An interoperable programming framework for the cloud. J Grid Comput 12(1):67–91
Lynch CA, Stonebraker M (1988) Extended user-defined indexing with application to textual databases. In: VLDB, pp 306–317
Fowler M, Sadalage P (2012) A brief guide to the emerging world of polyglot persistence, NoSQL Distilled
Mattern F (2001) Ubiquitous computing. Presentation
McKenna WJ, Burger L, Hoang C, Truong M (1996) EROC: a toolkit for building neato query optimizers. In: VLDB. Citeseer, pp111–121
Melton J, Simon AR (1993) Understanding the new SQL: a complete guide. Morgan Kaufmann, Burlington
Mohan C (2013) History repeats itself: sensible and nonsensql aspects of the NoSQL hoopla. In: Proceedings of the 16th international conference on extending database technology. ACM, pp 11–16
Mullins C (2012) Database administration: the complete guide to DBA practices and procedures, 2nd edn. Addison-Wesley (ISBN 0201741296)
Nierstrasz O, Dami L (1995) Component-oriented software technology. Object-Oriented Softw Compos 1:3–28
Nierstrasz O, Dami L (1995) Research directions in software composition. ACM Comput Surv 27(2):262–264
Olson S, Pledereder R, Shaw P, Yach D (1998) The sybase architecture for extensible data management. IEEE Data Eng Bull 21(3):12–24
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, pp 1099–1110
Orfali R, Harkey D, Edwards J (1996) The essential distributed objects survival guide. Wile (ISBN 0471129933)
Tamer Özsu M, Muñoz A, Szafron D (1995) An extensible query optimizer for an objectbase management system. In: CIKM, pp 188–196
Peng J, Zhang X, Lei Z, Zhang B, Zhang W, Li Q (2009) Comparison of several cloud computing platforms. In: IEEE of second international symposium on information science and engineering (ISISE), 2009, pp 23–27
Rohm U, Bohm K (1999) Working together in harmony-an implementation of the corba object query service and its evaluation. In: Proceedings of the IEEE 15th international conference on data engineering, pp 238–247
Roth MT, Schwarz PM (1997) Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: VLDB, vol 97. DTIC Document, pp 25–29
Schek H-J, Paul H-B, Scholl MH, Weikum G (1990) The DASDBS project: objectives, experiences, and future prospects. IEEE Trans Knowl Data Eng 2(1):25–43
Seshadri P (1998) Predator: a resource for database research. ACM SIGMOD Rec 27(1):16–20
Simmhan Y, Van Ingen C, Subramanian G, Li J (2010) Bridging the gap between desktop and the cloud for escience applications. In: 2010 IEEE 3rd international conference on cloud computing (CLOUD), pp 474–481
Stonebraker M, Cetintemel U (2005) One size fits all: an idea whose time has come and gone. In: Proceedings of the 21st international conference on data engineering, ICDE ’05. IEEE Computer Society, Washington, pp 2–11
Stonebraker M, Held G, Wong E, Kreps P (1976) The design and implementation of INGRES. ACM Trans Database Syst 1:189–222
Stonebraker M, Katz RH, Patterson DA, Ousterhout JK (1988) The design of XPRS. In: VLDB, pp 318–330
Stonebraker M, Rowe LA (1986) The design of postgres. In: SIGMOD conference, pp 340–355
Stonebraker M, Rubenstein WB, Guttman A (1983) Application of abstract data types and abstract indices to CAD data bases. In: Engineering design applications, pp 107–113
Subasu I, Ziegler P, Dittrich KR (2007) Towards service-based data management systems. In: Workshop proceedings of datenbanksysteme in business, technologie und Web (BTW 2007), pp 3–86130
Szyperski CA (2002) Component software: beyond OO programming, 2nd edn. Addison-Wesley (ISBN 0201745720)
Tiwari S (2011) Professional NoSQL. Wiley, Hoboken
Tomasic A, Raschid L, Valduriez P (1998) Scaling access to heterogeneous data sources with disco. IEEE Trans Knowl Data Eng 10(5):808–823
Vargas-Solar G, Collet C, Grazziotin-Ribeiro H (2000) Active services for federated databases. SAC 1:356–360
Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44
Vu TT, Collet C (2004) Adaptable query evaluation using qbf. In: IDEAS, pp 265–270
Wells DL, Blakeley JA, Thompson CW (1992) Architecture of an open object-oriented database management system. IEEE Comput 25(10):74–82
Wiederhold G (1992) Mediators in the architecture of future information systems. Computer 25(3):38–49
Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J (2008) DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, vol 8, pp 1–14