Big Data Management: What to Keep from the Past to Face Future Challenges?

Data Science and Engineering - Tập 2 - Trang 328-345 - 2017
G. Vargas-Solar1, J. L. Zechinelli-Martini2, J. A. Espinosa-Oviedo3
1Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, LAFMIA, Grenoble, France
2Fundación Universidad de las Américas, Puebla, Puebla, Mexico
3Barcelona Supercomputing Center, LAFMIA, Barcelona, Spain

Tóm tắt

The emergence of new hardware architectures, and the continuous production of data open new challenges for data management. It is no longer pertinent to reason with respect to a predefined set of resources (i.e., computing, storage and main memory). Instead, it is necessary to design data processing algorithms and processes considering unlimited resources via the “pay-as-you-go” model. According to this model, resources provision must consider the economic cost of the processes versus the use and parallel exploitation of available computing resources. In consequence, new methodologies, algorithms and tools for querying, deploying and programming data management functions have to be provided in scalable and elastic architectures that can cope with the characteristics of Big Data aware systems (intelligent systems, decision making, virtual environments, smart cities, drug personalization). These functions, must respect QoS properties (e.g., security, reliability, fault tolerance, dynamic evolution and adaptability) and behavior properties (e.g., transactional execution) according to application requirements. Mature and novel system architectures propose models and mechanisms for adding these properties to new efficient data management and processing functions delivered as services. This paper gives an overview of the different architectures in which efficient data management functions can be delivered for addressing Big Data processing challenges.

Tài liệu tham khảo

Adiba M (2007) Ambient, continuous and mobile data, Presentation Afrati FN, Sarma AD, Menestrina D, Parameswaran AG, Ullman JD (2012) Fuzzy joins using mapreduce. In: ICDE, pp 498–509 Alexandrov A, Bergmann R, Ewen S, Freytag JC, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V et al (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964 Alomar G, Fontal YB, Torres Viñals J (2015) Hecuba: Nosql made easy. In: Montserrat GF (ed) BSC doctoral symposium, 2nd edn. Barcelona Supercomputing Center, Barcelona, pp 136–137 Amedro B, Baude F, Caromel D, Delbé C, Filali I, Huet F, Mathias E, Smirnov O (2010) An efficient framework for running applications on clusters, grids, and clouds. In: Cloud computing. Springer, New York, pp 163–178 Astrahan MM, Blasgen MW, Chamberlin DD, Eswaran KP, Gray JN, Griffiths PP, Frank King W, Lorie RA, McJones PR, Mehl JW et al (1976) System R: relational approach to database management. ACM Trans Database Syst 1(2):97–137 Athanassoulis M, Kester M, Maas L, Stoica R, Idreos S, Ailamaki A, Callaghan M (2016) Designing access methods: the rum conjecture. In: International conference on extending database technology (EDBT) Atkinson MP, Bancilhon F, DeWitt DJ, Dittrich KR, Maier D, Zdonik SB (1989) The object-oriented database system manifesto. In: DOOD, vol 89, pp 40–57 Atzeni P, Bugiotti F, Rossi L (2012) Uniform access to non-relational database systems: the SOS platform. In: Advanced information systems engineering. Springer, New York, pp 160–174 Bancilhon F, Delobel C, Kanellakis PC (eds) (1992) Building an object-oriented database system, the story of O2. Morgan Kaufmann, San Francisco Banerjee S, Krishnamurthy V, Krishnaprasad M, Murthy R (2000) Oracle8i-the XML enabled data management system. In: Proceedings of the 16th international conference on data engineering, pp 561–568 Batoory DS, Barnett JR, Garza JF, Smith KP, Tsukuda K, Twichell BC, Wise TE (1988) GENESIS: an extensible database management system. IEEE Trans Softw Eng 14(11):1711–1730 Batory DS (1988) Concepts for a database system compiler. In: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, pp 184–192 Blakeley JA (1994) Open object database management systems. In: SIGMOD conference, p 520 Blakeley JA (1996) Data access for the masses through ole db. In: ACM SIGMOD record, vol 25. ACM, New York, pp 161–172 Blakeley JA (1996) OLE DB: a component DBMS architecture. In: Proceedings of the twelfth international conference on data engineering, pp 203–204 Blakeley JA, Pizzo MJ (1998) Microsoft universal data access platform. In: ACM SIGMOD record, vol 27. ACM, pp 502–503 Bliujute R, Saltenis S, Slivinskas G, Jensen CS (1999) Developing a datablade for a new index. In: Proceedings of the 15th international conference on data engineering, pp 314–323 Borkar V, Carey MJ, Li C (2012) Inside big data management: ogres, onions, or parfaits? In: Proceedings of the 15th international conference on extending database technology. ACM, pp 3–14 Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Proj Website 11(2007):21 Bruno G, Collet C, Vargas-Solar G (2006) Configuring intelligent mediators using ontologies. In: EDBT workshops, pp 554–572 Carey M, Haas L (1990) Extensible database management systems. ACM SIGMOD Rec 19(4):54–60 Carey MJ, DeWitt DJ, Frank D, Graefe G, Richardson JE, Shekita EJ, Muralikrishna M (1991) The architecture of the EXODUS extensible dbms. In: On object-oriented database, system, pp 231–256 Castrejón JC, López-Landa R, Lozano R (2011) Model2roo: a model driven approach for web application development based on the eclipse modeling framework and spring roo. In: 21st international conference on electrical communications and computers (CONIELECOMP), pp 82–87 Cattell RGG, Barry D (eds) (1997) The object database standard: ODMG 2.0. Morgan Kaufmann, San Francisco Cattell R (2010) Scalable SQL and NoSQL data stores. SIGMOD Record 39(4):12–27 Chaiken R, Jenkins B, Larson PÅ, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276 Chrysanthis PK, Ramamritham K (1994) Synthesis of extended transaction models using acta. ACM Trans Database Syst 19(3):450–491 Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387 Collet C (2000) The NODS project: networked open database services. In: 14th European conference on object-oriented programming (ECOOP-2000), June 2000 Collet C, Amann B, Bidoit N, Boughanem M, Bouzeghoub M, Doucet A, Gross-Amblard D, Petit J-M, Hacid M-S, Vargas-Solar G (2013) De la gestion de bases de données à la gestion de grands espaces de données. Ingénierie des Systèmes d’Information 18(4):11–31 Collet C, Belhajjame K, Bernot G, Bobineau C, Bruno G, Finance B, Jouanot F, Kedad Z, Laurent D, Tahi F, Vargas-Solar G, Vu T-T, Xue X (2004) Towards a mediation system framework for transparent access to largely distributed sources, the mediagrid project. In: ICSNW, pp 65–78 Collet C, Vargas-Solar G, Grazziotin-Ribeiro H (2000) Open active services for data-intensive distributed applications. In: IDEAS, pp 349–359 Collet C, Vu T-T (2004) QBF: a query broker framework for adaptable query evaluation. In: FQAS, pp 362–375 Dadam P, Kuespert K, Andersen F, Blanken HM, Erbe R, Guenauer J, Lum VY, Pistor P, Walch G (1986) A DBMS prototype to support extended NF2 relations: an integrated view on flat tables and hierarchies. In: SIGMOD conference, pp 356–367 Davulcu H, Freire J, Kifer M, Ramakrishnan IV (1999) A layered architecture for querying dynamic web content. In: ACM SIGMOD record, vol 28. ACM, pp 491–502 Dessloch S, Chen W, Chow J-H, Fuh Y-C, Grandbois J, Jou M, Mattos NM, Nitzsche R, Tran BT, Wang Y(2001) Extensible indexing support in db2 universal database. In: Compontent database systems, pp 105–138 Dessloch S, Mattos N (1997) Integrating SQL databases with content-specific search engines. VLDB 97:528–537 DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98 Dittrich KR, Geppert A (2000) Component database systems. Morgan Kaufmann, San Francisco Dittrich KR, Gotthard W, Lockemann PC (1986) DAMOKLES–a database system for software engineering environments. In: Advanced programming environments, pp 353–371 Drew P, King R, Heimbigner D (1992) A toolkit for the incremental implementation of heterogeneous database management systems. VLDB J Int J Very Large Data Bases 1(2):241–284 D’souza DF, Wills AC (1998) Objects, components, and frameworks with UML: the catalysis approach, vol 1. Addison-Wesley, Reading Fayad M, Schmidt DC (1997) Object-oriented application frameworks. Commun ACM 40(10):32–38 Franklin M (2013) The berkeley data analytics stack: present and future. In: IEEE international conference on big data, pp 2–3 Fritschi H, Gatziu S, Dittrich KR (1998) FRAMBOISE: an approach to framework-based active database management system construction. In: Proceedings of the seventh international conference on information and knowledge management. ACM, pp 364–370 Frost S (1998) Component-based development for enterprise systems: applying the SELECT perspective. Cambridge University Press, Cambridge García-Bañuelos L, Duong P-Q, Collet C (2003) A component-based infrastructure for customized persistent object management. In: DEXA workshops, pp 536–541 Garcia-Molina H, Papakonstantinou Y, Quass D, Rajaraman A, Sagiv Y, Ullman J, Vassalos V, Widom J (1997) The TSIMMIS approach to mediation: data models and languages. J Intell Inform Syst 8(2):117–132 Georgakopoulos D, Hornick M, Krychniak P, Manola F (1994) Specification and management of extended transactions in a programmable transaction environment. In: Proceedings of 10th international conference data engineering, 1994, pp 462–473 Geppert A, Dittrich KR (1994) Constructing the next 100 database management systems: like the handyman or like the engineer? ACM SIGMOD Rec 23(1):27–33 Geppert A, Scherrer S, Dittrich KR (1997) Construction of database management systems based on reuse. University of Zurich, KIDS Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, pp 29–43 Graefe G (1995) The Cascades framework for query optimization. Data Eng Bull 18(3):19–29 Graefe G, DeWitt DJ (1987) The EXODUS optimizer generator, vol 16. ACM, New York Graefe G, McKenna WJ (1993) The volcano optimizer generator: extensibility and efficient search. In: Proceedings of the ninth international conference on data engineering, 1993, pp 209–218 Gray J, Reuter A (1993) Transaction processing: concepts and techniques. Morgan Kaufmann Publishers, Burlington Gunarathne T, Zhang B, Tak-Lon W, Qiu J (2013) Scalable parallel computing on clouds using twister4azure iterative mapreduce. Future Gener Comput Syst 29(4):1035–1048 Guzenda L (2000) Objectivity/DB-a high performance object database architecture. In: Workshop on high performance object databases Haas LM, Chang W, Lohman GM, McPherson J, Wilms PF, Lapis G, Lindsay B, Pirahesh H, Carey MJ, Shekita E (1990) Starburst mid-flight: as the dust clears [database project]. IEEE Trans Knowl Data Eng 2(1):143–160 Haas LM, Freytag JC, Lohman GM, Pirahesh H (1989) Extensible query processing in starburst. In: SIGMOD conference, pp 377–388 Haerder T (2005) DBMS architecture-still an open problem. BTW 65:2–28 Haerder T, Rahm E (2001) Datenbanksysteme: Konzepte und Techniken der Implementierung; mit 14 Tabellen. Springer, Berlin Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317 Haerder T, Reuter A (1985) Architektur von datenbanksystemen fuer non-standard-anwendungen. In: BTW, pp 253–286 Hainaut JL, Henrard J, Englebert V, Roland D, Hick JM (2009) Database reverse engineering. In: Encyclopedia of database systems, pp 723–728 Hey T, Tansley S, Tolle KM (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research Idreos S, Alagiannis I, Johnson R, Ailamaki A (2011) Here are my data files. Here are my queries. Where are my results? In: Proceedings of 5th Biennial conference on innovative data systems research, number EPFL-CONF-161489 Isard M, Budiu M, Yuan Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper Syst Rev 41(3):59–72 Knolle H, Schlageter G, Welker E, Unland R (1992) TOPAZ: a tool kit for the construction of application-specific transaction managers. In: Objektbanken fuer experten. Springer, pp 254–280 Kossmann D (2008) Building web applications without a database system. In: Proceedings of the 11th international conference on extending database technology: advances in database technology, EDBT ’08. ACM, New York, pp 3 Krieger D, Adler RM (1998) The emergence of distributed component platforms. Computer 31(3):43–53 Lindsay B, McPherson J, Pirahesh H (1987) A data management extension architecture, vol 16. ACM, New York Linnemann V, Kuespert K, Dadam P, Pistor P, Erbe R, Kemper A, Suedkamp N, Walch G, Wallrath M (1988) Design and implementation of an extensible database management system supporting user defined data types and functions. In: VLDB, pp 294–305 Long J, Mayzak S (2011) Getting started with Roo. O’Reilly, Sebastopol Lordan F, Tejedor E, Ejarque J, Rafanell R, Alvarez J, Marozzo F, Lezzi D, Sirvent R, Talia D, Badia RM (2014) Servicess: An interoperable programming framework for the cloud. J Grid Comput 12(1):67–91 Lynch CA, Stonebraker M (1988) Extended user-defined indexing with application to textual databases. In: VLDB, pp 306–317 Fowler M, Sadalage P (2012) A brief guide to the emerging world of polyglot persistence, NoSQL Distilled Mattern F (2001) Ubiquitous computing. Presentation McKenna WJ, Burger L, Hoang C, Truong M (1996) EROC: a toolkit for building neato query optimizers. In: VLDB. Citeseer, pp111–121 Melton J, Simon AR (1993) Understanding the new SQL: a complete guide. Morgan Kaufmann, Burlington Mohan C (2013) History repeats itself: sensible and nonsensql aspects of the NoSQL hoopla. In: Proceedings of the 16th international conference on extending database technology. ACM, pp 11–16 Mullins C (2012) Database administration: the complete guide to DBA practices and procedures, 2nd edn. Addison-Wesley (ISBN 0201741296) Nierstrasz O, Dami L (1995) Component-oriented software technology. Object-Oriented Softw Compos 1:3–28 Nierstrasz O, Dami L (1995) Research directions in software composition. ACM Comput Surv 27(2):262–264 Olson S, Pledereder R, Shaw P, Yach D (1998) The sybase architecture for extensible data management. IEEE Data Eng Bull 21(3):12–24 Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, pp 1099–1110 Orfali R, Harkey D, Edwards J (1996) The essential distributed objects survival guide. Wile (ISBN 0471129933) Tamer Özsu M, Muñoz A, Szafron D (1995) An extensible query optimizer for an objectbase management system. In: CIKM, pp 188–196 Peng J, Zhang X, Lei Z, Zhang B, Zhang W, Li Q (2009) Comparison of several cloud computing platforms. In: IEEE of second international symposium on information science and engineering (ISISE), 2009, pp 23–27 Rohm U, Bohm K (1999) Working together in harmony-an implementation of the corba object query service and its evaluation. In: Proceedings of the IEEE 15th international conference on data engineering, pp 238–247 Roth MT, Schwarz PM (1997) Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: VLDB, vol 97. DTIC Document, pp 25–29 Schek H-J, Paul H-B, Scholl MH, Weikum G (1990) The DASDBS project: objectives, experiences, and future prospects. IEEE Trans Knowl Data Eng 2(1):25–43 Seshadri P (1998) Predator: a resource for database research. ACM SIGMOD Rec 27(1):16–20 Simmhan Y, Van Ingen C, Subramanian G, Li J (2010) Bridging the gap between desktop and the cloud for escience applications. In: 2010 IEEE 3rd international conference on cloud computing (CLOUD), pp 474–481 Stonebraker M, Cetintemel U (2005) One size fits all: an idea whose time has come and gone. In: Proceedings of the 21st international conference on data engineering, ICDE ’05. IEEE Computer Society, Washington, pp 2–11 Stonebraker M, Held G, Wong E, Kreps P (1976) The design and implementation of INGRES. ACM Trans Database Syst 1:189–222 Stonebraker M, Katz RH, Patterson DA, Ousterhout JK (1988) The design of XPRS. In: VLDB, pp 318–330 Stonebraker M, Rowe LA (1986) The design of postgres. In: SIGMOD conference, pp 340–355 Stonebraker M, Rubenstein WB, Guttman A (1983) Application of abstract data types and abstract indices to CAD data bases. In: Engineering design applications, pp 107–113 Subasu I, Ziegler P, Dittrich KR (2007) Towards service-based data management systems. In: Workshop proceedings of datenbanksysteme in business, technologie und Web (BTW 2007), pp 3–86130 Szyperski CA (2002) Component software: beyond OO programming, 2nd edn. Addison-Wesley (ISBN 0201745720) Tiwari S (2011) Professional NoSQL. Wiley, Hoboken Tomasic A, Raschid L, Valduriez P (1998) Scaling access to heterogeneous data sources with disco. IEEE Trans Knowl Data Eng 10(5):808–823 Vargas-Solar G, Collet C, Grazziotin-Ribeiro H (2000) Active services for federated databases. SAC 1:356–360 Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44 Vu TT, Collet C (2004) Adaptable query evaluation using qbf. In: IDEAS, pp 265–270 Wells DL, Blakeley JA, Thompson CW (1992) Architecture of an open object-oriented database management system. IEEE Comput 25(10):74–82 Wiederhold G (1992) Mediators in the architecture of future information systems. Computer 25(3):38–49 Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J (2008) DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, vol 8, pp 1–14