Data management in cloud environments: NoSQL and NewSQL data stores

Katarina Grolinger1, Wilson A. Higashino2, Abhinav Tiwari1, Miriam Am Capretz1
1Department of Electrical and Computer Engineering, Faculty of Engineering, Western University, London, Canada N6A 5B9#TAB#
2Department of Electrical and Computer Engineering, Faculty of Engineering, Western University, London, Canada N6A 5B9 and Instituto de Computação, Universidade Estadual de Campinas, Camp ...#TAB#

Tóm tắt

Abstract Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages.

Từ khóa


Tài liệu tham khảo

Facebook Newsroom: A New data center for Iowa. http://newsroom.fb.com/News/606/A-New-Data-Center-for-Iowa . Accessed 29 Sep 2013

Ohlhorst FJ: Big Data Analytics: Turning Big Data into Big Money. Hoboken, New Jersey, USA: John Wiley & Sons, Inc; 2013.

Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P: The end of an architectural era: (it’s time for a complete rewrite). Proc 33rd Int Conf Large Data Bases 2007, 1150–1160.

Beyer MA, Laney D: The Importance of “Big Data”: A Definition. 2012.http://www.gartner.com/id=2057415 . Accessed 29 Sep 2013

Agrawal D, Das S, El Abbadi A: Big data and cloud computing: Current State and Future Opportunities. Proceedings of the 14th International Conference on Extending Database Technology - EDBT/ICDT’11. New York, NY, USA: ACM Press; 2011:530–533.

Bughin J, Chui M, Manyika J: Clouds, big data, and smart assets: Ten tech-enabled business trends to watch. McKinsey Quarterly 2010, 2010: 1–14.

Mell P, Grance T: The NIST definition of cloud computing. NIST special publication 800–145. 2011.http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf . Accessed on 29 Sep 2013

Zhang Q, Cheng L, Boutaba R: Cloud computing: state-of-the-art and research challenges. J Intern Serv Appl 2010, 1: 7–18. 10.1007/s13174–010–0007–6 10.1007/s13174-010-0007-6 10.1007/s13174-010-0007-6

Venters W, Whitley EA: A critical review of cloud computing: researching desires and realities. J Info Technol 2012, 27: 179–197. 10.1057/jit.2012.17 10.1057/jit.2012.17 10.1057/jit.2012.17

Tudorica BG, Bucur C: A comparison between several NoSQL databases with comments and notes. 2011 10th International Conference RoEduNet. IEEE 2011, 1–5.

Hecht R, Jablonski S: NoSQL evaluation: A use case oriented survey. Proc 2011 Int Conf Cloud Serv Computing 2011, 336–341.

Cattell R: Scalable SQL and NoSQL Data Stores. ACM SIGMOD Record 2011, 39(4):12–27. 10.1145/1978915.1978919

Pokorny J: NoSQL Databases: a step to database scalability in Web environment. Int J Web Info Syst 2011, 9(1):69–82.

Sakr S, Liu A, Batista DM, Alomari M: A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutorials 2011, 13(3):311–336.

Sadalage PJ, Fowler M: NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Upper Saddle River, NJ: Addison-Wesley; 2013.

Abiteboul S, Manolescu I, Rigaux P, Rousset M-C, Senellart P: Web Data Management. New York: Cambridge University Press; 2012.

Aslett M: How will the database incumbents respond to NoSQL and NewSQL? 2011.https://451research.com/report-short?entityId=66963 . Accessed 29 Sep 2013

Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gen Computer Syst 2009, 25(6):599–616. http://dx.doi.org/10.1016/j.future.2008.12.001 10.1016/j.future.2008.12.001

Lakshman A, Malik P: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Syst Rev 2010, 44(2):35–40. 10.1145/1773912.1773922 10.1145/1773912.1773922 10.1145/1773912.1773922

Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber R: Bigtable: A distributed structured data storage system. 7th OSDI 2006, 26: 305–314.

Gilbert S, Lynch N: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 2002, 33(2):51–59. 10.1145/564585.564601 10.1145/564585.564601 10.1145/564585.564601

Brewer E: CAP twelve years later: How the “rules” have changed. Computer 2012, 45: 23–29. 10.1109/MC.2012.37 10.1109/MC.2012.37

NOSQL meetup. San Francisco: Eventbrite;http://nosql.eventbrite.com/ . Accessed 29 Sep 2013

Konstantinou I, Angelou E, Boumpouka C, Tsoumakos D, Koziris N: On the elasticity of NoSQL databases over cloud management platforms. Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM ’11. New York, NY, USA: ACM Press; 2011:2385–2388.

Pritchett D: BASE: An ACID Alternative. Queue 2008, 6: 48–55. 10.1145/1394127.1394128 10.1145/1394127.1394128 10.1145/1394127.1394128

Apache CouchDB http://couchdb.apache.org/ . Accessed 29 Sep 2013

Murty J: Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media, Inc; 2008.

DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W: Dynamo: Amazon’s highly available Key-value store. ACM SIGOPS Operating Syst Rev 2007, 41: 205. 10.1145/1323293.1294281 10.1145/1323293.1294281 10.1145/1323293.1294281

Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D: Spanner: Google’s globally-distributed database. Osdi 2012, 2012: 1–14.

VoltDB Inc: VoltDB Technical Overview. 1–4. 2013.http://voltdb.com/downloads/datasheets_collateral/technical_overview.pdf . Accessed 29 Sep 2013

Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones EPC, Madden S, Stonebraker M, Zhang Y, Hugg J, Abadi DJ: H-store: a high-performance, distributed main memory transaction processing system. Proc VLDB Endowment 2008, 1(2):1496–1499.

Clustrix Inc: A New Approach: Clustrix Sierra Database Engine. 1–10. 2012.http://www.clustrix.com/wp-content/uploads/2013/10/Clustrix_A-New-Approach_WhitePaper.pdf . Accessed 29 Sep 2013

NuoDB Greenbook Publication: NuoDB Emergent Architecture. 1–20. 2013.http://go.nuodb.com/rs/nuodb/images/Greenbook_Final.pdf . Accessed 29 Sep 2013

DB-Engines Ranking http://db-engines.com/en/ranking . Accessed 29 Sep 2013

MongoDB http://www.mongodb.org/ . Accessed 29 Sep 2013

Couchbase Server: The NoSQL document database. http://www.couchbase.com/couchbase-server/overview . Accessed 29 Sep 2013

Apache HBase http://hbase.apache.org/ . Accessed 29 Sep 2013

Redis http://redis.io/ . Accessed 29 Sep 2013

Memcached http://memcached.org/ . Accessed 29 Sep 2013

Klophaus R: Riak Core: building distributed applications without shared state. Proceedings of CUFP’10 - ACM SIGPLAN Commercial Users of Functional Programming. New York, NY, USA: ACM Press; 2010:1.

Oracle Berkeley DB 12c http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html . Accessed 29 Sep 2013

Neo4j - What is a Graph Database? http://www.neo4j.org/ . Accessed 29 Sep 2013

Auradkar A, Botev C, Das S, De Maagd D, Feinberg A, Ganti P, Gao L, Ghosh B, Gopalakrishna K, Harris B, Koshy J, Krawez K, Kreps J, Lu S, Nagaraj S, Narkhede N, Pachev S, Perisic I, Qiao L, Quiggle T, Rao J, Schulman B, Sebastian A, Seeliger O, Silberstein A, Shkolnik B, Soman C, Sumbaly R, Surlaker K, Topiwala S, Tran C, Varadarajan B, Westerman J, White Z, Zhang D, Zhang J: Data Infrastructure at LinkedIn. Proceedings of 2012 IEEE 28th International Conference on Data Engineering. IEEE 2012, 1370–1381.

Buerli M: The current state of graph databases. 2012.http://www.cs.utexas.edu/~cannata/dbms/Class%20Notes/08%20Graph_Databases_Survey.pdf . Accessed 29 Sep 2013

Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Comm ACM 2008, 51(1):107–113. 10.1145/1327452.1327492 10.1145/1327452.1327492 10.1145/1327452.1327492

Cassandra Query Language (CQL) v3.1.1 http://cassandra.apache.org/doc/cql3/CQL.html . Accessed 29 Sep 2013

Harris S, Seaborne A: SPARQL 1.1 Query Language. 2013.http://www.w3.org/TR/2013/REC-sparql11-query-20130321/ . Accessed 29 Sep 2013

AllegroGraph 4.11 http://www.franz.com/agraph/allegrograph/ . Accessed 29 Sep 2013

Battle R, Benson E: Bridging the semantic Web and Web 2.0 with Representational State Transfer (REST). Web Semantics: Sci Serv Agents World Wide Web 2008, 6: 61–69. 10.1016/j.websem.2007.11.002 10.1016/j.websem.2007.11.002 10.1016/j.websem.2007.11.002

Soni Madhulatha T: Graph partitioning advance clustering technique. Int J Computer Sci Eng Surv 2012, 3(1):91–104. 10.5121/ijcses.2012.3109 10.5121/ijcses.2012.3109 10.5121/ijcses.2012.3109

PCI Security Standards Council: Payment card industry (PCI) data security standard - requirements and security assessment procedures - version 2.0. 2010.https://www.pcisecuritystandards.org/documents/pci_dss_v2.pdf . Accessed 29 Sep 2013

Health insurance portability and accountability Act of 1996 (HIPAA) http://www.cms.gov/Regulations-and-Guidance/HIPAA-Administrative-Simplification/HIPAAGenInfo/downloads/hipaalaw.pdf . Accessed 29 Sep 2013

Gonzalez N, Miers C, Redígolo F, Simplício M, Carvalho T, Näslund M, Pourzandi M: A quantitative analysis of current security concerns and solutions for cloud computing. J Cloud Computing: Adv Syst Appl 2012, 1: 11. 10.1186/2192–113X-1–11 10.1186/2192-113X-1-11 10.1186/2192-113X-1-11

Basho Technologies: From relational to riak. 2012.http://basho.com/assets/RelationaltoRiak.pdf . Accessed 11 Dec 2013

Borthakur D, Rash S, Schmidt R, Aiyer A, Gray J, Sen SJ, Muthukkaruppan K, Spiegelberg N, Kuang H, Ranganathan K, Molkov D, Menon A: Apache hadoop goes realtime at Facebook. Proc 2011 Int Conf Manage Data - SIGMOD ’11 1071 2011. 10.1145/1989323.1989438 10.1145/1989323.1989438

Petcu D, Macariu G, Panica S, Crăciun C: Portable cloud applications—from theory to practice. Future Gen Computer Syst 2012, 29(6):1417–1430. 10.1016/j.future.2012.01.009 10.1016/j.future.2012.01.009

Vaquero LM, Rodero-Merino L, Buyya R: Dynamically scaling applications in the cloud. ACM SIGCOMM Computer Comm Rev 2011, 41(1):45–52. 10.1145/1925861.1925869 10.1145/1925861.1925869 10.1145/1925861.1925869

Liu Z, Wang Y, Lin R: A novel development and analysis solution to PaaS log by using CouchDB. 2012 3rd IEEE Int Conf Network Infrastr Digital Content 2012, 251–255.

Ramaswamy L, Lawson V, Gogineni SV: Towards a quality-centric Big data architecture for federated sensor services. 2013 IEEE Int Congr Big Data 2013, 86–93. 10.1109/BigData.Congress.2013.21 10.1109/BigData.Congress.2013.21

Redmond E, Wilson JR: Seven databases in seven weeks: a guide to modern databases and the NoSQL movement. O'Reilly Media 2013. 978–1-934356–92–0 978-1-934356-92-0

Havlik D, Egly M, Huber H, Kutschera P, Falgenhauer M, Cizek M, et al.: Robust and Trusted Crowd-Sourcing and Crowd-Tasking in the Future Internet. IFIP Advances in Information and Communication Technology 413th edition. 2013, 164–176.

Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R: Benchmarking cloud serving systems with YCSB. Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 154: 143–154.

Cooper BF, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H-A, Puz N, Weaver D, Yerneni R: PNUTS: Yahoo!’s hosted data serving platform. Proc VLDB Endowment 2008, 1(2):1277–1288.

Rabl T, Gómez-Villamor S, Sadoghi M, Muntés-Mulero V, Jacobsen HA, Mankovskii S: Solving big data challenges for enterprise application performance management. Proc VLDB Endowment 2012, 5(12):1724–1735.

Bushik S: A Vendor-independent Comparison of NoSQL Databases: Cassandra, HBase, MongoDB, Riak. 2012.http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html . Accessed 11 Dec 2013

Angles R, Gutierrez C: Survey of graph database models. ACM Computing Surv 2008, 40: 1–39. 10.1145/1322432.1322433 10.1145/1322432.1322433

Doytsher Y, Galon B, Kanza Y Proceedings of 21st international conference companion on world wide web - WWW’12 Companion. In Querying socio-spatial networks on the world-wide web. New York, NY, USA: ACM Press; 2012:329–332.

Mannens E, Coppens S, Pessemier T, Dacquin H, Deursen D, Sutter R, Walle R: Automatic news recommendations via aggregated profiling. Multimed Tools Appl 2011, 63: 407–425. 10.1007/s11042–011–0844–8 10.1007/s11042-011-0844-8

Ho L-Y, Wu J-J, Liu P: Distributed graph database for large-scale social computing. 2012 IEEE Fifth Int Conf Cloud Computing 2012, 455–462.

Sor V, Srirama SN Proceedings of the 2012 ACM Research in Applied Computation Symposium - RACS’12. In Evaluation of embeddable graph manipulation libraries in memory constrained environments. New York, NY, USA: ACM Press; 2012:269–275.

Bailis P, Fekete A, Ghodsi A, Hellerstein JM, Stoica I: HAT, not CAP: highly available transactions. 2013. arXiv preprint arXiv:1302.0309 arXiv preprint arXiv:1302.0309

Thusoo A, Sarma J, Sen JN, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R: Hive: a warehousing solution over a Map-reduce framework. Proc VLDB Endowment 2009, 2(2):1626–1629.

Olston C, Reed B, Srivastava U, Kumar R, Tomkins A: Pig latin: a not-so-foreign language for data processing. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data - SIGMOD’08. New York, NY, USA: ACM Press; 2008:1099–1110.

Atzeni P, Bugiotti F, Rossi L: Uniform access to NoSQL systems. Information systems (in press). 2013. 10.1016/j.is.2013.05.002