Xử lý giao dịch trong bộ nhớ: Những cân nhắc về hiệu quả và khả năng mở rộng

Knowledge and Information Systems - Tập 61 - Trang 1209-1240 - 2019
Huiqi Hu1, Xuan Zhou1, Tao Zhu1, Weining Qian1, Aoying Zhou1
1School of Data Science and Engineering, East China Normal University, Shanghai, China

Tóm tắt

Các hệ thống OLTP truyền thống lưu trữ trên đĩa chủ yếu được thiết kế cho các máy tính có bộ nhớ tương đối nhỏ. Được thúc đẩy bởi sự tiến bộ của phần cứng, các hệ thống OLTP cần được thiết kế lại cho môi trường bộ nhớ lớn hơn và đa lõi. So với các hệ thống lưu trữ trên đĩa, các hệ thống lưu trữ trong bộ nhớ có những lợi thế về hiệu suất đáng kể, cả về thông lượng giao dịch và độ trễ truy vấn. Hiệu suất của chúng không còn bị giới hạn bởi các thao tác I/O trên đĩa. Thay vào đó, hiệu quả và khả năng mở rộng trên các CPU đa lõi trở nên quan trọng hơn. Trong tài liệu này, chúng tôi khảo sát và tóm tắt một loạt các cân nhắc về thiết kế và triển khai có thể ảnh hưởng đến hiệu quả hoặc khả năng mở rộng của một hệ thống OLTP lưu trữ trong bộ nhớ. Những cân nhắc này liên quan đến hầu hết các thành phần chính của cơ sở dữ liệu, bao gồm kiểm soát đồng truy cập, ghi log, lập chỉ mục và biên dịch giao dịch. Đối với từng thành phần, chúng tôi cung cấp một số phân tích sâu dựa trên các nghiên cứu gần đây. Khảo sát này cũng nhằm cung cấp một số hướng dẫn cho việc thiết kế hoặc triển khai các hệ thống OLTP lưu trữ trong bộ nhớ hiệu suất cao.

Từ khóa

#OLTP #hệ thống trong bộ nhớ #hiệu suất #khả năng mở rộng #kiểm soát đồng truy cập #ghi log #lập chỉ mục #biên dịch giao dịch.

Tài liệu tham khảo

Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P (1992) Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. TODS 17(1):94–162 Alibaba single day record. https://techcrunch.com/2017/11/11/alibaba-smashes-its-singles-day-record/. Accessed 2018 Färber F, Cha SK, Primsch J, Bornhövd C, Sigg S, Lehner W (2012) SAP HANA database: data management for modern business applications. SIGMOD Rec 40(4):45–51 Lee J, Kwon YS, Färber F, Muehle M, Lee C, Bensberg C, Lee JY, Lee AH, Lehner W (2013) SAP HANA distributed in-memory database system: transaction, session, and metadata management. In: ICDE. IEEE, pp 1165–1173 Stonebraker M, Weisberg A (2013) The VoltDB main memory DBMS. IEEE Data Eng Bull 36(2):21–27 Kemper A, Neumann T (2011) Hyper: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: ICDE. IEEE, pp 195–206 Faerber F, Kemper A, Larson P-Å, Levandoski J, Neumann T, Pavlo A et al (2017) Main memory database systems. Found Trends Databases 8(1–2):1–130 Zhang H, Chen G, Ooi BC, Tan K-L, Zhang M (2015) In-memory big data management and processing: a survey. IEEE Trans Knowl Data Eng 27(7):1920–1948 Kung H-T, Robinson JT (1981) On optimistic methods for concurrency control. TODS 6(2):213–226 Tu S, Zheng W, Kohler E, Liskov B, Madden S (2013) Speedy transactions in multicore in-memory databases. In: SOSP. ACM, pp 18–32 Diaconu C, Freedman C, Ismert E, Larson P-A, Mittal P, Stonecipher R, Verma N, Zwilling M (2013) Hekaton: SQL server’s memory-optimized OLTP engine. In: SIGMOD. ACM, pp 1243–1254 Larson P-Å, Blanas S, Diaconu C, Freedman C, Patel JM, Zwilling M (2011) High-performance concurrency control mechanisms for main-memory databases. Proc VLDB Endow 5(4):298–309 Yu X, Pavlo A, Sanchez D, Devadas S (2016) TicToc: time traveling optimistic concurrency control. In: SIGMOD, vol 8, pp 209–220 Wu Y, Chan C-Y, Tan K-L (2016) Transaction healing: scaling optimistic concurrency control on multicores. In: SIGMOD. ACM, pp 1689–1704 Neumann T, Mühlbauer T, Kemper A (2015) Fast serializable multi-version concurrency control for main-memory database systems. In: SIGMOD. ACM, pp 677–689 Loesing S, Pilman M, Etter T, Kossmann D (2015) On the design and scalability of distributed shared-data databases. In: SIGMOD. ACM, pp 663–676 Jordan J, Banerjee J, Batman R (1981) Precision locks. In: SIGMOD. ACM, pp 143–147 Zheng W, Tu S, Kohler E, Liskov B (2014) Fast databases with fast durability and recovery through multicore parallelism. In: OSDI, pp 465–477 Eswaran KP, Gray JN, Lorie RA, Traiger IL (1976) The notions of consistency and predicate locks in a database system. Commun ACM 19(11):624–633 Johnson R, Pandis I, Ailamaki A (2009) Improving OLTP scalability using speculative lock inheritance. Proc VLDB Endow 2(1):479–489 Jung H, Han H, Fekete A, Heiser G, Yeom HY (2014) A scalable lock manager for multicores. TODS 39(4):29 Harizopoulos S, Abadi DJ, Madden S, Stonebraker M (2008) OLTP through the looking glass, and what we found there. In: SIGMOD. ACM, pp 981–992 Ren K, Thomson A, Abadi DJ (2012) Lightweight locking for main memory database systems. In: VLDB. vol 6, pp 145–156 Pandis I, Johnson R, Hardavellas N, Ailamaki A (2010) Data-oriented transaction execution. Proc VLDB Endow 3(1–2):928–939 Xie C, Su C, Littley C, Alvisi L, Kapritsos M, Wang Y (2015) High-performance acid via modular concurrency control. In: SOSP. ACM, pp 279–294 Narula N, Cutler C, Kohler E, Morris R (2014) Phase reconciliation for contended in-memory transactions. In: OSDI, pp 511–524 Shasha D, Llirbat F, Simon E, Valduriez P (1995) Transaction chopping: algorithms and performance studies. TODS 20(3):325–363 Yu X, Bezerra G, Pavlo A, Devadas S, Stonebraker M (2014) Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proc VLDB Endow 8(3):209–220 Agrawal R, Carey MJ, Livny M (1987) Concurrency control performance modeling: alternatives and implications. TODS 12(4):609–654 Thomasian A (1993) Two-phase locking performance and its thrashing behavior. TODS 18(4):579–625 Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P (2007) The end of an architectural era: (it’s time for a complete rewrite). In: VLDB. VLDB Endowment, pp 1150–1160 Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones EP, Madden S, Stonebraker M, Zhang Y et al (2008) H-store: a high-performance, distributed main memory transaction processing system. VLDB 1(2):1496–1499 Faleiro JM, Thomson A, Abadi DJ (2014) Lazy evaluation of transactions in database systems. In: SIGMOD. ACM, pp 15–26 Thomson A, Abadi DJ (2010) The case for determinism in database systems. Proc VLDB Endow 3(1–2):70–80 Thomson A, Diamond T, Weng S-C, Ren K, Shao P, Abadi DJ (2012) CalvIn: fast distributed transactions for partitioned database systems. In: SIGMOD, pp 1–12 Jones EP, Abadi DJ, Madden S (2010) Low overhead concurrency control for partitioned main memory databases. In: SIGMOD. ACM, pp 603–614 Pavlo A, Jones EP, Zdonik S (2011) On predictive modeling for optimizing transaction execution in parallel OLTP systems. Proc VLDB Endow 5(2):85–96 Wu Y, Arulraj J, Lin J, Xian R, Pavlo A (2017) An empirical evaluation of in-memory multi-version concurrency control. Proc VLDB Endow 10(7):781–792 Weikum G, Vossen G (2001) Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier, Amsterdam Diaconu C, Freedman C, Ismert E, Larson P-Å, Mittal P et al (2013) Hekaton: SQL server’s memory-optimized OLTP engine. In: SIGMOD, pp 1243–1254 Berenson H, Bernstein P, Gray J, Melton J, O’Neil E, O’Neil P (1995) A critique of ANSI SQL isolation levels. SIGMOD Rec 24:1–10 Fekete A, Liarokapis D, O’Neil E, O’Neil P, Shasha D (2005) Making snapshot isolation serializable. TODS 30(2):492–528 Jorwekar S, Fekete A, Ramamritham K, Sudarshan S (2007) Automating the detection of snapshot isolation anomalies. In: VLDB, pp 1263–1274 Cahill MJ, Röhm U, Fekete AD (2009) Serializable isolation for snapshot databases. DMoNH 34(4):20 Revilak S, O’Neil P, O’Neil E (2011) Precisely serializable snapshot isolation (PSSI). In: ICDE. IEEE, pp 482–493 Ports DR, Grittner K (2012) Serializable snapshot isolation in PostgreSQL. Proc VLDB Endow 5(12):1850–1861 Wang T, Johnson R, Fekete A, Pandis I (2015) The serial safety net: efficient concurrency control on modern hardware. In: DMoNH. ACM, p 8 Adya A, Liskov BH (1999) Weak consistency: a generalized theory and optimistic implementations for distributed transactions. Doctoral dissertation, Massachusetts Institute of Technology Kim K, Wang T, Johnson R, Pandis I (2016) ERMIA: fast memory-optimized database system for heterogeneous workloads. In: SIGMOD. ACM, pp 1675–1687 Jung H, Han H, Fekete A, Röhm U, Yeom HY (2013) Performance of serializable snapshot isolation on multicore servers. In: DASFAA (2), Lecture Notes in Computer Science. Springer, pp 416–430 Han H, Park S, Jung H, Fekete A, Rohm U, Yeom HY (2014) Scalable serializable snapshot isolation for multicore systems. In: ICDE Malviya N, Weisberg A, Madden S, Stonebraker M (2014) Rethinking main memory OLTP recovery. In: ICDE. IEEE, pp 604–615 Yao C, Agrawal D, Chen G, Ooi BC, Wu S (2016) Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases. In: SIGMOD. ACM, pp 1119–1134 Johnson R, Pandis I, Stoica R, Athanassoulis M, Ailamaki A (2010) Aether: a scalable approach to logging. Proc VLDB Endow 3(1–2):681–692 Wang T, Johnson R (2014) Scalable logging through emerging non-volatile memory. Proc VLDB Endow 7(10):865–876 Helland P, Sammer H, Lyon J, Carr R, Garrett P, Reuter A (1989) Group commit timers and high volume transaction systems. In: High performance transaction systems. Springer, pp 301–329 Jung H, Han H, Kang S (2017) Scalable database logging for multicores. PVLDB 11(2):135–148 Fang R, Hsiao H-I, He B, Mohan C, Wang Y (2011) High performance database logging using storage class memory. In: ICDE. IEEE, pp 1221–1231 Huang J, Schwan K, Qureshi MK (2014) NVRAM-aware logging in transaction systems. Proc VLDB Endow 8(4):389–400 NVM (2018) https://en.wikipedia.org/wiki/Non-volatile_memory. Accessed 2018 Wu Y, Guo W, Chan C, Tan K (2017) Fast failure recovery for main-memory DBMSs on multicores. In: SIGMOD, pp 267–281 Arulraj J, Perron M, Pavlo A (2016) Write-behind logging. Proc VLDB Endow 10(4):337–348 Comer D (1979) The ubiquitous b-tree. ACM Comput Surv 11(2):121–137 Rao J, Ross KA (2000) Making b+-trees cache conscious in main memory. SIGMOD Rec 29:475–486 Rao J, Ross KA (1999) Cache conscious indexing for decision-support in main memory. In: VLDB, pp 78–89 Schlegel B, Gemulla R, Lehner W (2009) K-ary search on modern processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM, New York, pp 52–60 Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P (2010) Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In: SIGMOD, ACM, pp 339–350 Mao Y, Kohler E, Morris RT (2012) Cache craftiness for fast multicore key-value storage. In: EuroSys. ACM, pp 183–196 Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N (2018) The case for learned index structures. In: SIGMOD, pp 489–504 Leis V, Kemper A, Neumann T (2013) The adaptive radix tree: artful indexing for main-memory databases. In: ICDE. IEEE, pp 38–49 Tree R. https://en.wikipedia.org/wiki/trie Böhm M, Schlegel B, Volk PB, Fischer U, Habich D, Lehner W (2011) Efficient in-memory indexing with generalized prefix trees. In: BTW, Germany, pp 227–246 Zhang H, Andersen DG, Pavlo A, Kaminsky M, Ma L, Shen R (2016) Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In: SIGMOD. ACM, pp 1567–1581 Wang Z, Pavlo A, Lim H, Leis V, Zhang H, Kaminsky M, Andersen DG (2018) Building a Bw-tree takes more than just buzz words. In: SIGMOD conference. ACM, pp 473–488 Binna R, Zangerle E, Pichl M, Specht G, Leis V (2018) HOT: a height optimized Trie index for main-memory database systems. In: SIGMOD, pp 521–534 Zhang H, Lim H, Leis V, Andersen DG, Kaminsky M, Keeton K, Pavlo A (2018) Surf: practical range query filtering with fast succinct tries. In: SIGMOD, pp 323–336 Jacobson G (1989) Space-efficient static trees and graphs. In: 30th annual symposium on foundations of computer science, Research Triangle Park, NC, USA, 30 Oct–1 Nov 1989, pp 549–554 Stoica R, Ailamaki A (2013) Enabling efficient OS paging for main-memory OLTP databases. In: DaMoN, p 7 Funke F, Kemper A, Neumann T (2012) Compacting transactional data in hybrid OLTP&OLAP databases. Proc VLDB Endow 5(11):1424–1435 Eldawy A, Levandoski J, Larson P-Å (2014) Trekking through siberia: managing cold data in a memory-optimized database. Proc VLDB Endow 7(11):931–942 Levandoski JJ, Larson P-Å, Stoica R (2013) Identifying hot and cold data in main-memory databases. In: ICDE. IEEE, pp 26–37 Cache replacement. https://en.wikipedia.org/wiki/cache-replacement-policies. Accessed 2018 DeBrabant J, Pavlo A, Tu S, Stonebraker M, Zdonik S (2013) Anti-caching: a new approach to database management system architecture. Proc VLDB Endow 6(14):1942–1953 Lang H, Mühlbauer T, Funke F, Boncz PA, Neumann T, Kemper A (2016) Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD. ACM, pp 311–326 Levandoski JJ, Lomet DB, Sengupta S (2013) The Bw-tree: a b-tree for new hardware platforms. In: ICDE. IEEE, pp 302–313 Graefe G (2010) A survey of b-tree locking techniques. TODS 35(3):16 Lehman PL et al (1981) Efficient locking for concurrent operations on b-trees. TODS 6(4):650–670 Cha SK, Hwang S, Kim K, Kwon K (2001) Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In: VLDB, vol 1, pp 181–190 Sewall J, Chhugani J, Kim C, Satish N, Dubey P (2011) PALM: parallel architecture-friendly latch-free modifications to b+ trees on many-core processors. Proc VLDB Endow 4(11):795–806 Neumann T (2011) Efficiently compiling efficient query plans for modern hardware. Proc VLDB Endow 4(9):539–550 Graefe G (1994) Volcano: an extensible and parallel query evaluation system. IEEE Trans Knowl Data Eng 6(1):120–135 Yan C, Cheung A (2016) Leveraging lock contention to improve OLTP application performance. Proc VLDB Endow 9(5):444–455 Wang Z, Mu S, Cui Y, Yi H, Chen H, Li J (2016) Scaling multicore databases via constrained parallel execution. In: SIGMOD. ACM, pp 1643–1658 Wang Z, Mu S, Cui Y, Yi H, Chen H, Li J (2016) Scaling multicore databases via constrained parallel execution. In: SIGMOD, ACM. New York, NY, pp 1643–1658 Mu S, Cui Y, Zhang Y, Lloyd W, Li J (2014) Extracting more concurrency from distributed transactions. In: OSDI, pp 479–494 Curino C, Jones E, Zhang Y, Madden S (2010) Schism: a workload-driven approach to database replication and partitioning. Proc VLDB Endow 3(1–2):48–57 Pavlo A, Curino C, Zdonik S (2012) Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD. ACM, pp 61–72 Taft R, Mansour E, Serafini M, Duggan J, Elmore AJ, Aboulnaga A, Pavlo A, Stonebraker M (2014) E-store: fine-grained elastic partitioning for distributed transaction processing systems. Proc VLDB Endow 8(3):245–256 Two-phase commit protocol. https://en.wikipedia.org/wiki/two-phase-commit-protocol. Accessed 2018 Lin Q, Chang P, Chen G, Ooi BC, Tan K-L, Wang Z (2016) Towards a non-2pc transaction management in distributed database systems. In: SIGMOD Ren K, Thomson A, Abadi DJ (2014) An evaluation of the advantages and disadvantages of deterministic database systems. Proc VLDB Endow 7(10):821–832 Ailamaki A, DeWitt DJ, Hill MD, Wood DA (1999) DBMSs on a modern processor: where does time go? In: VLDB, pp 266–277 Sirin U, Tözün P, Porobic D, Ailamaki A (2016) Micro-architectural analysis of in-memory OLTP. In: SIGMOD, Vol. 215922 Tözün P, Gold B, Ailamaki A (2013) OLTP in wonderland: where do cache misses come from in major OLTP components?. In: DaMoN. ACM, p 8 Miller JE, Kasture H, Kurian G, Gruenwald C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: HPCA-16. IEEE, pp 1–12 Salomie T-I, Subasu IE, Giceva J, Alonso G (2011) Database engines on multicores, why parallelize when you can distribute? In: EuroSys. ACM, pp 17–30 Appuswamy R, Anadiotis A, Porobic D, Iman M, Ailamaki A (2017) Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads. Proc VLDB Endow 11(2):121–134 Mühlbauer T, Rödiger W, Reiser A, Kemper A, Neumann T (2013) ScyPer: elastic OLAP throughput on transactional data. In: DanaC. ACM, pp 11–15 Lahiri T, Neimat M-A, Folkman S (2013) Oracle TimesTen: an in-memory database for enterprise applications. IEEE Data Eng Bull 36(2):6–13 Lindström J, Raatikka V, Ruuth J, Soini P, Vakkila K (2013) IBM solidDB: in-memory database optimized for extreme speed and availability. IEEE Data Eng Bull 36(2):14–20 MemSQL Inc., MemSQL. http://www.memsql.com. Accessed 2018 Freedman C, Ismert E, Larson P-Å (2014) Compilation in the Microsoft SQL Server Hekaton engine. IEEE Data Eng Bull 37(1):22–30 Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO. IEEE, pp 75–86 Fredkin E (1960) Trie memory. CACM 3(9):490–499 Wolski A, Raatikka V (2006) Performance measurement and tuning of hot-standby databases. In: International service availability symposium. Springer, pp 149–161 Chan C-Y, Ioannidis YE (1998) Bitmap index design and evaluation. SIGMOD Rec 27:355–366 Bernstein PA, Hadzilacos V, Goodman N (1987) Concurrency control and recovery in database systems. Addison-Wesley, Boston Altibase. Altibase administrator’s manual release Pavlo A, Angulo G, Arulraj J, Lin H, Lin J, Ma L, Menon P, Mowry TC, Perron M, Quah I, Santurkar S, Tomasic A, Toor S, Aken DV, Wang Z, Wu Y, Xian R, Zhang T (2017) Self-driving database management systems. In: CIDR. www.cidrdb.org Peloton. https://github.com/cmu-db/peloton. Accessed 2018 Sikka V, Färber F, Lehner W, Cha SK, Peh T, Bornhövd C (2012) Efficient transaction processing in SAP HANA database: the end of a column store myth. In: SIGMOD. ACM, pp 731–742 Wang T, Kimura H (2016) Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proc VLDB Endw 10(2):49–60 Pavlo A (2017) What are we doing with our lives? Nobody cares about our concurrency control research. In: SIGMOD, p 3 Zhu T, Wang D, Hu H, Qian W, Wang X, Zhou A (2018) Interactive transaction processing for in-memory database system. In: DASFAA, part II, pp 228–246 Gray J et al (1996) The dangers of replication and a solution. SIGMOD Rec 25(2):173–182 Decandia G et al (2007) Dynamo: Amazon’s highly available key-value store. In: SOSP Cassandra website. http://cassandra.apache.org/. Accessed 2018 Bailis P, Davidson A, Fekete A et al (2013) Highly available transactions: virtues and limitations. Proc VLDB Endow 7(3):181–192 Lamport L (1998) The part-time parliament. TOCS 16(2):133–169 Lamport L (2001) Paxos made simple. ACM SIGACT News 32(4):18–25 Baker J, Bond C, Corbett JC et al (2011) Megastore: providing scalable, highly available storage for interactive services. In: CIDR, pp 223–234 Corbett JC, Jeffrey D et al (2013) Spanner: Googles globally distributed database. TOCS 31(3):8 Rao J, Shekita EJ, Tata S (2011) Using paxos to build a scalable, consistent, and highly available datastore. In: VLDB, pp 243–254