Compressing bitmap indexes for faster search operations

Kesheng Wu1, E.J. Otoo2, A. Shoshani2
1Lawrence Berkeley National Laboratory, Berkeley, CA, USA
2Lawrence Berkeley National Laboratory, Berkeley, CA USA

Tóm tắt

We study the effects of compression on bitmap indexes. The main operations on the bitmaps during query processing are bitwise logical operations. Using the general purpose compression schemes the logical operations on the compressed bitmaps are much slower than on the uncompressed bitmaps. Specialized compression schemes, like the byte-aligned bitmap code (BBC), are usually faster in performing logical operations than the general purpose schemes, but in many cases they are still orders of magnitude slower than the uncompressed scheme. To make the compressed bitmap indexes operate more efficiently, we designed a CPU-friendly scheme which we refer to as the word-aligned hybrid code (WAH). Tests on both synthetic and real application data show that the new scheme significantly outperforms well-known compression schemes at a modest increase in storage space. Compared to BBC, WAH performs logical operations about 12 times faster and uses only 60% more space. Compared to the uncompressed scheme, in most test cases WAH is faster while still using less space. We further verified with additional tests that the improvement in logical operation speed translates to similar improvement in query processing speed.

Từ khóa

#Character generation #Data warehouses #Laboratories #Query processing #Logic testing #Contracts #Scientific computing #Performance analysis #Indexing #Databases

Tài liệu tham khảo

10.1145/280277.280279 ishikawa, 1993, Evalution of signature files as set access facilities in OODBs, Proc of 1993 SIGMOD Conf, 247, 10.1145/170035.170076 johnson, 1999, Performance measurements of compressed bitmap indices, Proceedings of VLDB'99, 278 jürgens, 1999, Tree based indexes vs. bitmap indexes - a performance study, Proceedings of DMDW'99 10.1145/354756.354819 lee, 1995, Efficient signature file methods for text retrieval, IEEE Transactions on Knowledge and Data Engineering, 7 gailly, 1998, zlib 1 1 3 manual markl, 2000, Processing relational OLAP queries with UB-trees and multidimensional hierarchical clustering, Proc DMDW 2000 10.1145/133160.133210 o'neil, 1987, Model 204 architecture and performance, Proc 2nd International Workshop on High Performance Transaction Systems, 40 wu, 1998, Encoded bitmap indexing for data warehouses, Proc of IEEE ICDE 1998, 220 bernardo, 2000, Access coordination of tertiary storage for high energy physics applications, IEEE Symposium on Mass Storage Systems, 105 kesheng, 2001, Technical Report LBNL/PUB-3161 antoshenkov, 1996, Query processing and optimization in ORACLE RDB, The VLDB Journal, 5, 229, 10.1007/s007780050026 10.1145/304182.304201 10.1145/276304.276336 10.1145/356770.356776 10.1145/248603.248616 antoshenkov, 1994, Byte-aligned bitmap compression, Technical Report 10.1007/3-540-60584-3_30 amer-yahia, 2000, Optimizing queries on compressed bitmaps, Proceedings of VLDB 2000, 329 10.1145/253260.253268 patterson, 1996, Computer Architecture: A Quantitative Approach 10.1145/211990.212001 stockinger, 2000, Improving the performance of high-energy physics analysis through bitmap indices, Proceedings of DEXA 2000 10.1109/SSDM.1999.787637 wu, 1996, Technical Report RC 20449 wong, 1985, Bit transposed files, Proceedings of VLDB, 85, 448