An efficient multidimensional $L_{\infty }$ wavelet method and its application to approximate query processing

Springer Science and Business Media LLC - Tập 24 - Trang 105-133 - 2020
Xueyan Guo1,2, Tongliang Li2, Xiaoyun Li2, Huanyu Zhao2, Suzhen Wang3, Chaoyi Pang2,4
1School of Mathematical Sciences, Nankai University, Tianjin, China
2Institute of Applied Mathematics, Hebei Academy of Sciences, Shijiazhuang, China
3Hebei University of Economics and Business, Shijiazhuang, China
4School of Computer and Data Engineering, NingboTech University, Ningbo, China

Tóm tắt

Approximate query processing (AQP) has been an effective approach for real-time and online query processing for today’s query systems. It provides approximate but fast query results to users. In wavelet based AQP, queries are executed against the wavelet synopsis which is a lossy, compressed representation of the original data returned by a specific wavelet method. Wavelet synopsis optimized for $L_{\infty }$ -norm error can guarantee approximate error of each individual element, thus it can provide error guaranteed query results for many queries. However, most algorithms for building one dimensional $L_{\infty }$ synopsis are of super linear complexity, which makes the extension to their multidimensional case challengeable. In this paper, we propose an efficient multidimensional wavelet method towards constructing $L_{\infty }$ synopsis and we apply it to AQP. The proposed wavelet method can bound the approximate error of each individual element and it has linear time complexity. It can also provide fast AQP. These good properties are all verified theoretically. Extensive experiments on both synthetic and real-life datasets are presented to show its effectiveness and efficiency for data compression and AQP.

Tài liệu tham khảo

Ajtai, M., Feldman, V., Hassidim, A., Nelson, J.: Sorting and selection with imprecise comparisons. ACM Trans. Algor. 12(2), 1–19 (2016) Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and Trends in Databases 4(1-3), 1–294 (2012) Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. The VLDB Journal 10(2-3), 199–223 (2001) Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: ACM SIGMOD International Conference on Management of Data, pp 476–487 (2002) Garofalakis, M., Gibbons, P.B.: Probabilistic wavelet synopses. ACM Transactions on Database Systems 29(1), 43–90 (2004) Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. Acm Transactions on Database Systems 30(4), 888–928 (2005) Guha, S.: Space efficiency in synopsis construction algorithms. In: International Conference on Very Large Data Bases, pp 409–420 (2005) Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp 88–97 (2005) Guha, S.: On the space-time of optimal, approximate and streaming algorithms for synopsis construction problems. The VLDB Journal 17(6), 1509–1535 (2008) Guo, X., Zhao, H., Li, X., Li, T., Dai, M.: Eeg signal analysis based on fixed-value shift compression algorithm. In: International Conference on Natural Computation, pp 959–963 (2015) Han, X., Li, J., Gao, H.: Efficiently processing (p, ε)-approximate join aggregation on massive data. Inform. Sci. 278, 773–792 (2014) Jawerth, B., Sweldens, W.: An overview of wavelet based multiresolution analyses. Siam Review 36(3), 377–412 (1994) Jiang, Y., Pang, C., Zhang, H.L., Wang, J., Li, T., Zhang, Q., He, J.: Finding the minimum number of elements with sum above a threshold. Inform. Sci. 238(7), 205–211 (2013) Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 380–389 (2007) Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: International Conference on Very Large Data Bases, pp 421–432 (2005) Liu, J., Yan, D.L.: Answering approximate queries over xml data. IEEE Trans. Fuzzy Syst. 24(2), 288–305 (2016) Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1999) Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. ACM SIGMOD Record 27(2), 448–459 (1998) Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: International Conference on Very Large Data Bases, pp 101–110 (2000) Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: International Conference on Foundations of Software Technology and Theoretical Computer Science, pp 285–296 (2005) Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65(1), 1–42 (2013) Stollnitz, E.J., Derose, T.D., David, H.: Salesin wavelets for computer graphics: theory and applications (1996) Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: International Conference on Extending Database Technology, pp 732–743 (2009) Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. Acm Sigmod Record 28(2), 193–204 (1999) Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: International Conference on Database Systems for Advanced Applications, pp 646–661 (2009) Wang, Y., Xia, Y., Fang, Q., Xu, X.: Aqp++: a hybrid approximate query processing framework for generalized aggregation queries. J. Comput. Sci. 26, 419–431 (2018)