Communication Steps for Parallel Query Processing
Tóm tắt
We study the problem of computing conjunctive queries over large databases on parallel architectures without shared storage. Using the structure of such a query
When the data are free of skew, we obtain essentially tight upper and lower bounds for one round algorithms, and we show how the bounds degrade when there is skew in the data. In the case of skewed data, we show how to improve the algorithms when approximate degrees of the (necessarily small number of) heavy-hitter elements are available, obtaining essentially optimal algorithms for queries such as skewed simple joins and skewed triangle join queries.
For queries that we identify as
Our upper bounds are given by simple structured algorithms using MapReduce. Our one-round lower bounds are proved in a very general model, which we call the
Từ khóa
Tài liệu tham khảo
Foto N. Afrati , Manas R. Joglekar , Christopher Ré , Semih Salihoglu , and Jeffrey D. Ullman . 2017. GYM: A multiround distributed join algorithm . In Proceedings of the 20th International Conference on Database Theory (ICDT’17) . 4:1--4:18. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT. 2017 .4 10.4230/LIPIcs.ICDT.2017.4 Foto N. Afrati, Manas R. Joglekar, Christopher Ré, Semih Salihoglu, and Jeffrey D. Ullman. 2017. GYM: A multiround distributed join algorithm. In Proceedings of the 20th International Conference on Database Theory (ICDT’17). 4:1--4:18. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT.2017.4
F. N. Afrati A. D. Sarma S. Salihoglu and J. D. Ullman. 2012. Upper and lower bounds on the cost of a map-reduce computation. CoRR abs/1206.4377 (2012). F. N. Afrati A. D. Sarma S. Salihoglu and J. D. Ullman. 2012. Upper and lower bounds on the cost of a map-reduce computation. CoRR abs/1206.4377 (2012).
Johann Brault-Baron . 2016. Hypergraph acyclicity revisited. ACM Comput. Surv. 49, 3 ( 2016 ), 54:1--54:26. http://doi.acm.org/10.1145/2983573 Johann Brault-Baron. 2016. Hypergraph acyclicity revisited. ACM Comput. Surv. 49, 3 (2016), 54:1--54:26. http://doi.acm.org/10.1145/2983573
S. Chaudhuri. 2012. What next?: A half-dozen data management research goals for big data and the cloud. In PODS. 1--4. S. Chaudhuri. 2012. What next?: A half-dozen data management research goals for big data and the cloud. In PODS. 1--4.
J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In OSDI. 137--150. J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In OSDI. 137--150.
EMC Corporation. Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field. Retrieved from http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf. EMC Corporation. Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field. Retrieved from http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf.
A. Gál and P. Gopalan. 2007. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In FOCS. 294--304. A. Gál and P. Gopalan. 2007. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In FOCS. 294--304.
Sudipto Guha and Zhiyi Huang . 2009. Revisiting the direct sum theorem and space lower bounds in random order streams . In ICALP (LNCS) , Vol. 5555 . Springer , 513--524. Sudipto Guha and Zhiyi Huang. 2009. Revisiting the direct sum theorem and space lower bounds in random order streams. In ICALP (LNCS), Vol. 5555. Springer, 513--524.
Paraschos Koutris , Paul Beame , and Dan Suciu . 2016 . Worst-case optimal algorithms for parallel query processing . In Proceedings of the 19th International Conference on Database Theory (ICDT’16) . 8:1--8:18. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.8 10.4230/LIPIcs.ICDT.2016.8 Paraschos Koutris, Paul Beame, and Dan Suciu. 2016. Worst-case optimal algorithms for parallel query processing. In Proceedings of the 19th International Conference on Database Theory (ICDT’16). 8:1--8:18. DOI:http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.8
H. Q. Ngo E. Porat C. Ré and A. Rudra. 2012. Worst-case optimal join algorithms: (Extended abstract). In PODS. 37--48. H. Q. Ngo E. Porat C. Ré and A. Rudra. 2012. Worst-case optimal join algorithms: (Extended abstract). In PODS. 37--48.
Spark. Apache Spark. http://spark.apache.org/. Spark. Apache Spark. http://spark.apache.org/.
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614.