SOFA: An extensible logical optimizer for UDF-heavy data flows
Tài liệu tham khảo
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, in: Proceedings of the Symposium on Operating Systems Design and Implementation, 2004, pp. 137–150.
Sakr, 2011, A survey of large scale data management approaches in cloud environments, IEEE Commun. Surv. Tutor., 13, 311, 10.1109/SURV.2011.032211.00087
Beyer, 2011, JAQL, Proc. VLDB Endow., 4, 1272, 10.14778/3402755.3402761
A. Heise, A. Rheinländer, M. Leich, U. Leser, F. Naumann, Meteor/Sopremo: an extensible query language and operator model, in: Proceedings of the Int. Workshop on End-to-End Management of Big Data (BigData) in conjunction with VLDB, 2012.
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in: Proceedings of the International Conference on Management of Data, 2008, pp. 1099–1110.
M.J. Cafarella, C. Ré, Manimal: relational optimization for data-intensive programs, in: Proceedings of the ACM SIGMOD Workshop on the Web and Databases, 2010, pp. 10:1–10:6.
Hueske, 2012, Opening the black boxes in data flow optimization, Proc. VLDB Endow., 5, 1256, 10.14778/2350229.2350244
S. Wu, F. Li, S. Mehrotra, B.C. Ooi, Query optimization for massively parallel data processing, in: Proceedings of International Symposium on Cloud Computing, 2011, pp. 12:1–12:13.
Alexandrov, 2014, The stratosphere platform for big data analytics, VLDB J., 1
M.T. Roth, P.M. Schwarz, Don׳t scrap it, wrap it! A wrapper architecture for legacy data sources, in: Proceedings of the International Conference on Very Large Databases, 1997, pp. 266–275.
Graefe, 1994, Volcano—an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., 6, 120, 10.1109/69.273032
M.A. Hernández, S.J. Stolfo, The merge/purge problem for large databases, in: SIGMOD Conference, 1995, pp. 127–138.
Dantsin, 2001, Complexity and expressive power of logic programming, ACM Comput. Surv., 33, 374, 10.1145/502807.502810
Graefe, 1995, The Cascades framework for query optimization, IEEE Data Eng. Bull., 18, 19
L.M. Haas, J.C. Freytag, G.M. Lohman, H. Pirahesh, Extensible query processing in Starburst, in: Proceedings of the International Conference on Management of Data, 1989, pp. 377–388.
H. Pirahesh, J.M. Hellerstein, W. Hasan, Extensible/rule based query rewrite optimization in Starburst, in: Proceedings of the International Conference on Management of Data, 1992, pp. 39–48.
S. Chaudhuri, K. Shim, Query optimization in the presence of foreign functions, in: Proceedings of the International Conference on Very Large Databases, 1993, pp. 529–542.
Chaudhuri, 1999, Optimization of queries with user-defined predicates, ACM Trans. Database Syst., 24, 177, 10.1145/320248.320249
J.M. Hellerstein, M. Stonebraker, Predicate migration: optimizing queries with expensive predicates, in: Proceedings of the International Conference on Management of Data, 1993, pp. 267–276.
U. Srivastava, K. Munagala, J. Widom, R. Motwani, Query optimization over web services, in: Proceedings of the International Conference on Very Large Databases, 2006, pp. 355–366.
Ogasawara, 2011, An algebraic approach for data-centric scientific workflows, Proc. VLDB Endow., 4, 1328, 10.14778/3402755.3402766
A. Simitsis, P. Vassiliadis, T.K. Sellis, Optimizing ETL processes in data warehouses, in: Proceedings of the International Conference on Data Engineering, 2005, pp. 564–575.
Lim, 2012, Stubby, Proc. VLDB Endow., 5, 1196, 10.14778/2350229.2350239
L. Fegaras, C. Li, U. Gupta, An optimization framework for map-reduce queries, in: Proceedings of the International Conference on Extending Database Technology, 2012, pp. 26–37.
J. Zhang, H. Zhou, R. Chen, X. Fan, Z. Guo, H. Lin, J.Y. Li, W. Lin, J. Zhou, L. Zhou, Optimizing data shuffling in data-parallel computation by understanding user-defined functions, in: Proceedings of the USENIX Conference on Networked Systems Design and Implementation, 2012, p. 22.
Thusoo, 2009, Hive, Proc. VLDB Endow., 2, 1626, 10.14778/1687553.1687609
Y. Sagiv, Optimizing datalog programs, in: Proceedings of the Symposium on Principles of Database Systems, 1987, pp. 349–362.
Heise, 2012, Integrating open government data with Stratosphere for more transparency, Web Semant., 14, 45, 10.1016/j.websem.2012.02.002