Efficient set containment join
Tóm tắt
Từ khóa
Tài liệu tham khảo
http://liu.cs.uic.edu/download/data/
http://www.cim.mcgill.ca/~dudek/206/Logs/AOL-user-ct-collection
http://www.informatik.uni-freiburg.de/~cziegler/BX/
http://dai-labor.de/IRML/datasets
http://www.discogs.com/
http://www.cs.cmu.edu/~enron
http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html
http://konect.uni-koblenz.de/networks/lkml_person-thread
http://socialnetworks.mpi-sws.org/data-imc2007.html
http://www.clearbits.net/torrents/1881-dec-2011
http://vi.sualize.us/
http://wiki.dbpedia.org/Downloads
Afrati, F.N., Sarma, A.D., Menestrina, D., Parameswaran, A., Ullman, J.D.: Fuzzy joins using mapreduce. In: ICDE, pp. 498–509 (2012)
Agrawal, P., Arasu, A., Kaushik, R.: On indexing error-tolerant set containment. In: SIGMOD, pp. 927–938 (2010)
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)
Baeza-Yates, R., Salinger, A.: A fast set intersection algorithm for sorted sequences. In: CPM (2004)
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW, pp. 131–140 (2007)
Bouros, P., Mamoulis, N., Ge, S., Terrovitis, M.: Set containment join revisited. In: Knowledge and Information Systems, pp. 1–28 (2015)
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE (2006)
Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: a mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)
Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. In: VLDB, pp. 360–371 (2015)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Helmer, S., Moerkotte, G.: Evaluation of main memory join algorithms for joins with set comparison predicates. In: VLDB, pp. 386–395 (1997)
Hmedeh, Z., Kourdounakis, H., Christophides, V., Du Mouza, C., Scholl, M., Travers., N.: Subscription indexes for web syndication systems. In: EDBT, pp. 312–323 (2012)
Hu, X., Tao, Y., Yi, K.: Output-optimal parallel algorithms for similarity joins. In: PODS, pp. 79–90 (2017)
Jampani, R., Pudi, V.: Using prefix-trees for efficiently computing set joins. In: DASFAA, pp. 761–772 (2005)
Kunkel, A., Rheinländer, A., Schiefer, C., Helmer, S., Bouros, P., Leser, U.: Piejoin: towards parallel set containment joins. In: SSDBM, p. 11 (2016)
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: SIGKDD, pp. 497–506 (2009)
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Luo, Y., Fletcher, G.H., Hidders, J., De Bra, P.: Efficient and scalable trie-based algorithms for computing set containment relations. In: ICDE, pp. 303–314 (2015)
Mamoulis, N.: Efficient processing of joins on set-valued attributes. In: SIGMOD, pp. 157–168 (2003)
Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. In: VLDB, pp. 636–647 (2016)
Melnik, S., Garcia-Molina, H.: Divide-and-conquer algorithm for computing set containment joins. In: EDBT, pp. 427–444 (2002)
Melnik, S., Garcia Molina, H.: Adaptive algorithms for set containment joins. TODS 28(1), 56–99 (2003)
Metwally, A., Faloutsos, C.: V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors. In: VLDB, pp. 704–715 (2012)
Ramasamy, K., Patel, J.M., Naughton, J.F., Kaushik, R.: Set containment joins: the good, the bad and the ugly. In: VLDB, pp. 351–362 (2000)
Sun, J., Shang, Z., Li, G., Dend, D., Bao, Z.: Dima: a distributed in-memory similarity-based query processing system. In: VLDB, pp. 1925–1928 (2017)
Terrovitis, M., Bouros, P., Vassiliadis, P., Sellis, T., Mamoulis, N.: Efficient answering of set containment queries for skewed item distributions. In: EDBT, pp. 225–236 (2011)
Terrovitis, M., Passas, S., Vassiliadis, P., Sellis, T.: A combination of trie-trees and inverted files for the indexing of set-valued attributes. In: CIKM, pp. 728–737 (2006)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD, pp. 495–506 (2010)
Wang, J., Feng, J., Li, G.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. In: VLDB, pp. 1219–1230 (2010)
Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering? An adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)
Wang, X., Qin, L., Lin, X., Zhang, Y., Chang, L.: Leveraging set relations in exact set similarity join. In: VLDB, pp. 925–936 (2017)
Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW, pp. 131–140 (2008)
Yan, T.W., García-Molina, H.: Index structures for selective dissemination of information under the boolean model. TODS 19(2), 332–364 (1994)