Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture

Big Data Research - Tập 26 - Trang 100267 - 2021
Lining Yu1, Tiezheng Nie1, Derong Shen1, Yue Kou1
1College of Computer Science and Technology, Northeastern University, China

Tài liệu tham khảo

Chaudhuri, 2006, A primitive operator for similarity joins in data cleaning, 5 Bayardo, 2007, Scaling up all pairs similarity search, 131 Xiao, 2011, Efficient similarity joins for near-duplicate detection, TODS, 36, 15, 10.1145/2000824.2000825 Arasu, 2006, Efficient exact set-similarity joins, 918 Mann, 2014, PEL: position-enhanced length filter for set similarity joins, 89 Mann, 2016, An empirical evaluation of set similarity join techniques, Proc. VLDB Endow., 9, 636, 10.14778/2947618.2947620 Vernica, 2010, Efficient parallel set-similarity joins using MapReduce, 6 Deng, 2018, Overlap set similarity joins with theoretical guarantees, 905 Wang, 2012, Can we beat the prefix filtering?: an adaptive framework for similarity join and search, 85 Wang, 2019, Leveraging set relations in exact and dynamic set similarity join, VLDB J., 28, 267, 10.1007/s00778-018-0529-2 Ma, 2019, Similarity histogram estimation based top-k similarity join algorithm on high-dimensional data, vol. 11817 Zhou, 2018, A generic inverted index framework for similarity search on the GPU Sandes, 2017 Li, 2018, A GPU accelerated update efficient index for kNN queries in road networks Kruliš, 2015, Optimizing sorting and top-k selection steps in permutation based indexing on GPUs Wang, 2017 Gowanlock, 2016, Distance threshold similarity searches: efficient trajectory indexing on the GPU, IEEE Trans. Parallel Distrib. Syst., 27, 2533, 10.1109/TPDS.2015.2500896 Papenbrock, 2015, Progressive duplicate detection, IEEE Trans. Knowl. Data Eng., 27, 1316, 10.1109/TKDE.2014.2359666 Whang, 2013, Pay-as-you-go entity resolution, IEEE Trans. Knowl. Data Eng., 25, 1111, 10.1109/TKDE.2012.43 Simonini, 2019, Schema-agnostic progressive entity resolution, IEEE Trans. Knowl. Data Eng., 31, 1208, 10.1109/TKDE.2018.2852763 Cai, 2020, Target-aware holistic influence maximization in spatial social networks, IEEE Trans. Knowl. Data Eng. early access, 10.1109/TKDE.2020.3003047 Hernández, 1995, The merge/purge problem for large databases, 127 Bloom, 1970, Space/time tradeoffs in hash coding with allowable errors, Commun. ACM, 13, 422, 10.1145/362686.362692 Christen, 2012, A survey of indexing techniques for scalable set linkage and deduplication, IEEE Trans. Knowl. Data Eng., 24, 1537, 10.1109/TKDE.2011.127 Nvidia, 2017 Yu, 2020, An approach for progressive set similarity join with GPU accelerating, 155 Zhao, 2021, Deep Attributed Network Representation Learning of Complex Coupling and Interaction, Knowl.-Based Syst., 212 Wang, 2020, Distributed Pregel-Based Provenance-Aware Regular Path Query Processing on RDF Knowledge Graphs, World Wide Web J., 23, 1465, 10.1007/s11280-019-00739-0