Mixed Hierarchical Networks for Deep Entity Matching

Springer Science and Business Media LLC - 2021

Chuen-Tsai Sun¹, Derong Shen²

¹School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China

²School of Computer Science and Engineering, Northeastern University, Shenyang, China

Tóm tắt

Từ khóa

Tài liệu tham khảo

Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. IEEE Trans. Knowledge and Data Engineering, 2007, 19(1): 1-16. https://doi.org/10.1109/TKDE.2007.250581.

Christophides V, Efthymiou V, Palpanas T, Papadakis G, Stefanidis K. An overview of end-to-end entity resolution for big data. ACM Computing Surveys, 2021, 53(6): Article No. 127. https://doi.org/10.1145/3418896.

Papadakis G, Ioannou E, Palpanas T. Entity resolution: Past, present and yet-to-come. In Proc. the 23rd International Conference on Extending Database Technology, March 30–April 2, 2020, pp.647-650. https://doi.org/10.5441/002/edbt.2020.85.

Hernández M A, Stolfo S J. The merge/purge problem for large databases. ACM SIGMOD Record, 1995, 24(2): 127-138. https://doi.org/10.1145/568271.223807.

Singh R, Meduri V, Elmagarmid A, Madden S, Papotti P, Quiané-Ruiz J A, Solar-Lezama A, Tang N. Generating concise entity matching rules. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1635-1638. https://doi.org/10.1145/3035918.3058739.

Fellegi I P, Sunter A B. A theory for record linkage. Journal of the American Statistical Association, 1969, 64(328): 1183-210. https://doi.org/10.1080/01621459.1969.10501049.

Konda P, Das S, Suganthan G P et al. Magellan: Toward building entity matching management systems. Proceedings of the VLDB Endowment, 2016, 9(12): 1197-208. https://doi.org/10.14778/2994509.2994535.

Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N. Distributed representations of tuples for entity resolution. Proceedings of the VLDB Endowment, 2018, 11(11): 1454-1467. https://doi.org/10.14778/3236187.3236198.

Mudgal S, Li H, Rekatsinas T, Doan A, Park Y, Krishnan G, Deep R, Arcaute E, Raghavendra V. Deep learning for entity matching: A design space exploration. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.19-34. https://doi.org/10.1145/3183713.3196926.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. https://doi.org/10.1038/nature14539.

Fu C, Han X, Sun L, Chen B, Zhang W, Wu S, Kong H. End-to-end multi-perspective matching for entity resolution. In Proc. the 28th International Joint Conference on Artificial Intelligence, August 2019, pp.4961-4967. https://doi.org/10.24963/ijcai.2019/689.

Zhang D, Nie Y, Wu S, Shen Y, Tan K L. Multi-context attention for entity matching. In Proc. the Web Conference 2020, April 2020, pp.2634-2640. https://doi.org/10.1145/3366423.3380017.

Nie H, Han X, He B, Sun L, Chen B, Zhang W, Wu S, Kong H. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In Proc. the 28th ACM International Conference on Information and Knowledge Management, November 2019, pp.629-638. https://doi.org/10.1145/3357384.3358018.

Fu C, Han X, He J, Sun L. Hierarchical matching network for heterogeneous entity resolution. In Proc. the 29th International Joint Conference on Artificial Intelligence, July 2020, pp.3665-3671. https://doi.org/10.24963/ijcai.2020/507.

Efthymiou V, Papadakis G, Papastefanatos G, Stefanidis K, Palpanas T. Parallel meta-blocking for scaling entity resolution over big heterogeneous data. Information Systems, 2017, 65: 137-57. https://doi.org/10.1016/j.is.2016.12.001.

Araújo T B, Pires C E, Mestre D G, Nóbrega T P, Nascimento D C, Stefanidis K. A noise tolerant and schema-agnostic blocking technique for entity resolution. In Proc. the 34th ACM/SIGAPP Symposium on Applied Computing, April 2019, pp.422-430. https://doi.org/10.1145/3297280.3299730.

Li Y, Li J, Suhara Y, Doan A, Tan W C. Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 2020, 14(1): 50-60. https://doi.org/10.14778/3421424.3421431.

Brunner U, Stockinger K. Entity matching with transformer architectures—A step forward in data integration. In Proc. the 23rd International Conference on Extending Database Technology, March 30–April 2, 2020, pp.463-473. https://doi.org/10.5441/002/edbt.2020.58.

Thirumuruganathan S, Parambath S P, Ouzzani M, Tang N, Joty S R. Reuse and adaptation for entity resolution through transfer learning. arXiv:1809.11084, 2018. http://arxiv.org/abs/1809.11084, April 2021.

Kasai J, Qian K, Gurajada S, Li Y, Popa L. Low-resource deep entity resolution with transfer and active learning. In Proc. the 57th Conference of the Association for Computational Linguistics, July 2019, pp.5851-5861. https://doi.org/10.18653/v1/P19-1586.

Zhao C, He Y. Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In Proc. the 2019 World Wide Web Conference, May 2019, pp.2413-2424. https://doi.org/10.1145/3308558.3313578.

Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.1180-1189.

Sun C, Shen D. Entity resolution with hybrid attention-based networks. In Proc. the 26th International Conference on Database Systems for Advanced Applications, April 2021, pp.558-565. https://doi.org/10.1007/978-3-030-73197-7_3.

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2016, pp.1480-1489. https://doi.org/10.18653/v1/N16-1174.

Jiang J Y, Zhang M, Li C, Bendersky M, Golbandi N, Najork M. Semantic text matching for long-form documents. In Proc. the 2019 World Wide Web Conference, May 2019, pp.795-806. https://doi.org/10.1145/3308558.3313707.

Hu D. An introductory survey on attention mechanisms in NLP problems. In Proc. the 2019 Intelligent Systems Conference, September 2019, pp.432-448. https://doi.org/10.1007/978-3-030-29513-4_31.

Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, December 2013, pp.3111-3119.

Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1532-1543. https://doi.org/10.3115/v1/D14-1162.

Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. https://doi.org/10.1162/tacl_a_00051.

Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1724-1734. https://doi.org/10.3115/v1/D14-1179.

Lin Z, Feng M, Santos C N, Yu M, Xiang B, Zhou B, Bengio Y. A structured self-attentive sentence embedding. In Proc. the 2017 International Conference on Learning Representations, April 2017.

Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 2015 International Conference on Learning Representations, May 2015.

Tang M, Cai J, Zhuo H. Multi-matching network for multiple choice reading comprehension. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, pp.7088-7095. https://doi.org/10.1609/aaai.v33i01.33017088.

Saito K, Watanabe K, Ushiku Y, Harada T. Maximum classifier discrepancy for unsupervised domain adaptation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.3723-3732. https://doi.org/10.1109/CVPR.2018.00392.

Wang J, Li G, Yu J X, Feng J. Entity matching: How similar is similar. Proceedings of the VLDB Endowment, 2010, 4(10): 622–633. https://doi.org/10.14778/2021017.2021020.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA