Incremental Density-Based Clustering on Multicore Processors

IEEE Transactions on Pattern Analysis and Machine Intelligence - Tập 44 Số 3 - Trang 1338-1356 - 2022

Son T. Mai¹, Jon Jacobsen², Sihem Amer-Yahia³, Ivor Spence¹, Nhat-Phuong Tran¹, Ira Assent², Quoc Viet Hung Nguyen⁴

¹Queen’s University Belfast, Belfast, United Kingdom

²Aarhus University, Aarhus, Denmark

³University of Grenoble Alpes, Saint-Martin-d'Hères, France

⁴Griffith University, Brisbane, Queensland, Australia

Tóm tắt

The density-based clustering algorithm is a fundamental data clustering technique with many real-world applications. However, when the database is frequently changed, how to effectively update clustering results rather than reclustering from scratch remains a challenging task. In this work, we introduce IncAnyDBC, a unique parallel incremental data clustering approach to deal with this problem. First, IncAnyDBC can process changes in bulks rather than batches like state-of-the-art methods for reducing update overheads. Second, it keeps an underlying cluster structure called the object node graph during the clustering process and uses it as a basis for incrementally updating clusters wrt. inserted or deleted objects in the database by propagating changes around affected nodes only. In additional, IncAnyDBC actively and iteratively examines the graph and chooses only a small set of most meaningful objects to produce exact clustering results of DBSCAN or to approximate results under arbitrary time constraints. This makes it more efficient than other existing methods. Third, by processing objects in blocks, IncAnyDBC can be efficiently parallelized on multicore CPUs, thus creating a work-efficient method. It runs much faster than existing techniques using one thread while still scaling well with multiple threads. Experiments are conducted on various large real datasets for demonstrating the performance of IncAnyDBC.

Từ khóa

#Density-based clustering #anytime clustering #incremental clustering #active clustering #multicore CPUs

Tài liệu tham khảo

Aggarwal, 2014, Data Clustering: Algorithms and Applications, 10.1201/b17320 Han, 2012, Data Mining: Concepts and Techniques Ester, Incremental clustering for mining in a data warehousing environment, Proc. 24rd Int. Conf. Very Large Data Bases, 323 10.1007/978-3-319-56608-5_50 10.1145/2505515.2507837 10.1145/3035918.3064050 10.5555/3001460.3001507 10.1145/1247480.1247546 10.1109/ICDM.2012.95 10.1016/j.pmcj.2009.07.005 10.1145/3083897 10.1145/2723372.2737792 Gunawan, 2013, A faster algorithm for DBSCAN 10.1145/2939672.2939750 Frank, 2019, UCI machine learning repository 10.1111/j.1365-2966.2006.11287.x 10.1007/s10618-018-0562-1 10.1145/3068335 10.1007/s10115-016-1004-2 10.1109/SC.2012.9 Tan, 2005, Introduction to Data Mining, (First Edition) 10.1145/2834892.2834894 10.1145/258533.258657 Ackerman, Incremental clustering: The case for extra clusters, Proc. 27th Int. Conf. Neural Inf. Process. Syst., 307 10.1007/978-3-540-24741-8_8 10.1109/ICDE.2017.94 10.1109/TKDE.2018.2828086 10.1137/1.9781611972764.29 10.1145/1552303.1552307 10.1007/978-3-319-55705-2_10 10.1007/978-3-319-44039-2_3 10.1109/ICDM.2004.10082 Böhm, Efficient anytime density-based clustering, Proc. SIAM Int. Conf. Data Mining, 112 10.1007/s10115-014-0797-0 10.1109/DASFAA.2001.916361 10.1109/ICISIP.2004.1287631 10.1007/3-540-36175-8_56 10.1007/978-3-642-37456-2_14 10.1145/2733381 10.1007/11731139_22 10.14778/3021924.3021932 10.1109/SC.2014.51 10.1145/1645953.1646038 10.1109/ICDE.1998.655795 10.1109/ICPADS.2011.83 10.1109/CLOUD.2012.42 10.1145/2503210.2503262 10.1016/j.procs.2013.05.200 10.1145/3132847.3133112 10.1145/3183713.3196887 10.1007/978-3-540-30116-5_23 10.1109/BigData.2018.8622258

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA