Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Thuật toán rừng ngẫu nhiên thích ứng dựa trên MapReduce cho phân loại đa nhãn

Neural Computing and Applications - Tập 31 - Trang 8239-8252 - 2018

Qinghua Wu¹, Haihui Wang¹, Xuesong Yan^2,3, Xiaobo Liu⁴

¹Faculty of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, China

²School of Computer Science, China University of Geosciences, Wuhan, China

³State Key Lab of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, China

⁴School of Automation, China University of Geosciences, Wuhan, China

Tóm tắt

Vì tính phức tạp của đặc điểm dữ liệu, học nhiều nhãn trong khai thác dữ liệu đã được các học giả đề xuất để giải quyết vấn đề thông tin tri thức trong kỷ nguyên dữ liệu lớn. Trong kỷ nguyên dữ liệu lớn, độ phức tạp của các cấu trúc dữ liệu khiến cho các phương pháp học nhãn đơn truyền thống không thể đáp ứng được nhu cầu phát triển công nghệ. Hơn nữa, tầm quan trọng của học nhiều nhãn đang dần trở nên rõ ràng. Thuật toán rừng ngẫu nhiên (RF) được coi là một trong những thuật toán phân loại tốt nhất. Trong nghiên cứu này, thuật toán cây quyết định truyền thống đã được cải tiến, và phương pháp RF truyền thống đã được chuyển đổi thành phương pháp RF thích ứng (ARF) cho phân loại đa nhãn. Qua các thí nghiệm, tính hiệu quả của phương pháp đề xuất đã được xác minh. Phương pháp RF có thể không đủ khả năng phân loại dữ liệu khổng lồ trong thời gian ngắn, nhưng Hadoop do Apache phát triển lại phù hợp với các nhiệm vụ đòi hỏi dữ liệu nhiều. Trên cơ sở này, chúng tôi đã sửa đổi chế độ lập trình MapReduce để phù hợp với phương pháp ARF đề xuất. Phương pháp này đã được triển khai trên nền tảng đám mây, và tính hiệu quả về thời gian của mô hình song song đã được xác minh qua các thí nghiệm.

Từ khóa

#phân loại đa nhãn #thuật toán rừng ngẫu nhiên #học máy #MapReduce #dữ liệu lớn

Tài liệu tham khảo

Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):1–38 Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13 Streich AP, Buhmann JM (2008) Classification of multi-labeled data: a generative approach. Mach Learn Knowl Discov Databases DBLP:390–405 Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139 Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771 Li X, Wang L, Sung E (2004) Multilabel SVM active learning for image classification. Int Conf Image Process 4(4):2207–2210 Diplaris S, Tsoumakas G, Mitkas PA, Vlahavas IP (2005) Protein classification with multiple algorithms. In: Panhellenic conference on informatics, pp 448–456 Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330 Tawiah CA, Sheng VS (2013) Empirical comparison of multi-label classification algorithms. In: Proceedings of the 27th national conference on artificial intelligence (AAAI), Bellevue, Washington, pp 1645–1646 Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods: a case study. Clei Electron J 14(1):4 Tawiah CA, Sheng VS (2013) A study on multi-label classification. In: Industrial conference on data mining (ICDM), Springer, Berlin, pp 137–150 Yan X, Wu Q, Sheng VS (2016) A double weighted Naive Bayes with niching cultural algorithm for multi-label classification. Int J Pattern Recognit Artif Intell 30(06):1650013 Wu J, Zhao S, Sheng VS, Ye C, Zhao P, Cui Z (2017) Weak labeled active learning with conditional label dependence for multi-label image classification. IEEE Trans Multimed 19(6):1156–1169 Wu Q, Liu H, Yan X (2016) Multi-label classification algorithm research based on swarm intelligence. Clust Comput 19(4):2075–2085 Wu J, Guo A, Sheng VS, Zhao P, Cui Z (2018) An active learning approach for multi-label image classification with sample noise. Int J Pattern Recognit Artif Intell 32(3):1–23 Ma J, Zhou H, Zhao J, Gao Y, Jiang J, Tian J (2015) Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans Geosci Remote Sens 53(12):6469–6481 Zang H, Zhang T, Zhang Y (2015) Bifurcation analysis of a mathematical model for genetic regulatory network with time delays. Appl Math Comput 260:204–226 Zhou H, Ma J, Yang C, Sun S, Liu R, Zhao J (2016) Nonrigid feature matching for remote sensing images via probabilistic inference with global and local regularizations. IEEE Geosci Remote Sens Lett 13(3):374–378 Xia P (2016) Haptics for product design and manufacturing simulation. IEEE Trans Haptics 9(3):358–375 Lu T, Peng L, Zhang Y (2016) Edge feature based approach for object recognition. Pattern Recognit Image Anal 26(2):350–353 Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text Categorization. Mach Learn 39:135–168 Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687 De Comite F, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision trees from texts and data. In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, pp 35–49 Zhu S, Ji X, Xu W, Gong Y (2005) Multi-labelled classification using maximum entropy method. In: International ACM SIGIR conference on research and development in information retrieval, pp 274–281 Zhang M, Zhou Z (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048 Zhang M, Zhou Z (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351 Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13 De Carvalho AC, Freitas AA (2009) A tutorial on multi-label classification techniques. Found Comput Intell 5:177–195 Liu F, Zhang X, Ye Y, Zhao Y, Li Y (2015) MLRF: multi-label classification through random forest with label-set partition. In: International conference on intelligent computing, pp 407–418 Breiman Leo (2001) Random Forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324 Gall J, Lempitsky VS (2009) Class-specific Hough forests for object detection. In: Decision forests for computer vision and medical image analysis. Springer, London, pp 143–157 Gall J, Yao A, Razavi N, Van Gool L, Lempitsky VS (2011) Hough Forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2188–2202 Prinzie A, Den Poel DV (2008) Random forests for multiclass classification: random multinomial logit. Expert Syst Appl 34(3):1721–1732 Chen XW, Liu M (2005) Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 21(24):4394–4400 Pang H, Datta D, Zhao H (2009) Pathway analysis using random forests with bivariate node-split for survival outcomes. Bioinformatics 26(2):250–258 Rio SD, Lopez V, Benitez JM, Herrera F (2014) On the use of MapReduce for imbalanced big data using random forest. Inf Sci 285:112–137 Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872 Yan X, Zhu Z, Wu Q (2018) Intelligent inversion method for pre-stack seismic big data based on MapReduce. Comput Geosci 110:81–89 Yan X, Zhu Z, Hu C, Gong W, Wu Q (2018) Spark-based intelligent parameter inversion method for prestack seismic data. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3457-6 Strobl C, Boulesteix A, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinf 9(1):307 Breiman Leo (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655 Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Proj Website 11(11):1–10 White T (2015) Hadoop—the definitive guide 4e. Hadoop: the definitive guide. O’Reilly Media Inc, Newton Zikopoulos P, Eaton C (1989) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York City Zhenhai Z, Shining L, Zhigang L, Hao C (2013) Multi-label feature selection algorithm based on information entropy. J Comput Res Dev 50(6):1177–1184

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA