Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Thuật toán làm mát giả lập dựa trên trục xoay để xác định các phân tách chéo cho việc tạo ra cây quyết định

Computational Statistics - 2024

Ferdinand Bollwein¹

¹Institute of Mathematics, Clausthal University of Technology, Erzstraße 1, 38678, Clausthal-Zellerfeld, Germany

Tóm tắt

Tóm tắtChúng tôi mô tả một thuật toán làm mát giả lập mới nhằm tính toán các phân tách chéo gần tối ưu trong bối cảnh xây dựng cây quyết định. Thuật toán này có thể được hiểu như là một cuộc hành trình trên các tế bào của một sắp xếp siêu phẳng được xác định bởi các quan sát trong tập dữ liệu huấn luyện. Các tế bào của sắp xếp siêu phẳng này tương ứng với các tập con của các phân tách chéo mà chia nhỏ không gian đặc trưng theo cùng một cách, và các đỉnh của sắp xếp này tiết lộ nhiều giải pháp lân cận. Chúng tôi sử dụng một chiến lược xoay trục để lặp qua các đỉnh và khám phá khu vực lân cận này. Việc nhúng tìm kiếm khu vực lân cận vào khung làm mát giả lập cho phép thoát khỏi các cực địa phương và tăng khả năng tìm kiếm giải pháp tối ưu toàn cục. Để khắc phục các vấn đề liên quan đến đồng nhất, chúng tôi dựa vào một sơ đồ xoay trục từ điển. Kết quả thí nghiệm của chúng tôi cho thấy phương pháp của chúng tôi phù hợp với việc tạo ra các cây quyết định nhỏ và chính xác, đồng thời có khả năng vượt trội hơn so với các thuật toán tạo ra cây quyết định đơn biến và chéo hiện có. Hơn nữa, các cây quyết định chéo thu được bằng phương pháp này cạnh tranh với các mô hình dự đoán phổ biến khác.

Từ khóa

Tài liệu tham khảo

Avis D (2000) A revised implementation of the reverse search vertex enumeration algorithm. In: Kalai G, Ziegler GM (eds) Polytopes—combinatorics and Computation. Birkhäuser Basel, Basel, pp 177–198. https://doi.org/10.1007/978-3-0348-8438-9_9

Avis D, Fukuda K (1992) A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discret Comput Geom 8(3):295–313. https://doi.org/10.1007/BF02293050

Bertsimas D, Dunn J (2017) Optimal classification trees. Mach Learn 106(7):1039–1082. https://doi.org/10.1007/s10994-017-5633-9

Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020) Sparsity in optimal randomized classification trees. Eur J Oper Res 284(1):255–272. https://doi.org/10.1016/j.ejor.2019.12.002

Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Comp Oper Res 132:105281. https://doi.org/10.1016/j.cor.2021.105281

Bollwein F, Westphal S (2022) Oblique decision tree induction by cross-entropy optimization based on the von Mises–Fisher distribution. Comput Stat. https://doi.org/10.1007/s00180-022-01195-7

Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, pp 82–90. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML ’98

Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall/CRC, London. https://doi.org/10.1201/9781315139470

Cantú-Paz E, Kamath C (2003) Inducing oblique decision trees with evolutionary algorithms. IEEE Trans Evol Comput 7(1):54–68. https://doi.org/10.1109/TEVC.2002.806857

Dantzig G, Orden A, Wolfe P (1955) The generalized simplex method for minimizing a linear form under linear inequality restraints. Pac J Math 5(2):183–195. https://doi.org/10.2140/pjm.1955.5.183

Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

Dunn JW (2018) Optimal trees for prediction and prescription. PhD thesis, Massachusetts Institute of Technology

Edelsbrunner H (2012) Algorithms in combinatorial geometry, vol 10. Springer, Berlin

Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701. https://doi.org/10.1080/01621459.1937.10503522

Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. https://doi.org/10.1214/aoms/1177731944

Gendreau M, Potvin JY et al (2010) Handbook of metaheuristics, vol 2. Springer, Berlin

Grötschel M, Lovász L, Schrijver A (2012) Geometric algorithms and combinatorial optimization, vol 2. Springer, Berlin

Gurobi Optimization, LLC (2022) Gurobi Optimizer Reference Manual. https://www.gurobi.com, Accesed 4 Feb 2022

Heath DG (1993) A geometric framework for machine learning. PhD thesis, Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA

Heath D, Kasif S, Salzberg S (1993) Induction of oblique decision trees. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1002–1007. Morgan Kaufmann Publishers

Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70

López-Chau A, Cervantes J, López-García L, Lamont FG (2013) Fisher’s decision tree. Expert Syst Appl 40(16):6283–6291. https://doi.org/10.1016/j.eswa.2013.05.044

Manwani N, Sastry P (2011) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybern) 42(1):181–192

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114

Murthy SK, Kasif S, Salzberg S, Beigel R (1993) OC1: a randomized algorithm for building oblique decision trees. In: Proceedings of AAAI, pp 322–327. Citeseer

Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2:1–32. https://doi.org/10.1613/jair.63

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. PhD thesis, Oxford University, Oxford, United Kingdom

Wickramarachchi D, Robertson B, Reale M, Price C, Brown J (2016) Hhcart: an oblique decision tree. Comput Stat Data Anal 96:12–23. https://doi.org/10.1016/j.csda.2015.11.006

Wickramarachchi D, Robertson B, Reale M, Price C, Brown J (2019) A reflected feature space for cart. Aust N Z J Stat 61(3):380–391. https://doi.org/10.1111/anzs.12275

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA