Distributed adaptive nearest neighbor classifier: algorithm and theory

Statistics and Computing - Tập 33 - Trang 1-23 - 2023

Ruiqi Liu¹, Ganggang Xu², Zuofeng Shang³

¹Department of Mathematics and Statistics, Texas Tech University, Lubbock, USA

²Department of Management Science, University of Miami, Coral Gables, USA

³Department of Mathematical Sciences, New Jersey Institute of Technology, City, USA

Tóm tắt

When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset.

Tài liệu tham khảo

Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Stat. 35(2), 608–633 (2007) Balsubramani, A., Dasgupta, S., Moran, S.: An adaptive nearest neighbor rule for classification. In: Advances in Neural Information Processing Systems, pp. 7579–7588 (2019) Cai, T.T., Wei, H.: Transfer learning for nonparametric classification: minimax rate and adaptive classifier. Ann. Stat. (to appear) (2019) Cérou, F., Guyader, A.: Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10, 340–355 (2006). https://doi.org/10.1051/ps:2006014 Chaudhuri, K., Dasgupta, S.: Rates of convergence for nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 3437–3445 (2014) Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009) Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967) Devroye, L., Gyorfi, L., Krzyzak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. Ann. Stat. 22(3), 1371–1385 (1994) Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml Duan, J., Qiao, X., Cheng, G.: Statistical guarantees of distributed nearest neighbor classification. Adv. Neural Inf. Process. Syst. 33, 229–240 (2020) Gadat, S., Klein, T., Marteau, C.: Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Stat. 44(3), 982–1009 (2016). https://doi.org/10.1214/15-AOS1395 Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., Shum, H.-Y.: Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122 (2008) Han, E.-H.S., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65. Springer (2001) Hanneke, S., Kontorovich, A., Sabato, S., Weiss, R.: Universal Bayes consistency in metric spaces. Ann. Stat. 49(4), 2129–2150 (2021). https://doi.org/10.1214/20-AOS2029 Huang, J.: Projection estimation in multiple regression with application to functional Anova models. Ann. Stat. 26(1), 242–272 (1998) Huang, J.: Local asymptotics for polynomial spline regression. Ann. Stat. 31(5), 1600–1635 (2003) Jiang, H.: Non-asymptotic uniform rates of consistency for k-nn regression. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 3999–4006 (2019) Jiang, S., Pang, G., Wu, M., Kuang, L.: An improved k-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012) Kowalski, B.R., Bender, C.: k-nearest neighbor classification rule (pattern recognition) applied to nuclear magnetic resonance spectral interpretation. Anal. Chem. 44(8), 1405–1411 (1972) Lepskii, O.: On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35(3), 454–466 (1991) Lepski, O.V., Spokoiny, V.G.: Optimal pointwise adaptive methods in nonparametric estimation. Ann. Stat. 25, 2512–2546 (1997) Qiao, X., Duan, J., Cheng, G.: Rates of convergence for large-scale nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 10769–10780 (2019) Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012). https://doi.org/10.1214/12-AOS1049 Shang, Z., Cheng, G.: Computational limits of a distributed algorithm for smoothing spline. J. Mach. Learn. Res. 18(1), 3809–3845 (2017) Shang, Z., Hao, B., Cheng, G.: Nonparametric bayesian aggregation for massive data. J. Mach. Learn. Res. 20(140), 1–81 (2019) Stone, C.J.: Consistent nonparametric regression. Ann. Stat. 5(4), 595–620 (1977) Xu, Y., Zhu, Q., Chen, Y., Pan, J.-S.: An improvement to the nearest neighbor classifier and face recognition experiments. Int. J. Innov. Comput. Inf. Control 9(2), 543–554 (2013) Xu, G., Shang, Z., Cheng, G.: Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In: International Conference on Machine Learning, pp. 5483–5491. PMLR (2018) Xu, G., Shang, Z., Cheng, G.: Distributed generalized cross-validation for divide-and-conquer kernel ridge regression and its asymptotic optimality. J. Comput. Graph. Stat. 28(4), 891–908 (2019). https://doi.org/10.1080/10618600.2019.1586714 Zhang, Y., Duchi, J., Wainwright, M.: Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16(1), 3299–3340 (2015) Zheng, W., Zhao, L., Zou, C.: Locally nearest neighbor classifiers for pattern classification. Pattern Recogn. 37(6), 1307–1309 (2004)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA