Distributed adaptive nearest neighbor classifier: algorithm and theory
Tóm tắt
When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset.
Tài liệu tham khảo
Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Stat. 35(2), 608–633 (2007)
Balsubramani, A., Dasgupta, S., Moran, S.: An adaptive nearest neighbor rule for classification. In: Advances in Neural Information Processing Systems, pp. 7579–7588 (2019)
Cai, T.T., Wei, H.: Transfer learning for nonparametric classification: minimax rate and adaptive classifier. Ann. Stat. (to appear) (2019)
Cérou, F., Guyader, A.: Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10, 340–355 (2006). https://doi.org/10.1051/ps:2006014
Chaudhuri, K., Dasgupta, S.: Rates of convergence for nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 3437–3445 (2014)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Devroye, L., Gyorfi, L., Krzyzak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. Ann. Stat. 22(3), 1371–1385 (1994)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Duan, J., Qiao, X., Cheng, G.: Statistical guarantees of distributed nearest neighbor classification. Adv. Neural Inf. Process. Syst. 33, 229–240 (2020)
Gadat, S., Klein, T., Marteau, C.: Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Stat. 44(3), 982–1009 (2016). https://doi.org/10.1214/15-AOS1395
Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., Shum, H.-Y.: Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122 (2008)
Han, E.-H.S., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65. Springer (2001)
Hanneke, S., Kontorovich, A., Sabato, S., Weiss, R.: Universal Bayes consistency in metric spaces. Ann. Stat. 49(4), 2129–2150 (2021). https://doi.org/10.1214/20-AOS2029
Huang, J.: Projection estimation in multiple regression with application to functional Anova models. Ann. Stat. 26(1), 242–272 (1998)
Huang, J.: Local asymptotics for polynomial spline regression. Ann. Stat. 31(5), 1600–1635 (2003)
Jiang, H.: Non-asymptotic uniform rates of consistency for k-nn regression. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 3999–4006 (2019)
Jiang, S., Pang, G., Wu, M., Kuang, L.: An improved k-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)
Kowalski, B.R., Bender, C.: k-nearest neighbor classification rule (pattern recognition) applied to nuclear magnetic resonance spectral interpretation. Anal. Chem. 44(8), 1405–1411 (1972)
Lepskii, O.: On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35(3), 454–466 (1991)
Lepski, O.V., Spokoiny, V.G.: Optimal pointwise adaptive methods in nonparametric estimation. Ann. Stat. 25, 2512–2546 (1997)
Qiao, X., Duan, J., Cheng, G.: Rates of convergence for large-scale nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 10769–10780 (2019)
Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012). https://doi.org/10.1214/12-AOS1049
Shang, Z., Cheng, G.: Computational limits of a distributed algorithm for smoothing spline. J. Mach. Learn. Res. 18(1), 3809–3845 (2017)
Shang, Z., Hao, B., Cheng, G.: Nonparametric bayesian aggregation for massive data. J. Mach. Learn. Res. 20(140), 1–81 (2019)
Stone, C.J.: Consistent nonparametric regression. Ann. Stat. 5(4), 595–620 (1977)
Xu, Y., Zhu, Q., Chen, Y., Pan, J.-S.: An improvement to the nearest neighbor classifier and face recognition experiments. Int. J. Innov. Comput. Inf. Control 9(2), 543–554 (2013)
Xu, G., Shang, Z., Cheng, G.: Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In: International Conference on Machine Learning, pp. 5483–5491. PMLR (2018)
Xu, G., Shang, Z., Cheng, G.: Distributed generalized cross-validation for divide-and-conquer kernel ridge regression and its asymptotic optimality. J. Comput. Graph. Stat. 28(4), 891–908 (2019). https://doi.org/10.1080/10618600.2019.1586714
Zhang, Y., Duchi, J., Wainwright, M.: Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16(1), 3299–3340 (2015)
Zheng, W., Zhao, L., Zou, C.: Locally nearest neighbor classifiers for pattern classification. Pattern Recogn. 37(6), 1307–1309 (2004)