Consistency of test-based method for selection of variables in high-dimensional two-group discriminant analysis

Japanese Journal of Statistics and Data Science - Tập 2 - Trang 155-171 - 2019
Yasunori Fujikoshi1, Tetsuro Sakurai2
1Department of Mathematics, Graduate School of Science, Hiroshima University, Higashi Hiroshima, Japan
2School of General and Management Studies, Suwa University of Science, Chino, Japan

Tóm tắt

This paper is concerned with selection of variables in two-group discriminant analysis with the same covariance matrix. We propose a test-based method (TM) drawing on the significance of each variable. Sufficient conditions for the test-based method to be consistent are provided when the dimension and the sample size are large. For the case that the dimension is larger than the sample size, a ridge-type method is proposed. Our results and tendencies therein are explored numerically through a Monte Carlo simulation. It is pointed that our selection method can be applied for high-dimensional data.

Tài liệu tham khảo

Akaike, H. (1973). Information theory and an extension of themaximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), 2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó. Clemmensen, L., Hastie, T., Witten, D. M., & Ersbell, B. (2011). Sparse discriminant analysis. Technometrics, 53, 406–413. Fujikoshi, Y. (1985). Selection of variables in two-group discriminant analysis by error rate and Akaike’s information criteria. Journal of Multivariate Analysis, 17, 27–37. Fujikoshi, Y. (2000). Error bounds for asymptotic approximations of the linear discriminant function when the sample size and dimensionality are large. Journal of Multivariate Analysis, 73, 1–17. Fujikoshi, Y., & Sakurai, T. (2016). High-dimensional consistency of rank estimation criteria in multivariate linear model. Journal of Multivariate Analysis, 149, 199–212. Fujikoshi, Y., Ulyanov, V. V., & Shimizu, R. (2010). Multivariate statistics: high-dimensional and large-sample approximations. Hobeken, NJ: Wiley. Fujikoshi, Y., Sakurai, T., & Yanagihara, H. (2014). Consistency of high-dimensional AIC-type and \(\text{ C }_p\)-type criteria in multivariate linear regression. Journal of Multivariate Analysis, 144, 184–200. Hao, N., Dong, B. & Fan, J. (2015). Sparsifying the Fisher linear discriminant by rotation. Journal of the Royal Statistical Society: Series B, 77, 827–851. Hyodo, M., & Kubokawa, T. (2014). A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data. Journal of Multivariate Analysis, 123, 364–379. Ito, T. & Kubokawa, T. (2015). Linear ridge estimator of high-dimensional precision matrix using random matrix theory. Discussion Paper Series, CIRJE-F-995. Kubokawa, T., & Srivastava, M. S. (2012). Selection of variables in multivariate regression models for large dimensions. Communication in Statistics-Theory and Methods, 41, 2465–2489. McLachlan, G. J. (1976). A criterion for selecting variables for the linear discriminant function. Biometrics, 32, 529–534. Nishii, R., Bai, Z. D., & Krishnaia, P. R. (1988). Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Mathematical Journal, 18, 451–462. Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York: Wiley. Sakurai, T., Nakada, T., & Fujikoshi, Y. (2013). High-dimensional AICs for selection of variables in discriminant analysis. Sankhya, Series A, 75, 1–25. Schwarz, G. (1978). Estimating the dimension od a model. Annals of Statistics, 6, 461–464. Tiku, M. (1985). Noncentral chi-square distribution. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 6 (pp. 276–280). New York: Wiely. Van Wieringen, W. N., & Peeters, C. F. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis, 103, 284–303. Witten, D. W., & Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society: Series B, 73, 753–772. Yamada, T., Sakurai, T. & Fujikoshi, Y. (2017). High-dimensional asymptotic results for EPMCs of W- and Z- rules. Hiroshima Statistical Research Group, 17–12. Yanagihara, H., Wakaki, H., & Fujikoshi, Y. (2015). A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large. Electronic Journal of Statistics, 9, 869–897. Zhao, L. C., Krishnaiah, P. R., & Bai, Z. D. (1986). On determination of the number of signals in presence of white noise. Journal of Multivariate Analysis, 20, 1–25.