The minimum regularized covariance determinant estimator
Tóm tắt
The minimum covariance determinant (MCD) approach estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the minimum regularized covariance determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure sets the weight of the target matrix, so that the regularization is only used when needed. The MRCD estimator is defined in any dimension, is well-conditioned by construction and preserves the good robustness properties of the MCD. We prove that so-called concentration steps can be performed to reduce the MRCD objective function, and we exploit this fact to construct a fast algorithm. We verify the accuracy and robustness of the MRCD estimator in a simulation study and illustrate its practical use for outlier detection and regression analysis on real-life high-dimensional data sets in chemistry and criminology.
Tài liệu tham khảo
Agostinelli, C., Leung, A., Yohai, V., Zamar, R.: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. Test 24(3), 441–461 (2015)
Agulló, J., Croux, C., Van Aelst, S.: The multivariate least trimmed squares estimator. J. Multivar. Anal. 99, 311–338 (2008)
Atkinson, A.C., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, New York (2004)
Bartlett, M.S.: An inverse matrix adjustment arising in discriminant analysis. Ann. Math. Stat. 22(1), 107–111 (1951)
Boudt, K., Cornelissen, J., Croux, C.: Jump robust daily covariance estimation by disentangling variance and correlation components. Comput. Stat. Data Anal. 56(11), 2993–3005 (2012)
Butler, R., Davies, P., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. Ann. Stat. 21(3), 1385–1400 (1993)
Cator, E., Lopuhaä, H.: Central limit theorem and influence function for the MCD estimator at general multivariate distributions. Bernoulli 18(2), 520–551 (2012)
Croux, C., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71(2), 161–190 (1999)
Croux, C., Haesbroeck, G.: Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87, 603–618 (2000)
Croux, C., Gelper, S., Haesbroeck, G.: Regularized Minimum Covariance Determinant Estimator. Mimeo, New York (2012)
Esbensen, K., Midtgaard, T., Schönkopf, S.: Multivariate Analysis in Practice: A Training Package. Camo As, Oslo (1996)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(2), 432–441 (2008)
Gnanadesikan, R., Kettenring, J.: Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81–124 (1972)
Grübel, R.: A minimal characterization of the covariance matrix. Metrika 35(1), 49–52 (1988)
Hardin, J., Rocke, D.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004)
Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005)
Hubert, M., Van Driessen, K.: Fast and robust discriminant analysis. Comput. Stat. Data Anal. 45, 301–320 (2004)
Hubert, M., Rousseeuw, P., Vanden Branden, K.: ROBPCA: a new approach to robust principal components analysis. Technometrics 47, 64–79 (2005)
Hubert, M., Rousseeuw, P., Van Aelst, S.: High breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)
Hubert, M., Rousseeuw, P., Verdonck, T.: A deterministic algorithm for robust location and scatter. J. Comput. Graph. Stat. 21(3), 618–637 (2012)
Khan, J., Van Aelst, S., Zamar, R.H.: Robust linear model selection based on least angle regression. J. Am. Stat. Assoc. 102(480), 1289–1299 (2007)
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365–411 (2004)
Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19, 229–248 (1991)
Maronna, R., Zamar, R.H.: Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)
Öllerer, V., Croux, C.: Robust high-dimensional precision matrix estimation. In: Modern Nonparametric, Robust and Multivariate Methods, pp. 325–350. Springer (2015)
Pison, G., Rousseeuw, P., Filzmoser, P., Croux, C.: Robust factor analysis. J. Multivar. Anal. 84, 145–172 (2003)
Rousseeuw, P.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)
Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Grossmann, W., Pflug, G., Vincze, I., Wertz, W. (eds.) Mathematical Statistics and Applications, vol. B, pp. 283–297. Reidel Publishing Company, Dordrecht (1985)
Rousseeuw, P., Croux, C.: Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 88(424), 1273–1283 (1993)
Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Rousseeuw, P., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
Rousseeuw, P., Van Aelst, S., Van Driessen, K., Agulló, J.: Robust multivariate regression. Technometrics 46, 293–305 (2004)
Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M., Maechler, M.: Robustbase: Basic Robust Statistics. R package version 0.92-3 (2012)
SenGupta, A.: Tests for standardized generalized variances of multivariate normal populations of possibly different dimensions. J. Multivar. Anal. 23(2), 209–219 (1987)
Sherman, J., Morrison, W.J.: Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 21(1), 124–127 (1950)
Todorov, V., Filzmoser, P.: An object-oriented framework for robust multivariate analysis. J. Stat. Softw. 32(3), 1–47 (2009)
Won, J.-H., Lim, J., Kim, S.-J., Rajaratnam, B.: Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(3), 427–450 (2013)
Woodbury, M.A.: Inverting modified matrices. Memo. Rep. 42, 106 (1950)
Zhao, T., Liu, H., Roeder, K., Lafferty, J., Wasserman, L.: The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res. 13, 1059–1062 (2012)