NOVELIST estimator of large correlation and covariance matrices and their inverses

TEST - Tập 28 - Trang 694-727 - 2018
Na Huang1, Piotr Fryzlewicz1
1Department of Statistics, London School of Economics, London, UK

Tóm tắt

We propose a “NOVEL Integration of the Sample and Thresholded covariance” (NOVELIST) estimator to estimate the large covariance (correlation) and precision matrix. NOVELIST estimator performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log $$ p/n\rightarrow 0$$ , and its improved version when $$p/n \rightarrow 0$$ . In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. Real-data applications are presented.

Tài liệu tham khảo

Alvarez I, Niemi J, Simpson M (2014) Bayesian inference for a covariance matrix. Preprint Bickel P, Levina E (2008a) Regularized estimation of large covariance matrices. Ann Stat 36:199–227 Bickel P, Levina E (2008b) Covariance regularization by thresholding. Ann Stat 36:2577–2604 Cai T, Liu W (2011) Adaptive thresholding for sparse covariance matrix estimation. J Am Stat Assoc 106:672–684 Cai TT, Zhang C, Zhou HH (2010) Optimal rates of convergence for covariance matrix estimation. Ann Stat 38:2118–2144 Chen C (1979) Bayesian inference for a normal dispersion matrix and its application to stochastic multiple regression analysis. J R Stat Soc Ser B 41:235–248 Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87:603–618 Dickey JM, Lindley DV, Press SJ (1985) Bayesian estimation of the dispersion matrix of a multivariate normal distribution. Commun Stat Theory Methods 14:1019–1034 El Karoui N (2008) Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann Stat 36:2717–2756 Evans IG (1965) Bayesian estimation of parameters of a multivariate normal distribution. J R Stat Soc Ser B 27:279–283 Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360 Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econ 147:186–197 Fan J, Liao Y, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal complements. J R Stat Soc Ser B 75:603–680 Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188 Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441 Fryzlewicz P (2013) High-dimensional volatility matrix estimation via wavelets and thresholding. Biometrika 100:921–938 Furrer R, Bengtsson T (2007) Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. J Multivar Anal 98:227–255 Gardner TS, di Bernardo D, Lorenz D, Collins JJ (2003) Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301:102–105 Golub GH, Van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore Guo YQ, Hastie T, Tibshirani R (2007) Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8:86–100 Jeong H, Mason SP, Barabási A-L, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42 Lam C (2016) Nonparametric eigenvalue-regularized precision or covariance matrix estimation. Ann Stat 44:928–953 Lam C, Feng P (2017) Integrating regularized covariance matrix estimators. Preprint Ledoit O, Péché S (2011) Eigenvectors of some large sample covariance matrix ensembles. Probab Theory Relat Fields 151:233–264 Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Finance 10:603–621 Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411 Ledoit O, Wolf M (2012) Nonlinear shrinkage and estimation of large-dimensional covariance matrices. Ann Stat 4:1024–1060 Ledoit O, Wolf M (2015) Spectrum estimation: a unified framework for covariance matrix estimation and PCA in large dimensions. J Multivar Anal 139:360–384 Leonard T, John SJH (2012) Bayesian inference for a covariance matrix. Ann Stat 20:1669–1696 Longerstaey J, Zangari A, Howard S (1996) Risk metrics\(^{TM}\)-technical document. Technical document. J.P. Morgan, New York Markowitz H (1952) Portfolio selection. J Finance 7:77–91 Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462 Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572 Rothman AJ, Bickel P, Levina E, Zhu J (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515 Rothman AJ, Levina E, Zhu J (2009) Generalized thresholding of large covariance matrices. J Am Stat Assoc 104:177–186 Savic RM, Karlsson MO (2009) Importance of shrinkage in empirical bayes estimates for diagnostics: problems and solutions. Am Assoc Pharm Sci 11:558–569 Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomic. Stat Appl Genet Mol Biol 4:1544–6115 Wu WB, Pourahmadi M (2003) Nonparametric estimation in the gaussian graphical model. Biometrika 90:831–844 Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429