Covariance Estimation: The GLM and Regularization Perspectives

Statistical Science - Tập 26 Số 3 - 2011
Mohsen Pourahmadi1
1Department of Statistics, Texas A&M University, College Station, TX USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. <i>J. Roy. Statist. Soc. Ser. B</i> <b>58</b> 267–288.

Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. <i>Biometrika</i> <b>73</b> 13–22.

Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. <i>Biometrika</i> <b>86</b> 677–690.

Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. <i>Biometrika</i> <b>87</b> 425–435.

Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. <i>J. Mach. Learn. Res.</i> <b>9</b> 485–516.

Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. <i>Ann. Statist.</i> <b>37</b> 4254–4278.

Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. <i>Ann. Statist.</i> <b>34</b> 1436–1462.

Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. <i>Electron. J. Stat.</i> <b>2</b> 494–515.

Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. <i>Biostatistics</i> <b>9</b> 432–441.

Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. <i>Ann. Statist.</i> <b>36</b> 199–227.

Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. <i>Ann. Statist.</i> <b>36</b> 2577–2604.

Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. <i>Biometrika</i> <b>93</b> 85–98.

Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. <i>J. Comput. Graph. Statist.</i> <b>15</b> 265–286.

Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. <i>J. Multivariate Anal.</i> <b>88</b> 365–411.

Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. <i>J. Amer. Statist. Assoc.</i> <b>104</b> 735–746.

Warton, D. I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. <i>J. Amer. Statist. Assoc.</i> <b>103</b> 340–349.

Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. <i>Biometrika</i> <b>94</b> 19–35.

Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. <i>Statist. Sinica</i> <b>10</b> 1281–1311.

Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. <i>J. Amer. Statist. Assoc.</i> <b>104</b> 682–693.

Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. <i>Ann. Statist.</i> <b>38</b> 2118–2144.

Rajaratnam, B., Massam, H. and Carvalho, C. M. (2008). Flexible covariance estimation in graphical Gaussian models. <i>Ann. Statist.</i> <b>36</b> 2818–2849.

Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. <i>J. Amer. Statist. Assoc.</i> <b>102</b> 632–641.

Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. <i>J. Multivariate Anal.</i> <b>98</b> 227–255.

Leonard, T. and Hsu, J. S. J. (1992). Bayesian inference for a covariance matrix. <i>Ann. Statist.</i> <b>20</b> 1669–1696.

Liechty, J. C., Liechty, M. W. and Müller, P. (2004). Bayesian correlation estimation. <i>Biometrika</i> <b>91</b> 1–14.

Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. <i>Ann. Statist.</i> <b>29</b> 295–327.

Boik, R. J. (2002). Spectral models for covariance matrices. <i>Biometrika</i> <b>89</b> 159–182.

Chiu, T. Y. M., Leonard, T. and Tsui, K.-W. (1996). The matrix-logarithmic covariance model. <i>J. Amer. Statist. Assoc.</i> <b>91</b> 198–210.

Daniels, M. J. and Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. <i>J. Amer. Statist. Assoc.</i> <b>94</b> 1254–1263.

Daniels, M. J. and Pourahmadi, M. (2009). Modeling covariance matrices via partial autocorrelations. <i>J. Multivariate Anal.</i> <b>100</b> 2352–2363.

Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. <i>J. Amer. Statist. Assoc.</i> <b>97</b> 1141–1153.

Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. <i>Biometrika</i> <b>97</b> 539–550.

Bondell, H. D., Krishna, A. and Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. <i>Biometrics</i> <b>66</b> 1069–1077.

Chen, Z. and Dunson, D. B. (2003). Random effects selection in linear mixed models. <i>Biometrics</i> <b>59</b> 762–769.

Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). <i>Ann. Statist.</i> <b>32</b> 407–499.

Wong, F., Carter, C. K. and Kohn, R. (2003). Efficient estimation of covariance selection models. <i>Biometrika</i> <b>90</b> 809–830.

Daniels, M. J. and Kass, R. E. (2001). Shrinkage estimators for covariance matrices. <i>Biometrics</i> <b>57</b> 1173–1184.

Daniels, M. J. and Pourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data. <i>Biometrika</i> <b>89</b> 553–566.

Witten, D. M. and Tibshirani, R. (2009). Covariance-regularized regression and classification for high-dimensional problems. <i>J. Roy. Statist. Soc. Ser. B</i> <b>71</b> 615–636.

Wright, S. (1934). The method of path coefficients. <i>Ann. Math. Statist.</i> <b>5</b> 161–215.

Joe, H. (2006). Generating random correlation matrices based on partial correlations. <i>J. Multivariate Anal.</i> <b>97</b> 2177–2189.

Liu, X. and Daniels, M. J. (2006). A new algorithm for simulating a correlation matrix based on parameter expansion and reparameterization. <i>J. Comput. Graph. Statist.</i> <b>15</b> 897–914.

Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. <i>Statist. Sinica</i> <b>20</b> 101–148.

Haff, L. R. (1991). The variational form of certain Bayes estimators. <i>Ann. Statist.</i> <b>19</b> 1163–1190.

Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. <i>Biometrika</i> <b>90</b> 831–844.

Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. <i>Ann. Statist.</i> <b>13</b> 1581–1591.

Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. <i>Ann. Statist.</i> <b>8</b> 586–597.

Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. <i>J. Amer. Statist. Assoc.</i> <b>104</b> 177–186.

McMurry, T. L. and Politis, D. N. (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. <i>J. Time Series Anal.</i> <b>31</b> 471–482.

Wu, W. B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes. <i>Statist. Sinica</i> <b>19</b> 1755–1768.

Wagaman, A. S. and Levina, E. (2009). Discovering sparse covariance structures with the isomap. <i>J. Comput. Graph. Statist.</i> <b>18</b> 551–572.

Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. <i>Ann. Statist.</i> <b>1</b> 135–141.

Anderson, T. W. (2003). <i>An Introduction to Multivariate Statistical Analysis</i>, 3rd ed. Wiley, Hoboken, NJ.

Pourahmadi, M. (2001). <i>Foundations of Time Series Analysis and Prediction Theory</i>. Wiley, New York.

Searle, S. R., Casella, G. and McCulloch, C. E. (1992). <i>Variance Components</i>. Wiley, New York.

Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G., eds. (2009). <i>Longitudinal Data Analysis</i>. CRC Press, Boca Raton, FL.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). <i>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</i>, 2nd ed. Springer, New York.

Carroll, R. J. and Ruppert, D. (1988). <i>Transformation and Weighting in Regression</i>. Chapman &amp; Hall, New York.

Bilmes, J. A. (2000). Factored sparse inverse covariance matrices. In <i>IEEE International Conference on Accoustics, Speech and Signal Processing</i> (<i>Istanbul</i>, <i>Turkey</i>) <b>2</b> II1009–II1012.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). <i>Time Series Analysis: Forecasting and Control</i>, 3rd ed. Prentice Hall, Englewood Cliffs, NJ.

Brown, P. J., Le, N. D. and Zidek, J. V. (1994). Inference for a covariance matrix. In <i>Aspects of Uncertainty</i> (P. R. Freeman and A. F. M. Smith, eds.) 77–92. Wiley, Chichester.

Cressie, N. A. C. (1993). <i>Statistics for Spatial Data</i>, rev ed. Wiley, New York.

Diggle, P., Liang, K. Y., Zeger, S. L. and Heagerty, P. J. (2002). <i>Analysis of Longitudinal Data</i>, 2nd ed. Clarendon Press, Oxford.

Flury, B. (1988). <i>Common Principal Components and Related Multivariate Models</i>. Wiley, New York.

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Applications of the lasso and grouped lasso to the estimation of sparse graphical models. Technical report, Stanford Univ.

Hoff, P. D. and Niu, X. (2009). A covariance regression model. Technical report, Univ. Washington.

Lin, S. P. and Perlman, M. D. (1985). A Monte Carlo comparison of four estimators of a covariance matrix. In <i>Multivariate Analysis VI (Pittsburgh, PA, 1983)</i> 411–429. North-Holland, Amsterdam.

Pourahmadi, M. (2007b). Simultaneous modeling of covariance matrices: GLM, Bayesian and nonparametric perspective. In <i>Correlated Data Modelling 2004</i> (D. Gregori et al., eds.) 41–64. FrancoAngeli, Milan, Italy.

Rocha, G. V., Zhao, P. and Yu, B. (2008). A path following algorithm for sparse pseudo-likelihood inverse covariance estimation (splice). Technical Report 759, Dept. Statistics, Univ. California, Berkeley.

Stein, C. (1975). Estimation of a covariance matrix. In <i>Rietz Lecture. 39th Annual Meeting IMS. Atlanta, Georgia</i>.

James, W. and Stein, C. (1961). Estimation with quadratic loss. In <i>Proc. 4th Berkeley Sympos. Math. Statist. Probab.</i> <b>I</b> 361–379. Univ. California Press, Berkeley.

Bartlett, M. S. (1933). On the theory of statistical regression. <i>Proc. Roy. Soc. Edinburgh</i> <b>53</b> 260–283.

Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes,’ and some alternatives when there are many more variables than observations. <i>Bernoulli</i> <b>10</b> 989–1010.

Cannon, M. J., Warner, L., Taddei, J. A. and Kleinbaum, D. G. (2001). What can go wrong when you assume that correlated data are independent: An illustration from the evaluation of a childhood health intervention in Brazil. <i>Statist. Med.</i> <b>20</b> 1461–1467.

Carroll, R. J. (2003). Variances are not always nuisance parameters. <i>Biometrics</i> <b>59</b> 211–220.

Chang, C. and Tsay, R. S. (2010). Estimation of covariance matrix via the sparse Cholesky factor with lasso. <i>J. Statist. Plann. Inference</i> <b>140</b> 3858–3873.

Daniels, M. J. (2005). A class of shrinkage priors for the dependence structure in longitudinal data. <i>J. Statist. Plann. Inference</i> <b>127</b> 119–130.

Daniels, M. J. and Hogan, J. W. (2008). <i>Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Monographs on Statistics and Applied Probability</i> <b>109</b>. Chapman &amp; Hall/CRC, Boca Raton, FL.

Dégerine, S. and Lambert-Lacroix, S. (2003). Partial autocorrelation function of a nonstationary time series <i>J. Multivariate Anal.</i> <b>89</b> 135–147.

Dempster, A. (1972). Covariance selection models. <i>Biometrics</i> <b>28</b> 157–175.

Eaves, D. and Chang, T. (1992). Priors for ordered conditional variance and vector partial correlation. <i>J. Multivariate Anal.</i> <b>41</b> 43–55.

El Karoui, N. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. <i>Ann. Statist.</i> <b>36</b> 2717–2756.

El Karoui, N. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. <i>Ann. Statist.</i> <b>36</b> 2757–2790.

Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. <i>Ann. Appl. Statist.</i> <b>3</b> 521–541.

Gabriel, K. R. (1962). Ante-dependence analysis of an ordered set of variables. <i>Ann. Math. Statist.</i> <b>33</b> 201–212.

Garthwaite, P. H. and Al-Awadhi, S. A. (2001). Non-conjugate prior distribution assessment for multivariate normal sampling. <i>J. R. Stat. Soc. Ser. B Stat. Methodol.</i> <b>63</b> 95–110.

Golub, G. H. and Van Loan, C. F. (1989). <i>Matrix Computations</i>, 2nd ed. <i>Johns Hopkins Series in the Mathematical Sciences</i> <b>3</b>. Johns Hopkins Univ. Press, Baltimore, MD.

Hoff, P. D. (2009). A hierarchical eigenmodel for pooled covariance estimation. <i>J. Roy. Statist. Soc. Ser. B</i> <b>71</b> 971–992.

Huang, J. Z., Liu, L. and Liu, N. (2007). Estimation of large covariance matrices of longitudinal data with basis function approximations. <i>J. Comput. Graph. Statist.</i> <b>16</b> 189–209.

Jiang, G., Sarkar, S. K. and Hsuan, F. (1999). A likelihood ratio test and its modifications for the homogeneity of the covariance matrices of dependent multivariate normals. <i>J. Statist. Plann. Inference</i> <b>81</b> 95–111.

Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. <i>Technometrics</i> <b>22</b> 389–395.

Jones, M. C. (1987). Randomly choosing parameters from the stationarity and invertibility region of autoregressive-moving average models. <i>J. Roy. Statist. Soc. Ser. C</i> <b>36</b> 134–138.

Jong, J.-C. and Kotz, S. (1999). On a relation between principal components and regression analysis. <i>Amer. Statist.</i> <b>53</b> 349–351.

Kalman, A. E. (1960). A new approach to linear filtering and prediction problems. <i>Trans. Amer. Soc. Mech. Eng.—J. Basic Engineering</i> <b>82</b> 35–45.

Kaufman, C. G., Schervish, M. J. and Nychka, W. (2008). Covariance tapering for likelihood-based estimation in large data sets. <i>J. Amer. Statist. Assoc.</i> <b>103</b> 145–155.

Kurowicka, D. and Cooke, R. (2003). A parameterization of positive definite matrices in terms of partial correlation vines. <i>Linear Algebra Appl.</i> <b>372</b> 225–251.

Ledoit, O., Santa-Clara, P. and Wolf, M. (2003). Flexible multivariate GARCH modeling with an application to international stock markets. <i>Rev. Econom. Statist.</i> <b>85</b> 735–747.

Leng, C., Zhang, W. and Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. <i>J. Amer. Statist. Assoc.</i> <b>105</b> 181–193.

LeSage, J. P. and Pace, R. K. (2007). A matrix exponential spatial specification. <i>J. Econometrics</i> <b>140</b> 190–214.

Leung, P. L. and Muirhead, R. J. (1987). Estimation of parameter matrices and eigenvalues in MANOVA and canonical correlation analysis. <i>Ann. Statist.</i> <b>15</b> 1651–1666.

Levina, E., Rothman, A. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. <i>Ann. Appl. Statist.</i> <b>2</b> 245–263.

Lin, T. I. (2011). A Bayesian inference in joint modelling of location and scale parameters of the <i>t</i> distribution for longitudinal data. <i>J. Statist. Plann. Inference</i> <b>141</b> 1543–1553.

Lin, T.-I. and Wang, Y.-J. (2009). A robust approach to joint modeling of mean and scale covariance for longitudinal data. <i>J. Statist. Plann. Inference</i> <b>139</b> 3013–3026.

Liu, C. (1993). Bartlett’s decomposition of the posterior distribution of the covariance for normal monotone ignorable missing data. <i>J. Multivariate Anal.</i> <b>46</b> 198–206.

Pan, J. and Mackenzie, G. (2003). On modelling mean-covariance structures in longitudinal studies. <i>Biometrika</i> <b>90</b> 239–244.

Pinheiro, J. D. and Bates, D. M. (1996). Unconstrained parameterizations for variance–covariance matrices. <i>Stat. Comput.</i> <b>6</b> 289–366.

Pourahmadi, M. (2007a). Cholesky decompositions and estimation of a multivariate normal covariance matrix: Parameter orthogonality. <i>Biometrika</i> <b>94</b> 1006–1013.

Pourahmadi, M. and Daniels, M. J. (2002). Dynamic conditionally linear mixed models for longitudinal data. <i>Biometrics</i> <b>58</b> 225–231.

Quenouille, M. H. (1949). Approximate tests of correlation in time-series. <i>J. Roy. Statist. Soc. Ser. B</i> <b>11</b> 68–84.

Ramsey, F. L. (1974). Characterization of the partial autocorrelation function. <i>Ann. Statist.</i> <b>2</b> 1296–1301.

Roy, J. (1958). Step-down procedure in multivariate analysis. <i>Ann. Math. Statist.</i> <b>29</b> 1177–1187.

Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In <i>Proc. Third Berkeley Symp. Math. Statist. Probab.</i> <b>I</b> 197–206. Univ. California Press, Berkeley.

Szatrowski, T. H. (1980). Necessary and sufficient conditions for explicit solutions in the multivariate normal estimation problem for patterned means and covariances. <i>Ann. Statist.</i> <b>8</b> 802–810.

Wermuth, N. (1980). Linear recursive equations, covariance selection, and path analysis. <i>J. Amer. Statist. Assoc.</i> <b>75</b> 963–972.

Wold, H. O. A. (1960). A generalization of causal chain models. <i>Econometrica</i> <b>28</b> 443–463.

Yang, R.-Y. and Berger, J. O. (1994). Estimation of a covariance matrix using the reference prior. <i>Ann. Statist.</i> <b>22</b> 1195–1211.

Yuan, M. and Huang, J. Z. (2009). Regularized parameter estimation of high dimensional <i>t</i> distribution. <i>J. Statist. Plann. Inference</i> <b>139</b> 2284–2292.

Yule, G. U. (1907). On the theory of correlation for any number of variables, treated by a new system of notation. <i>Roy. Soc. Proc.</i> <b>79</b> 85–96.

Yule, G. U. (1927). On a model of investigating periodicities in disturbed series with special reference to Wolfer’s sunspot numbers. <i>Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.</i> <b>226</b> 267–298.

Zimmerman, D. L. (2000). Viewing the correlation structure of longitudinal data through a PRISM. <i>Amer. Statist.</i> <b>54</b> 310–318.

Zimmerman, D. L. and Núñez-Antón, V. (2001). Parametric modelling of growth curve data: An overview (with discussion). <i>Test</i> <b>10</b> 1–73.

Zimmerman, D. L. and Núñez-Antón, V. A. (2010). <i>Antedependence Models for Longitudinal Data. Monographs on Statistics and Applied Probability</i> <b>112</b>. CRC Press, Boca Raton, FL.