A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
Tóm tắt
Inferring large-scale covariance matrices from sparse genomic data is an ubiquitous problem in bioinformatics. Clearly, the widely used standard covariance and correlation estimators are ill-suited for this purpose. As statistically efficient and computationally fast alternative we propose a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity.Subsequently, we apply this improved covariance estimator (which has guaranteed minimum mean squared error, is well-conditioned, and is always positive definite even for small sample sizes) to the problem of inferring large-scale gene association networks. We show that it performs very favorably compared to competing approaches both in simulations as well as in application to real expression data.
Từ khóa
Tài liệu tham khảo
Magwene, 2004, and Estimating genomic coexpression networks using first - order conditional independence, Genome Biology, 5, 10.1186/gb-2004-5-12-r100
Tibshirani, 2002, Diagnosis of multiple cancer type by shrunken centroids of gene expression, Proc Natl Acad Sci USA, 99, 6567, 10.1073/pnas.082099299
Efron, 1977, and Stein s paradox in statistics, Sci Am, 236, 119, 10.1038/scientificamerican0577-119
Greenland, 2000, Principles of multilevel modelling Intl, Epidemiol, 29, 158
Leung, 1998, and Estimation of the scale matrix and its eigen - values in the Wishart and the multivariate F distributions Statist Math, Ann Inst, 50, 523, 10.1023/A:1003529529228
Efron, 2004, Large - scale simultaneous hypothesis testing : the choice of a null hypothesis Amer Statist, Assoc, 99, 96, 10.1198/016214504000000089
Hoerl, 1970, and a Ridge regression : applications to nonorthogonal problems, Technometrics, 12, 69, 10.1080/00401706.1970.10488635
Hoerl, 1970, and Ridge regression : biased estimation for nonorthogonal problems, Technometrics, 12, 55, 10.1080/00401706.1970.10488634
Morris, 1983, Parametric empirical Bayes inference : theory and applica - tions Amer Statist, Assoc, 78, 47, 10.1080/01621459.1983.10477920
Ledoit, 2003, and Improved estimation of the covariance matrix of stock returns with an application to portfolio selection Empir, Finance, 10, 603
Efron, 1975, and Data analysis using Stein s estimator and its generalizations Amer Statist, Assoc, 70, 311, 10.1080/01621459.1975.10479864
Efron, 1975, Biased versus unbiased estimation Adv, Math, 16, 259
Butte, 2000, Discov - ering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks, Proc Natl Acad Sci USA, 97, 12182, 10.1073/pnas.220392197
Wille, 2004, von Rohr Bühlmann Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana, Genome Biology, 5, 10.1186/gb-2004-5-11-r92
Eisen, 1998, Cluster analysis and display of genome - wide expression patterns, Proc Natl Acad Sci USA, 95, 14863, 10.1073/pnas.95.25.14863
Smyth, 2004, Linear models and empirical Bayes methods for assessing differential expression in microarray experiments Statist Biol, Appl Genet Mol, 3, 3
Cui, 2005, Improved statistical tests for differential gene expression by shrinking variance components estimates, Biostatistics, 6, 59, 10.1093/biostatistics/kxh018
Cox, 2004, and A note on pseudolikelihood from marginal densities, Biometrika, 91, 729, 10.1093/biomet/91.3.729