Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis

Annals of Applied Statistics - Tập 7 Số 1 - 2013
Jun Chen1,2,3, Hongzhe Li1,2,3
1Department of Biostatistics and Epidemiology
2 University of Pennsylvania Philadelphia, Pennsylvania 19104-6021 USA
3University of Pennsylvania,

Tóm tắt

Từ khóa


Tài liệu tham khảo

Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J. R. and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. <i>Ann. Appl. Stat.</i> <b>4</b> 53–77.

Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. <i>Ann. Statist.</i> <b>37</b> 3468–3497.

Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate $\beta$-distribution, and correlations among proportions. <i>Biometrika</i> <b>49</b> 65–82.

Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. <i>J. R. Stat. Soc. Ser. B Stat. Methodol.</i> <b>70</b> 53–71.

Aitchison, J. (1982). The statistical analysis of compositional data. <i>J. R. Stat. Soc. Ser. B Stat. Methodol.</i> <b>44</b> 139–177.

Bäckhed, F., Ley, R. E., Sonnenburg, J. L., Peterson, D. A. and Gordon, J. I. (2005). Host-bacterial mutualism in the human intestine. <i>Science</i> <b>307</b> 1915–1920.

Barry, S. and Welsh, A. (2002). Generalized additive modelling and zero inflated count data. <i>Ecological Modelling</i> <b>157</b> 179–188.

Benson, A. K., Kelly, S. A., Legge, R., Ma, F., Low, S. J., Kim, J., Zhang, M., Oh, P. L., Nehrenberg, D., Hua, K. et al. (2010). Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. <i>Proc. Natl. Acad. Sci. USA</i> <b>107</b> 18933–18938.

Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Peña, A. G., Goodrich, J. K., Gordon, J. I. et al. (2010). QIIME allows analysis of high-throughput community sequencing data. <i>Nature Methods</i> <b>7</b> 335–336.

Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Preprint. Available at <a href="arXiv:1001.0736">arXiv:1001.0736</a>.

Lee, A. H., Wang, K., Scott, J. A., Yau, K. K. W. and McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. <i>Stat. Methods Med. Res.</i> <b>15</b> 47–61.

Matsen, F. A., Kodner, R. B. and Armbrust, E. V. (2010). pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. <i>BMC Bioinformatics</i> <b>11</b> 538.

McArdle, B. H. (2001). Fitting multivariate models to community data: A comment on distance-based redundancy analysis. <i>Ecology</i> <b>82</b> 290–297.

Moghimbeigi, A., Eshraghian, M. R., Mohammad, K. and McArdle, B. (2008). Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros. <i>J. Appl. Stat.</i> <b>35</b> 1193–1202.

Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J. et al. (2009). Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. <i>Applied and Environmental Microbiology</i> <b>75</b> 7537–7541.

Sokol, H., Pigneur, B., Watterlot, L., Lakhdari, O., Bermúdez-Humarán, L. G., Gratadoux, J. J., Blugeon, S., Bridonneau, C., Furet, J. P., Corthier, G. et al. (2008). Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. <i>Proc. Natl. Acad. Sci. USA</i> <b>105</b> 16731–16736.

Tseng, P. and Yun, S. (2008). A coordinate gradient descent method for nonsmooth separable minimization. <i>Math. Program.</i> <b>117</b> 387–423.

Virgin, H. W. and Todd, J. A. (2011). Metagenomics and personalized medicine. <i>Cell</i> <b>147</b> 44–56.

Wu, G. D., Chen, J., Hoffmann, C., Bittinger, K., Chen, Y. Y., Keilbaugh, S. A., Bewtra, M., Knights, D., Walters, W. A., Knight, R. et al. (2011). Linking long-term dietary patterns with gut microbial enterotypes. <i>Science</i> <b>334</b> 105–108.

Zhang, H. H., Liu, Y., Wu, Y. and Zhu, J. (2008). Variable selection for the multicategory SVM via adaptive sup-norm regularization. <i>Electron. J. Stat.</i> <b>2</b> 149–167.

Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In <i>ICML’</i>08: <i>Proceedings of the</i> 25<i>th International Conference on Machine Learning</i> 33–40. ACM, New York.

Legendre, P. and Legendre, L. (2002). <i>Numerical Ecology</i>, 2nd ed. Elsevier, Amsterdam.