Adaptive gPCA: A method for structured dimensionality reduction with applications to microbiome data
Tóm tắt
Từ khóa
Tài liệu tham khảo
Allen, G. I., Grosenick, L. and Taylor, J. (2014). A generalized least-square matrix decomposition. <i>J. Amer. Statist. Assoc.</i> <b>109</b> 145–159.
Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. <i>Bioinformatics</i> <b>24</b> 1175–1182.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. <i>J. R. Stat. Soc. Ser. B. Stat. Methodol.</i> <b>67</b> 91–108.
Paradis, E., Claude, J. and Strimmer, K. (2004). Ape: Analyses of phylogenetics and evolution in R language. <i>Bioinformatics</i> <b>20</b> 289–290.
Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. <i>Biostatistics</i> <b>9</b> 18–29.
Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. <i>J. Amer. Statist. Assoc.</i> <b>104</b> 682–693.
Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. <i>Biostatistics</i> <b>10</b> 515–534.
Chen, J., Bittinger, K., Charlson, E. S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R. G., Bushman, F. D. and Li, H. (2012). Associating microbiome composition with environmental covariates using generalized UniFrac distances. <i>Bioinformatics</i> <b>28</b> 2106–2113.
Pavoine, S., Dufour, A.-B. and Chessel, D. (2004). From dissimilarities among species to dissimilarities among communities: A double principal coordinate analysis. <i>J. Theoret. Biol.</i> <b>228</b> 523–537.
Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. <i>Ann. Appl. Stat.</i> <b>5</b> 2326–2358.
Dethlefsen, L. and Relman, D. A. (2011). Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. <i>Proc. Natl. Acad. Sci. USA</i> <b>108</b> 4554–4561.
Rinaldo, A. (2009). Properties and refinements of the fused lasso. <i>Ann. Statist.</i> <b>37</b> 2922–2952.
Callahan, B. J., Sankaran, K., Fukuyama, J. A., McMurdie, P. J. and Holmes, S. P. (2016). Bioconductor workflow for microbiome data analysis: From raw reads to community analyses. <i>F</i>1000<i>Res</i> <b>5</b> 1492.
Caussinus, H. (1986). Models and uses of principal component analysis. <i>Multidimensional Data Analysis</i> <b>86</b> 149–170.
Chang, Q., Luan, Y. and Sun, F. (2011). Variance adjusted weighted unifrac: A powerful beta diversity measure for comparing communities based on phylogeny. <i>BMC Bioinform.</i> <b>12</b> 1.
Cohan, F. M. (2002). What are bacterial species? <i>Annual Reviews in Microbiology</i> <b>56</b> 457–487.
Doolittle, W. F. and Papke, R. T. (2006). Genomics and the bacterial species problem. <i>Genome Biol.</i> <b>7</b> 1.
Dray, S., Pavoine, S. and Aguirre de Cárcer, D. (2015). Considering external information to improve the phylogenetic comparison of microbial communities: A new approach based on constrained double principal coordinates analysis (cdpcoa). <i>Molecular Ecology Resources</i> <b>15</b> 242–249.
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than blast. <i>Bioinformatics</i> <b>26</b> 2460–2461.
Escoufier, Y. (1973). Le traitement des variables vectorielles. <i>Biometrics</i> <b>29</b> 751–760.
Fernandes, A. D., Reid, J. N., Macklaim, J. M., McMurrough, T. A., Edgell, D. R. and Gloor, G. B. (2014). Unifying the analysis of high-throughput sequencing datasets: Characterizing rna-seq, 16s rrna gene sequencing and selective growth experiments by compositional data analysis. <i>Microbiome</i> <b>2</b> 15.
Filzmoser, P., Hron, K. and Reimann, C. (2009). Principal component analysis for compositional data with outliers. <i>Environmetrics</i> <b>20</b> 621–632.
Fukuyama, J. (2019). Supplement to “Adaptive gPCA: A method for structured dimensionality reduction with applications to microbiome data.” <a href="DOI:10.1214/18-AOAS1227SUPP">DOI:10.1214/18-AOAS1227SUPP</a>.
Holmes, S. (2008). Multivariate data analysis: The French way. In <i>Probability and Statistics</i>: <i>Essays in Honor of David A. Freedman. Inst. Math. Stat.</i> (<i>IMS</i>) <i>Collect.</i> <b>2</b> 219–233. IMS, Beachwood, OH.
Love, M. I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. <i>Genome Biol.</i> <b>15</b> 550.
Lozupone, C. and Knight, R. (2005). Unifrac: A new phylogenetic method for comparing microbial communities. <i>Applied and Environmental Microbiology</i> <b>71</b> 8228–8235.
Lozupone, C. A., Hamady, M., Kelley, S. T. and Knight, R. (2007). Quantitative and qualitative $\beta$ diversity measures lead to different insights into factors that structure microbial communities. <i>Applied and Environmental Microbiology</i> <b>73</b> 1576–1585.
McMurdie, P. J. and Holmes, S. (2014). Waste not, want not: Why rarefying microbiome data is inadmissible. <i>PLoS Comput. Biol.</i> <b>10</b> e1003531.
Penrose, R. (1955). A generalized inverse for matrices. <i>Proc. Camb. Philos. Soc.</i> <b>51</b> 406–413.
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013). The silva ribosomal rna gene database project: Improved data processing and web-based tools. <i>Nucleic Acids Res.</i> <b>41</b> D590–D596.
Randolph, T. W., Zhao, S., Copeland, W., Hullar, M. and Shojaie, A. (2018). Kernel-penalized regression for analysis of microbiome data. <i>Ann. Appl. Stat.</i> <b>12</b> 540–566.
Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. <i>BMC Bioinform.</i> <b>8</b> 35.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. <i>Proc. Natl. Acad. Sci. USA</i> <b>102</b> 15545–15550.
R Core Team (2017). <i>R</i>: <i>A Language and Environment for Statistical Computing</i>. R Foundation for Statistical Computing, Vienna, Austria.
Brenner, D. J., Staley, J. T. and Krieg, N. R. (2005). Classification of procaryotic organisms and the concept of bacterial speciation. In <i>Bergey’s Manual of Systematic Bacteriology</i> 27–32. Springer, Berlin.
Chang, W., Cheng, J., Allaire, J., Xie, Y. and McPherson, J. (2016). shiny: Web Application Framework for R. R package version 0.13.2.
Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In <i>Proceedings of the</i> 19<i>th International Conference on Machine Learning</i> 315–322.
