Adaptive gPCA: A method for structured dimensionality reduction with applications to microbiome data

Annals of Applied Statistics - Tập 13 Số 2 - 2019

Julia Fukuyama¹

¹Indiana University,

Tóm tắt

Từ khóa

Tài liệu tham khảo

Allen, G. I., Grosenick, L. and Taylor, J. (2014). A generalized least-square matrix decomposition. J. Amer. Statist. Assoc. 109 145–159.

Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24 1175–1182.

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.

Paradis, E., Claude, J. and Strimmer, K. (2004). Ape: Analyses of phylogenetics and evolution in R language. Bioinformatics 20 289–290.

Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9 18–29.

Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.

Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.

Chen, J., Bittinger, K., Charlson, E. S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R. G., Bushman, F. D. and Li, H. (2012). Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28 2106–2113.

Pavoine, S., Dufour, A.-B. and Chessel, D. (2004). From dissimilarities among species to dissimilarities among communities: A double principal coordinate analysis. J. Theoret. Biol. 228 523–537.

Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Stat. 5 2326–2358.

Dethlefsen, L. and Relman, D. A. (2011). Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc. Natl. Acad. Sci. USA 108 4554–4561.

Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist. 37 2922–2952.

Callahan, B. J., Sankaran, K., Fukuyama, J. A., McMurdie, P. J. and Holmes, S. P. (2016). Bioconductor workflow for microbiome data analysis: From raw reads to community analyses. F1000Res 5 1492.

Caussinus, H. (1986). Models and uses of principal component analysis. Multidimensional Data Analysis 86 149–170.

Chang, Q., Luan, Y. and Sun, F. (2011). Variance adjusted weighted unifrac: A powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinform. 12 1.

Cohan, F. M. (2002). What are bacterial species? Annual Reviews in Microbiology 56 457–487.

Doolittle, W. F. and Papke, R. T. (2006). Genomics and the bacterial species problem. Genome Biol. 7 1.

Dray, S., Pavoine, S. and Aguirre de Cárcer, D. (2015). Considering external information to improve the phylogenetic comparison of microbial communities: A new approach based on constrained double principal coordinates analysis (cdpcoa). Molecular Ecology Resources 15 242–249.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than blast. Bioinformatics 26 2460–2461.

Escoufier, Y. (1973). Le traitement des variables vectorielles. Biometrics 29 751–760.

Fernandes, A. D., Reid, J. N., Macklaim, J. M., McMurrough, T. A., Edgell, D. R. and Gloor, G. B. (2014). Unifying the analysis of high-throughput sequencing datasets: Characterizing rna-seq, 16s rrna gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2 15.

Filzmoser, P., Hron, K. and Reimann, C. (2009). Principal component analysis for compositional data with outliers. Environmetrics 20 621–632.

Fukuyama, J. (2019). Supplement to “Adaptive gPCA: A method for structured dimensionality reduction with applications to microbiome data.” <a href="DOI:10.1214/18-AOAS1227SUPP">DOI:10.1214/18-AOAS1227SUPP</a>.

Holmes, S. (2008). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman. Inst. Math. Stat. (IMS) Collect. 2 219–233. IMS, Beachwood, OH.

Love, M. I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 550.

Lozupone, C. and Knight, R. (2005). Unifrac: A new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71 8228–8235.

Lozupone, C. A., Hamady, M., Kelley, S. T. and Knight, R. (2007). Quantitative and qualitative $\beta$ diversity measures lead to different insights into factors that structure microbial communities. Applied and Environmental Microbiology 73 1576–1585.

McMurdie, P. J. and Holmes, S. (2014). Waste not, want not: Why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10 e1003531.

Penrose, R. (1955). A generalized inverse for matrices. Proc. Camb. Philos. Soc. 51 406–413.

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013). The silva ribosomal rna gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 41 D590–D596.

Randolph, T. W., Zhao, S., Copeland, W., Hullar, M. and Shojaie, A. (2018). Kernel-penalized regression for analysis of microbiome data. Ann. Appl. Stat. 12 540–566.

Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinform. 8 35.

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.

R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Brenner, D. J., Staley, J. T. and Krieg, N. R. (2005). Classification of procaryotic organisms and the concept of bacterial speciation. In Bergey’s Manual of Systematic Bacteriology 27–32. Springer, Berlin.

Chang, W., Cheng, J., Allaire, J., Xie, Y. and McPherson, J. (2016). shiny: Web Application Framework for R. R package version 0.13.2.

Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th International Conference on Machine Learning 315–322.

Matsen, F. A. and Evans, S. N. (2013). Edge principal components and squash clustering: Using the special structure of phylogenetic placement data for sample comparison. PLoS ONE.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]