Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates
Tóm tắt
The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.
We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified.
The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at
Từ khóa
Tài liệu tham khảo
Fuhrman JA: Microbial community structure and its functional implications. Nature. 2009, 459: 193-199. 10.1038/nature08058.
Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY, Chow CE, Sachdeva R, Jones AC, Schwalbach MS: Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J. 2011, 5: 1414-1425. 10.1038/ismej.2011.24.
Chaffron S, Rehrauer H, Pernthaler J, von Mering C: A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res. 2010, 20: 947-959. 10.1101/gr.104521.109.
Fisher MM, Triplett EW: Automated approach for ribosomal intergenic spacer analysis of microbial diversity and its application to freshwater bacterial communities. Appl Environ Microbiol. 1999, 65: 4630-4636.
Stepanauskas R, Moran MA, Bergamaschi BA, Hollibaugh JT: Covariance of bacterioplankton composition and environmental variables in a temperate delta system. Aquat Microb Ecol. 2003, 31: 85-98.
Van Mooy BAS, Devol AH, Keil RG: Relationship between bacterial community structure, light, and carbon cycling in the eastern subarctic North Pacific. Limnology and Oceanography. 2004, 49: 1056-1062. 10.4319/lo.2004.49.4.1056.
Yannarell AC, Triplett EW: Geographic and environmental sources of variation in lake bacterial community composition. Appl Environ Microbiol. 2005, 71: 227-239. 10.1128/AEM.71.1.227-239.2005.
Yannarell AC, Triplett EW: Within- and between-lake variability in the composition of bacterioplankton communities: investigations using multiple spatial scales. Appl Environ Microbiol. 2004, 70: 214-223. 10.1128/AEM.70.1.214-223.2004.
Li X, Rao S, Jiang W, Li C, Xiao Y, Guo Z, Zhang Q, Wang L, Du L, Li J: Discovery of Time-Delayed Gene Regulatory Networks based on temporal gene expression profiling. BMC Bioinformatics. 2006, 7: 26-10.1186/1471-2105-7-26.
Paver SF, Kent AD: Temporal patterns in glycolate-utilizing bacterial community composition correlate with phytoplankton population dynamics in humic lakes. Microb Ecol. 2010, 60: 406-418. 10.1007/s00248-010-9722-6.
Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F: Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics. 2006, 22: 2532-2538. 10.1093/bioinformatics/btl417.
Wang G, Yin L, Zhao Y, Mao K: Efficiently mining time-delayed gene expression patterns. IEEE Trans Syst Man Cybern B Cybern. 2010, 40: 400-411.
Shade A, Chiu CY, McMahon KD: Differential bacterial dynamics promote emergent community robustness to lake mixing: an epilimnion to hypolimnion transplant experiment. Environ Microbiol. 2010, 12: 455-466. 10.1111/j.1462-2920.2009.02087.x.
Lee ML, Kuo FC, Whitmore GA, Sklar J: Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A. 2000, 97: 9834-9839. 10.1073/pnas.97.18.9834.
Nguyen TT, Almon RR, DuBois DC, Jusko WJ, Androulakis IP: Importance of replication in analyzing time-series gene expression data: corticosteroid dynamics and circadian patterns in rat liver. BMC Bioinformatics. 2010, 11: 279-10.1186/1471-2105-11-279.
Balasubramaniyan R, Hullermeier E, Weskamp N, Kamper J: Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics. 2005, 21: 1069-1077. 10.1093/bioinformatics/bti095.
Zhu D, Li Y, Li H: Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data. Bioinformatics. 2007, 23: 2298-2305. 10.1093/bioinformatics/btm328.
Yao J, Chang C, Salmi ML, Hung YS, Loraine A, Roux SJ: Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient. BMC Bioinformatics. 2008, 9: 288-10.1186/1471-2105-9-288.
Littell RC, Pendergast J, Natarajan R: Modelling covariance structure in the analysis of repeated measures data. Stat Med. 2000, 19: 1793-1819. 10.1002/1097-0258(20000715)19:13<1793::AID-SIM482>3.0.CO;2-Q.
Efron B, Tibshirani R: An Introduction to the Bootstrap. 1998, Boca Raton; London: Chapman & Hall/CRC
Li KC: Genome-wide coexpression dynamics: theory and application. Proc Natl Acad Sci U S A. 2002, 99: 16875-16880. 10.1073/pnas.252466999.
Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007, 2: 2366-2382. 10.1038/nprot.2007.324.
Countway PD, Vigil PD, Schnetzer A, Moorthi SD, Caron DA: Seasonal analysis of protistan community structure and diversity at the USC Microbial Observatory (San Pedro Channel, North Pacific Ocean). Limnology and Oceanography. 2010, 55: 2381-2396. 10.4319/lo.2010.55.6.2381.
Vigil P, Countway PD, Rose J, Lonsdale DJ, Gobler CJ, Caron DA: Rapid shifts in dominant taxa among microbial eukaryotes in estuarine ecosystems. Aquat Microb Ecol. 2008, 54: 83-100.
Bar-Joseph Z: Analyzing time series gene expression data. Bioinformatics. 2004, 20: 2493-2503. 10.1093/bioinformatics/bth283.
Tai YC, Speed TP: On gene ranking using replicated microarray time course data. Biometrics. 2009, 65: 40-51. 10.1111/j.1541-0420.2008.01057.x.
Tai YC, Speed TP: A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat. 2006, 34: 2387-2412. 10.1214/009053606000000759.
Wang J, Kim SK: Global analysis of dauer gene expression in Caenorhabditis elegans. Development. 2003, 130: 1621-1634. 10.1242/dev.00363.