Khám Phá Dữ Liệu Chuỗi Cao Qua 16S rRNA về Đái Sự Đa Dạng VSV và Tương Tác

Springer Science and Business Media LLC - Tập 99 - Trang 4119-4129 - 2015
Feng Ju1, Tong Zhang1
1Environmental Biotechnology Lab, The University of Hong Kong, Hong Kong, China

Tóm tắt

Sự tồn tại phổ biến của vi sinh vật dẫn đến những mối quan ngại liên tục của công chúng về khả năng gây bệnh của chúng cũng như những mối đe dọa đối với môi trường con người, đồng thời là những lợi ích tiềm năng trong kỹ thuật sinh học. Sự phát triển và ứng dụng rộng rãi của công nghệ sinh học môi trường, chẳng hạn như trong sản xuất năng lượng sinh học, xử lý nước thải, phục hồi môi trường và khử trùng nước uống, đã mang đến cho chúng ta những lợi ích cả về môi trường lẫn kinh tế. Đáng chú ý, việc ứng dụng rộng rãi các kỹ thuật vi mô và phân tử từ những năm 1990 đã cho phép các kỹ sư nhìn vào sinh học vi sinh trong "hộp đen" của các cộng đồng vi sinh vật được thiết kế trong các quy trình công nghệ sinh học, cung cấp hướng dẫn cho thiết kế và tối ưu hóa quy trình. Gần đây, những tiến bộ cách mạng trong công nghệ giải trình tự DNA và chi phí nhanh chóng giảm đang làm thay đổi các phương pháp nghiên cứu sinh học vi sinh và sinh thái truyền thống, đánh dấu sự khởi đầu của kỷ nguyên giải trình tự thế hệ mới (NGS). Các gánh nặng nghiên cứu chính giờ đây đã chuyển từ các thí nghiệm trong phòng thí nghiệm truyền thống sang việc xử lý phân tích dữ liệu NGS khổng lồ và thông tin, điều này gây tốn kém về tính toán và thách thức về bioinformatics. Nghiên cứu này thảo luận về bioinformatics và phân tích thống kê tiên tiến của dữ liệu giải trình tự RNA ribosome 16S (rRNA) với độ phân giải cao từ các nền tảng NGS phổ biến nhằm thúc đẩy việc ứng dụng của nó trong việc khám phá sự đa dạng vi sinh vật của các vi sinh vật chức năng và gây bệnh, cũng như các tương tác của chúng trong các quy trình công nghệ sinh học.

Từ khóa

#vi sinh vật #công nghệ sinh học môi trường #giải trình tự DNA #16S rRNA #tương tác vi sinh vật

Tài liệu tham khảo

Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31(6):533–538 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Angiuoli SV, Matalka M, Gussman A, Galens K, Vangala M, Riley DR, Arze C, White JR, White O, Fricke WF (2011) CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinforma 12(1):356–370 Bragg L, Stone G, Imelfort M, Hugenholtz P, Tyson GW (2012) Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods 9(5):425–426 Cai L, Ju F, Zhang T (2013) Tracking human sewage microbiome in a municipal wastewater treatment plant. Appl Microbiol Biotechnol 98(7):3317–3326 Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R (2010a) PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26(2):266–267 Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI (2010b) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336 Cole J, Wang Q, Cardenas E, Fish J, Chai B, Farris R, Kulam-Syed-Mohideen A, McGarrell D, Marsh T, Garrity G (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37(1):141–145 Doncheva NT, Assenov Y, Domingues FS, Albrecht M (2012) Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc 7(4):670–685 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797 Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461 Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27(16):2194–2200 Evans J, Sheneman L, Foster J (2006) Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J Mol Evol 62(6):785–792 Faust K, Raes J (2012) Microbial interactions: from networks to models. Nat Rev Microbiol 10(8):538–550 Friedman J, Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8(9):e1002687 Gobet A, Quince C, Ramette A (2010) Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets. Nucleic Acids Res 38(15):e155–e155 Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86 Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, Knight R, Ley RE (2014) Conducting a microbiome study. Cell 158(2):250–262 Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321 Guo F, Zhang T (2012) Profiling bulking and foaming bacteria in activated sludge by high throughput sequencing. Water Res 46(8):2772–2782 Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21(3):494–504 Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8):754–755 Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386 Ibarbalz FM, Figuerola EL, Erijman L (2013) Industrial activated sludge exhibit unique bacterial community composition at high taxonomic ranks. Water Res 47(11):3854–3864 Ju F, Zhang T (2014a) Bacterial assembly and temporal dynamics in activated sludge of a full-scale municipal wastewater treatment plant. ISME J 9:683–695 Ju F, Zhang T (2014b) Novel microbial populations in ambient and mesophilic biogas-producing and phenol-degrading consortia unraveled by high-throughput sequencing. Microb Ecol 68(2):235–246 Ju F, Guo F, Ye L, Xia Y, Zhang T (2013) Metagenomic analysis on seasonal microbial variations of activated sludge from a full-scale wastewater treatment plant over 4 years. Environ Microbiol Rep 6(1):80–89 Ju F, Xia Y, Guo F, Wang Z, Zhang T (2014) Taxonomic relatedness shapes bacterial assembly in activated sludge of globally distributed wastewater treatment plants. Environ Microbiol 16(8):2421–2432 Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12(4):656–664 Knights D, Costello EK, Knight R (2011) Supervised classification of human microbiota. FEMS Microbiol Rev 35(2):343–359 Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Thurber RLV, Knight R (2013) Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31(9):814–821 Lassmann T, Sonnhammer EL (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinforma 6(1):298–306 Lix LM, Keselman JC, Keselman H (1996) Consequences of assumption violations revisited: a quantitative review of alternatives to the one-way analysis of variance F test. Rev Educ Res 66(4):579–619 Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30(5):434–439 Ludwig W, Strunk O, Westram R, Richter L, Meier H, Buchner A, Lai T, Steppi S, Jobb G, Förster W (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32(4):1363–1371 Mao Y, Yu K, Xia Y, Chao Y, Zhang T (2014) Genome reconstruction and gene expression of “Candidatus Accumulibacter phosphatis” clade IB performing biological phosphorus removal. Environ Sci Technol 48(17):10363–10371 Minoche AE, Dohm JC, Himmelbauer H (2011) Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12(11):R112 Morariu VI, Srinivasan BV, Raykar VC, Duraiswami R, Davis LS (2009) Automatic online tuning for fast Gaussian summation. In: Advances in neural information processing systems, 1(1):1113-1120 Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217 Oswald ES, Brown LM, Bulinski JC, Hung CT (2011) Label-free protein profiling of adipose-derived human stem cells under hyperosmotic treatment. J Proteome Res 10(7):3050–3059 Page RD (2001) TreeView. Glasgow University, Glasgow, UK Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9):1073–1079 Peng X, Guo F, Ju F, Zhang T (2014) Shifts in the microbial community, nitrifiers and denitrifiers in the biofilm in a full-scale rotating biological contactor. Environ Sci Technol 48(14):8044–8052 Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One 5(3):e9490 Proulx SR, Promislow DE, Phillips PC (2005) Network thinking in ecology and evolution. Trends Ecol Evol 20(6):345–353 Pruesse E, Peplies J, Glöckner FO (2012) SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28(14):1823–1829 Qian P-Y, Wang Y, Lee OO, Lau SC, Yang J, Lafi FF, Al-Suwailem A, Wong TY (2010) Vertical stratification of microbial communities in the Red Sea revealed by 16S rDNA pyrosequencing. ISME J 5(3):507–518 Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1):341–353 Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011) Removing noise from pyrosequenced amplicons. BMC Bioinforma 12(1):38–55 Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62(2):142–160 Reeder J, Knight R (2010) Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution. Nat Methods 7(9):668–669 Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB (2013) Characterizing and measuring bias in sequence data. Genome Biol 14(5):R51 Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F (2006) Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 22(20):2532–2538 Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541 Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):531–536 Soergel DA, Dey N, Knight R, Brenner SE (2012) Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J 6(7):1440–1444 Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313 Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY, Chow C-ET, Sachdeva R, Jones AC, Schwalbach MS (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J 5(9):1414–1425 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739 Vanwonterghem I, Jensen PD, Ho DP, Batstone DJ, Tyson GW (2014) Linking microbial community structure, interactions and function in anaerobic digesters using new molecular techniques. Curr Opin Biotechnol 27:55–64 Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73(16):5261–5267 Wright ES, Yilmaz LS, Noguera DR (2012) DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Appl Environ Microbiol 78(3):717–725 Xia Y, Cai L, Zhang T, Fang HH (2012) Effects of substrate loading and co-substrates on thermophilic anaerobic conversion of microcrystalline cellulose and microbial communities revealed using high-throughput sequencing. Int J Hydrog Energy 37(18):13652–13659 Xu Z, Malmer D, Langille MG, Way SF, Knight R (2014) Which is more important for classifying microbial communities: who’s there or what they can do&quest. ISME J 8:2357–2359 Yang Y, Jiang XT, Zhang T (2014) Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes. PLoS One 9(10):e110947 Ye L, Shao MF, Zhang T, Tong AHY, Lok S (2011) Analysis of the bacterial community in a laboratory-scale nitrification reactor and a wastewater treatment plant by 454-pyrosequencing. Water Res 45(15):4390–4398 Yu K, Zhang T (2013) Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach. PLoS One 8(4):e59831 Zhang T, Shao M-F, Ye L (2012) 454 Pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants. ISME J 6(6):1137–1147