A systematic search for discriminating sites in the 16S ribosomal RNA gene

Springer Science and Business Media LLC - Tập 4 - Trang 1-9 - 2014
Hilde Vinje1, Trygve Almøy1, Kristian Hovde Liland1,2, Lars Snipen1
1Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Ås, Norway
2Nofima AS, Ås, Norway

Tóm tắt

The 16S rRNA is by far the most common genomic marker used for prokaryotic classification, and has been used extensively in metagenomic studies over recent years. Along the 16S gene there are regions with more or less variation across the kingdom of bacteria. Nine variable regions have been identified, flanked by more conserved parts of the sequence. It has been stated that the discriminatory power of the 16S marker lies in these variable regions. In the present study we wanted to examine this more closely, and used a supervised learning method to search systematically for sites that contribute to correct classification at either the phylum or genus level. When classifying phyla the site selection algorithm located 50 discriminative sites. These were scattered over most of the alignments and only around half of them were located in the variable regions. The selected sites did, however, have an entropy significantly larger than expected, meaning they are sites of large variation. We found that the discriminative sites typically have a large entropy compared to their closest neighbours along the alignments. When classifying genera the site selection algorithm needed around 80% of the sites in the 16S gene before the classification error reached a minimum. This means that all variation, in both variable and conserved regions, is needed in order to separate genera. Our findings does not support the statement that the discriminative power of the 16S gene is located only in the variable regions. Variable regions are important, but just as many discriminative sites are found in the more conserved parts. The discriminative power is typically found in sites of large variation located inside shorter regions of higher conservation.

Tài liệu tham khảo

Woese CR, Stackebrand E, Macke TJ, Fox GE: A phylogenetic definition of the major eubacterial taxa. Syst Appl Microbiol. 1985, 6: 143-151. 10.1016/S0723-2020(85)80047-3. Woese CR: Bacterial evolution. Syst Appl Microbiol. 1987, 51: 221-271. Pace NR: A molecular view of microbial diversity and the biosphere. Science. 1997, 276: 734-740. 10.1126/science.276.5313.734. Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA. 1977, 74 (11): 5088-90. 10.1073/pnas.74.11.5088. Harmsen D, Karch H: 16S rDNA for diagnosing pathogens: a living tree. ASM News. 2004, 70: 19-24. Van de Peer Y, Chapelle S, De Wachter R: A quantitative map of nucleotide substitution rates in bacterial rRNA. Nucleic Acids Res. 1996, 24: 3381-3391. 10.1093/nar/24.17.3381. Clarridge JE: Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin Microbiol. 2004, 17: 840-862. 10.1128/CMR.17.4.840-862.2004. Chakravorty S, Helb D, Burday M, Connell N, Alland D: A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods. 2007, 69 (2): 330-339. 10.1016/j.mimet.2007.02.005. Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M: Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies. PLoS One. 2012, 7 (8): e42671-10.1371/journal.pone.0042671. doi: 10.1371/journal.pone.0042671. Bartlett JMS, Stirling D: A short history of the polymerase chain reaction. Methods Mol Biol. 2003, 226: 3-6. Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin L, Pace NR: Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Nat Acad Sci. 1985, 82: 6955-6959. 10.1073/pnas.82.20.6955. Baker GC, Smith JJ, Cowan DA: Review and re-analysis of domain-specific 16S primers. J Microbiol Methods. 2003, 55: 541-555. 10.1016/j.mimet.2003.08.009. Wang Y, Qian P: Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS ONE. 2009, 4 (10): e7401-10.1371/journal.pone.0007401. doi:10.1371/journal.pone.0007401. Mao D, Zhou Q, Chen C, Quan Z: Coverage evaluation of universal bacterial primers using the metagenomic datasets. BMC Microbiol. 2012, 12: 66-10.1186/1471-2180-12-66. Winsley T, van Dorst JM, Brown MV, Ferrari BC: Capturing greater 16S rRNA gene sequence diversity within the domain bacteria. Appl Environ Microbiol. 2012, 78: 5938-5941. 10.1128/AEM.01299-12. Cai L, Ye L, Tong AHY, Lok S, Zhang T: Biased diversity metrics revealed by bacterial 16S Pyrotags derived from different primer sets. PLoS ONE. 2013, 8 (1): e53649-10.1371/journal.pone.0053649. doi:10.1371/journal.pone.0053649. Mizrahi-Man O, Davenport ER, Gilad Y: Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: evaluation of effective study designs. PLoS ONE. 2013, 8 (1): e53608-10.1371/journal.pone.0053608. doi:10.1371/journal.pone.0053608 DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen G L: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006, 72: 5069-5072. 10.1128/AEM.03006-05. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje J M: The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2008, 37: D141-D145. Pruesse E, Quast C, Knittel K, Fuchs B, Ludwig W, Peplies J, Glöckner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35: 7188-7196. 10.1093/nar/gkm864. Greengenes database. [http://greengenes.lbl.gov/cgi-bin/nph-index.cgi] Ribosomal Database Project. [http://rdp.cme.msu.edu/] SILVA database. [http://www.arb-silva.de/] Wold S, Martens H, Wold H: The multivariate calibration problem in chemistry solved by the PLS method. Lect Notes Math. 1983, 973: 286-293. Mehmood T, Martens H, Warringer J, Snipen L, Sæbø S: Mining for genotype-phenotype relations in Saccharomyces using partial least squares. BMC Bioinformatics. 2011, 12: 318-10.1186/1471-2105-12-318. Mehmood T, Bohlin J, Kristoffersen AB, Warringer J, Snipen L, Sæbø S: Exploration of multivariate analysis in microbial coding sequence modeling. BMC Bioinformatics. 2012, 13: 97-10.1186/1471-2105-13-97. Mehmood T, Liland KH, Snipen L, Sæbø S: A review of variable selection methods in partial least squares regression. Chemometrics Intell Lab Syst. 2012, 118: 62-69. Rajalahti T, Arneberg R, Kroksveen AC, Berle M, Myhr KM, Kvalheim OM: Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. Anal Chem. 2009, 81 (7): 2581-90. 10.1021/ac802514y. Nawrocki EP, Kolbe DL: Eddy SR: Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009, 25 (10): 1335-1337. 10.1093/bioinformatics/btp157. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007, 73: 5261-5267. 10.1128/AEM.00062-07.