Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy

Applied and Environmental Microbiology - Tập 73 Số 16 - Trang 5261-5267 - 2007
Qiong Wang1,2, George M Garrity3,4, James M. Tiedje3,4, James R. Cole1,2
1Center for Microbial Ecology 1 and Department of Microbiology and Molecular Genetics,
2Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824.
3Center for Microbial Ecology
4Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824

Tóm tắt

ABSTRACT The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (≥95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/ .

Từ khóa


Tài liệu tham khảo

10.1111/j.1472-765X.1991.tb00608.x

10.1101/gr.7.10.986

10.1093/nar/28.1.15

RNA modeling using stochastic context-free grammars. 1999

10.1093/oxfordjournals.molbev.a026231

10.1186/1471-2105-3-2

Christensen, H. B. 1992. Introduction to statistics: a calculus-based approach, 1st ed., p. 510-512. Harcourt Brace Jovanovich, Inc., Orlando, FL.

10.1093/nar/gki038

10.1016/j.ijfoodmicro.2004.08.016

10.1093/bioinformatics/btg200

Domingos, P., and M. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning29:103-130.

Bergey's manual of systematic bacteriology 2001

Bergey's manual of systematic bacteriology 2004

10.1099/ijs.0.63300-0

Karavaĭko, G. I., T. P. Turova, I. A. Tsaplina, and T. I. Bogdanova. 2000. The phylogenetic position of aerobic, moderately thermophilic bacteria of the Sulfobacillus species, oxidizing Fe2+, S0 and sulfide minerals. Mikrobiologiia69:857-860.

Li, Y. H., and A. K. Jain. 1998. Classification of text documents. Comput. J.41:537-546.

10.1128/AEM.71.12.8228-8235.2005

10.1093/nar/22.17.3485

10.1128/AEM.68.8.3673-3682.2002

10.1093/nar/21.13.3025

10.1101/gr.186401

10.1128/AEM.67.9.4374-4376.2001

10.1073/pnas.0605127103

Stackebrandt, E., W. Frederiksen, G. M. Garrity, P. A. D. Grimont, P. Kämpfer, M. C. J. Maiden, X. Nesme, R. Rosselló-Mora, J. Swings, H. G. Trüper, L. Vauterin, A. C. Ward, and W. B. Whitman. 2002. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int. J. Syst. Evol. Microbiol.52:1043-1047.

Turova, T. P., A. B. Poltoraus, I. A. Lebedeva, E. S. Bulygina, I. A. Tsaplina, T. I. Bogdanova, and G. I. Karavaiko. 1995. Determination of the phylogenetic position of Sulfobacillus thermosulfidooxidans on the basis of analysis of the 5S and 16S ribosomal RNA. Mikrobiologiia64:366-374.

10.1093/nar/28.1.10

10.1099/00207713-42-2-263

10.1073/pnas.87.12.4576