UCHIME improves sensitivity and speed of chimera detection

Bioinformatics - Tập 27 Số 16 - Trang 2194-2200 - 2011
Robert C. Edgar1, Brian J. Haas1, José C. Clemente1, Christopher Quince1, Rob Knight1
11 Tiburon, CA, USA, 2Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142, 3Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA and 4School of Engineering, University of Glasgow, Glasgow G12 8LT, UK

Tóm tắt

Abstract Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments. Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer. Contact:  [email protected] Availability: Source, binaries and data: http://drive5.com/uchime. Supplementary information:  Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Acinas, 2005, PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample, Appl. Environ. Microbiol., 71, 8966, 10.1128/AEM.71.12.8966-8969.2005

Altschul, 1989, Trees, stars and multiple biological sequence alignment, SIAM J. Appl. Math., 49, 197, 10.1137/0149012

Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389

Ashelford, 2005, At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies, Appl. Environ. Microbiol., 71, 7724, 10.1128/AEM.71.12.7724-7736.2005

Ashelford, 2006, New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras, Appl. Environ. Microbiol., 72, 5734, 10.1128/AEM.00556-06

Durbin, 1998, Biological Sequence Analysis, 10.1017/CBO9780511790492

Edgar, 2010, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, 26, 2460, 10.1093/bioinformatics/btq461

Haas, 2011, Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons, Genome Res., 21, 494, 10.1101/gr.112730.110

Huber, 2004, Bellerophon: a program to detect chimeric sequences in multiple sequence alignments, Bioinformatics, 20, 2317, 10.1093/bioinformatics/bth226

Katoh, 2008, Recent developments in the MAFFT multiple sequence alignment program, Brief. Bioinformatics, 9, 286, 10.1093/bib/bbn013

Lahr, 2009, Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase, Biotechniques, 47, 857, 10.2144/000113219

Maidak, 1999, A new version of the RDP (Ribosomal Database Project), Nucleic Acids Res., 27, 171, 10.1093/nar/27.1.171

Mason, 2002, Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation, Q. J. Meteorol. Soc., 128, 2145, 10.1256/003590002320603584

Nilsson, 2010, An open source chimera checker for the fungal ITS region, Mol. Ecol. Res., 10, 1076, 10.1111/j.1755-0998.2010.02850.x

Quince, 2011, Removing noise from pyrosequenced amplicons, BMC Bioinformatics, 12, 38, 10.1186/1471-2105-12-38

Stackebrandt, 1994, A place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in Bacteriology, Int. J. Syst. Bacteriol., 44, 846, 10.1099/00207713-44-4-846

Thompson, 2002, Heteroduplexes in mixed-template amplifications: formation, consequence and elimination by ‘reconditioning PCR’, Nucleic Acids Res., 30, 2083, 10.1093/nar/30.9.2083

Wang, 1996, The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species, Microbiology, 142, 1107, 10.1099/13500872-142-5-1107

Wang, 1997, Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes, Appl. Environ. Microbiol., 63, 4645, 10.1128/aem.63.12.4645-4650.1997