CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Rachid Ounit1, Steve Wanamaker2, Timothy J. Close2, Stefano Lonardi1
1Department of Computer Science & Engineering, University of California, 900 University Avenue, CA, 92521, Riverside, USA
2Department of Plant & Botanic Sciences, University of California, 900 University Avenue, CA, 92521, Riverside, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al.Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004; 304(5667):66–74.

Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al.Structure, function and diversity of the healthy human microbiome. Nature. 2012; 486(7402):207–14.

The Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012; 486(7402):215–21.

Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007; 17(3):377–86.

Brady A, Salzberg S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat Methods. 2011; 8(5):367.

Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011; 12(Suppl 2):4.

Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods. 2012; 9(8):811–4.

Rosen GL, Reichenberger ER, Rosenfeld AM. NBC: the naive bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2011; 27(1):127–9.

Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, et al.Taxonomic metagenome sequence assignment with structured output models. Nat Methods. 2011; 8(3):191–2.

Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics. 2013; 29(18):2253–60.

Wood D, Salzberg S. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014; 15(3):46.

Bazinet AL, Cummings MP. A comparative evaluation of sequence classification programs. BMC Bioinf. 2012; 13(1):92.

Koslicki D, Foucart S, Rosen G. WGSQuikr: Fast whole-genome shotgun metagenomic classification. PloS one. 2014; 9(3):91784.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990; 215(3):403–10.

Kent WJ. BLAT: the BLAST-like alignment tool. Genome Res. 2002; 12(4):656–64.

International Barley Genome Sequencing Consortium. A physical, genetic and functional sequence assembly of the barley genome. Nature. 2012; 491(7426):711–6.

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al.Genbank. Nucleic Acids Res. 2012:1195.

Vinga S, Almeida J. Alignment-free sequence comparison: a review. Bioinformatics. 2003; 19(4):513–23.

Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, et al.Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007; 4(6):495–500.

Magoc T, Pabinger S, Canzar S, Liu X, Su Q, Puiu D, et al.GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics. 2013; 29(14):1718–25.

Said HS, Suda W, Nakagome S, Chinen H, Oshima K, Kim S, et al.Dysbiosis of salivary microbiota in inflammatory bowel disease and its association with oral immunological biomarkers. DNA Res. 2013:037.

Antonio MA, Hawes SE, Hillier SL. The identification of vaginal lactobacillus species and the demographic and microbiologic characteristics of women colonized by these species. J Infectious Diseases. 1999; 180(6):1950–6.

Hyman RW, Fukushima M, Diamond L, Kumm J, Giudice LC, Davis RW. Microbes on the human vaginal epithelium. Proc Nat Acad Sci. 2005; 102(22):7952–7.

Doležel J, Vrána J, Šafář J, Bartoš J, Kubaláková M, Šimková H. Chromosomes in the flow to simplify genome analysis. Funct Integr Genomics. 2012; 12(3):397–416.

Lonardi S, Duma D, Alpert M, Cordero F, Beccuti M, Bhat PR, et al.Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput Biol. 2013; 9(4):1003010.

Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al.SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012; 1(1):18.

Close TJ, Wanamaker S, Roose ML, Lyon M. HarvEST. Methods Mol Biol. 2006; 406:161– 77.

Close TJ, Bhat PR, Lonardi S, Wu Y, Rostoks N, Ramsay L, et al.Development and implementation of high-throughput SNP genotyping in barley. BMC Genomics. 2009; 10(1):582.

Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, et al.Anchoring and ordering NGS contig assemblies by population sequencing (Popseq). Plant J. 2013; 76(4):718–27. doi:10.1111/tpj.12319.

Tu Q, He Z, Zhou J. Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res. 2014; 42(8):67.

Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000; 7(1-2):203–14.