Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinIONTM sequencing

Minh Duc Cao1, Devika Ganesamoorthy1, Alysha G. Elliott1, Huihui Zhang1, Matthew A. Cooper1, Lachlan Coin1,2
1Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, QLD 4072, Brisbane, Australia
2Department of Genomics of Common Disease, Imperial College London, W12 0NN London, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Boyd SD. Diagnostic applications of high-throughput DNA sequencing. Ann Rev Pathol. 2013; 8:381–410. doi: 10.1146/annurev-pathol-020712-164026 .

Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013; 155(1):27–38. doi: 10.1016/j.cell.2013.09.006 .

Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams. ACM SIGMOD Record. 2005; 34(2):18. doi: 10.1145/1083784.1083789 .

Muthukrishnan S. Data Streams: Algorithms and Applications. Foundations Trends Theor Comput Sci. 2005; 1(2):117–236.

Kasianowicz JJ, Brandin E, Branton D, Deamer DW. Characterization of individual polynucleotide molecules using a membrane channel. Proc Nat Acad Sci. 1996; 93(24):13770–3. doi: 10.1073/pnas.93.24.13770 .

Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS, Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA. The potential and challenges of nanopore sequencing. Nat Biotechnol. 2008; 26(10):1146–53. doi: 10.1038/nbt.1495 .

Stoddart D, Heron AJ, Mikhailova E, Maglia G, Bayley H. Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore. Proc Nat Acad Sci USA. 2009; 106(19):7702–7. doi: 10.1073/pnas.0901054106 .

Quick J, Ashton P, Calus S, Chatt C, Gossain S, Hawker J, Nair S, Neal K, Nye K, Peters T, De Pinna E, Robinson E, Struthers K, Webber M, Catto A, Dallman TJ, Hawkey P, Loman NJ. Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella. Genome Biol. 2015; 16(1):114. doi: 10.1186/s13059-015-0677-2 .

Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, Stryke D, Bouquet J, Somasekar S, Linnen JM, Dodd R, Mulembakani P, Schneider BS, Muyembe-Tamfum JJ, Stramer SL, Chiu CY. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med. 2015; 7(1):99. doi: 10.1186/s13073-015-0220-9 .

Cao MD, Ganesamoorthy D, Cooper MA, Coin LJM. Realtime analysis and visualization of MinION sequencing data with npReader. Bioinformatics. 2016; 32(5):764–6. doi: 10.1093/bioinformatics/btv658 .

Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013. 1303.3997#.

Quick J, Quinlan AR, Loman NJ. A Reference Bacterial Genome Dataset Generated on the {MinION} Portable Single-molecule Nanopore Sequencer. GigaScience. 2014; 3(1):22. doi: 10.1186/2047-217x-3-22 .

Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, Wain J, O’Grady J. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol. 2015; 33(3):296–300. doi: 10.1038/nbt.3103 .

Kilianski A, Haas JL, Corriveau EJ, Liem AT, Willis KL, Kadavy DR, Rosenzweig CN, Minot SS. Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer. GigaScience. 2015;4(1). doi: 10.1186/s13742-015-0051-z .

Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods. 2015; 12(4):351–6. doi: 10.1038/nmeth.3290 .

Diancourt L, Passet V, Verhoef J, Grimont PAD, Brisse S. Multilocus Sequence Typing of Klebsiella pneumoniae Nosocomial Isolates. J Clin Microbiol. 2005; 43(8):4178–82. doi: 10.1128/JCM.43.8.4178-4182.2005 .

Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. Identification of Acquired Antimicrobial Resistance Genes. J Antimicrobial Chemother. 2012; 67(11):2640–4. doi: 10.1093/jac/dks261 .

Allison L, Wallace CS, Yee CN. When is a string like a string? In: Artificial Intelligence and Mathematics.1990. Ft. Lauderdale FL.

Poznik DG, Henn BM, Yee MC, Sliwerska E, Euskirchen GM, Lin AA, Snyder M, Quintana-Murci L, Kidd JM, Underhill PA, Bustamante CD. Sequencing {Y} Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females. Science. 2013; 341(6145):562–5. doi: 10.1126/science.1237619 .

Juul S, Izquierdo F, Hurst A, Dai X, Wright A, Kulesha E, Pettett R, Turner DJ. What’s in my pot? Real-time species identification on the MinION. bioRxiv. 2015. doi: 10.1101/030742 .

Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods. 2012; 9(8):811–4. doi: 10.1038/nmeth.2066 .

Judge K, Harris SR, Reuter S, Parkhill J, Peacock SJ. Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes. J Antimicrobial Chemother. 2015; 70(10):2775–778. doi: 10.1093/jac/dkv206 .

Dunne WM, Westblade LF, Ford B. Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. Eur J Clin Microbiol Infect Dis Off Publ Eur Soc Clin Microbiol. 2012; 31(8):1719–26. doi: 10.1007/s10096-012-1641-7 .

Fricke WF, Rasko DA. Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions. Nat Rev Genet. 2014; 15(1):49–55. doi: 10.1038/nrg3624 .

Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Nat Acad Sci USA. 1998; 95(6):3140–145. doi: 10.1073/pnas.95.6.3140 .

Cody AJ, McCarthy ND, Jansen van Rensburg M, Isinkaye T, Bentley SD, Parkhill J, Dingle KE, Bowler ICJW, Jolley KA, Maiden MCJ. Real-Time Genomic Epidemiological Evaluation of Human Campylobacter Isolates by Use of Whole-Genome Multilocus Sequence Typing. J Clin Microbiol. 2013; 51(8):2526–34. doi: 10.1128/JCM.00066-13 .

Inouye M, Dashnow H, Raven LA, Schultz MB, Pope BJ, Tomita T, Zobel J, Holt KE. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med. 2014; 6(11):90. doi: 10.1186/s13073-014-0090-6 .

Cao MD, Nguyen SH, Ganesamoorthy D, Elliott A, Cooper M, Coin LJM. Scaffolding and Completing Genome Assemblies in Real-time with Nanopore Sequencing. BioRxiv. 2016. 054783. doi: 10.1101/054783 .

David M, Dursi LJ, Yao D, Boutros PC, Simpson JT. Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data. BioRxiv. 2016. 046086. doi: 10.1101/046086 .

Boža V, Brejová B, Vinar T. DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads. 2016. 1603.09195.

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30(15):2114–120. doi: 10.1093/bioinformatics/btu170 .

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012; 19(5):455–77. doi: 10.1089/cmb.2012.0021 .

Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria. J Clin Microbiol. 2012; 50(4):1355–61. doi: 10.1128/JCM.06094-11 .

Sison CP, Glaz J. Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions. J Am Stat Assoc. 1995; 90(429):366. doi: 10.2307/2291162 .

Lassmann T, Frings O, Sonnhammer ELL. Kalign2: High-performance Multiple Alignment of Protein and Nucleotide Sequences Allowing External Features. Nucleic Acids Res. 2009; 37(3):858–65. doi: 10.1093/nar/gkn1006 .

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990; 215(3):403–10. doi: 10.1016/S0022-2836(05)80360-2 .

Gusfield D, Balasubramanian K, Naor D. Parametric Optimization of Sequence Alignment. Algorithmica. 1994; 12(4):312–26. doi: 10.1007/bf01185430 .

Frith M, Hamada M, Horton P. Parameters for Accurate Genome Alignment. BMC Bioinformatics. 2010; 11(1):80. doi: 10.1186/1471-2105-11-80 .

Cao MD, Dix TI, Allison L. A genome alignment algorithm based on compression. BMC Bioinformatics. 2010; 11(1):599. doi: 10.1186/1471-2105-11-599 .

Allison L, Wallace CS, Yee CN. Finite-state models in the alignment of macromolecules. J Mol Evol. 1992; 35(1):77–89. doi: 10.1007/BF00160262 .

Solomonoff R. A Formal Theory of Inductive Inference. Inform Control. 1964; 7(2):1–22224254.

Wallace CS, Boulton DM. An Information Measure for Classification. Comput J. 1968; 11(2):185–94.

Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Nat Acad Sci. 1992; 89(22):10915–9.

Cao MD, Dix TI, Allison L, Mears C. A simple statistical algorithm for biological sequence compression. In: Data Compression Conference. Utah: IEEE: 2007. p. 43–52, doi: 10.1109/DCC.2007.7 .

Cao MD, Dix TI, Allison L. A biological compression model and its applications In: Arabnia HRR, Tran Q-N, editors. Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology. New York: Springer: 2011. p. 657–66, doi: 10.1007/978-1-4419-7046-6_67 .

Cao MD, Dix TI, Allison L. Computing substitution matrices for genomic comparative analysis In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B, editors. Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science. Berlin Heidelberg: Springer: 2009. p. 647–55, doi: 10.1007/978-3-642-01307-2_64 .

Cao MD. Java package for sequence analysis. 2015. https://github.com/mdcao/japsa .

Cao MD, Ganesamoorthy D, Elliott A, Zhang H, Cooper M, Coin L. Support data for “Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION sequencing”. GigaScience Database. 2016. doi: 10.5524/100206 .

Elliott AG, Ganesamoorthy D, Coin L, Cooper MA, Cao MD. Complete genome sequence of klebsiella quasipneumoniae subsp. similipneumoniae Strain ATCC 700603. Genome Announcements. 2016; 4(3):00438–16. doi: 10.1128/genomeA.00438-16 .