ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs

Microbiome - 2021
Tatiana Dvorkina1, Anton Bankevich2, Alexeï Sorokin3, Fan Yang4,5, Boahemaa Adu-Oppong6,4, Ryan J. Williams5, Keith H. Turner5, Pavel A. Pevzner2
1Center for Algorithmic Biotechnology [Saint Petersburg] (7-9, Universitetskaya nab., St. Petersburg, 199034 - Russia)
2UC San Diego - University of California [San Diego] (UCSD, 9500 Gilman Dr., La Jolla, CA 92093 USA - United States)
3MICALIS - MICrobiologie de l'ALImentation au Service de la Santé (Domaine de Vilvert 78352 JOUY-EN-JOSAS CEDEX - France)
4Bayer Cropscience (France)
5Ascus Biosciences (San Diego - United States)
6Thermo Fisher Scientific Inc. (168 Third Avenue, Waltham, MA USA 02451 - United States)

Tóm tắt

Abstract Background Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics. Methods Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG. Results We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes. Conclusions We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes “hidden” in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes.

Từ khóa


Tài liệu tham khảo

Afshinnekoo E, Meydan C, Chowdhury S, Jaroudi D, Boyer C, Bernstein N, et al. Geospatial resolution of human and bacterial diversity with city-scale metagenomics. Cell Syst. 2015;1(1):72–87. https://doi.org/10.1016/j.cels.2015.01.001.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.

Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A, Pevzner PA. plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics. 2016;32(22):3380–7. https://doi.org/10.1093/bioinformatics/btw493.

Antipov D, Raiko M, Lapidus A, Pevzner PA. Plasmid detection and assembly in genomic and metagenomic data sets. Genome Res. 2019;29(6):961–8. https://doi.org/10.1101/gr.241299.118.

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single cell sequencing. J Comput Biol. 2012;19(5):455–77. https://doi.org/10.1089/cmb.2012.0021.

Beron CM, Curatti L, Salerno GL. New strategy for identification of novel cry-type genes from Bacillus thuringiensis strains. Appl Environ Microbiol. 2005;71(2):761–5. https://doi.org/10.1128/AEM.71.2.761-765.2005.

Besemer J, Borodovsky M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 2005;33(suppl. 2):W451–4. https://doi.org/10.1093/nar/gki487.

Bolotin A, Gillis A, Sanchis V, Nielsen-LeRoux C, Mahillon J, Lereclus D, et al. Comparative genomics of extrachromosomal elements in Bacillus thuringiensis subsp. israelensis. Res Microbiol. 2017;168(4):331–44. https://doi.org/10.1016/j.resmic.2016.10.008.

Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12(1):59–60. https://doi.org/10.1038/nmeth.3176.

Carozzi NB, Kramer VC, Warren GW, Evola S, Koziel MG. Prediction of insecticidal activity of Bacillus thuringiensis strains by polymerase chain reaction product profiles. Appl Environ Microbiol. 1991;57(11):3057–61. https://doi.org/10.1128/AEM.57.11.3057-3061.1991.

Chai G, Yu M, Jiang L, Duan Y, Huang J. HMMCAS: a web tool for the identification and domain annotations of Cas proteins. IEEE/ACM Trans Comput Biol Bioinformatics. 2018;16:1313–5.

Chari R, Mali P, Moosburner M, Church GM. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods. 2015;12:823–6.

Chelliah R, Wei S, Park B-J, Park J-H, Park Y-S, Kim S-H, et al. New perspectives on Mega plasmid sequence (poh1) in Bacillus thuringiensis ATCC 10792 harbouring antimicrobial, insecticidal and antibiotic resistance genes. Microb Pathog. 2019;126:14–8. https://doi.org/10.1016/j.micpath.2018.10.013.

Coil D, Jospin G, Darling AE. A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data. Bioinformatics. 2015;31(4):587–9. https://doi.org/10.1093/bioinformatics/btu661.

Crickmore N, Berry C, Panneerselvam S, Mishra R, Connor TR, Bonning BC. Bacterial Pesticidal Protein Res Cent. 2020. https://www.bpprc.org.

Crickmore N, Berry C, Panneerselvam S,  Mishra R, Connor TR, Bonning BC. A structure-based nomenclature for Bacillus thuringiensis and other bacteria-derived pesticidal proteins. J Invertebr Pathol. 2020;107438.

Compeau PEC, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011;29(11):987–91. https://doi.org/10.1038/nbt.2023.

Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms, Third Edition (3rd. ed.). Cambridge: The MIT Press; 2009.

Daas MS, Rosana ARR, Acedo JZ, Douzane M, Nateche F, Kebbouche-Gana S, et al. Insights into the draft genome sequence of bioactives-producing Bacillus thuringiensis DNG9 isolated from Algerian soil-oil slough. Stand Genomic Sci. 2018;13(1):25. https://doi.org/10.1186/s40793-018-0331-1.

Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23(6):673–9. https://doi.org/10.1093/bioinformatics/btm009.

Dvorkina T, Antipov D, Korobeynikov A, Nurk S. SPAligner: alignment of long diverged molecular sequences to assembly graphs. BMC Bioinformatics. 2020;21(S12):306 https://doi.org/10.1186/s12859-020-03590-7.

Eddy SR. Profile Hidden Markov Models. Bioinformatics. 1998;14(9):755–63. https://doi.org/10.1093/bioinformatics/14.9.755.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2014;32 (5):1792–7. https://doi.org/10.1093/nar/gkh340.

Gassmann AJ, Petzold-Maxwell JL, Clifton EH, Dunbar MW, Hoffmann AM, Ingber DA, et al. Field-evolved resistance by western corn rootworm to multiple Bacillus thuringiensis toxins in transgenic maize. Proc Natl Acad Sci. 2014;111(14):5141–6. https://doi.org/10.1073/pnas.1317179111.

Gelfand MS, Mironov AA, Pevzner PA. Gene recognition via spliced sequence alignment. Proc Natl Acad Sci. 1996;93(17):9061–6. https://doi.org/10.1073/pnas.93.17.9061.

Gillis A, Fayad N, Makart L, Bolotin A, Sorokin A, Kallassy M, et al. Role of plasmid plasticity and mobile genetic elements in the entomopathogen Bacillus thuringiensis serovar israelensis. FEMS Microbiol Rev. 2018;42(6):829–56. https://doi.org/10.1093/femsre/fuy034.

Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–5. https://doi.org/10.1093/bioinformatics/btt086.

Guigó R, Agarwal P, Abril JF, Burset M, Fickett JW. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 2000;10(10):1631–42. https://doi.org/10.1101/gr.122800.

Hernández-Rodríguez CS, Boets A, Van Rie J, Ferré J. Screening and identification of vip genes in Bacillus thuringiensis strains. J Appl Microbiol. 2009;107(1):219–25. https://doi.org/10.1111/j.1365-2672.2009.04199.x.

Höfte H, Whiteley HR. Insecticidal crystal proteins of Bacillus thuringiensis. Microbiol Rev. 1989;53(2):242–55. https://doi.org/10.1128/mr.53.2.242-255.1989.

Huson DH, Tappu R, Bazinet AL, Xie C, Cummings MP, Nieselt K, et al. Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads. Microbiome. 2017). https://doi.org/10.1186/s40168-017-0233-2;5(1):11.

Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11(1):119. https://doi.org/10.1186/1471-2105-11-119.

Hyatt D, LoCascio PF, Hauser LJ, m Edward C Uberbacher, E.C. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics. 2012;28(17):2223–30. https://doi.org/10.1093/bioinformatics/bts429.

Jeong H, Choi SK, Park SH. Genome sequences of Bacillus thuringiensis Serovar kurstaki strain BP865 and B. thuringiensis Serovar aizawai Strain HD-133. Genome Announcements. 2017;5:e01544–16.

Juárez-Pérez VM, Ferrandis MD, Frutos R. PCR-based approach for detection of novel Bacillus thuringiensis Cry genes. ApplEnviron Microbiol. 1997;63(8):2997–3002. https://doi.org/10.1128/aem.63.8.2997-3002.1997.

Kaoutari AE, Armougom F, Gordon JI, Raoult D, Henrissat B. The abundance and variety of carbohydrate-active enzymes in the human gut microbiota. Nate Rev Microbiol. 2013;11(7):497–504. https://doi.org/10.1038/nrmicro3050.

Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 2012;40(1):e9. https://doi.org/10.1093/nar/gkr1067.

Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-nodesolution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–6. https://doi.org/10.1093/bioinformatics/btv033.

Lin Y, Fang G, Peng K. Characterization of the highly variable cry gene regions of Bacillus thuringiensis strain ly4a3 by PCR-SSCP profiling and sequencing. Biotechnol Lett. 2007;29(2):247–51. https://doi.org/10.1007/s10529-006-9224-2.

Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, et al. An updated evolutionary classification of CRISPR–Cas systems. Nat Rev Microbiol. 2015;13(11):722–36. https://doi.org/10.1038/nrmicro3569.

Masri L, Branca A, Sheppard AE, Papkou A, Laehnemann D, Guenther PS, et al. Host–pathogen coevolution: the selective advantage of Bacillus thuringiensis virulence and its Cry toxin genes. PLoS Biol. 2015;13(6):e1002169. https://doi.org/10.1371/journal.pbio.1002169.

Meleshko D, Mohimani H, Tracanna V, Hajirasouliha I, Medema MH, Korobeynikov A, et al. BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs. Genome Res. 2019;29(8):1352–62. https://doi.org/10.1101/gr.243477.118.

Méric G, Mageiros L, Pascoe B, Woodcock DJ, Mourkas E, Lamble S, et al. Lineage-specific plasmid acquisition and the evolution of specialized pathogens in Bacillus thuringiensis and the Bacillus cereus group. Mol Ecol. 2018;27(7):1524–40. https://doi.org/10.1111/mec.14546.

Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2016;32(7):1088–90. https://doi.org/10.1093/bioinformatics/btv697.

Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021;49(D1):D412–9.

Nielsen P, Krogh A. Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics. 2005;21(24):4322–9. https://doi.org/10.1093/bioinformatics/bti701.

Nguyen SN, Flores A, Talamantes D, Dar F, Valdez A, Schwans J, Berlemont R. GeneHunt for rapid domain-specific annotation of glycoside hydrolases. Sci Rep. 2019;9(1). https://doi.org/10.1038/s41598-019-46290-w.

Noguera PA, Ibarra JE. Detection of new cry genes of Bacillus thuringiensis by use of a novel PCR primer system. Appl Environ Microbiol. 2010;76(18):6150–5. https://doi.org/10.1128/AEM.00797-10.

Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27(5):824–34. https://doi.org/10.1101/gr.213959.116.

Ohba M, Mizuki E, Uemori A. Parasporin, a new anticancer protein group from Bacillus thuringiensis. Anticancer Res. 2009;29(1):427–33.

Palma L, Muñoz D, Berry C, Murillo J, de Escudero I, Caballero P. Molecular and insecticidal characterization of a novel Cry-related protein from Bacillus Thuringiensis toxic against Myzus persicae. Toxins. 2014;6(11):3144–56. https://doi.org/10.3390/toxins6113144.

Palma L, Muñoz D, Berry C, Murillo J, Caballero P. Bacillus thuringiensis toxins: an overview of their biocidal activity. Toxins. 2014;6(12):3296–325. https://doi.org/10.3390/toxins6123296.

Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol. 2017;2(11):1533–42. https://doi.org/10.1038/s41564-017-0012-7.

Prjibelski AD, Vasilinetc I, Bankevich A, Gurevich A, Krivosheeva T, Nurk S, et al. ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics. 2014;30(12):i293–301. https://doi.org/10.1093/bioinformatics/btu266.

Price MN, Dehal PS, Arkin AP, Poon AFY. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE. 2010;5(3):e9490. https://doi.org/10.1371/journal.pone.0009490.

Romeis J, Naranjo SE, Meissle M, Shelton AM. Genetically engineered crops help support conservation biological control. Biol Control. 2019;130:136–54. https://doi.org/10.1016/j.biocontrol.2018.10.001.

Sajid M, Geng C, Li M, Wang Y, Liu H, Zheng J, et al. Whole-genome analysis of Bacillus thuringiensis revealing partial genes as a source of novel Cry toxins. Appl Environ Microbiol. 2018;84:e00277–18.

Sanahuja G, Banakar R, Twyman RM, Capell T, Christou P. Bacillus thuringiensis: a century of research, development and commercial applications: a century of Bacillus thuringiensis. Plant Biotechnol J. 2011;9(3):283–300. https://doi.org/10.1111/j.1467-7652.2011.00595.x.

Schnepf E, Crickmore N, Van Rie J, Lereclus D, Baum J, Feitelson J, et al. Bacillus thuringiensis and its pesticidal crystal proteins. Microbiol Mol Biol Rev. 1998;62(3):775–806. https://doi.org/10.1128/MMBR.62.3.775-806.1998.

Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9. https://doi.org/10.1093/bioinformatics/btu153.

Shikov AE, Malovichko YV, Skitchenko RK, Nizhnikov AA, Antonets KS. No more tears: mining sequencing data for novel Bt Cry toxins with CryProcessor. Toxins. 2020;12(3):204. https://doi.org/10.3390/toxins12030204.

Shlemov A, Korobeynikov A. PathRacer: racing profile HMM paths on assembly graph. Lecture Notes Comput Sci. 2019;11488:80–94. https://doi.org/10.1007/978-3-030-18174-1_6.

Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, et al. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nat Methods. 2017;14(11):1063–71. https://doi.org/10.1038/nmeth.4458.

Wang Q, Fish JA, Gilman M, et al. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. Microbiome. 2015;3:32. https://doi.org/10.1186/s40168-015-0093-6.

Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–2. https://doi.org/10.1093/bioinformatics/btv383.

Wu YW, Rho M, Doak TG, Ye Y. Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics. Bioinformatics. 2012;28(18):i363–9.

Ye W, Zhu L, Liu Y, Crickmore N, Peng D, Ruan L, et al. Mining new crystal protein genes from Bacillus thuringiensis on the basis of mixed plasmid-enriched genome sequencing and a computational pipeline. Appl Environ Microbiol. 2012;78(14):4795–801. https://doi.org/10.1128/AEM.00340-12.

Ye Y, Jaroszewski L, Li W, Godzik A. A segment alignment approach to protein comparison. Bioinformatics. 2003;19(6):742–9. https://doi.org/10.1093/bioinformatics/btg073.

Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9. https://doi.org/10.1101/gr.074492.107.

Zheng J, Yu Y, Ye W, Peng D, Sun M. BtToxin_Digger: a comprehensive and high-throughput pipeline for mining toxin protein genes from Bacillus thuringiensis. bioRxiv. 2020; https://doi.org/10.1101/2020.05.26.114520.

Zhong C, Yang Y, Yooseph S. GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data. BMC Bioinformatics. 2019). https://doi.org/10.1186/s12859-019-2818-1;20(S11):276.

Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 2010;8(12):e132.