MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

Springer Science and Business Media LLC - Tập 7 - Trang 1-8 - 2015
James G Jeffryes1,2, Ricardo L Colastani2, Mona Elbadawi-Sidhu3, Tobias Kind3, Thomas D Niehaus4, Linda J Broadbelt1, Andrew D Hanson4, Oliver Fiehn3,5, Keith E J Tyo1, Christopher S Henry2
1Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA
3West Coast Metabolomics Center, University of California, Davis, USA
4Horticultural Sciences Department, University of Florida, Gainesville, USA
5Biochemistry Department, King Abdulaziz University, Jeddah, Saudi Arabia

Tóm tắt

In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.

Tài liệu tham khảo

Patti GJ, Yanes O, Siuzdak G (2012) Innovation: metabolomics: the apogee of the omics trilogy. Nat Rev Mol Cell Biol 13:263–269 Dromms R, Styczynski M (2012) Systematic applications of metabolomics in metabolic engineering. Metabolites 2:1090–1122 Roux A, Lison D, Junot C, Heilier J-F (2011) Applications of liquid chromatography coupled to mass spectrometry-based metabolomics in clinical chemistry and toxicology: a review. Clin Biochem 44:119–135 Guertin KA, Moore SC, Sampson JN, Huang W-Y, Xiao Q, Stolzenberg-Solomon RZ (2014) Metabolomics in nutritional epidemiology: identifying metabolites associated with diet and quantifying their potential to uncover diet-disease relations in populations. Am J Clin Nutr ajcn.113.078758 Scalbert A, Brennan L, Fiehn O, Hankemeier T, Kristal BS, van Ommen B et al (2009) Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics 5:435–458 Stein S (2012) Mass spectral reference libraries: an ever-expanding resource for chemical identification. Anal Chem 84:7274–7282 Heinonen M, Shen H, Zamboni N, Rousu J (2012) Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics 28:2333–2341 Menikarachchi LC, Cawley S, Hill DW, Hall LM, Hall L, Lai S et al (2012) MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. Anal Chem 84:9388–9394 Wang Y, Kora G, Bowen B, Pan C (2014) MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Anal Chem 86:9496–9503 Wolf S, Schmidt S, Müller-Hannemann M, Neumann S (2010) In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinform 11:148 Kind T, Liu K-H, Lee DY, DeFelice B, Meissen JK, Fiehn O (2013) LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat Methods 10:755–758 Schymanski E, Neumann S (2013) CASMI: and the winner is… Metabolites 3:412–439 Shen H, Zamboni N, Heinonen M, Rousu J (2013) Metabolite identification through machine learning—tackling CASMI challenge using FingerID. Metabolites 3:484–505 Matsuda F (2014) Rethinking mass spectrometry-based small molecule identification strategies in metabolomics. Mass Spectrom 3:S0038 Menikarachchi LC, Hill DW, Hamdalla MA, Mandoiu II, Grant DF (2013) In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. J Chem Inf Model 53:2483–2492 Nam H, Lewis NE, Lerman JA, Lee D-H, Chang RL, Kim D et al (2012) Network context and selection in the evolution to enzyme specificity. Science 337:1101–1104 Bar-Even A, Noor E, Savir Y, Liebermeister W, Davidi D, Tawfik DS et al (2011) The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50:4402–4410 Weng J-K, Philippe RN, Noel JP (2012) The rise of chemodiversity in plants. Science 336:1667–1670 Fiehn O, Barupal DK, Kind T (2011) Extending biochemical databases by metabolomic surveys. J Biol Chem 286:23637–23643 O’Brien P, Herschlag D (1999) Catalytic promiscuity and the evolution of new enzymatic activities. Chem Biol 6:R91–R105 Sánchez-Moreno I, Iturrate L, Martín-Hoyos R, Jimeno ML, Mena M, Bastida A et al (2009) From kinase to cyclase: an unusual example of catalytic promiscuity modulated by metal switching. Chem Biochem 10:225–229 Gao J, Ellis LBM, Wackett LP (2011) The University of Minnesota Pathway Prediction System: multi-level prediction and visualization. Nucleic Acids Res 39(Web Server issue):W406–W411 Moriya Y, Shigemizu D, Hattori M, Tokimatsu T, Kotera M, Goto S (2010) PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res 38(Web Server issue):W138–W143 Henry CS, Broadbelt LJ, Hatzimanikatis V (2010) Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnol Bioeng 106:462–473 Li L, Li R, Zhou J, Zuniga A, Stanislaus AE, Wu Y et al (2013) MyCompoundID: using an evidence-based metabolome library for metabolite identification. Anal Chem 85:3401–3408 Foster JM, Moreno P, Fabregat A, Hermjakob H, Steinbeck C, Apweiler R et al (2013) LipidHome: a database of theoretical lipids optimized for high throughput mass spectrometry lipidomics. PLoS One 8:1–8 Ridder L, van der Hooft JJJ, Verhoeven S, De Vos RCH, Vervoort J, Bino RJ (2014) In silico prediction and automatic LC–MS n annotation of green tea metabolites in urine. Anal Chem 140411210700006 Morreel K, Saeys Y, Dima O, Lu F, Van de Peer Y, Vanholme R et al (2014) Systematic structural characterization of metabolites in arabidopsis via candidate substrate-product pair networks. Plant Cell 26:tpc.113.122242 González-Lergier J, Broadbelt LJ, Hatzimanikatis V (2005) Theoretical considerations and computational analysis of the complexity in polyketide synthesis pathways. J Am Chem Soc 127:9930–9938 Henry CS, Jankowski MD, Broadbelt LJ, Hatzimanikatis V (2006) Genome-scale thermodynamic analysis of Escherichia coli metabolism. Biophys J 90:1453–1461 Mu F, Unkefer CJ, Unkefer PJ, Hlavacek WS (2011) Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds. Bioinformatics 27:1537–1545 De Groot MJL, Van Berlo RJP, Van Winden WA, Verheijen PJT, Reinders MJT, De Ridder D (2009) Metabolite and reaction inference based on enzyme specificities. Bioinformatics 25:2975–2982 Frelin O, Huang L, Hasnain G, Jeffryes JG, Ziemak MJ, Rocca JR et al (2015) A directed-overflow and damage-control N-glycosidase in riboflavin biosynthesis. Biochem J 466:137–145 Kumar A, Suthers PF, Maranas CD (2012) MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinform 13:6 Lang M, Stelzer M, Schomburg D (2011) BKM-react, an integrated biochemical reaction database. BMC Biochem 12:42 Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205 Jewison T, Knox C, Neveu V, Djoumbou Y, Guo AC, Lee J et al (2012) YMDB: the yeast metabolome database. Nucleic Acids Res 40(Database issue):D815–D820 Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martínez C et al (2013) EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res 41(Database issue):D605–D612 O’Boyle NM, Morley C, Hutchison GR (2008) Pybel: a python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J 2:5 Altman T, Travers M, Kothari A, Caspi R, Karp PD (2013) A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinform 14:112 Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I (2013) InChI: the worldwide chemical structure identifier standard. J Cheminform 5:7 Jayaseelan KV, Moreno P, Truszkowski A, Ertl P, Steinbeck C (2012) Natural product-likeness score revisited: an open-source, open-data implementation. BMC Bioinform 13:106 Stein SE, Babushok VI, Brown RL, Linstrom PJ (2007) Estimation of kovats retention indices using group contributions. J Chem Inf Model 47:975–980 Bolton E, Wang Y, Thiessen P, Bryant S (2008) PubChem: integrated platform of small molecules and biological activities. Annu Rep 4:217–241 Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Model 29:97–101 Fenner K, Gao J, Kramer S, Ellis L, Wackett L (2008) Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction. Bioinformatics 24:2079–2085 Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K et al (2010) MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 45:703–714