Predicting novel substrates for enzymes with minimal experimental effort with active learning

Metabolic Engineering - Tập 44 - Trang 171-181 - 2017
Dante A. Pertusi1, Matthew E. Moura1, James G. Jeffryes1,2, Siddhant Prabhu1, Bradley Walters Biggs1, Keith E.J. Tyo1
1Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States
2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, United States

Tài liệu tham khảo

Akhtar, 2013, Carboxylic acid reductase is a versatile enzyme for the conversion of fatty acids into fuels and chemical commodities, PNAS, 110, 87, 10.1073/pnas.1216516110 Alvarsson, 2014, Ligand-based target prediction with signature fingerprints, J. Chem. Inf. Model., 54, 2647, 10.1021/ci500361u Biggs, 2016, Orthogonal assays clarify the oxidative biochemistry of taxol P450 CYP725A4, ACS Chem. Biol., 11, 1445, 10.1021/acschembio.5b00968 Campodonico, 2014, Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path, Metab. Eng., 25, 140, 10.1016/j.ymben.2014.07.009 Carbonell, 2010, Molecular signatures-based prediction of enzyme promiscuity, Bioinformatics, 26, 2012, 10.1093/bioinformatics/btq317 Carbonell, 2014, XTMS: pathway design in an eXTended metabolic space, Nucleic Acids Res., 42, W389, 10.1093/nar/gku362 ChemAxon, 2013a. Molecule File Converter. ChemAxon, 2013b. Standardizer. Cho, 2010, Prediction of novel synthetic pathways for the production of desired chemicals, BMC Syst. Biol., 4, 35, 10.1186/1752-0509-4-35 Daylight Inc, 2011. SMARTS: A Language for Describing Molecular Patterns, In: Daylight Theory Manual. Daylight ChemicalInformation Systems, Inc., Laguna Niguel, CA, pp. 19–25. DePristo, 2007, The subtle benefits of being promiscuous: adaptive evolution potentiated by enzyme promiscuity, HFSP J., 1, 94, 10.2976/1.2754665 Gourley, 2001, Pteridine reductase mechanism correlates protein metabolism with drug resistance in trypanosomatid parasites, Nat. Struct. Biol., 8, 521, 10.1038/88584 Heikamp, 2013, Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening, J. Chem. Inf. Model., 53, 1595, 10.1021/ci4002712 Humble, 2011, Biocatalytic promiscuity, Eur. J. Org. Chem., 2011, 3391, 10.1002/ejoc.201001664 Irwin, 2012, ZINC: a free tool to discover chemistry for biology, J. Chem. Inf. Model., 52, 1757, 10.1021/ci3001277 Jacob, 2008, Protein-ligand interaction prediction: an improved chemogenomics approach, Bioinformatics, 24, 2149, 10.1093/bioinformatics/btn409 Kanehisa, 2014, Data, information, knowledge and principle: back to metabolism in KEGG, Nucleic Acids Res., 42, D199, 10.1093/nar/gkt1076 Kato, 1980, Substrate specificity of α-amino aicd ester hydrolase from xanthomonas citri, Agric. Biol. Chem., 44, 1075 Khersonsky, 2010, Enzyme promiscuity: a mechanistic and evolutionary perspective, Annu. Rev. Biochem., 79, 471, 10.1146/annurev-biochem-030409-143718 Kim, 2012, Inhibitory cross-talk upon introduction of a new metabolic pathway into an existing metabolic network, PNAS, 109 Kurutsch, 2009, MenD as a versatile catalyst for asymmetric synthesis, J. Mol. Catal. B Enzym., 61, 56, 10.1016/j.molcatb.2009.03.011 Lee, 2012, Systems metabolic engineering of microorganisms for natural and non-natural chemicals, Nat. Chem. Biol., 8, 536, 10.1038/nchembio.970 Linster, 2013, Metabolite damage and its repair or pre-emption, Nat. Chem. Biol., 9, 72, 10.1038/nchembio.1141 Lucas, 2015, The purchasable chemical space: a detailed picture, J. Chem. Inf. Model., 55, 915, 10.1021/acs.jcim.5b00116 Maciejewski, 2015, An experimental design strategy: weak reinforcement leads to increased hit rates and enhanced chemical diversity, J. Chem. Inf. Model., 10.1021/acs.jcim.5b00054 Mafu, 2016, Probing the promiscuity of ent-kaurene oxidases via combinatorial biosynthesis, Proc. Natl. Acad. Sci. USA, 113, 5, 10.1073/pnas.1512096113 Matykiewicz, P., Pestian, J., 2012. Effect of small sample size on text categorization with support vector machines. In: Proceedings of the 212 Workshop on Biomedical Natural Language Processing. pp. 193–201. Moura, 2016, Characterizing and predicting carboxylic acid reductase activity for diversifying bioaldehyde production, Biotechnol. Bioeng., 113, 944, 10.1002/bit.25860 Mu, 2011, Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds, Bioinformatics, 27, 1537, 10.1093/bioinformatics/btr177 Nare, 1997, The roles of pteridine reductase 1 and dihydrofolate reductase-thymidylate synthase in pteridine metabolism in the protozoan parasite leishmania major, J. Biol. Chem., 272, 13883, 10.1074/jbc.272.21.13883 O’Boyle, 2008, Pybel: a python wrapper for the openbabel cheminformatics toolkit, Chem. Cent. J., 2, 5, 10.1186/1752-153X-2-5 O’Boyle, 2011, Open Babel: an open chemical toolbox, J. Cheminf., 3, 33, 10.1186/1758-2946-3-33 Pedregosa, 2011, Scikit-learn: machine learning in python, J. Mach. Learn. Res., 12, 2825 Pertusi, 2015, Efficient searching and annotation of metabolic networks using chemical similarity, Bioinformatics, 31, 1016, 10.1093/bioinformatics/btu760 Rehdorf, 2009, Cloning, expression, characterization, and biocatalytic investigation of the 4-hydroxyacetophenone monooxygenase from Pseudomonas putida JD1, Appl. Environ. Microbiol, 75, 3106, 10.1128/AEM.02707-08 Scheer, 2011, BRENDA, the enzyme information system in 2011, Nucleic Acids Res, 39, D670, 10.1093/nar/gkq1089 Schölkopf, 2002 Schomburg, 2013, BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA, Nucl. Acids Res., 41, D764, 10.1093/nar/gks1049 Settles, 2012, Active learning, synthesis lectures on artificial intelligence and machine learning, Morgan Claypool. Sévin, 2016, Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli, Nat. Methods, 14 Smusz, 2013, The influence of the inactives subset generation on the performance of machine learning methods, J. Chemin-., 5, 17, 10.1186/1758-2946-5-17 Terfloth, 2007, Ligand-based models for the isoform specificity of cytochrome P450 3A4, 2D6, and 2C9 substrates, J. Chem. Inf. Model., 47, 1688, 10.1021/ci700010t van der Maaten, 2008, Visualizing high-dimensional data using t-SNE, J. Mach. Learn. Res., 9, 2579 Van Schaftingen, 2013, Metabolite proofreading, a neglected aspect of intermediary metabolism, J. Inherit. Metab. Dis., 36, 427, 10.1007/s10545-012-9571-1 Venkitasubramanian, 2007, Reduction of carboxylic acids by Nocardia aldehyde oxidoreductase requires a phosphopantetheinylated enzyme, J. Biol. Chem., 282, 478, 10.1074/jbc.M607980200 Venkitasubramanian, 2007, Biocatalytic reduction of carboxylic acids: mechanism and applications, 425 Venkitasubramanian, 2008, Aldehyde oxidoreductase as a biocatalyst: reductions of vanillic acid, Enzym. Microb. Technol., 42, 130, 10.1016/j.enzmictec.2007.08.009 Verdel-Aranda, 2015, Molecular annotation of ketol-acid reductoisomerases from Streptomyces reveals a novel amino acid biosynthesis interlock mediated by enzyme promiscuity, Microb. Biotechnol., 8, 239, 10.1111/1751-7915.12175 Wale, 2009, Target fishing for chemical compounds using target-ligand activity data and ranking based methods, J. Chem. Inf. Model, 49, 2190, 10.1021/ci9000376 Warmuth, 2003, Active learning with support vector machines in the drug discovery process, J. Chem. Inf. Comput. Sci., 43, 667, 10.1021/ci025620t Willett, 2006, Similarity-based virtual screening using 2D fingerprints, Drug Discov. Today, 11, 1046, 10.1016/j.drudis.2006.10.005