A comprehensive comparison of molecular feature representations for use in predictive modeling

Computers in Biology and Medicine - Tập 130 - Trang 104197 - 2021
Tomaž Stepišnik1,2, Blaž Škrlj1,2, Jörg Wicker3, Dragi Kocev1
1Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
2Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
3The University of Auckland, School of Computer Science, Auckland, New Zealand

Tài liệu tham khảo

1984 2018, Semi-supervised trees for multi-target regression, Inf. Sci., 450, 109, 10.1016/j.ins.2018.03.033 Altae-Tran, 2017, Low data drug discovery with one-shot learning, ACS Cent. Sci., 3, 283, 10.1021/acscentsci.6b00367 Bento, 2014, The chembl bioactivity database: an update, Nucleic Acids Res., 42, D1083, 10.1093/nar/gkt1031 Bleiziffer, 2018, Machine learning of partial charges derived from high-quality quantum-mechanical calculations, J. Chem. Inf. Model., 58, 579, 10.1021/acs.jcim.7b00663 Coley, 2017, Convolutional embedding of attributed molecular graphs for physical property prediction, J. Chem. Inf. Model., 57, 1757, 10.1021/acs.jcim.6b00601 Delaney, 2004, Esol: estimating aqueous solubility directly from molecular structure, J. Chem. Inf. Comput. Sci., 44, 1000, 10.1021/ci034243x Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1 Duan, 2010, Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods, J. Mol. Graph. Model., 29, 157, 10.1016/j.jmgm.2010.05.008 Duvenaud, 2015, Convolutional networks on graphs for learning molecular fingerprints, vol. 28, 2224 Editorial, 2019, Dark chemistry: ultra-large libraries of virtual molecules help researchers to explore the chemical universe and point to potential drugs, Nature, 566, 7 Ellis, 2006, The university of Minnesota biocatalysis/biodegradation database: the first decade, Nucleic Acids Res., 34, D517, 10.1093/nar/gkj076 Gindulyte, 2018, PubChem 2019 update: improved access to chemical data, Nucleic Acids Res., 47, D1102 Gladysz, 2018, Spectrophores as one-dimensional descriptors calculated from three-dimensional atomic properties: applications ranging from scaffold hopping to multi-target virtual screening, J. Cheminf., 10, 9, 10.1186/s13321-018-0268-9 Hansen, 2009, Benchmark data set for in silico prediction of ames mutagenicity, J. Chem. Inf. Model., 49, 2077, 10.1021/ci900161g Hu, 2012, Performance evaluation of 2d fingerprint and 3d shape similarity methods in virtual screening, J. Chem. Inf. Model., 52, 1103, 10.1021/ci300030u Hunt, 2018, Whichp450: a multi-class categorical model to predict the major metabolising cyp450 isoform for a compound, J. Comput. Aided Mol. Des., 32, 537, 10.1007/s10822-018-0107-0 Jaeger, 2018, Mol2vec: unsupervised machine learning approach with chemical intuition, J. Chem. Inf. Model., 58, 27, 10.1021/acs.jcim.7b00616 Kearnes, 2016, Molecular graph convolutions: moving beyond fingerprints, J. Comput. Aided Mol. Des., 30, 595, 10.1007/s10822-016-9938-8 Kocev, 2013, Tree ensembles for predicting structured outputs, Pattern Recogn., 46, 817, 10.1016/j.patcog.2012.09.023 Kong, 2018, Identification of novel aurora kinase a (aurka) inhibitors via hierarchical ligand-based virtual screening, J. Chem. Inf. Model., 58, 36, 10.1021/acs.jcim.7b00300 Kuhn, 2016, The sider database of drugs and side effects, Nucleic Acids Res., 44, D1075, 10.1093/nar/gkv1075 Lang Lapins, 2018, A confidence predictor for logd using conformal regression and a support-vector machine, J. Cheminf., 10, 17, 10.1186/s13321-018-0271-1 Levatić, 2017, Semi-supervised classification trees, J. Intell. Inf. Syst., 49, 461, 10.1007/s10844-017-0457-4 Lo, 2018, Machine learning in chemoinformatics and drug discovery, Drug Discov. Today, 23, 1538, 10.1016/j.drudis.2018.05.010 Lusci, 2013, Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules, J. Chem. Inf. Model., 53, 1563, 10.1021/ci400187y Martins, 2012, A bayesian approach to in silico blood-brain barrier penetration modeling, J. Chem. Inf. Model., 52, 1686, 10.1021/ci300124c Mobley, 2018 O'Boyle, 2016, Comparing structural fingerprints using a literature-based similarity benchmark, J. Cheminf., 8, 36, 10.1186/s13321-016-0148-0 Petković, 2019, Feature ranking for multi-target regression, Mach. Learning J. Online First, 1 Qureshi, 2018 Riniker, 2013, Open-source platform to benchmark fingerprints for ligand-based virtual screening, J. Cheminf., 5, 26, 10.1186/1758-2946-5-26 Rogers, 2010, Extended-connectivity fingerprints, J. Chem. Inf. Model., 50, 742, 10.1021/ci100050t Rohrer, 2009, Maximum unbiased validation (muv) data sets for virtual screening based on pubchem bioactivity data, J. Chem. Inf. Model., 49, 169, 10.1021/ci8002649 Sterling, 2015, Zinc 15 âĂŞ ligand discovery for everyone, J. Chem. Inf. Model., 55, 2324, 10.1021/acs.jcim.5b00559 Subramanian, 2016, Computational modeling of beta-secretase 1 (bace-1) inhibitors using ligand based approaches, J. Chem. Inf. Model., 56, 1936, 10.1021/acs.jcim.6b00290 Todeschini, 2009, vol. 41 Walters, 2019, Virtual chemical libraries, J. Med. Chem., 62, 1116, 10.1021/acs.jmedchem.8b01048 Wei, 2011, Padel-descriptor: an open source software to calculate molecular descriptors and fingerprints, J. Comput. Chem., 32, 1466, 10.1002/jcc.21707 Wicker, J., Fenner, K., Ellis, L., Wackett, L., Kramer, S., . Predicting Biodegradation Products and Pathways: a Hybrid Knowledge- and Machine Learning-Based Approach . Winter, 2019, Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations, Chem. Sci., 10, 1692, 10.1039/C8SC04175J Wu, 2017 Zhang, 2018, In-silico guided discovery of novel ccr9 antagonists, J. Comput. Aided Mol. Des., 32, 573, 10.1007/s10822-018-0113-2 Zoffmann, 2019, Machine learning-powered antibiotics phenotypic drug discovery, Sci. Rep., 9, 5013, 10.1038/s41598-019-39387-9