Transformer-CNN: Swiss knife for QSAR modeling and interpretation
Tóm tắt
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on
Từ khóa
Tài liệu tham khảo
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. arXiv:1706.03762
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. arXiv e-prints. arXiv:1509.01626
Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554. https://doi.org/10.1007/s10822-011-9440-2
Mauri A, Consonni V, Pavan M, Todeschini R (2006) Dragon software: an easy approach to molecular descriptor calculations. Match 56:237–248
Baskin I, Varnek A (2008) Fragment descriptors in SAR/QSAR/QSPR studies, molecular similarity analysis and in virtual screening. Chemoinformatics approaches to virtual screening. Royal Society of Chemistry, Cambridge, pp 1–43
Eklund M, Norinder U, Boyer S, Carlsson L (2014) Choosing feature selection and learning algorithms in QSAR. J Chem Inf Model 54:837–843. https://doi.org/10.1021/ci400573c
Baskin II, Winkler D, Tetko IV (2016) A renaissance of neural networks in drug discovery. Expert Opin Drug Discov 11:785–795. https://doi.org/10.1080/17460441.2016.1201262
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, et al (2015) Convolutional networks on graphs for learning molecular fingerprints. arXiv e-prints. arXiv:1509.09292
Coley CW, Barzilay R, Green WH et al (2017) Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model 57:1757–1772. https://doi.org/10.1021/acs.jcim.6b00601
Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci 4:268–276. https://doi.org/10.1021/acscentsci.7b00572
Kimber TB, Engelke S, Tetko IV, et al (2018) Synergy effect between convolutional neural networks and the multiplicity of smiles for improvement of molecular prediction. arXiv e-prints. arXiv:1812.04439
Gilmer J, Schoenholz SS, Riley PF, et al (2017) Neural message passing for quantum chemistry. Proceedings of the 34 th International conference on machine learning, Sydney, Australia, PMLR 70. arXiv:1704.01212
Shang C, Liu Q, Chen K-S, et al (2018) Edge attention-based multi-relational graph convolutional networks. arXiv e-prints. arXiv:1802.04944
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005
O’Boyle NM, Banck M, James CA et al (2011) Open babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33
Vidal D, Thormann M, Pons M (2005) LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities. J Chem Inf Model 45:386–393. https://doi.org/10.1021/ci0496797
Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv e-prints. arXiv:1502.01710
Goh GB, Hodas NO, Siegel C, Vishnu A (2017) SMILES2Vec: an interpretable general-purpose deep neural network for predicting chemical properties. arXiv e-prints. arXiv:1712.02034
Jastrzębski S, Leśniak D, Czarnecki WM (2016) Learning to SMILE(S). arXiv e-prints. arXiv:1602.06289
Goh GB, Siegel C, Vishnu A, Hodas NO (2017) Using rule-based labels for weak supervised learning: a chemnet for transferable chemical property prediction. arXiv e-prints. arXiv:1712.02734
Zheng S, Yan X, Yang Y, Xu J (2019) Identifying structure-property relationships through SMILES syntax analysis with self-attention mechanism. J Chem Inf Model 59:914–923. https://doi.org/10.1021/acs.jcim.8b00803
Tetko IV, Karpov P, Bruno E, Kimber TB, Godin G. Augmentation Is What You Need! In: Tetko IV, Karpov P, Kurkova V (ed) 28th International Conference on Artificial Neural Networks Munich, Germany, 2019 Sep 17, Proceedings, Part V, Workshop and Special sessions, Springer, Cham, pp 831–835
Kiela D, Bottou L (2014) Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP). pp 36–45
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507. https://doi.org/10.1126/science.1127647
Heller S, McNaught A, Stein S et al (2013) InChI - the worldwide chemical structure identifier standard. J Cheminform 5:7. https://doi.org/10.1186/1758-2946-5-7
Winter R, Montanari F, Noé F, Clevert D-A (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692–1701. https://doi.org/10.1039/c8sc04175j
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Schwaller P et al (2019) Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5:1572–1583. https://doi.org/10.1021/acscentsci.9b00576
Karpov P, Godin G, Tetko IV. A transformer model for retrosynthesis. In: Tetko IV, Theis F, Karpov P, Kurkova V (ed) 28th International Conference on artificial neural networks, Munich, Germany, September 17–19, 2019 Proceedings, Part V, Workshop and Special sessions. Springer
Samek W, Müller K-R (2019) Towards explainable artificial intelligence. In: Samek W, Montavon G, Vedaldi A, et al. (eds) Explainable AI: interpreting, explaining and visualizing deep learning. Springer International Publishing, Cham, pp 5–22
Montavon G, Binder A, Lapuschkin S et al (2019) Layer-wise relevance propagation: an overview. In: Samek W, Montavon G, Vedaldi A, et al. (eds) Explainable AI: interpreting, explaining and visualizing deep learning. Springer International Publishing, Cham, pp 193–209
Tetko IV, Villa AE, Livingstone DJ (1996) Neural network studies. 2. Variable selection. J Chem Inf Comput Sci 36:794–803. https://doi.org/10.1021/ci950204c
Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777
Segler MHS, Kogej T, Tyrchan C, Waller MP (2017) Generating focussed molecule libraries for drug discovery with recurrent neural networks
Gupta A, Múller AT, Huisma BJH et al (2018) Generative recurrent networks for de novo drug design. Mol Inform 37:1700111
Rush A (2018) The annotated transformer. In: Proceedings of workshop for NLP open source software (NLP-OSS). pp 52–60
Abadi M, Barham P, Chen J, et al (2016) TensorFlow: a system for large-scale machine learning
Landrum G RDKit: Open-source cheminformatics. https://www.rdkit.org
Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media Inc, Sebastopol
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Srivastava RK, Greff K, Schmidhuber J (2015) Highway Networks. Paper presented at the Deep Learning Workshop, International Conference on Machine Learning, Lille, France. arXiv:1505.00387
Tetko IV, Karpov P, Bruno E, et al (2019) Augmentation Is What You Need!: 28th International Conference on artificial neural networks, Munich, Germany, September 17–19, 2019, Proceedings. In: Tetko IV, Kůrková V, Karpov P, Theis F (eds) Artificial neural networks and machine learning–ICANN 2019: workshop and special sessions. Springer International Publishing, Cham, pp 831–835
Draper NR, Smith H (2014) Applied regression analysis. Wiley, New York
Tetko IV, Sushko Y, Novotarskyi S et al (2014) How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 54:3320–3329. https://doi.org/10.1021/ci5005288
Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513–530
Brandmaier S, Sahlin U, Tetko IV, Öberg T (2012) PLS-optimal: a stepwise d-optimal design based on latent variables. J Chem Inf Model 52:975–983
Sushko I, Novotarskyi S, Körner R et al (2010) Applicability domains for classification problems: benchmarking of distance to models for ames mutagenicity set. J Chem Inf Model 50:2094–2111
Tetko IV, Tanchuk VY, Kasheva TN, Villa AEP (2001) Estimation of aqueous solubility of chemical compounds using e-state indices. J Chem Inf Comput Sci 41:1488–1493
Huuskonen JJ, Livingstone DJ, Tetko IV IV (2000) Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices. J Chem Inf Comput Sci 40:947–955
Suzuki K, Nakajima H, Saito Y et al (2000) Janus kinase 3 (Jak3) is essential for common cytokine receptor γ chain (γc)-dependent signaling: comparative analysis of γc, Jak3, and γc and Jak3 double-deficient mice. Int Immunol 12:123–132
Sutherland JJ, Weaver DF (2004) Three-dimensional quantitative structure-activity and structure-selectivity relationships of dihydrofolate reductase inhibitors. J Comput Aided Mol Des 18:309–331
Vorberg S, Tetko IV (2014) Modeling the biodegradability of chemical compounds using the online chemical modeling environment (OCHEM). Mol Inform 33:73–85. https://doi.org/10.1002/minf.201300030
Novotarskyi S, Abdelaziz A, Sushko Y et al (2016) ToxCast EPA in vitro to in vivo challenge: insight into the rank-I model. Chem Res Toxicol 29:768–775. https://doi.org/10.1021/acs.chemrestox.5b00481
Rybacka A, Rudén C, Tetko IV, Andersson PL (2015) Identifying potential endocrine disruptors among industrial chemicals and their metabolites – development and evaluation of in silico tools. Chemosphere 139:372–378
Xia Z, Karpov P, Popowicz G, Tetko IV (2019) Focused library generator: case of Mdmx inhibitors. J Comp Aided Mol Des 1:1
Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Transact Int Syst Technol 2:27. https://doi.org/10.1145/1961189.1961199
Tetko IV (2002) Associative neural network. Neural Process Lett 16:187–199. https://doi.org/10.1023/A:1019903710291
Sosnin S, Karlov D, Tetko IV, Fedorov MV (2019) Comparative study of multitask toxicity modeling on a broad chemical space. J Chem Inf Model 59:1062–1072. https://doi.org/10.1021/acs.jcim.8b00685
Arras L, Montavon G, Müller K-R, Samek W (2017) Explaining recurrent neural network predictions in sentiment analysis. Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis
Plošnik A, Vračko M, Dolenc MS (2016) Mutagenic and carcinogenic structural alerts and their mechanisms of action. Arh Hig Rada Toksikol 67:169–182. https://doi.org/10.1515/aiht-2016-67-2801
Xia Z, Karpov P, Popowicz G, Tetko IV (2019) Focused library generator: case of Mdmx inhibitors. J Comput Aided Mol Des. https://doi.org/10.1007/s10822-019-00242-8