Transformer-CNN: Swiss knife for QSAR modeling and interpretation

Pavel Karpov1, Guillaume Godin2, Igor V. Tetko1
1Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
2Firmenich International SA, Digital Lab, Geneva, Lausanne, Switzerland

Tóm tắt

Abstract

We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model’s result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.

Từ khóa


Tài liệu tham khảo

Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. arXiv:1706.03762

Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. arXiv e-prints. arXiv:1509.01626

Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554. https://doi.org/10.1007/s10822-011-9440-2

Mauri A, Consonni V, Pavan M, Todeschini R (2006) Dragon software: an easy approach to molecular descriptor calculations. Match 56:237–248

Baskin I, Varnek A (2008) Fragment descriptors in SAR/QSAR/QSPR studies, molecular similarity analysis and in virtual screening. Chemoinformatics approaches to virtual screening. Royal Society of Chemistry, Cambridge, pp 1–43

Eklund M, Norinder U, Boyer S, Carlsson L (2014) Choosing feature selection and learning algorithms in QSAR. J Chem Inf Model 54:837–843. https://doi.org/10.1021/ci400573c

Baskin II, Winkler D, Tetko IV (2016) A renaissance of neural networks in drug discovery. Expert Opin Drug Discov 11:785–795. https://doi.org/10.1080/17460441.2016.1201262

Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, et al (2015) Convolutional networks on graphs for learning molecular fingerprints. arXiv e-prints. arXiv:1509.09292

Coley CW, Barzilay R, Green WH et al (2017) Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model 57:1757–1772. https://doi.org/10.1021/acs.jcim.6b00601

Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci 4:268–276. https://doi.org/10.1021/acscentsci.7b00572

Kimber TB, Engelke S, Tetko IV, et al (2018) Synergy effect between convolutional neural networks and the multiplicity of smiles for improvement of molecular prediction. arXiv e-prints. arXiv:1812.04439

Gilmer J, Schoenholz SS, Riley PF, et al (2017) Neural message passing for quantum chemistry. Proceedings of the 34 th International conference on machine learning, Sydney, Australia, PMLR 70. arXiv:1704.01212

Shang C, Liu Q, Chen K-S, et al (2018) Edge attention-based multi-relational graph convolutional networks. arXiv e-prints. arXiv:1802.04944

Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005

O’Boyle NM, Banck M, James CA et al (2011) Open babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33

Vidal D, Thormann M, Pons M (2005) LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities. J Chem Inf Model 45:386–393. https://doi.org/10.1021/ci0496797

Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv e-prints. arXiv:1502.01710

Goh GB, Hodas NO, Siegel C, Vishnu A (2017) SMILES2Vec: an interpretable general-purpose deep neural network for predicting chemical properties. arXiv e-prints. arXiv:1712.02034

Jastrzębski S, Leśniak D, Czarnecki WM (2016) Learning to SMILE(S). arXiv e-prints. arXiv:1602.06289

Goh GB, Siegel C, Vishnu A, Hodas NO (2017) Using rule-based labels for weak supervised learning: a chemnet for transferable chemical property prediction. arXiv e-prints. arXiv:1712.02734

Zheng S, Yan X, Yang Y, Xu J (2019) Identifying structure-property relationships through SMILES syntax analysis with self-attention mechanism. J Chem Inf Model 59:914–923. https://doi.org/10.1021/acs.jcim.8b00803

Tetko IV, Karpov P, Bruno E, Kimber TB, Godin G. Augmentation Is What You Need! In: Tetko IV, Karpov P, Kurkova V (ed) 28th International Conference on Artificial Neural Networks Munich, Germany, 2019 Sep 17, Proceedings, Part V, Workshop and Special sessions, Springer, Cham, pp 831–835

Kiela D, Bottou L (2014) Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP). pp 36–45

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507. https://doi.org/10.1126/science.1127647

Heller S, McNaught A, Stein S et al (2013) InChI - the worldwide chemical structure identifier standard. J Cheminform 5:7. https://doi.org/10.1186/1758-2946-5-7

Winter R, Montanari F, Noé F, Clevert D-A (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692–1701. https://doi.org/10.1039/c8sc04175j

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Schwaller P et al (2019) Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5:1572–1583. https://doi.org/10.1021/acscentsci.9b00576

Karpov P, Godin G, Tetko IV. A transformer model for retrosynthesis. In: Tetko IV, Theis F, Karpov P, Kurkova V (ed) 28th International Conference on artificial neural networks, Munich, Germany, September 17–19, 2019 Proceedings, Part V, Workshop and Special sessions. Springer

Samek W, Müller K-R (2019) Towards explainable artificial intelligence. In: Samek W, Montavon G, Vedaldi A, et al. (eds) Explainable AI: interpreting, explaining and visualizing deep learning. Springer International Publishing, Cham, pp 5–22

Montavon G, Binder A, Lapuschkin S et al (2019) Layer-wise relevance propagation: an overview. In: Samek W, Montavon G, Vedaldi A, et al. (eds) Explainable AI: interpreting, explaining and visualizing deep learning. Springer International Publishing, Cham, pp 193–209

Tetko IV, Villa AE, Livingstone DJ (1996) Neural network studies. 2. Variable selection. J Chem Inf Comput Sci 36:794–803. https://doi.org/10.1021/ci950204c

Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777

Segler MHS, Kogej T, Tyrchan C, Waller MP (2017) Generating focussed molecule libraries for drug discovery with recurrent neural networks

Gupta A, Múller AT, Huisma BJH et al (2018) Generative recurrent networks for de novo drug design. Mol Inform 37:1700111

Rush A (2018) The annotated transformer. In: Proceedings of workshop for NLP open source software (NLP-OSS). pp 52–60

Abadi M, Barham P, Chen J, et al (2016) TensorFlow: a system for large-scale machine learning

Landrum G RDKit: Open-source cheminformatics. https://www.rdkit.org

Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media Inc, Sebastopol

Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

Srivastava RK, Greff K, Schmidhuber J (2015) Highway Networks. Paper presented at the Deep Learning Workshop, International Conference on Machine Learning, Lille, France. arXiv:1505.00387

Tetko IV, Karpov P, Bruno E, et al (2019) Augmentation Is What You Need!: 28th International Conference on artificial neural networks, Munich, Germany, September 17–19, 2019, Proceedings. In: Tetko IV, Kůrková V, Karpov P, Theis F (eds) Artificial neural networks and machine learning–ICANN 2019: workshop and special sessions. Springer International Publishing, Cham, pp 831–835

Draper NR, Smith H (2014) Applied regression analysis. Wiley, New York

Tetko IV, Sushko Y, Novotarskyi S et al (2014) How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 54:3320–3329. https://doi.org/10.1021/ci5005288

Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513–530

Brandmaier S, Sahlin U, Tetko IV, Öberg T (2012) PLS-optimal: a stepwise d-optimal design based on latent variables. J Chem Inf Model 52:975–983

Sushko I, Novotarskyi S, Körner R et al (2010) Applicability domains for classification problems: benchmarking of distance to models for ames mutagenicity set. J Chem Inf Model 50:2094–2111

Tetko IV, Tanchuk VY, Kasheva TN, Villa AEP (2001) Estimation of aqueous solubility of chemical compounds using e-state indices. J Chem Inf Comput Sci 41:1488–1493

Huuskonen JJ, Livingstone DJ, Tetko IV IV (2000) Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices. J Chem Inf Comput Sci 40:947–955

Suzuki K, Nakajima H, Saito Y et al (2000) Janus kinase 3 (Jak3) is essential for common cytokine receptor γ chain (γc)-dependent signaling: comparative analysis of γc, Jak3, and γc and Jak3 double-deficient mice. Int Immunol 12:123–132

Sutherland JJ, Weaver DF (2004) Three-dimensional quantitative structure-activity and structure-selectivity relationships of dihydrofolate reductase inhibitors. J Comput Aided Mol Des 18:309–331

Vorberg S, Tetko IV (2014) Modeling the biodegradability of chemical compounds using the online chemical modeling environment (OCHEM). Mol Inform 33:73–85. https://doi.org/10.1002/minf.201300030

Novotarskyi S, Abdelaziz A, Sushko Y et al (2016) ToxCast EPA in vitro to in vivo challenge: insight into the rank-I model. Chem Res Toxicol 29:768–775. https://doi.org/10.1021/acs.chemrestox.5b00481

Rybacka A, Rudén C, Tetko IV, Andersson PL (2015) Identifying potential endocrine disruptors among industrial chemicals and their metabolites – development and evaluation of in silico tools. Chemosphere 139:372–378

Xia Z, Karpov P, Popowicz G, Tetko IV (2019) Focused library generator: case of Mdmx inhibitors. J Comp Aided Mol Des 1:1

Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Transact Int Syst Technol 2:27. https://doi.org/10.1145/1961189.1961199

Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324

Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. arXiv [cs.LG]

Tetko IV (2002) Associative neural network. Neural Process Lett 16:187–199. https://doi.org/10.1023/A:1019903710291

Sosnin S, Karlov D, Tetko IV, Fedorov MV (2019) Comparative study of multitask toxicity modeling on a broad chemical space. J Chem Inf Model 59:1062–1072. https://doi.org/10.1021/acs.jcim.8b00685

Arras L, Montavon G, Müller K-R, Samek W (2017) Explaining recurrent neural network predictions in sentiment analysis. Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis

Plošnik A, Vračko M, Dolenc MS (2016) Mutagenic and carcinogenic structural alerts and their mechanisms of action. Arh Hig Rada Toksikol 67:169–182. https://doi.org/10.1515/aiht-2016-67-2801

Xia Z, Karpov P, Popowicz G, Tetko IV (2019) Focused library generator: case of Mdmx inhibitors. J Comput Aided Mol Des. https://doi.org/10.1007/s10822-019-00242-8

Huuskonen J (2000) Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci 40:773–777. https://doi.org/10.1021/ci9901338