Multi-PLI: interpretable multi‐task deep learning model for unifying protein–ligand interaction datasets

Springer Science and Business Media LLC - Tập 13 - Trang 1-14 - 2021
Fan Hu1, Jiaxin Jiang1, Dongqi Wang1, Muchun Zhu1, Peng Yin1
1Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

Tóm tắt

The assessment of protein–ligand interactions is critical at early stage of drug discovery. Computational approaches for efficiently predicting such interactions facilitate drug development. Recently, methods based on deep learning, including structure- and sequence-based models, have achieved impressive performance on several different datasets. However, their application still suffers from a generalizability issue because of insufficient data, especially for structure based models, as well as a heterogeneity problem because of different label measurements and varying proteins across datasets. Here, we present an interpretable multi-task model to evaluate protein–ligand interaction (Multi-PLI). The model can run classification (binding or not) and regression (binding affinity) tasks concurrently by unifying different datasets. The model outperforms traditional docking and machine learning on both binary classification and regression tasks and achieves competitive results compared with some structure-based deep learning methods, even with the same training set size. Furthermore, combined with the proposed occlusion algorithm, the model can predict the important amino acids of proteins that are crucial for binding, thus providing a biological interpretation.

Tài liệu tham khảo

Ma D-L, Chan DS-H, Leung C-H (2013) Drug repositioning by structure-based virtual screening. Chem Soc Rev 42:2130. https://doi.org/10.1039/c2cs35357a Koeppen H, Kriegl J, Lessel U et al (2011) Ligand-based virtual screening. virtual screen princ Challenges, pract Guide 61–85. https://doi.org/10.1002/9783527633326.ch3 Varnek A, Baskin I (2012) Machine learning methods for property prediction in Chemoinformatics: Quo Vadis ? J Chem Inf Model 52:1413–1437. https://doi.org/10.1021/ci200409x Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010 Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386 Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci. https://doi.org/10.1155/2018/70683492018/7068349 Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13:55–75. https://doi.org/10.1109/MCI.2018.2840738 Chen H, Engkvist O, Wang Y et al (2018) The rise of deep learning in drug discovery. Drug Discov Today 23:1241–1250. https://doi.org/10.1016/j.drudis.2018.01.039 Wallach I, Dzamba M, Heifets A (2015) AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. Data Min Knowl Discov 22:31–72. https://doi.org/10.1007/s10618-010-0175-9 Ragoza M, Hochuli J, Idrobo E et al (2017) Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model 57:942–957. https://doi.org/10.1021/acs.jcim.6b00740 Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P (2018) Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics 34:3666–3674. https://doi.org/10.1093/bioinformatics/bty374 Öztürk H, Özgür A, Ozkirimli E (2018) DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34:i821–i829. https://doi.org/10.1093/bioinformatics/bty593 Tsubaki M, Tomii K, Sese J (2018) Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty535 Lee I, Keum J, Nam H (2019) DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLOS Comput Biol 15:e1007129. https://doi.org/10.1371/journal.pcbi.1007129 Wan F, Zeng J (2016) Deep learning with feature embedding for compound-protein interaction prediction. bioRxiv. https://doi.org/10.1101/086033 Liu H, Sun J, Guan J et al (2015) Improving compound-protein interaction prediction by building up highly credible negative samples. Bioinformatics 31:i221–i229. https://doi.org/10.1093/bioinformatics/btv256 Sieg J, Flachsenberg F, Rarey M (2019) In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J Chem Inf Model 59:947–961. https://doi.org/10.1021/acs.jcim.8b00712 Chen L, Cruz A, Ramsey S et al (2019) Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14:1–22. https://doi.org/10.1371/journal.pone.0220113 Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. https://doi.org/10.1093/bioinformatics/bts565 Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 12:17. https://doi.org/10.1186/s13321-020-00423-w Li Y, Han L, Liu Z, Wang R (2014) Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results. J Chem Inf Model 54:1717–1736. https://doi.org/10.1021/ci500081m Trott O, Olson AJ (2009) AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem NA-NA. https://doi.org/10.1002/jcc.21334 Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: A benchmark for molecular machine learning. Chem Sci 9:513–530. https://doi.org/10.1039/c7sc02664a Yingkai Gao K, Fokoue A, Luo H et al (2018) Interpretable drug target prediction using deep neural representation. IJCAI 2018:3371–3377 Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3:9. https://doi.org/10.1186/s40537-016-0043-6 Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594. https://doi.org/10.1021/jm300687e Tang J, Szwajda A, Shakyawar S et al (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis. J Chem Inf Model 54:735–743. https://doi.org/10.1021/ci400709d Heidemeyer M, Cherkasov A, Ester M et al (2017) SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines. J Cheminform 9:1–14. https://doi.org/10.1186/s13321-017-0209-z Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J Med Chem 47:2977–2980. https://doi.org/10.1021/jm030580l Hartshorn MJ, Verdonk ML, Chessari G et al (2007) Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem 50:726–741. https://doi.org/10.1021/jm061277y Davis MI, Hunt JP, Herrgard S et al (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29:1046–1051. https://doi.org/10.1038/nbt.1990 Szegedy C, Vanhoucke V, Ioffe S et al (2015) Rethinking the inception architecture for computer vision. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Jiang J, Hu F, Zhu M, Yin P (2019) A multi-task deep model for protein-ligand interaction prediction. In: 2019 International Conference on Intelligent Informatics and Sciences B (ICIIBMS). IEEE, pp 28–31 Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, pp 249–256 He K, Zhang X, Ren S, Sun J (2015) Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 1026–1034 Zeiler MD, Fergus R (2014) Visualizing and Understanding Convolutional Networks. In: European conference on computer vision (ECCV). pp 818–833 Hu F, Jiang J, Yin P (2019) Interpretable Prediction of Protein-Ligand Interaction by Convolutional Neural Network. In: 2019 IEEE International Conference on Bioinformatics, Biomedicine (BIBM). IEEE, pp 656–659