Deep Learning-Based Modeling of Drug–Target Interaction Prediction Incorporating Binding Site Information of Proteins

Sofia D’Souza1, K. V. Prema2, S. Balaji3, Ronak Shah1
1Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
2Department of Computer Science and Engineering, Manipal Academy of Higher Education, Bengaluru, India
3Department of Biotechnology, Manipal Academy of Higher Education, Manipal, India

Tóm tắt

Chemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.

Tài liệu tham khảo

van Westen GJ, Wegner JK, IJzerman AP, van Vlijmen HW, Bender A, (2011) Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. Med Chem Comm 2(1):16–30. https://doi.org/10.1039/C0MD00165A Klabunde T (2007) Chemogenomic approaches to drug discovery: similar receptors bind similar ligands. Br J Pharmacol 152(1):5–7. https://doi.org/10.1038/sj.bjp.0707307 Jacob L, Vert J-P (2008) Protein–ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156. https://doi.org/10.1093/bioinformatics/btn409 Hopkins AL (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol 4(11):682–690. https://doi.org/10.1038/nchembio.118 Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB et al (2009) Predicting new molecular targets for known drugs. Nature 462(7270):175–181. https://doi.org/10.1038/nature08506 Li YY, An J, Jones SJ (2011) A computational approach to finding novel targets for existing drugs. PLoS Comput Biol 7(9):1002139. https://doi.org/10.1371/journal.pcbi.1002139 Li Y, Jones S (2012) Drug repositioning for personalized medicine. Genome Med 4:27. https://doi.org/10.1186/gm326 Leung MK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12):121–129. https://doi.org/10.1093/bioinformatics/btu277 Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RK, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR et al (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science. https://doi.org/10.1126/science.1254806 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539 Wallach I, Dzamba M, Heifets A (2015) AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery, 1–11. arXiv:1510.02855. https://doi.org/10.1007/s10618-010-0175-9 Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein–ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957. https://doi.org/10.1021/acs.jcim.6b00740 Gomes J, Ramsundar B, Feinberg EN, Pande VS (2017) Atomic convolutional networks for predicting protein-ligand binding affinity. arXiv preprint arXiv:1703.10603. https://doi.org/10.48550/arXiv.1703.10603 Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, Aittokallio T (2014) Toward more realistic drug–target interaction predictions. Brief Bioinform 16(2):325–337. https://doi.org/10.1093/bib/bbu010 Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 He T, Heidemeyer M, Ban F, Cherkasov A, Ester M (2017) Simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminformatics 9(1):1–14. https://doi.org/10.1186/s13321-017-0209-z Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H (2017) Deep-learning-based drug–target interaction prediction. J Proteome Res 16(4):1401–1409. https://doi.org/10.1021/acs.jproteome.6b00618 Feng Q, Dueva E, Cherkasov A, Ester M (2018) Padme: a deep learning-based framework for drug-target interaction prediction. arXiv preprint arXiv:1807.09741. https://doi.org/10.48550/arXiv.1807.09741 Rifaioglu AS, Cetin Atalay R, Cansen Kahraman D, Doğan T, Martin M, Atalay V (2021) Mdeepred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery. Bioinformatics 37(5):693–704. https://doi.org/10.1093/bioinformatics/btaa858 Wang Y-B, You Z-H, Yang S, Yi H-C, Chen Z-H, Zheng K (2020) A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med Inform Decis Mak 20(2):1–9. https://doi.org/10.1186/s12911-020-1052-0 Öztürk H, Özgür A, Ozkirimli E (2018) DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34(17):821–829. https://doi.org/10.1093/bioinformatics/bty593 Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A (2020) Deepcda: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics 36(17):4633–4642. https://doi.org/10.1093/bioinformatics/btaa544 Karimi M, Wu D, Wang Z, Shen Y (2019) Deepaffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35(18):3329–3338. https://doi.org/10.1093/bioinformatics/btz111 Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJ, et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Med Chem Comm 6(1):24–50. https://doi.org/10.1039/C4MD00216D Weininger D (1988) Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36. https://doi.org/10.1021/ci00057a005 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, p 5998–6008. https://doi.org/10.48550/arXiv.1706.03762 Honda S, Shi S, Ueda HR (2019) Smiles transformer: pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738. https://doi.org/10.48550/arXiv.1911.04738 Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29(11):1046–1051. https://doi.org/10.1038/nbt.1990 Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model 54(3):735–743. https://doi.org/10.1021/ci400709d Knight JD, Qian B, Baker D, Kothary R (2007) Conservation, variability and the modeling of active protein kinases. PLoS One 2(10):982. https://doi.org/10.1371/journal.pone.0000982 Modi V, Dunbrack RL (2019) A structurally-validated multiple sequence alignment of 497 human protein kinase domains. Sci Rep 9(1):1–16. https://doi.org/10.1038/s41598-019-56499-4 Hemmer W, McGlone M, Tsigelny I, Taylor SS (1997) Role of the glycine triad in the atp-binding site of camp-dependent protein kinase. J Biol Chem 272(27):16946–16954. https://doi.org/10.1074/jbc.272.27.16946 Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C et al (2007) Patterns of somatic mutation in human cancer genomes. Nature 446(7132):153–158. https://doi.org/10.1038/nature05610 Kanev GK, de Graaf C, de Esch IJ, Leurs R, Würdinger T, Westerman BA, Kooistra AJ (2019) The landscape of atypical and eukaryotic protein kinases. Trends Pharmacol Sci 40(11):818–832. https://doi.org/10.1016/j.tips.2019.09.002 Uniprot (2021) The universal protein knowledgebase in 2021. Nucleic Acids Res 49(D1):480–489. https://doi.org/10.1093/nar/gkaa1100 Kanev GK, de Graaf C, Westerman BA, de Esch IJ, Kooistra AJ (2021) Klifs: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res 49(D1):562–569. https://doi.org/10.1093/nar/gkaa895 Öztürk H, Özgür A, Ozkirimli E (2018) Deepdta: deep drug-target binding affinity prediction. Bioinformatics 34(17):821–829. https://doi.org/10.1093/bioinformatics/bty593 Gönen M, Heller G (2005) Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92(4):965–970. https://doi.org/10.1093/biomet/92.4.965 Raghavan V, Bollmann P, Jung GS (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Inf Syst (TOIS) 7(3):205–229. https://doi.org/10.1145/65943.65945 Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN (2013) Some case studies on application of “rm2’’ metrics for judging quality of quantitative structure-activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34(12):1071–1082. https://doi.org/10.1002/jcc.23231 Stank A, Kokh DB, Fuller JC, Wade RC (2016) Protein binding pocket dynamics. Acc Chem Res 49(5):809–815. https://doi.org/10.1021/acs.accounts.5b00516 Zheng L, Fan J, Mu Y (2019) Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein–ligand binding affinity prediction. ACS Omega 4(14):15956–15965. https://doi.org/10.1021/acsomega.9b01997