Deep Learning-Based Modeling of Drug–Target Interaction Prediction Incorporating Binding Site Information of Proteins
Tóm tắt
Chemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.
Tài liệu tham khảo
van Westen GJ, Wegner JK, IJzerman AP, van Vlijmen HW, Bender A, (2011) Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. Med Chem Comm 2(1):16–30. https://doi.org/10.1039/C0MD00165A
Klabunde T (2007) Chemogenomic approaches to drug discovery: similar receptors bind similar ligands. Br J Pharmacol 152(1):5–7. https://doi.org/10.1038/sj.bjp.0707307
Jacob L, Vert J-P (2008) Protein–ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156. https://doi.org/10.1093/bioinformatics/btn409
Hopkins AL (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol 4(11):682–690. https://doi.org/10.1038/nchembio.118
Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB et al (2009) Predicting new molecular targets for known drugs. Nature 462(7270):175–181. https://doi.org/10.1038/nature08506
Li YY, An J, Jones SJ (2011) A computational approach to finding novel targets for existing drugs. PLoS Comput Biol 7(9):1002139. https://doi.org/10.1371/journal.pcbi.1002139
Li Y, Jones S (2012) Drug repositioning for personalized medicine. Genome Med 4:27. https://doi.org/10.1186/gm326
Leung MK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12):121–129. https://doi.org/10.1093/bioinformatics/btu277
Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RK, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR et al (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science. https://doi.org/10.1126/science.1254806
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Wallach I, Dzamba M, Heifets A (2015) AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery, 1–11. arXiv:1510.02855. https://doi.org/10.1007/s10618-010-0175-9
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein–ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957. https://doi.org/10.1021/acs.jcim.6b00740
Gomes J, Ramsundar B, Feinberg EN, Pande VS (2017) Atomic convolutional networks for predicting protein-ligand binding affinity. arXiv preprint arXiv:1703.10603. https://doi.org/10.48550/arXiv.1703.10603
Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, Aittokallio T (2014) Toward more realistic drug–target interaction predictions. Brief Bioinform 16(2):325–337. https://doi.org/10.1093/bib/bbu010
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
He T, Heidemeyer M, Ban F, Cherkasov A, Ester M (2017) Simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminformatics 9(1):1–14. https://doi.org/10.1186/s13321-017-0209-z
Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H (2017) Deep-learning-based drug–target interaction prediction. J Proteome Res 16(4):1401–1409. https://doi.org/10.1021/acs.jproteome.6b00618
Feng Q, Dueva E, Cherkasov A, Ester M (2018) Padme: a deep learning-based framework for drug-target interaction prediction. arXiv preprint arXiv:1807.09741. https://doi.org/10.48550/arXiv.1807.09741
Rifaioglu AS, Cetin Atalay R, Cansen Kahraman D, Doğan T, Martin M, Atalay V (2021) Mdeepred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery. Bioinformatics 37(5):693–704. https://doi.org/10.1093/bioinformatics/btaa858
Wang Y-B, You Z-H, Yang S, Yi H-C, Chen Z-H, Zheng K (2020) A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med Inform Decis Mak 20(2):1–9. https://doi.org/10.1186/s12911-020-1052-0
Öztürk H, Özgür A, Ozkirimli E (2018) DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34(17):821–829. https://doi.org/10.1093/bioinformatics/bty593
Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A (2020) Deepcda: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics 36(17):4633–4642. https://doi.org/10.1093/bioinformatics/btaa544
Karimi M, Wu D, Wang Z, Shen Y (2019) Deepaffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35(18):3329–3338. https://doi.org/10.1093/bioinformatics/btz111
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJ, et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Med Chem Comm 6(1):24–50. https://doi.org/10.1039/C4MD00216D
Weininger D (1988) Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36. https://doi.org/10.1021/ci00057a005
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, p 5998–6008. https://doi.org/10.48550/arXiv.1706.03762
Honda S, Shi S, Ueda HR (2019) Smiles transformer: pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738. https://doi.org/10.48550/arXiv.1911.04738
Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29(11):1046–1051. https://doi.org/10.1038/nbt.1990
Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model 54(3):735–743. https://doi.org/10.1021/ci400709d
Knight JD, Qian B, Baker D, Kothary R (2007) Conservation, variability and the modeling of active protein kinases. PLoS One 2(10):982. https://doi.org/10.1371/journal.pone.0000982
Modi V, Dunbrack RL (2019) A structurally-validated multiple sequence alignment of 497 human protein kinase domains. Sci Rep 9(1):1–16. https://doi.org/10.1038/s41598-019-56499-4
Hemmer W, McGlone M, Tsigelny I, Taylor SS (1997) Role of the glycine triad in the atp-binding site of camp-dependent protein kinase. J Biol Chem 272(27):16946–16954. https://doi.org/10.1074/jbc.272.27.16946
Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C et al (2007) Patterns of somatic mutation in human cancer genomes. Nature 446(7132):153–158. https://doi.org/10.1038/nature05610
Kanev GK, de Graaf C, de Esch IJ, Leurs R, Würdinger T, Westerman BA, Kooistra AJ (2019) The landscape of atypical and eukaryotic protein kinases. Trends Pharmacol Sci 40(11):818–832. https://doi.org/10.1016/j.tips.2019.09.002
Uniprot (2021) The universal protein knowledgebase in 2021. Nucleic Acids Res 49(D1):480–489. https://doi.org/10.1093/nar/gkaa1100
Kanev GK, de Graaf C, Westerman BA, de Esch IJ, Kooistra AJ (2021) Klifs: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res 49(D1):562–569. https://doi.org/10.1093/nar/gkaa895
Öztürk H, Özgür A, Ozkirimli E (2018) Deepdta: deep drug-target binding affinity prediction. Bioinformatics 34(17):821–829. https://doi.org/10.1093/bioinformatics/bty593
Gönen M, Heller G (2005) Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92(4):965–970. https://doi.org/10.1093/biomet/92.4.965
Raghavan V, Bollmann P, Jung GS (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Inf Syst (TOIS) 7(3):205–229. https://doi.org/10.1145/65943.65945
Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN (2013) Some case studies on application of “rm2’’ metrics for judging quality of quantitative structure-activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34(12):1071–1082. https://doi.org/10.1002/jcc.23231
Stank A, Kokh DB, Fuller JC, Wade RC (2016) Protein binding pocket dynamics. Acc Chem Res 49(5):809–815. https://doi.org/10.1021/acs.accounts.5b00516
Zheng L, Fan J, Mu Y (2019) Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein–ligand binding affinity prediction. ACS Omega 4(14):15956–15965. https://doi.org/10.1021/acsomega.9b01997