DeepStack-DTIs: Predicting Drug–Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier

Yan Zhang1, Zhiwen Jiang2, Cheng Chen3, Qinqin Wei2, Hanming Gu2, Bin Yu4
1College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, China
2College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
3School of Computer Science and Technology, Shandong University, Qingdao, China
4Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agyemang B, Wu WP, Kpiebaareh MY, Lei Z, Nanor E, Chen L (2020) Multi-view self-attention for interpretable drug–target interaction prediction. J Biomed Inform 110:103547. https://doi.org/10.1016/j.jbi.2020.103547

Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J (2017) A network integration approach for drug–target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun 8(1):573. https://doi.org/10.1038/s41467-017-00680-8

Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S (2016) DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 32(12):i18–i27. https://doi.org/10.1093/bioinformatics/btw244

Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J (2021) Identifying drug–target interactions based on graph convolutional network and deep neural network. Brief Bioinform 22(2):2141–2150. https://doi.org/10.1093/bib/bbaa044

Wang Y, Zeng J (2013) Predicting drug–target interactions using restricted Boltzmann machines. Bioinformatics 29(13):i126–i134. https://doi.org/10.1093/bioinformatics/btt234

Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y (2016) Drug–target interaction prediction: databases, web servers and computational models. Brief Bioinform 17(4):696–712. https://doi.org/10.1093/bib/bbv066

Dearden JC (2003) In silico prediction of drug toxicity. J Comput Aided Mol Des 17:119–127. https://doi.org/10.1023/A:1025361621494

Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei DQ (2021) DTI-CDF: a cascade deep forest model towards the prediction of drug–target interactions based on hybrid features. Brief Bioinform 22(1):451–462. https://doi.org/10.1093/bib/bbz152

Nascimento AC, Prudêncio RB, Costa IG (2016) A multiple kernel learning algorithm for drug–target interaction prediction. BMC Bioinform 17:46. https://doi.org/10.1186/s12859-016-0890-3

Sharma A, Rain R (2018) BE-DTI’: Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. Comput Methods Programs Biomed 165:151–162. https://doi.org/10.1016/j.cmpb.2018.08.011

Chu Y, Shan X, Chen T, Jiang M, Wang Y, Wang Q, Salahub DR, Xiong Y, Wei DQ (2021) DTI-MLCD: predicting drug–target interactions using multi-label learning with community detection method. Brief Bioinform 22(3):1–15. https://doi.org/10.1093/bib/bbaa205

Thafar MA, Olayan RS, Ashoor H, Albaradei S, Bajic VB, Gao X, Gojobori T, Essack M (2020) DTiGEMS+: drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Cheminform 12(1):44. https://doi.org/10.1186/s13321-020-00447-2

Ding Y, Tang J, Guo F (2020) Identification of drug–Target interactions via dual Laplacian regularized least squares with multiple kernel fusion. Knowl-Based Syst 204:106254. https://doi.org/10.1016/j.knosys.2020.106254

Li H, Gao Z, Kang L, Zhang H, Yang K, Yu K, Luo X, Zhu W, Chen K, Shen J, Wang X, Jiang H (2006) TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res 34:W219–W224. https://doi.org/10.1093/nar/gkl114

Ezzat A, Wu M, Li XL, Kwoh CK (2019) Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform 20(4):1337–1357. https://doi.org/10.1093/bib/bby002

Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K (2021) Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Brief Bioinform 22(1):247–269. https://doi.org/10.1093/bib/bbz157

Mousavian Z, Masoudi-Nejad A (2014) Drug-target interaction prediction via chemogenomic space: learning-based methods. Expert Opin Drug Metab Toxicol 10(9):1273–1287. https://doi.org/10.1517/17425255.2014.950222

Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y (2012) Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8(5):e1002503. https://doi.org/10.1371/journal.pcbi.1002503

Manoochehri HE, Nourani M (2020) Drug-target interaction prediction using semi-bipartite graph model and deep learning. BMC Bioinform 21(S4):248. https://doi.org/10.1186/s12859-020-3518-6

Ding Y, Tang J, Guo F (2017) Identification of drug-target interactions via multiple information integration. Inform Sci 418:546–560. https://doi.org/10.1016/j.ins.2017.08.045

Huang YA, You ZH, Chen X (2018) A Systematic Prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr Protein Pept Sci 19(5):468–478. https://doi.org/10.2174/1389203718666161122103057

Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J Mol Biol 238(1):54–61. https://doi.org/10.1006/jmbi.1994.1267

Yap CW, Chen YZ (2005) Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J Chem Inf Model 45(4):982–992. https://doi.org/10.1021/ci0500536

Wu G, Liu J, Yue X (2019) Prediction of drug-disease associations based on ensemble meta paths and singular value decomposition. BMC Bioinform 20(S3):134. https://doi.org/10.1186/s12859-019-2644-5

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323

Zhang Y, Qiao S, Ji S, Han N, Liu D, Zhou J (2019) Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Eng Appl Artif Intel 79:58–66. https://doi.org/10.1016/j.engappai.2019.01.003

Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY (2013) Improving protein-atp binding residues prediction by boosting svms with random under-sampling. Neurocomputing 104:180–190. https://doi.org/10.1016/j.neucom.2012.10.012

Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M (2008) Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24(13):i232–i240. https://doi.org/10.1093/bioinformatics/btn162

Wang L, You ZH, Chen X, Yan X, Liu G, Zhang W (2018) RFDT: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Curr Protein Pept Sci 19(5):445–454. https://doi.org/10.2174/1389203718666161114111656

Li Z, Han P, You ZH, Li X, Zhang Y, Yu H, Nie R, Chen X (2017) In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences. Sci Rep 7:11174. https://doi.org/10.1038/s41598-017-10724-0

Meng FR, You ZH, Chen X, Zhou Y, An JY (2017) Prediction of drug-target interaction networks from the integration of protein sequences and drug chemical structures. Molecules 22(7):1119. https://doi.org/10.3390/molecules22071119

Mahmud SMH, Chen W, Jahan H, Liu Y, Sujan NI, Ahmed S (2019) iDTi-CSsmoteB: identification of drug–target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE. IEEE Access 7(2019):48699–48714. https://doi.org/10.1109/ACCESS.2019.2910277

Rayhan F, Ahmed S, Shatabda S, Farid DM, Mousavian Z, Dehzangi A, Rahman MS (2017) iDTI-ESBoost: identification of drug target interaction using evolutionary and structural features with boosting. Sci Rep 7:17731. https://doi.org/10.1038/s41598-017-18025-2

Yang Y, Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Zhou Y (2017) SPIDER2: a package to predict secondary structure, accessible surface area, and main-Chain torsional angles by deep neural networks. Methods Mol Biol 1484:55–63. https://doi.org/10.1007/978-1-4939-6406-2_6

Ezzat A, Wu M, Li XL, Kwoh CK (2016) Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinform 17(S19):509. https://doi.org/10.1186/s12859-016-1377-y

Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucleic Acids Res 39:D1035–D1041. https://doi.org/10.1093/nar/gkq1126

Shi H, Liu S, Chen J, Li X, Ma Q, Yu B (2019) Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 111(6):1839–1852. https://doi.org/10.1016/j.ygeno.2018.12.007

Mahmud SMH, Chen W, Meng H, Jahan H, Liu Y, Hasan SMM (2020) Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal Biochem 589:113507. https://doi.org/10.1016/j.ab.2019.113507

Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357. https://doi.org/10.1093/nar/gkj102

Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D (2004) BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 32:D431–D433. https://doi.org/10.1093/nar/gkh081

Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P, Preissner R (2007) SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res 36:D919–D922. https://doi.org/10.1093/nar/gkm862

Kuang Q, Xu X, Li R, Dong Y, Li Y, Huang Z, Li Y, Li M (2015) An eigenvalue transformation technique for predicting drug-target interaction. Sci Rep 5:13867. https://doi.org/10.1038/srep13867

Yu B, Li S, Qiu W, Wang M, Du J, Zhang Y, Chen X (2018) Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genomics 19:478. https://doi.org/10.1186/s12864-018-4849-9

Liu Y, Yu Z, Chen C, Han Y, Yu B (2020) Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem 609:113903. https://doi.org/10.1016/j.ab.2020.113903

Qiu W, Li S, Cui X, Yu Z, Wang M, Du J, Peng Y, Yu B (2018) Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J Theor Biol 450:86–103. https://doi.org/10.1016/j.jtbi.2018.04.026

Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202. https://doi.org/10.1006/jmbi.1999.3091

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389

Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43(3):246–255. https://doi.org/10.1002/prot.1035

Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y (2020) Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 123:103899. https://doi.org/10.1016/j.compbiomed.2020.103899

Cui X, Yu Z, Yu B, Wang M, Tian B, Ma Q (2019) UbiSitePred: a novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou’s pseudo components. Chemom Intell Lab Syst 184:28–43. https://doi.org/10.1016/j.chemolab.2018.11.012

Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B (2017) Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 76:260–273. https://doi.org/10.1016/j.jmgm.2017.07.012

Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849. https://doi.org/10.1093/bioinformatics/btx218

Yamanishi Y, Pauwels E, Saigo H, Stoven V (2011) Extracting sets of chemical substructures and protein domains governing drug-target interactions. J Chem Inf Model 51(5):1183–1194. https://doi.org/10.1021/ci100476q

Cao DS, Hu QN, Xu QS, Yang YN, Zhao JC, Lu HM, Zhang LX, Liang YZ (2011) In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint. Anal Chim Acta 692(1–2):50–56. https://doi.org/10.1016/j.aca.2011.02.010

O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33

Chawla NV, Bowyer KW, Kegelmeyer HLO, WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451

Chen C, Zhang Q, Ma Q, Yu B (2019) LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intell Lab Syst 191:54–64. https://doi.org/10.1016/j.chemolab.2019.06.003

Zhan ZH, You ZH, Li LP, Zhou Y, Yi HC (2018) Accurate prediction of ncRNA-Protein interactions from the integration of sequence and evolutionary information. Front Genet 9:458. https://doi.org/10.3389/fgene.2018.00458

Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

Mishra A, Pokhrel P, Hoque MT (2019) StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics 35(3):433–441. https://doi.org/10.1093/bioinformatics/bty653

Wu H, Xing Y, Ge W, Liu X, Zou J, Zhou C, Liao J (2020) Drug-drug interaction extraction via hybrid neural networks on biomedical literature. J Biomed Inform 106:103432. https://doi.org/10.1016/j.jbi.2020.103432

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647

Mousavian Z, Khakabimamaghani S, Kavousi K, Masoudi-Nejad A (2016) Drug-target interaction prediction from PSSM based evolutionary information. J Pharmacol Toxicol Methods 78:42–51. https://doi.org/10.1016/j.vascn.2015.11.002

Wang X, Zhang Y, Yu B, Salhi A, Chen R, Wang L, Liu Z (2021) Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Comput Biol Med 134:104516. https://doi.org/10.1016/j.compbiomed.2021.104516

Yu B, Qiu W, Chen C, Ma A, Jiang J, Zhou H, Ma Q (2020) SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36(4):1074–1081. https://doi.org/10.1093/bioinformatics/btz734

Yu B, Yu Z, Chen C, Ma A, Liu B, Tian B, Ma Q (2020) DNNAce: Prediction of prokaryote lysine acetylation sites through deep neural networks with multi-information fusion. Chemom Intell Lab Syst 200:103999. https://doi.org/10.1016/j.chemolab.2020.103999

Sun X, Jin T, Chen C, Cui X, Ma Q, Yu B (2020) RBPro-RF: Use Chou’s 5-steps rule to predict RNA-binding proteins via random forest with elastic net. Chemom Intell Lab Syst 197:103919. https://doi.org/10.1016/j.chemolab.2019.103919

Wang M, Cui X, Li S, Yang X, Ma A, Zhang Y, Yu B (2020) DeepMal:accurate prediction of protein malonylation sites by deep neural networks. Chemom Intell Lab Syst 207:104175. https://doi.org/10.1016/j.chemolab.2020.104175

Liu XY, Wu J, Zhou ZH (2009) Exploratory Undersampling for Class-Imbalance Learning. IEEE Trans Syst Man Cybern B Cybern 39(2):539–550. https://doi.org/10.1109/TSMCB.2008.2007853

Bao L, Juan C, Li J, Zhang Y (2016) Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing 172:198–206. https://doi.org/10.1016/j.neucom.2014.05.096

Taguchi YH, Oono Y (2005) Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics 21(6):730–740. https://doi.org/10.1093/bioinformatics/bti067

Ross BC (2014) Mutual information between discrete and continuous data sets. PLoS ONE 9(2):e87357. https://doi.org/10.1371/journal.pone.0087357

Lai CM, Yeh WC, Chang CY (2016) Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218:331–338. https://doi.org/10.1016/j.neucom.2016.08.089

Wang Y, Tseng M (2014) Attribute selection for product configurator design based on Gini index. Int J Prod Res 52:6136–6145. https://doi.org/10.1080/00207543.2014.917216

Zou Q, Zeng J, Cao L, Ji R (2016) A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173:346–354. https://doi.org/10.1016/j.neucom.2014.12.123

Kandaswamy KK, Pugalenthi G, Hazrati MK, Kalies KU, Martinetz T (2011) BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection. BMC Bioinform 12:345. https://doi.org/10.1186/1471-2105-12-345

Chen C, Shi H, Jiang Z, Salhi A, Chen R, Cui X, Yu B (2021) DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network. Comput Biol Med 136:104676. https://doi.org/10.1016/j.compbiomed.2021.104676

Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504

Nigsch F, Bender A, Buuren BV, Tissen J, Nigsch E, Mitchell JBO (2006) Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J Chem Inf Model 46(6):2412–2422. https://doi.org/10.1021/ci060149f

Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106. https://doi.org/10.1007/BF00116251

Box JF (1987) Guinness, Gosset, Fisher, and Small Samples. Stat Sci 2(1):45–52. https://doi.org/10.1214/ss/1177013437

Cao DS, Liu S, Xu QS, Lu HM, Huang JH, Hu QN, Liang YZ (2012) Large-scale prediction of drug-target interactions using proteinsequences and drug topological structures. Anal Chim Acta 752:1–10. https://doi.org/10.1016/j.aca.2012.09.021

Wang L, You ZH, Chen X, Xia SX, Liu F, Yan X, Zhou Y, Song KJ (2018) A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network. J Comput Biol 25(3):361–373. https://doi.org/10.1089/cmb.2017.0135

Xia LY, Yang ZY, Zhang H, Liang Y (2019) Improved prediction of drug-target interactions using self-paced learning with collaborative matrix factorization. J Chem Inf Model 59(7):3340–3351. https://doi.org/10.1021/acs.jcim.9b00408

Meece FA, Ahmed G, Nair H, Santhamma B, Tekmal RR, Zhao C, Pollok NE, Lara J, Shaked Z, Nickisch K (2018) Esters of levonorgestrel and etonogestrel intended as single, subcutaneous-injection, long-lasting contraceptives. Steroids 137:47–56. https://doi.org/10.1016/j.steroids.2018.07.010

Radin DP, Patel P (2016) Delineating the molecular mechanisms of tamoxifen’s oncolytic actions in estrogen receptor-negative cancers. Eur J Pharmacol 781:173–180. https://doi.org/10.1016/j.ejphar.2016.04.017

Gainder S, Thakur M, Saha SC, Prakash M (2019) To study the changes in fetal hemodynamics with intravenous labetalol or nifedipine in acute severe hypertension. Pregnancy Hypertens 15:12–15. https://doi.org/10.1016/j.preghy.2018.02.011

Ferrari MD, Saxena PRS (1992) Clinical effects and mechanism of action of sumatriptan in migraine. Clin Neurol Neurosur 94:73–77. https://doi.org/10.1016/0303-8467(92)90028-2

Matabosch X, Pozo OJ, Monfort N, Pérez-Mañá C, Farré M, Marcos J, Segura J, Ventura R (2013) Urinary profile of methylprednisolone and its metabolites after oral and topical administrations. J Steroid Biochem 138:214–221. https://doi.org/10.1016/j.jsbmb.2013.05.019

Fizazi K, Smith MR, Tombal B (2018) Clinical development of darolutamide: a novel androgen receptor antagonist for the treatment of prostate cancer. Clin Genitourin Cancer 16(5):332–340. https://doi.org/10.1016/j.clgc.2018.07.017