Exploration and augmentation of pharmacological space via adversarial auto-encoder model for facilitating kinase-centric drug development

Springer Science and Business Media LLC - Tập 13 - Trang 1-15 - 2021
Xinyu Bai1,2, Yuxin Yin1,2,3
1Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
2Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, People’s Republic of China
3Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, China

Tóm tắt

Predicting compound–protein interactions (CPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of CPI matrixes, resulting in poor generalization performance. Hence, unlike typical CPI prediction models focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM-AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing the adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the data space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA for external datasets enhances its generalization ability, making it possible to gracefully handle previously unseen kinases and inhibitors. EPA showed promising potential when directly applied to virtual screening and off-target prediction, exhibiting its practicality in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.

Tài liệu tham khảo

Keiser M, Setola V, Irwin J et al (2009) Predicting new molecular targets for known drugs. Nature 462:175–181 Scannell JW, Bosley J (2016) When quality beats quantity: decision theory, drug discovery, and the reproducibility crisis. PLoS ONE 11:e0147215 Zhang G, Xing J, Wang Y et al (2018) Discovery of novel inhibitors of indoleamine 2,3-dioxygenase 1 through structure-based virtual screening. Front Pharmacol 9:277 Dong L, Shen S, Chen W et al (2019) Discovery of novel inhibitors targeting human O-GlcNAcase: docking-based virtual screening, biological evaluation, structural modification, and molecular dynamics simulation. J Chem Inf Model 59:4374–4382 Scior T, Bender A, Tresadern G et al (2012) Recognizing pitfalls in virtual screening: a critical review. J Chem Inf Model 52:867–881 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589 Genheden S, Thakkar A et al (2020) AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform 12:1–9 Tong X, Liu X, Tan X et al (2021) Generative models for De Novo drug design. J Med Chem 64:14011–14027 Wen M, Zhang Z, Niu S et al (2017) Deep-learning-based drug–target interaction prediction. J Proteome Res 16:1401–1409 Lee I, Keum J, Nam H (2019) DeepConv-DTI: prediction of drug–target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol 15:e1007129 Liu G, Singha M, Pu L et al (2021) GraphDTI: a robust deep learning predictor of drug–target interactions from multiple heterogeneous data. J Cheminform 13:1–17 Öztürk H, Özgür A, Ozkirimli E (2018) DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34:i821–i829 Nguyen T, Le H, Quinn TP et al (2021) GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37:1140–1147 Kramer C, Kalliokoski T, Gedeck P et al (2012) The experimental uncertainty of heterogeneous public K i data. J Med Chem 55:5165–5173 Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232 Torgo L, Ribeiro R (2009) Precision and recall for regression. In: Paper presented at international conference on discovery science, Porto, Portugal, 3–5 October 2009 Torgo L, Ribeiro RP, Pfahringer B et al (2013) Smote for regression. In: Paper presented at Portuguese conference on artificial intelligence, Azores, Portugal, 9–12 September 2013 Sundar V, Colwell L (2020) Using single protein/ligand binding models to predict active ligands for unseen proteins. bioRxiv https://doi.org/10.1101/2020.08.02.233155. Sixt L, Wild B, Landgraf T (2018) Rendergan: generating realistic labeled data. Front Robot AI 5:66 Lim SK, Loo Y, Tran N-T et al (2018) Doping: generative data augmentation for unsupervised anomaly detection with GAN. In: Paper presented at international conference on data mining (ICDM), Singapore, 17–20 November 2018 Zhu X, Liu Y, Li J et al (2018) Emotion classification with data augmentation using generative adversarial networks. In: Paper presented at Pacific-Asia conference on knowledge discovery and data mining, Melbourne, VIC, Australia, 3–6 June 2018 Christmann-Franck S, van Westen GJ, Papadatos G et al (2016) Unprecedently large-scale kinase inhibitor set enabling the accurate prediction of compound–kinase activities: a way toward selective promiscuity by design? J Chem Inf Model 56:1654–1675 Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107 Drewry DH, Willson TM, Zuercher WJ (2014) Seeding collaborations to advance kinase science with the GSK published kinase inhibitor set (PKIS). Curr Top Med Chem 14:340–342 Metz JT, Johnson EF, Soni NB et al (2011) Navigating the kinome. Nat Chem Biol 7:200–202 Davis MI, Hunt JP, Herrgard S et al (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29:1046–1051 Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58:27–35 Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55:2324–2337 Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10:e0141287 Boutet E, Lieberherr D, Tognolli M et al (2007) Uniprotkb/swiss-prot. Plant bioinformatics. Springer, New York, pp 89–112 Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Paper presented at proceedings of the 3rd international conference on learning representations, San Diego, CA, USA, May 7–9, 2015 Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial nets. In: Paper presented at advances in neural information processing systems, Montreal, Quebec, Canada, 8–13 December 2014 Makhzani A, Shlens J, Jaitly N et al (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644 Abadi M, Barham P, Chen J et al (2016) Tensorflow: a system for large-scale machine learning. In: Paper presented at 12th symposium on operating systems design and implementation, Savannah, GA, USA, 2–4 November 2016 Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2:37–52 Linderman GC, Rachh M, Hoskins JG et al (2017) Efficient algorithms for t-distributed stochastic neighborhood embedding. arXiv preprint arXiv:1712.09005 Zhou H, Wang F, Tao P (2018) t-Distributed stochastic neighbor embedding method with the least information loss for macromolecular simulations. J Chem Theory Comput 14:5499–5510 Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 Park Y, Marcotte EM (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods 9:1134–1136 Sorgenfrei FA, Fulle S, Merget B (2018) Kinome-wide profiling prediction of small molecules. ChemMedChem 13:495–499 Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Paper presented at proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13–17 August 2016 Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507 Karaman MW, Herrgard S, Treiber DK et al (2008) A quantitative analysis of kinase inhibitor selectivity. Nat Biotechnol 26:127–132 Li X, Li Z, Wu X et al (2019) Deep learning enhancing kinome-wide polypharmacology profiling: model construction and experiment validation. J Med Chem 63:8723–8737 Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801 Chaput L, Martinez-Sanz J, Saettel N et al (2016) Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance. J Cheminform 8:1–17 Che T, Li Y, Jacob AP et al (2017) Mode regularized generative adversarial networks. In: Paper presented at 5th international conference on learning representations, Toulon, France, 24–26 April 2017 Srivastava A, Valkov L, Russell C et al (2017) Veegan: reducing mode collapse in gans using implicit variational learning. In: Paper presented at proceedings of the 31st international conference on neural information processing systems, Long Beach, CA, USA, 4–9 December 2017 Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Paper presented at international conference on machine learning, Sydney, NSW, Australia, 6–11 August 2017 Abusitta A, Wahab OA, Fung BC (2021) VirtualGAN: reducing mode collapse in generative adversarial networks using virtual mapping. In: Paper presented at 2021 international joint conference on neural networks (IJCNN), Shenzhen, China, July 18–22 2021 Du S, You S, Li X et al (2020) Agree to disagree: adaptive ensemble knowledge distillation in gradient space. In: Paper presented at advances in neural information processing systems, 6–12 December 2020