Exploration and augmentation of pharmacological space via adversarial auto-encoder model for facilitating kinase-centric drug development

Springer Science and Business Media LLC - Tập 13 - Trang 1-15 - 2021
Xinyu Bai1,2, Yuxin Yin1,2,3
1Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
2Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, People’s Republic of China
3Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, China

Tóm tắt

Predicting compound–protein interactions (CPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of CPI matrixes, resulting in poor generalization performance. Hence, unlike typical CPI prediction models focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM-AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing the adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the data space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA for external datasets enhances its generalization ability, making it possible to gracefully handle previously unseen kinases and inhibitors. EPA showed promising potential when directly applied to virtual screening and off-target prediction, exhibiting its practicality in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.

Tài liệu tham khảo