Learning protein-ligand binding affinity with atomic environment vectors

Rocco Meli1, Andrew Anighoro2, Michael J. Bodkin2, Garrett M. Morris3, Philip C. Biggin1
1Department of Biochemistry, University of Oxford, Oxford, UK
2Evotec (UK) Ltd, Abingdon, UK
3Department of Statistics, University of Oxford, Oxford, UK

Tóm tắt

Abstract

Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pKunits and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks, for which it has not been explicitly trained. Therefore, we show that the model can be combined with the classical scoring function AutoDock Vina in the context of$$\Delta$$Δ-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina,$$\Delta$$Δ-AEScore has an RMSE of 1.32 pKunits and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the docking and screening power of the underlying classical scoring function.

Từ khóa


Tài liệu tham khảo

Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC (2016) Accurate calculation of the absolute free energy of binding for drug molecules. Chem Sci 7(1):207–218. https://doi.org/10.1039/c5sc02678d

Aldeghi M, Bluck JP, Biggin PC (2018) Absolute alchemical free energy calculations for ligand binding: a beginner’s guide. In: Gore M, Jagtap UB (eds) Methods in molecular biology. Springer, New York, NY, pp 199–232

Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3(11):935–949. https://doi.org/10.1038/nrd1549

Boyles F, Deane CM, Morris GM (2019) Learning from the ligand: using ligand-based features to improve binding affinity prediction. Bioinformatics 36(3):758–764. https://doi.org/10.1093/bioinformatics/btz665

Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482. https://doi.org/10.1021/ci500731a

Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. Cohen. J Mol Biol 267(3):727–748. https://doi.org/10.1006/jmbi.1996.0897

Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL (2004) Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem 47(7):1750–1759. https://doi.org/10.1021/jm030644s

Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, Olson AJ (2009) AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 30(16):2785–2791. https://doi.org/10.1002/jcc.21256

Ravindranath PA, Forli S, Goodsell DS, Olson AJ, Sanner MF (2015) AutoDockFR: advances in protein-ligand docking with explicitly specified binding site flexibility. PLoS Comput Biol 11(12):1004586. https://doi.org/10.1371/journal.pcbi.1004586

Trott O, Olson AJ (2009) AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461. https://doi.org/10.1002/jcc.21334

Ballester PJ, Mitchell JBO (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9):1169–1175. https://doi.org/10.1093/bioinformatics/btq112

Ain QU, Aleksandrova A, Roessler FD, Ballester PJ (2015) Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput Mol Sci 5(6):405–424. https://doi.org/10.1002/wcms.1225

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: Where have you been? Where are you going to? J Med Chem 57(12):4977–5010. https://doi.org/10.1021/jm4004285

Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtalolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A (2020) QSAR without borders. Chem Soc Rev 49(11):3525–3564. https://doi.org/10.1039/d0cs00098a

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ eds. Advances in neural information processing systems. Curran Associates, Inc., Red Hook, vol 25, pp 1097–1105.

Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. arXiv:1303.5778 [cs]

Hinton G, Deng L, Yu D, Dahl G, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97. https://doi.org/10.1109/msp.2012.2205597

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539

Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957. https://doi.org/10.1021/acs.jcim.6b00740

McNutt AT, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, Sunseri J, Koes DR (2021) Gnina 1.0: molecular docking with deep learning. J Cheminform 13(1):43. https://doi.org/10.1186/s13321-021-00522-2

Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P (2018) Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34(21):3666–3674. https://doi.org/10.1093/bioinformatics/bty374

Jiménez J, Škalič M, Martínez-Rosell G, De Fabritiis G (2018) KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model 58(2):287–296. https://doi.org/10.1021/acs.jcim.7b00650

Cang Z, Mu L, Wei G-W (2018) Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput Biol 14(1):1005929. https://doi.org/10.1371/journal.pcbi.1005929

Hassan-Harrirou H, Zhang C, Lemmin T (2020) RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J Chem Inf Model 60(6):2791–2802. https://doi.org/10.1021/acs.jcim.0c00075

Feinberg EN, Sur D, Wu Z, Husic BE, Mai H, Li Y, Sun S, Yang J, Ramsundar B, Pande VS (2018) PotentialNet for molecular property prediction. ACS Cent Sci 4(11):1520–1530. https://doi.org/10.1021/acscentsci.8b00507

Behler J, Parrinello M (2007) Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys Rev Lett 98(14):146401. https://doi.org/10.1103/physrevlett.98.146401

Smith JS, Isayev O, Roitberg AE (2017) ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem Sci 8(4):3192–3203. https://doi.org/10.1039/c6sc05720a

Bartók AP, Payne MC, Kondor R, Csányi G (2010) Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys Rev Lett 104(13):136403. https://doi.org/10.1103/physrevlett.104.136403

Bartók AP, Kondor R, Csányi G (2013) On representing chemical environments. Phys. Rev. B 87:184115. https://doi.org/10.1103/PhysRevB.87.184115

Bartók AP, De S, Poelking C, Bernstein N, Kermode JR, Csányi G, Ceriotti M (2017) Machine learning unifies the modeling of materials and molecules. Sci Adv 3(12):1701816. https://doi.org/10.1126/sciadv.1701816

Behler J (2011) Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. Phys Chem Chem Phys 13(40):17930. https://doi.org/10.1039/c1cp21668f

Smith JS, Nebgen B, Lubbers N, Isayev O, Roitberg AE (2018) Less is more: Sampling chemical space with active learning. J. Chem. Phys. 148(24):241733. https://doi.org/10.1063/1.5023802

Gao X, Ramezanghorbani F, Isayev O, Smith JS, Roitberg AE (2020) TorchANI: a free and open source PyTorch-based deep learning implementation of the ANI neural network potentials. J Chem Inf Model 60(7):3408–3415. https://doi.org/10.1021/acs.jcim.0c00451

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style high-performance deep learning library, In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, (eds) Advances in neural information processing systems. Curran Associates, Inc., vol 32, pp. 8024–8035. http://papers.neurips.cc/paper/9015-ytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Smith JS, Nebgen BT, Zubatyuk R, Lubbers N, Devereux C, Barros K, Tretiak S, Isayev O, Roitberg AE (2019) Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat Commun 10(1):2903. https://doi.org/10.1038/s41467-019-10827-4

Li Y, Liu Z, Li J, Han L, Liu J, Zhao Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 1. compilation of the test set. J Chem Inf Model 54(6):1700–1716. https://doi.org/10.1021/ci500080q

Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R (2017) Forging the basis for developing protein-ligand interaction scoring functions. Acc Chem Res 50(2):302–309. https://doi.org/10.1021/acs.accounts.6b00491

Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2. evaluation methods and general results. J. Chem. Inf. Model. 54(6):1717–1736. https://doi.org/10.1021/ci500081m (PMID: 24708446)

Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R (2018) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59(2):895–913. https://doi.org/10.1021/acs.jcim.8b00545

Su M, Feng G, Liu Z, Li Y, Wang R (2020) Tapping on the black box: how is the scoring power of a machine-learning scoring function dependent on the training set? J Chem Inf Model 60(3):1122–1136. https://doi.org/10.1021/acs.jcim.9b00714

O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. JCheminform 3(1):33. https://doi.org/10.1186/1758-2946-3-33

Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O (2011) MDAnalysis: a toolkit for the analysis of molecular dynamics simulations. J Comput Chem 32(10):2319–2327. https://doi.org/10.1002/jcc.21787

Gowers RJ, Linke M, Barnoud J, Reddy TJE, Melo MN, Seyler SL, Domański J, Dotson, D.L., Buchoux, S., Kenney, I.M., Beckstein, O (2016) MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. In: Sebastian Benthall, Scott Rostrup (eds.) Proceedings of the 15th Python in science conference, pp. 98–105 . https://doi.org/10.25080/Majora-629e541a-00e

O’Boyle NM, Morley C, Hutchison GR (2008) Pybel: A python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2(1):5. https://doi.org/10.1186/1752-153x-2-5

Banci L (2003) Molecular dynamics simulations of metalloproteins. Curr Opin Chem Biol 7(4):524. https://doi.org/10.1016/s1367-5931(03)00087-5

Çınaroğlu SS, Timuçin E (2019) Comparative assessment of seven docking programs on a nonredundant metalloprotein subset of the PDBbind refined. J Chem Inf Model 59(9):3846–3859. https://doi.org/10.1021/acs.jcim.9b00346

Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA (2015) Big data meets quantum chemistry approximations: the $$\delta$$-machine learning approach. J Chem Theory Comput 11(5):2087–2096. https://doi.org/10.1021/acs.jctc.5b00099

Wang C, Zhang Y (2016) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38(3):169–177. https://doi.org/10.1002/jcc.24667

Lu J, Hou X, Wang C, Zhang Y (2019) Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions. J Chem Inf Model 59(11):4540–4549. https://doi.org/10.1021/acs.jcim.9b00645

Ericksen SS, Wu H, Zhang H, Michael LA, Newton MA, Hoffmann FM, Wildman SA (2017) Machine learning consensus scoring improves performance across targets in structure-based virtual screening. J Chem Inf Model 57(7):1579–1590. https://doi.org/10.1021/acs.jcim.7b00153

Francoeur PG, Masuda T, Sunseri J, Jia A, Iovanisci RB, Snyder I, Koes DR (2020) Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J Chem Inf Model 60(9):4200–4215

Van Rossum G, Drake FL (2009) Python 3 reference manual. CreateSpace, Scotts Valley, CA

O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3(1):33. https://doi.org/10.1186/1758-2946-3-33

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, Vijaykumar A, Bardelli AP, Rothberg A, Hilboll A, Kloeckner A, Scopatz A, Lee A, Rokem A, Woods CN, Fulton C, Masson C, Häggström C, Fitzgerald C, Nicholson DA, Hagen DR, Pasechnik DV, Olivetti E, Martin E, Wieser E, Silva F, Lenders F, Wilhelm F, Young G, Price GA, Ingold G-L, Allen GE, Lee GR, Audren H, Probst I, Dietrich JP, Silterra J, Webber JT, Slavič J, Nothman J, Buchner J, Kulick J, Schönberger JL, de Miranda Cardoso J, Reimer J, Harrington J, Rodríguez JLC, Nunez-Iglesias J, Kuczynski J, Tritz K, Thoma M, Newville M, Kümmerer M, Bolingbroke M, Tartre M, Pak M, Smith NJ, Nowaczyk N, Shebanov N, Pavlyk O, Brodtkorb PA, Lee P, McGibbon RT, Feldbauer R, Lewis S, Tygier S, Sievert S, Vigna S, Peterson S, More S, Pudlik T, Oshima T, Pingel TJ, Robitaille TP, Spura T, Jones TR, Cera T, Leslie T, Zito T, Krauss T, Upadhyay U, Halchenko YO, Vázquez-Baeza Y, Contributors S (2020) SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat Methods 17(3):261–272. https://doi.org/10.1038/s41592-019-0686-2

McKinney W (2010) Data Structures for Statistical Computing in Python. In: van der Walt S, Millman J (eds.) Proceedings of the 9th Python in science conference, pp. 56–61. https://doi.org/10.25080/Majora-92bf1922-00a

Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/mcse.2007.55

Waskom M (2020) The seaborn development team: Mwaskom/seaborn. Zenodo. https://doi.org/10.5281/zenodo.592845

Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2015) Scikit-learn. GetMobile: Mobile Comp and Comm 19(1):29–33. https://doi.org/10.1145/2786984.2786995

Krekel H, Oliveira B, Pfannschmidt R, Bruynooghe F, Laugher B, Bruhin F (2004) pytest 6.0. https://github.com/pytest-dev/pytest

Kingma DP, Ba J (2014) ADAM: a method for stochastic optimization. arXiv:1412.6980

Ross GA, Morris GM, Biggin PC (2013) One size does not fit all: the limits of structure-based models in drug discovery. J Chem Theory Comput 9(9):4266–4274. https://doi.org/10.1021/ct4004228

Pearlman DA, Charifson PS (2001) Are free energy calculations useful in practice? a comparison with rapid scoring functions for comparison with rapid scoring functions the p38 MAP kinase protein system$$\dagger$$. J Med Chem 44(21):3417–3423. https://doi.org/10.1021/jm0100279

Kwon Y, Shin W-H, Ko J, Lee J (2020) AK-Score: accurate protein-ligand binding affinity prediction using the ensemble of 3D-convolutional neural network. Int J Mol Sci. https://doi.org/10.26434/chemrxiv.12015045.v1

Nguyen DD, Wei G-W (2019) AGL-score: algebraic graph learning score for protein-ligand binding scoring, ranking, docking, and screening. J Chem Inf Model 59(7):3291–3304. https://doi.org/10.1021/acs.jcim.9b00334

Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49(4):1079–1093. https://doi.org/10.1021/ci9000053

Sieg J, Flachsenberg F, Rarey M (2019) In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J Chem Inf Model 59(3):947–961. https://doi.org/10.1021/acs.jcim.8b00712

Wallach I, Heifets A (2018) Most ligand-based classification benchmarks reward memorization rather than generalization. J Chem Inf Model 58(5):916–932. https://doi.org/10.1021/acs.jcim.7b00403

Chen L, Cruz A, Ramsey S, Dickson CJ, Duca JS, Hornak V, Koes DR, Kurtzman T (2019) Hidden bias in the DUD-e dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS One 14(8):0220113. https://doi.org/10.1371/journal.pone.0220113

Musil F, Grisafi A, Bartók AP, Ortner C, Csányi G, Ceriotti M (2021) Physics-inspired structural representations for molecules and materials . arXiv:2101.04673

McCorkindale W, Poelking C, Lee AA (2020) Investigating 3D Atomic Environments for Enhanced QSAR. arXiv:2010.12857

Hochuli J, Helbling A, Skaist T, Ragoza M, Koes DR (2018) Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model 84:96–108. https://doi.org/10.1016/j.jmgm.2018.06.005

Ragoza M, Turner L, Koes DR (2017) Ligand pose optimization with atomic grid-based convolutional neural networks . arXiv:1710.07400

Durrant JD, McCammon JA (2011) NNScore 2.0: a neural-network Receptor-Ligand scoring function. J Chem Inf Model 51(11):2897–2903. https://doi.org/10.1021/ci2003889

Zhu F, Zhang X, Allen JE, Jones D, Lightstone FC (2020) Binding affinity prediction by pairwise function based on neural network. J Chem Inf Model 60(6):2766–2772. https://doi.org/10.1021/acs.jcim.0c00026

Afifi K, Al-Sadek AF (2018) Improving classical scoring functions using random forest: the non-additivity of free energy terms’ contributions in binding. Chem Biol Drug Des 92(2):1429–1434. https://doi.org/10.1111/cbdd.13206

Li Y, Rezaei MA, Li C, Li X, Wu D (2019) DeepAtom: a Framework for protein-ligand binding affinity prediction . arXiv:1912.00318

PDBbind-CN Database. http://pdbbind-cn.org/

Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53(8):1893–1904. https://doi.org/10.1021/ci300604z

Scantlebury J, Brown N, Von Delft F, Deane CM (2020) Data set augmentation allows deep learning-based virtual screening to better generalize to unseen target classes and highlight important binding interactions. J Chem Inf Model 60(8):3722–3730. https://doi.org/10.1021/acs.jcim.0c00263

Genheden S, Ryde U (2015) The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov 10(5):449–461. https://doi.org/10.1517/17460441.2015.1032936

Cole DJ, Mones L, Csányi G (2020) A machine learning based intramolecular potential for a flexible organic molecule. Faraday Discuss. https://doi.org/10.1039/d0fd00028k

Lahey S-LJ, Rowley CN (2020) Simulating protein-ligand binding with neural network potentials. Chem Sci 11(9):2362–2368. https://doi.org/10.1039/c9sc06017k

Rufa DA, Bruce Macdonald HE, Fass J, Wieder M, Grinaway PB, Roitberg AE, Isayev O, Chodera JD (2020) Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning/molecular mechanics potentials. bioRxiv. https://doi.org/10.1101/2020.07.29.227959