Pocket Crafter: a 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery

Lingling Shen1, Jian Fang1, Lulu Liu1, Fei Yang1, Jeremy L. Jenkins1, Peter S. Kutchukian1, He Wang1
1Novartis Biomedical Research, Cambridge, USA

Tóm tắt

We present a user-friendly molecular generative pipeline called Pocket Crafter, specifically designed to facilitate hit finding activity in the drug discovery process. This workflow utilized a three-dimensional (3D) generative modeling method Pocket2Mol, for the de novo design of molecules in spatial perspective for the targeted protein structures, followed by filters for chemical-physical properties and drug-likeness, structure–activity relationship analysis, and clustering to generate top virtual hit scaffolds. In our WDR5 case study, we acquired a focused set of 2029 compounds after a targeted searching within Novartis archived library based on the virtual scaffolds. Subsequently, we experimentally profiled these compounds, resulting in a novel chemical scaffold series that demonstrated activity in biochemical and biophysical assays. Pocket Crafter successfully prototyped an effective end-to-end 3D generative chemistry-based workflow for the exploration of new chemical scaffolds, which represents a promising approach in early drug discovery for hit identification. Hit identification is a time-consuming and costly step in drug discovery process. Here we developed a molecule generative pipeline called Pocket Crafter that can speed up this process greatly. This workflow utilized 3D generative modeling method Pocket2Mol for the de novo design of molecules in spatial perspective for the target and applies filters for chemical-physical properties and drug-likeness to generate top virtual hits with further structure–activity relationship analysis and clustering to output a focused set of hit compounds, which led to the success of hit finding as it showed in our demo case.

Từ khóa


Tài liệu tham khảo

Hughes J, Rees S, Kalindjian S, Philpott K (2011) Principles of early drug discovery. Br J Pharmacol 162:1239–1249. https://doi.org/10.1111/j.1476-5381.2010.01127.x Vamathevan J, Clark D, Czodrowski P et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463–477. https://doi.org/10.1038/s41573-019-0024-5 Gupta R, Srivastava D, Sahu M et al (2021) Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 25:1315–1360. https://doi.org/10.1007/s11030-021-10217-3 Batool M, Ahmad B, Choi S (2019) A structure-based drug discovery paradigm. Int J Mol Sci 20:2783. https://doi.org/10.3390/ijms20112783 Sanchez-Lengeling B (1979) Aspuru-Guzik A (2018) Inverse molecular design using machine learning: generative models for matter engineering. Science 361:360–365. https://doi.org/10.1126/science.aat2663 Winter R, Montanari F, Steffen A et al (2019) Efficient multi-objective molecular optimization in a continuous latent space. Chem Sci 10:8016–8024. https://doi.org/10.1039/C9SC01928F Arús-Pous J, Johansson SV, Prykhodko O et al (2019) Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 11:71. https://doi.org/10.1186/s13321-019-0393-0 Bjerrum EJ (2017) SMILES Enumeration as data augmentation for neural network modeling of molecules. arXiv:170307076. https://doi.org/10.48550/arXiv170307076 Li Y, Zhang L, Liu Z (2018) Multi-objective de novo drug design with conditional graph generative model. J Cheminform 10:33. https://doi.org/10.1186/s13321-018-0287-6 Bort W, Baskin II, Gimadiev T et al (2021) Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 11:3178. https://doi.org/10.1038/s41598-021-81889-y Zhavoronkov A, Ivanenkov YA, Aliper A et al (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/s41587-019-0224-x Blaschke T, Olivecrona M, Engkvist O et al (2018) Application of generative autoencoder in De Novo molecular design. Mol Inform. https://doi.org/10.1002/minf.201700123 Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268–276. https://doi.org/10.1021/acscentsci.7b00572 Valueva MV, Nagornov NN, Lyakhov PA et al (2020) Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math Comput Simul 177:232–243. https://doi.org/10.1016/j.matcom.2020.04.031 Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A (2017) An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv. https://doi.org/10.26434/chemrxiv.5309668.v3 Prykhodko O, Johansson SV, Kotsias P-C et al (2019) A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 11:74. https://doi.org/10.1186/s13321-019-0397-9 Kipf TN, Welling M (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv: 160902907. 10. 48550/arXiv160902907 Peng X, Luo S, Guan J, et al (2022) Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, 162. https://proceedings.mlr.press/v162/peng22b.html, pp 17644–17655 Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for bayesian inference, 2nd edn. Chapman & Hall/CRC, London Arkin MR, Tang Y, Wells JA (2014) Small-molecule inhibitors of protein-protein interactions: progressing toward the reality. Chem Biol 21:1102–1114. https://doi.org/10.1016/j.chembiol.2014.09.001 Mabonga L, Kappo AP (2019) Protein-protein interaction modulators: advances, successes and remaining challenges. Biophys Rev 11:559–581. https://doi.org/10.1007/s12551-019-00570-x Xu C, Min J (2011) Structure and function of WD40 domain proteins. Protein Cell 2:202–214. https://doi.org/10.1007/s13238-011-1018-1 Schapira M, Tyers M, Torrent M, Arrowsmith CH (2017) WD40 repeat domain proteins: a novel target class? Nat Rev Drug Discov 16:773–786. https://doi.org/10.1038/nrd.2017.179 Guarnaccia A, Tansey W (2018) Moonlighting with WDR5: a cellular multitasker. J Clin Med 7:21. https://doi.org/10.3390/jcm7020021 Chen X, Xu J, Wang X et al (2021) Targeting WD repeat-containing protein 5 (WDR5): a medicinal chemistry perspective. J Med Chem 64:10537–10556. https://doi.org/10.1021/acs.jmedchem.1c00037 Thomas LR, Wang Q, Grieb BC et al (2015) Interaction with WDR5 promotes target gene recognition and tumorigenesis by MYC. Mol Cell 58:440–452. https://doi.org/10.1016/j.molcel.2015.02.028 Thomas LR, Adams CM, Wang J et al (2019) Interaction of the oncoprotein transcription factor MYC with its chromatin cofactor WDR5 is essential for tumor maintenance. Proc Natl Acad Sci 116:25260–25268. https://doi.org/10.1073/pnas.1910391116 Mullard A (2022) Climbing cancer’s MYC mountain. Nat Rev Drug Discov 21:865–867. https://doi.org/10.1038/d41573-022-00192-1 Molecular Operating Environment release 2022.02 (2023). Chemical computing group ULC; Montreal, QC, Canada. https://www.chemcomp.com/index.htm Pipeline Pilot release 2020 (2023). BIOVIA, dassault systèmes, San Diego. https://www.3ds.com/products-services/biovia/products/data-science/pipeline-pilot/ Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:8. https://doi.org/10.1186/1758-2946-1-8 Bickerton GR, Paolini GV, Besnard J et al (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90–98. https://doi.org/10.1038/nchem.1243 Wildman SA, Crippen GM (1999) Prediction of Physicochemical Parameters by Atomic Contributions. J Chem Inf Comput Sci 39:868–873. https://doi.org/10.1021/ci990307l Landrum G (2023) QED module in RDKit: Open-source cheminformatics software. http://www.rdkit.org. Accessed 1 Mar 2023. Bemis GW, Murcko MA (1996) The properties of known drugs. 1 Molecular frameworks. J Med Chem 39:2887–2893. https://doi.org/10.1021/jm9602928 Naïm M, Bhat S, Rankin KN et al (2007) Solvated interaction energy (SIE) for scoring protein−ligand binding affinities. 1. Exploring the parameter space. J Chem Inf Model 47:122–133. https://doi.org/10.1021/ci600406v Fisher RA (1922) On the interpretation of χ2 from contingency tables, and the calculation of P. J Roy Stat Soc 85:87. https://doi.org/10.2307/2340521 Fisher RA (1954) Statistical methods for research workers. Springer, Berlin Agresti A (1992) A survey of exact inference for contingency tables. Stat Sci 7:131–153. https://doi.org/10.1214/ss/1177011454 Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555:604–610. https://doi.org/10.1038/nature25978 Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nat Mach Intell 2:573–584. https://doi.org/10.1038/s42256-020-00236-4 Shivanyuk A, Ryabukhin S, Bogolyubsky A et al (2007) Enamine REAL database: making chemical diversity real. Chem Today 25:58–59 Saldívar-González FI, Huerta-García CS, Medina-Franco JL (2020) Chemoinformatics-based enumeration of chemical libraries: a tutorial. J Cheminform 12:64. https://doi.org/10.1186/s13321-020-00466-z ROCS v3.5.1.2 (2022), OpenEye scientific software, Santa Fe, NM. http://www.eyesopen.com Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50:74–82. https://doi.org/10.1021/jm0603365 Venhorst J, Núñez S, Terpstra JW, Kruse CG (2008) Assessment of Scaffold hopping efficiency by use of molecular interaction fingerprints. J Med Chem 51:3222–3229. https://doi.org/10.1021/jm8001058 Sheridan RP, McGaughey GB, Cornell WD (2008) Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results. J Comput Aided Mol Des 22:257–265. https://doi.org/10.1007/s10822-008-9168-9 Rush TS, Grant JA, Mosyak L, Nicholls A (2005) A shape-based 3-D Scaffold hopping method and its application to a bacterial protein−protein interaction. J Med Chem 48:1489–1495. https://doi.org/10.1021/jm040163o Martin EJ, Polyakov VR, Zhu X-W et al (2019) All-assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC 50 s for 8558 Novartis assays. J Chem Inf Model 59:4450–4459. https://doi.org/10.1021/acs.jcim.9b00375 Ding J, Li G, Liu H et al (2023) Discovery of potent small-molecule inhibitors of WDR5-MYC interaction. ACS Chem Biol 18:34–40. https://doi.org/10.1021/acschembio.2c00843 Macdonald JD, Chacón Simon S, Han C et al (2019) Discovery and optimization of salicylic acid-derived sulfonamide inhibitors of the WD repeat-containing protein 5–MYC protein-protein interaction. J Med Chem 62:11232–11259. https://doi.org/10.1021/acs.jmedchem.9b01411 Chacón Simon S, Wang F, Thomas LR et al (2020) Discovery of WD repeat-containing protein 5 (WDR5)–myc inhibitors using fragment-based methods and structure-based design. J Med Chem 63:4315–4333. https://doi.org/10.1021/acs.jmedchem.0c00224 Ding J, Liu L, Chiang Y-L et al (2023) Discovery and structure-based design of inhibitors of the WD repeat-containing protein 5 (WDR5)–MYC interaction. J Med Chem 66:8310–8323. https://doi.org/10.1021/acs.jmedchem.3c00787 Schuffenhauer A, Schneider N, Hintermann S et al (2020) Evolution of Novartis’ small molecule screening deck design. J Med Chem 63:14425–14447. https://doi.org/10.1021/acs.jmedchem.0c01332 Foulkes DM, Byrne DP, Yeung W et al (2018) Covalent inhibitors of EGFR family protein kinases induce degradation of human Tribbles 2 (TRIB2) pseudokinase in cancer cells. Sci Signal 11:eaat7951. https://doi.org/10.1126/scisignal.aat7951