Pocket Crafter: a 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery
Tóm tắt
We present a user-friendly molecular generative pipeline called Pocket Crafter, specifically designed to facilitate hit finding activity in the drug discovery process. This workflow utilized a three-dimensional (3D) generative modeling method Pocket2Mol, for the de novo design of molecules in spatial perspective for the targeted protein structures, followed by filters for chemical-physical properties and drug-likeness, structure–activity relationship analysis, and clustering to generate top virtual hit scaffolds. In our WDR5 case study, we acquired a focused set of 2029 compounds after a targeted searching within Novartis archived library based on the virtual scaffolds. Subsequently, we experimentally profiled these compounds, resulting in a novel chemical scaffold series that demonstrated activity in biochemical and biophysical assays. Pocket Crafter successfully prototyped an effective end-to-end 3D generative chemistry-based workflow for the exploration of new chemical scaffolds, which represents a promising approach in early drug discovery for hit identification. Hit identification is a time-consuming and costly step in drug discovery process. Here we developed a molecule generative pipeline called Pocket Crafter that can speed up this process greatly. This workflow utilized 3D generative modeling method Pocket2Mol for the de novo design of molecules in spatial perspective for the target and applies filters for chemical-physical properties and drug-likeness to generate top virtual hits with further structure–activity relationship analysis and clustering to output a focused set of hit compounds, which led to the success of hit finding as it showed in our demo case.
Từ khóa
Tài liệu tham khảo
Hughes J, Rees S, Kalindjian S, Philpott K (2011) Principles of early drug discovery. Br J Pharmacol 162:1239–1249. https://doi.org/10.1111/j.1476-5381.2010.01127.x
Vamathevan J, Clark D, Czodrowski P et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463–477. https://doi.org/10.1038/s41573-019-0024-5
Gupta R, Srivastava D, Sahu M et al (2021) Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 25:1315–1360. https://doi.org/10.1007/s11030-021-10217-3
Batool M, Ahmad B, Choi S (2019) A structure-based drug discovery paradigm. Int J Mol Sci 20:2783. https://doi.org/10.3390/ijms20112783
Sanchez-Lengeling B (1979) Aspuru-Guzik A (2018) Inverse molecular design using machine learning: generative models for matter engineering. Science 361:360–365. https://doi.org/10.1126/science.aat2663
Winter R, Montanari F, Steffen A et al (2019) Efficient multi-objective molecular optimization in a continuous latent space. Chem Sci 10:8016–8024. https://doi.org/10.1039/C9SC01928F
Arús-Pous J, Johansson SV, Prykhodko O et al (2019) Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 11:71. https://doi.org/10.1186/s13321-019-0393-0
Bjerrum EJ (2017) SMILES Enumeration as data augmentation for neural network modeling of molecules. arXiv:170307076. https://doi.org/10.48550/arXiv170307076
Li Y, Zhang L, Liu Z (2018) Multi-objective de novo drug design with conditional graph generative model. J Cheminform 10:33. https://doi.org/10.1186/s13321-018-0287-6
Bort W, Baskin II, Gimadiev T et al (2021) Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 11:3178. https://doi.org/10.1038/s41598-021-81889-y
Zhavoronkov A, Ivanenkov YA, Aliper A et al (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/s41587-019-0224-x
Blaschke T, Olivecrona M, Engkvist O et al (2018) Application of generative autoencoder in De Novo molecular design. Mol Inform. https://doi.org/10.1002/minf.201700123
Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268–276. https://doi.org/10.1021/acscentsci.7b00572
Valueva MV, Nagornov NN, Lyakhov PA et al (2020) Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math Comput Simul 177:232–243. https://doi.org/10.1016/j.matcom.2020.04.031
Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A (2017) An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv. https://doi.org/10.26434/chemrxiv.5309668.v3
Prykhodko O, Johansson SV, Kotsias P-C et al (2019) A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 11:74. https://doi.org/10.1186/s13321-019-0397-9
Kipf TN, Welling M (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv: 160902907. 10. 48550/arXiv160902907
Peng X, Luo S, Guan J, et al (2022) Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, 162. https://proceedings.mlr.press/v162/peng22b.html, pp 17644–17655
Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for bayesian inference, 2nd edn. Chapman & Hall/CRC, London
Arkin MR, Tang Y, Wells JA (2014) Small-molecule inhibitors of protein-protein interactions: progressing toward the reality. Chem Biol 21:1102–1114. https://doi.org/10.1016/j.chembiol.2014.09.001
Mabonga L, Kappo AP (2019) Protein-protein interaction modulators: advances, successes and remaining challenges. Biophys Rev 11:559–581. https://doi.org/10.1007/s12551-019-00570-x
Xu C, Min J (2011) Structure and function of WD40 domain proteins. Protein Cell 2:202–214. https://doi.org/10.1007/s13238-011-1018-1
Schapira M, Tyers M, Torrent M, Arrowsmith CH (2017) WD40 repeat domain proteins: a novel target class? Nat Rev Drug Discov 16:773–786. https://doi.org/10.1038/nrd.2017.179
Guarnaccia A, Tansey W (2018) Moonlighting with WDR5: a cellular multitasker. J Clin Med 7:21. https://doi.org/10.3390/jcm7020021
Chen X, Xu J, Wang X et al (2021) Targeting WD repeat-containing protein 5 (WDR5): a medicinal chemistry perspective. J Med Chem 64:10537–10556. https://doi.org/10.1021/acs.jmedchem.1c00037
Thomas LR, Wang Q, Grieb BC et al (2015) Interaction with WDR5 promotes target gene recognition and tumorigenesis by MYC. Mol Cell 58:440–452. https://doi.org/10.1016/j.molcel.2015.02.028
Thomas LR, Adams CM, Wang J et al (2019) Interaction of the oncoprotein transcription factor MYC with its chromatin cofactor WDR5 is essential for tumor maintenance. Proc Natl Acad Sci 116:25260–25268. https://doi.org/10.1073/pnas.1910391116
Mullard A (2022) Climbing cancer’s MYC mountain. Nat Rev Drug Discov 21:865–867. https://doi.org/10.1038/d41573-022-00192-1
Molecular Operating Environment release 2022.02 (2023). Chemical computing group ULC; Montreal, QC, Canada. https://www.chemcomp.com/index.htm
Pipeline Pilot release 2020 (2023). BIOVIA, dassault systèmes, San Diego. https://www.3ds.com/products-services/biovia/products/data-science/pipeline-pilot/
Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:8. https://doi.org/10.1186/1758-2946-1-8
Bickerton GR, Paolini GV, Besnard J et al (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90–98. https://doi.org/10.1038/nchem.1243
Wildman SA, Crippen GM (1999) Prediction of Physicochemical Parameters by Atomic Contributions. J Chem Inf Comput Sci 39:868–873. https://doi.org/10.1021/ci990307l
Landrum G (2023) QED module in RDKit: Open-source cheminformatics software. http://www.rdkit.org. Accessed 1 Mar 2023.
Bemis GW, Murcko MA (1996) The properties of known drugs. 1 Molecular frameworks. J Med Chem 39:2887–2893. https://doi.org/10.1021/jm9602928
Naïm M, Bhat S, Rankin KN et al (2007) Solvated interaction energy (SIE) for scoring protein−ligand binding affinities. 1. Exploring the parameter space. J Chem Inf Model 47:122–133. https://doi.org/10.1021/ci600406v
Fisher RA (1922) On the interpretation of χ2 from contingency tables, and the calculation of P. J Roy Stat Soc 85:87. https://doi.org/10.2307/2340521
Fisher RA (1954) Statistical methods for research workers. Springer, Berlin
Agresti A (1992) A survey of exact inference for contingency tables. Stat Sci 7:131–153. https://doi.org/10.1214/ss/1177011454
Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555:604–610. https://doi.org/10.1038/nature25978
Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nat Mach Intell 2:573–584. https://doi.org/10.1038/s42256-020-00236-4
Shivanyuk A, Ryabukhin S, Bogolyubsky A et al (2007) Enamine REAL database: making chemical diversity real. Chem Today 25:58–59
Saldívar-González FI, Huerta-García CS, Medina-Franco JL (2020) Chemoinformatics-based enumeration of chemical libraries: a tutorial. J Cheminform 12:64. https://doi.org/10.1186/s13321-020-00466-z
ROCS v3.5.1.2 (2022), OpenEye scientific software, Santa Fe, NM. http://www.eyesopen.com
Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50:74–82. https://doi.org/10.1021/jm0603365
Venhorst J, Núñez S, Terpstra JW, Kruse CG (2008) Assessment of Scaffold hopping efficiency by use of molecular interaction fingerprints. J Med Chem 51:3222–3229. https://doi.org/10.1021/jm8001058
Sheridan RP, McGaughey GB, Cornell WD (2008) Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results. J Comput Aided Mol Des 22:257–265. https://doi.org/10.1007/s10822-008-9168-9
Rush TS, Grant JA, Mosyak L, Nicholls A (2005) A shape-based 3-D Scaffold hopping method and its application to a bacterial protein−protein interaction. J Med Chem 48:1489–1495. https://doi.org/10.1021/jm040163o
Martin EJ, Polyakov VR, Zhu X-W et al (2019) All-assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC 50 s for 8558 Novartis assays. J Chem Inf Model 59:4450–4459. https://doi.org/10.1021/acs.jcim.9b00375
Ding J, Li G, Liu H et al (2023) Discovery of potent small-molecule inhibitors of WDR5-MYC interaction. ACS Chem Biol 18:34–40. https://doi.org/10.1021/acschembio.2c00843
Macdonald JD, Chacón Simon S, Han C et al (2019) Discovery and optimization of salicylic acid-derived sulfonamide inhibitors of the WD repeat-containing protein 5–MYC protein-protein interaction. J Med Chem 62:11232–11259. https://doi.org/10.1021/acs.jmedchem.9b01411
Chacón Simon S, Wang F, Thomas LR et al (2020) Discovery of WD repeat-containing protein 5 (WDR5)–myc inhibitors using fragment-based methods and structure-based design. J Med Chem 63:4315–4333. https://doi.org/10.1021/acs.jmedchem.0c00224
Ding J, Liu L, Chiang Y-L et al (2023) Discovery and structure-based design of inhibitors of the WD repeat-containing protein 5 (WDR5)–MYC interaction. J Med Chem 66:8310–8323. https://doi.org/10.1021/acs.jmedchem.3c00787
Schuffenhauer A, Schneider N, Hintermann S et al (2020) Evolution of Novartis’ small molecule screening deck design. J Med Chem 63:14425–14447. https://doi.org/10.1021/acs.jmedchem.0c01332
Foulkes DM, Byrne DP, Yeung W et al (2018) Covalent inhibitors of EGFR family protein kinases induce degradation of human Tribbles 2 (TRIB2) pseudokinase in cancer cells. Sci Signal 11:eaat7951. https://doi.org/10.1126/scisignal.aat7951