A machine learning regression model for the screening and design of potential SARS-CoV-2 protease inhibitors
Tóm tắt
The widespread infection caused by the 2019 novel corona virus (SARS-CoV-2) has initiated global efforts to search for antiviral agents. Drug discovery is the first step in the development of commercially viable pharmaceutical products to deal with novel diseases. In an effort to accelerate the screening and drug discovery workflow for potential SARS-CoV-2 protease inhibitors, a machine learning model that can predict the binding free energies of compounds to the SARS-CoV-2 main protease is presented. The optimized multiple linear regression model, which was trained and tested on 226 natural compounds demonstrates reliable prediction performance (r2 test = 0.81, RMSE test = 0.43), while only requiring five topological descriptors. The externally validated model can help conserve and maximize available resources by limiting biological assays to compounds that yielded favorable outcomes from the model. The emergence of highly infectious diseases will always be a threat to human health and development, which is why the development of computational tools for rapid response is very important.
Từ khóa
Tài liệu tham khảo
Aanouz I, Belhassan A, El-Khatabi K, Lakhlifi T, El-ldrissi M, Bouachrine M (2020) Moroccan medicinal plants as inhibitors against SARS-CoV-2 main protease: computational investigations. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1758790
Amin SA, Banerjee S, Singh S, Qureshi IA, Gayen S, Jha T (2021) First structure–activity relationship analysis of SARS-CoV-2 virus main protease (Mpro) inhibitors: an endeavor on COVID-19 drug discovery. Mol Divers. https://doi.org/10.1007/s11030-020-10166-3
Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39:2887–2893. https://doi.org/10.1021/jm9602928
Callaway E (2020) The unequal scramble for coronavirus vaccines—by the numbers. Nature 584:506–507. https://doi.org/10.1038/d41586-020-02450-xv
Chen YC (2015) Beware of docking! Trends Pharmacol Sci 36:78–95. https://doi.org/10.1016/j.tips.2014.12.001
Das S, Sarmah S, Lyndem S, Singha RA (2020) An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1763201
De P, Bhayye S, Kumar V, Roy K (2020) In silico modeling for quick prediction of inhibitory activity against 3CLpro enzyme in SARS CoV diseases. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1821779
Farabi S, Ranjan Saha N, Anika Khan N, Hasanuzzaman Md (2020) Prediction of SARS-CoV-2 main protease inhibitors from several medicinal plant compounds by drug repurposing and molecular docking approach. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12440024.v1
Gates B (2020) Responding to Covid-19—a once-in-a-century pandemic? N Engl J Med 382:1677–1679. https://doi.org/10.1056/nejmp2003762
Gentile D, Patamia V, Scala A, Sciortino MT, Piperno A, Rescifina A (2020) Putative inhibitors of SARS-CoV-2 main protease from a library of marine natural products: a virtual screening and molecular modeling study. Mar Drugs 18:225. https://doi.org/10.3390/md18040225
Ghosh A, Chakraborty M, Chandra A, Alam MP (2021) Structure-activity relationship (SAR) and molecular dynamics study of withaferin-A fragment derivatives as potential therapeutic lead against main protease (M pro) of SARS-CoV-2. J Mol Model 27(3):1–17
Guha R (2007) Chemical informatics functionality in R. J Stat Softw 18:1–16. https://doi.org/10.18637/jss.v018.i05
Hebbali A (2017) Package ‘olsrr’. https://github.com/rsquaredacademy/olsrr
Islam R, Parves MR, Paul AS, Uddin N, Rahman MS, Mamun AA et al (2020) A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1761883
Jin Z, Du X, Xu Y, Deng Y, Liu M, Zhao Y, Zhang B, Li X, Zhang L, Peng C, Duan Y, Yu J, Wang L, Yang K, Liu F, Jiang R, Yang X, You T, Liu X et al (2020) Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582:289–293. https://doi.org/10.1038/s41586-020-2223-y
Kaur H, Nori H, Jenkins S, Caruana R, Wallach H, Wortman Vaughan J (2020) Interpreting interpretability: understanding data scientists' use of interpretability tools for machine learning. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–14. https://doi.org/10.1145/3313831.3376219
Khaerunnisa S, Kurniawan H, Awaluddin R, Suhartati S, Soetjipto S (2020) Potential inhibitor of COVID-19 main protease (Mpro) from several medicinal plant compounds by molecular docking study. Preprints. https://doi.org/10.20944/preprints202003.0226.v1
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05
Kumar V, Roy K (2020) Development of a simple, interpretable and easily transferable QSAR model for quick screening antiviral databases in search of novel 3C-like protease (3CLpro) enzyme inhibitors against SARS-CoV diseases. SAR QSAR Environ Res 31(7):511–526
Kupferschmidt K, Cohen J (2020) WHO launches global megatrial of the four most promising coronavirus treatments. Science. https://doi.org/10.1126/science.abb8497
Li G, de Clercq E (2020) Therapeutic options for the 2019 novel coronavirus (2019-nCoV). Nat Rev Drug Discov 19:149–150. https://doi.org/10.1038/d41573-020-00016-0
Liu S, Cao C, Li Z (1998) Approach to estimation and prediction for normal boiling point (NBP) of alkanes based on a novel molecular distance-edge (MDE) vector, λ. J Chem Inf Comput Sci 38:387–394. https://doi.org/10.1021/ci970109z
Mapari S, Camarda K (2020) Use of three-dimensional descriptors in molecular design for biologically active compounds. Curr Opin Chem Eng 27:60–64. https://doi.org/10.1016/j.coche.2019.11.011
Prasanth DSNBK, Murahari M, Chandramohan V, Panda SP, Atmakuri LR, Guntupalli C (2020) In silico identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1779129
Randic M (1984) On molecular identification numbers. J Chem Inf Comput Sci 24:164–175. https://doi.org/10.1021/ci00043a009
Rastelli G, Pellati F, Pinzi L, Gamberini MC (2020) Repositioning natural products in drug discovery. Molecules 25:1154. https://doi.org/10.3390/molecules25051154
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215. https://doi.org/10.1038/s42256-019-0048-x
Terry M (2021) Comparing COVID-19 vaccines: timelines, types and prices. BioSpace. https://www.biospace.com/article/comparing-covid-19-vaccines-pfizer-biontech-moderna-astrazeneca-oxford-j-and-j-russia-s-sputnik-v/
Ton AT, Gentile F, Hsing M, Ban F, Cherkasov A (2020) Rapid identification of potential inhibitors of SARS-CoV-2 main protease by deep docking of 1.3 billion compounds. Mol Inform. https://doi.org/10.1002/minf.202000028
Yan Y, Shen X, Cao Y, Zhang J, Wang Y, Cheng Y (2020) Discovery of anti-2019-nCoV agents from 38 Chinese patent drugs toward respiratory diseases via docking screening. Preprints 2020. https://doi.org/10.20944/preprints202002.0254.v2
Yang H, Xie W, Xue X, Yang K, Ma J, Liang W et al (2005) Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol 3:e324. https://doi.org/10.1371/journal.pbio.0030324
Yang Y, Chen H, Nilsson I, Muresan S, Engkvist O (2010) Investigation of the relationship between topology and selectivity for druglike molecules. J Med Chem 53:7709–7714. https://doi.org/10.1021/jm1008456
Zhang L, Mao H, Liu Q, Gani R (2020) Chemical product design—recent advances and perspectives. Curr Opin Chem Eng 27:22–34. https://doi.org/10.1016/j.coche.2019.10.005
