A machine learning regression model for the screening and design of potential SARS-CoV-2 protease inhibitors

Gabriela Ilona B. Janairo1, Derrick Ethelbhert C. Yu1, Jose Isagani B. Janairo2
1Chemistry Department, De La Salle University, Manila, Philippines
2Biology Department, De La Salle University, Manila, Philippines

Tóm tắt

The widespread infection caused by the 2019 novel corona virus (SARS-CoV-2) has initiated global efforts to search for antiviral agents. Drug discovery is the first step in the development of commercially viable pharmaceutical products to deal with novel diseases. In an effort to accelerate the screening and drug discovery workflow for potential SARS-CoV-2 protease inhibitors, a machine learning model that can predict the binding free energies of compounds to the SARS-CoV-2 main protease is presented. The optimized multiple linear regression model, which was trained and tested on 226 natural compounds demonstrates reliable prediction performance (r2 test = 0.81, RMSE test = 0.43), while only requiring five topological descriptors. The externally validated model can help conserve and maximize available resources by limiting biological assays to compounds that yielded favorable outcomes from the model. The emergence of highly infectious diseases will always be a threat to human health and development, which is why the development of computational tools for rapid response is very important.

Từ khóa


Tài liệu tham khảo

Aanouz I, Belhassan A, El-Khatabi K, Lakhlifi T, El-ldrissi M, Bouachrine M (2020) Moroccan medicinal plants as inhibitors against SARS-CoV-2 main protease: computational investigations. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1758790 Amin SA, Banerjee S, Singh S, Qureshi IA, Gayen S, Jha T (2021) First structure–activity relationship analysis of SARS-CoV-2 virus main protease (Mpro) inhibitors: an endeavor on COVID-19 drug discovery. Mol Divers. https://doi.org/10.1007/s11030-020-10166-3 Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39:2887–2893. https://doi.org/10.1021/jm9602928 Callaway E (2020) The unequal scramble for coronavirus vaccines—by the numbers. Nature 584:506–507. https://doi.org/10.1038/d41586-020-02450-xv Chen YC (2015) Beware of docking! Trends Pharmacol Sci 36:78–95. https://doi.org/10.1016/j.tips.2014.12.001 Das S, Sarmah S, Lyndem S, Singha RA (2020) An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1763201 De P, Bhayye S, Kumar V, Roy K (2020) In silico modeling for quick prediction of inhibitory activity against 3CLpro enzyme in SARS CoV diseases. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1821779 Farabi S, Ranjan Saha N, Anika Khan N, Hasanuzzaman Md (2020) Prediction of SARS-CoV-2 main protease inhibitors from several medicinal plant compounds by drug repurposing and molecular docking approach. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12440024.v1 Gates B (2020) Responding to Covid-19—a once-in-a-century pandemic? N Engl J Med 382:1677–1679. https://doi.org/10.1056/nejmp2003762 Gentile D, Patamia V, Scala A, Sciortino MT, Piperno A, Rescifina A (2020) Putative inhibitors of SARS-CoV-2 main protease from a library of marine natural products: a virtual screening and molecular modeling study. Mar Drugs 18:225. https://doi.org/10.3390/md18040225 Ghosh A, Chakraborty M, Chandra A, Alam MP (2021) Structure-activity relationship (SAR) and molecular dynamics study of withaferin-A fragment derivatives as potential therapeutic lead against main protease (M pro) of SARS-CoV-2. J Mol Model 27(3):1–17 Guha R (2007) Chemical informatics functionality in R. J Stat Softw 18:1–16. https://doi.org/10.18637/jss.v018.i05 Hebbali A (2017) Package ‘olsrr’. https://github.com/rsquaredacademy/olsrr Islam R, Parves MR, Paul AS, Uddin N, Rahman MS, Mamun AA et al (2020) A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1761883 Jin Z, Du X, Xu Y, Deng Y, Liu M, Zhao Y, Zhang B, Li X, Zhang L, Peng C, Duan Y, Yu J, Wang L, Yang K, Liu F, Jiang R, Yang X, You T, Liu X et al (2020) Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582:289–293. https://doi.org/10.1038/s41586-020-2223-y Kaur H, Nori H, Jenkins S, Caruana R, Wallach H, Wortman Vaughan J (2020) Interpreting interpretability: understanding data scientists' use of interpretability tools for machine learning. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–14. https://doi.org/10.1145/3313831.3376219 Khaerunnisa S, Kurniawan H, Awaluddin R, Suhartati S, Soetjipto S (2020) Potential inhibitor of COVID-19 main protease (Mpro) from several medicinal plant compounds by molecular docking study. Preprints. https://doi.org/10.20944/preprints202003.0226.v1 Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05 Kumar V, Roy K (2020) Development of a simple, interpretable and easily transferable QSAR model for quick screening antiviral databases in search of novel 3C-like protease (3CLpro) enzyme inhibitors against SARS-CoV diseases. SAR QSAR Environ Res 31(7):511–526 Kupferschmidt K, Cohen J (2020) WHO launches global megatrial of the four most promising coronavirus treatments. Science. https://doi.org/10.1126/science.abb8497 Li G, de Clercq E (2020) Therapeutic options for the 2019 novel coronavirus (2019-nCoV). Nat Rev Drug Discov 19:149–150. https://doi.org/10.1038/d41573-020-00016-0 Liu S, Cao C, Li Z (1998) Approach to estimation and prediction for normal boiling point (NBP) of alkanes based on a novel molecular distance-edge (MDE) vector, λ. J Chem Inf Comput Sci 38:387–394. https://doi.org/10.1021/ci970109z Mapari S, Camarda K (2020) Use of three-dimensional descriptors in molecular design for biologically active compounds. Curr Opin Chem Eng 27:60–64. https://doi.org/10.1016/j.coche.2019.11.011 Prasanth DSNBK, Murahari M, Chandramohan V, Panda SP, Atmakuri LR, Guntupalli C (2020) In silico identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.1779129 Randic M (1984) On molecular identification numbers. J Chem Inf Comput Sci 24:164–175. https://doi.org/10.1021/ci00043a009 Rastelli G, Pellati F, Pinzi L, Gamberini MC (2020) Repositioning natural products in drug discovery. Molecules 25:1154. https://doi.org/10.3390/molecules25051154 Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215. https://doi.org/10.1038/s42256-019-0048-x Terry M (2021) Comparing COVID-19 vaccines: timelines, types and prices. BioSpace. https://www.biospace.com/article/comparing-covid-19-vaccines-pfizer-biontech-moderna-astrazeneca-oxford-j-and-j-russia-s-sputnik-v/ Ton AT, Gentile F, Hsing M, Ban F, Cherkasov A (2020) Rapid identification of potential inhibitors of SARS-CoV-2 main protease by deep docking of 1.3 billion compounds. Mol Inform. https://doi.org/10.1002/minf.202000028 Yan Y, Shen X, Cao Y, Zhang J, Wang Y, Cheng Y (2020) Discovery of anti-2019-nCoV agents from 38 Chinese patent drugs toward respiratory diseases via docking screening. Preprints 2020. https://doi.org/10.20944/preprints202002.0254.v2 Yang H, Xie W, Xue X, Yang K, Ma J, Liang W et al (2005) Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol 3:e324. https://doi.org/10.1371/journal.pbio.0030324 Yang Y, Chen H, Nilsson I, Muresan S, Engkvist O (2010) Investigation of the relationship between topology and selectivity for druglike molecules. J Med Chem 53:7709–7714. https://doi.org/10.1021/jm1008456 Zhang L, Mao H, Liu Q, Gani R (2020) Chemical product design—recent advances and perspectives. Curr Opin Chem Eng 27:22–34. https://doi.org/10.1016/j.coche.2019.10.005