In silico structural and functional characterization of hypothetical proteins from Monkeypox virus

Kajal Gupta1
1Department of Biochemistry, Daulat Ram College, University of Delhi, Delhi, India

Tóm tắt

Monkeypox virus is a small, double-stranded DNA virus that causes a zoonotic disease called Monkeypox. The disease has spread from Central and West Africa to Europe and North America and created havoc in some countries all around the world. The complete genome of the Monkeypox virus Zaire-96-I-16 has been sequenced. The viral strain contains 191 protein-coding genes with 30 hypothetical proteins whose structure and function are still unknown. Hence, it is imperative to functionally and structurally annotate the hypothetical proteins to get a clear understanding of novel drug and vaccine targets. The purpose of the study was to characterize the 30 hypothetical proteins through the determination of physicochemical properties, subcellular characterization, function prediction, functional domain prediction, structure prediction, structure validation, structural analysis, and ligand binding sites using Bioinformatics tools. The structural and functional analysis of 30 hypothetical proteins was carried out in this research. Out of these, 3 hypothetical functions (Q8V547, Q8V4S4, Q8V4Q4) could be assigned a structure and function confidently. Q8V547 protein in Monkeypox virus Zaire-96-I-16 is predicted as an apoptosis regulator which promotes viral replication in the infected host cell. Q8V4S4 is predicted as a nuclease responsible for viral evasion in the host. The function of Q8V4Q4 is to prevent host NF-kappa-B activation in response to pro-inflammatory cytokines like TNF alpha or interleukin 1 beta. Out of the 30 hypothetical proteins of Monkeypox virus Zaire-96-I-16, 3 were annotated using various bioinformatics tools. These proteins function as apoptosis regulators, nuclease, and inhibitors of NF-Kappa-B activator. The functional and structural annotation of the proteins can be used to perform a docking with potential leads to discover novel drugs and vaccines against the Monkeypox. In vivo research can be carried out to identify the complete potential of the annotated proteins.

Tài liệu tham khảo

Gong Q, Wang C, Chuai X, Chiu S (2022) Monkeypox virus: a re-emergent threat to humans. Virologica Sinica 37(4):477–482. https://doi.org/10.1016/j.virs.2022.07.006 Doshi RH, Guagliardo SA, Doty JB, Babeaux AD, Matheny A, Burgado J, Townsend MB, Morgan CN, Satheshkumar PS, Ndakala N, Kanjingankolo T (2019) Epidemiologic and ecologic investigations of monkeypox, Likouala Department, Republic of the Congo, 2017. Emerg Infect Dis 25(2):281–289. https://doi.org/10.3201/eid2502.181222 Ogoina D, Izibewule JH, Ogunleye A, Ederiane E, Anebonam U, Neni A, Oyeyemi A, Etebu EN, Ihekweazu C (2019) The 2017 human monkeypox outbreak in Nigeria—report of outbreak experience and response in the Niger Delta University Teaching Hospital, Bayelsa State, Nigeria. PLoS One 14(4):e0214229. https://doi.org/10.1371/journal.pone.0214229 World Health Organization.(2022, August 24) “Multi-country outbreak of monkeypox”. Retrieved from https://www.who.int/publications/m/item/multi-country-outbreak-of-monkeypox--external-situation-report--4---24-august-2022. Food and Drug administration.(2023, January 2) “FDA Mpox Response”. Retrieved from https://www.fda.gov/emergency-preparedness-and-response/mcm-issues/fda-mpox-response Cho CT, Wenner HA (1973) Monkeypox virus. Bacteriological reviews 37(1):1–8. https://doi.org/10.1128/br.37.1.1-18.1973 Pickup DJ (2015) Extracellular virions: the advance guard of poxvirus infections. PLoS Pathogens 11(7):e1004904. https://doi.org/10.1371/journal.ppat.1004904 Matho MH, Schlossman A, Gilchuk IM, Miller G, Mikulski Z, Hupfer M, Wang J, Bitra A, Meng X, Xiang Y, Kaever T (2018) Structure–function characterization of three human antibodies targeting the vaccinia virus adhesion molecule D8. J Biol Chem. 293(1):390–401. https://doi.org/10.1074/jbc.M117.814541 Chiu WL, Lin CL, Yang MH, Tzou DLM, Chang W (2007) Vaccinia virus 4c (A26L) protein on intracellular mature virus binds to the extracellular cellular matrix laminin. J virol 81(5):2149–2157. https://doi.org/10.1128/JVI.02302-06 Singh K, Gittis AG, Gitti RK, Ostazeski SA, Su HP, Garboczi DN (2016) The vaccinia virus H3 envelope protein, a major target of neutralizing antibodies, exhibits a glycosyltransferase fold and binds UDP-glucose. J Virol 90(10):5020–5030. https://doi.org/10.1128/JVI.02933-15 Schin AM, Diesterbeck US, Moss B (2021) Insights into the organization of the poxvirus multicomponent entry-fusion complex from proximity analyses in living infected cells. J Virol 95(16):e00852-e921. https://doi.org/10.1128/JVI.00852-21 Senkevich TG, Ojeda S, Townsley A, Nelson GE, Moss B (2005) Poxvirus multiprotein entry–fusion complex. Proc Nat Acad Sci 102(51):18572–18577. https://doi.org/10.1073/pnas.0509239102 Brown E, Senkevich TG, Moss B (2006) Vaccinia virus F9 virion membrane protein is required for entry but not virus assembly, in contrast to the related L1 protein. J virol 80(19):9455–9464. https://doi.org/10.1128/JVI.01149-06 Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O’Neill K, Robbertse B, Sharma S(2020). NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database(Oxford).https://doi.org/10.1093/database/baaa062 Shchelkunov SN, Totmenin AV, Babkin IV, Safronov PF, Ryazankina OI, Petrov NA, Gutorov VV, Uvarova EA, Mikheev MV, Sisler JR, Esposito JJ (2001) Human monkeypox and smallpox viruses: genomic comparison. FEBS letters 509(1):66–70. https://doi.org/10.1016/S0014-5793(01)03144-1 Genome. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2004 – [cited 2022 August 27]. Available from: https://www.ncbi.nlm.nih.gov/genome/ The UniProt Consortium(2022), UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, gkac1052.https://doi.org/10.1093/nar/gkac1052 Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic acids res 31(13):3784–8. https://doi.org/10.1093/nar/gkg563 Gill SC, Von Hippel PH (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal biochem 182(2):319–26. https://doi.org/10.1016/0003-2697(89)90602-7 Guruprasad K, Reddy BB, Pandit MW (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng 4(2):155–161. https://doi.org/10.1093/protein/4.2.155 Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J mol biol 157(1):105–32. https://doi.org/10.1016/0022-2836(82)90515-0 Naveed M, Tehreem S, Usman M, Chaudhry Z, Abbas G (2017) Structural and functional annotation of hypothetical proteins of human adenovirus: prioritizing the novel drug targets. BMC res notes 10(1):1–6. https://doi.org/10.1186/s13104-017-2992-z Chou KC, Shen HB (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat protoc 3(2):153–62. https://doi.org/10.1038/nprot.2007.494 Shen HB, Chou KC (2007) Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85(3):233–240. https://doi.org/10.1002/bip.20640 Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–9. https://doi.org/10.1093/bioinformatics/bth466 Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722. https://doi.org/10.1093/bioinformatics/btl170 Möller S, Croning MD, Apweiler R (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17(7):646–53. https://doi.org/10.1093/bioinformatics/17.7.646 Krogh A, Larsson B, Von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J mol biol 305(3):567–80. https://doi.org/10.1006/jmbi.2000.4315 Sonnhammer EL, Von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182. https://doi.org/10.1006/jmbi.2000.4315 Tusnády GE, Simon I (1998) Principles governing amino acid composition of integral membrane proteins: applications to topology prediction. J Mol Biol 283:489–506. https://doi.org/10.1006/jmbi.1998.2107 Tusnády GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17:849–850. https://doi.org/10.1093/bioinformatics/17.9.849 Mahram A, Herbordt MC (2010) Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. InProceedings of the 24th ACM International Conference on Supercomputing, pp 73–82. https://doi.org/10.1145/1810085.1810099 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–10. https://doi.org/10.1016/S0022-2836(05)80360-2 Letunic I, Doerks T, Bork P (2012) SMART 7: recent updates to the protein domain annotation resource. Nucleic acids res 40(D1):D302-5. https://doi.org/10.1093/nar/gkr931 Schultz J, Copley RR, Doerks T, Ponting CP, Bork P (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic acids res 28(1):231–4. https://doi.org/10.1093/nar/28.1.231 Pagni M, Ioannidis V, Cerutti L, Zahn-Zabal M, Jongeneel CV, Hau J, Martin O, Kuznetsov D, Falquet L (2007) MyHits: improvements to an interactive resource for analyzing protein sequences. Nucleic Acids Res 35:W433-7. https://doi.org/10.1093/nar/gkm352 Venkataraman A, Chew TH, Hussein ZA, Shamsir MS (2011) A protein short motif search tool using amino acid sequence and their secondary structure assignment. Bioinformation 7(6):304. https://doi.org/10.6026/007/97320630007304 Zdobnov EM, Apweiler R (2001) InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9):847–848. https://doi.org/10.1093/bioinformatics/17.9.847 Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9):1236–40. https://doi.org/10.1093/bioinformatics/btu031 Shen HB, Chou KC (2009) Predicting protein fold pattern with functional domain and sequential evolution information. J Theor Biol 256(3):441–6. https://doi.org/10.1016/j.jtbi.2008.10.007 Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–22. https://doi.org/10.1093/bioinformatics/btl170 Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, Tosatto SC, Paladin L, Raj S, Richardson LJ, Finn RD (2021) Pfam: the protein families database in 2021. Nucleic acids res 49(D1):D412-9. https://doi.org/10.1093/nar/gkaa913 Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA (2016) The Pfam protein families database: towards a more sustainable future. Nucleic acids res 44(D1):D279-85. https://doi.org/10.1093/nar/gkv1344 Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer EL (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic acids res 27(1):260–2. https://doi.org/10.1093/nar/27.1.260 Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic acids res 26(1):320–2. https://doi.org/10.1093/nar/26.1.320 Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28(3):405–20. https://doi.org/10.1002/(SICI)1097-0134(199707)28:3%3c405::AID-PROT10%3e3.0.CO;2-L Kundsen M, Wiuf C (2010) The CATH database. Hum genomics 4(3):207–212. https://doi.org/10.1186/1479-7364-4-3-207 Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, Orengo CA (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 11(2):233–244. https://doi.org/10.1110/ps.16802 Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J (2009) SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic acids res 37(suppl_1):D380-6. https://doi.org/10.1093/nar/gkn762 Wilson D, Madera M, Vogel C, Chothia C, Gough J (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic acids res 35(suppl_1):D308-13. https://doi.org/10.1093/nar/gkl910 Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat protoc 10(6):845–58. https://doi.org/10.1038/nprot.2015.053 Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK - a program to check the stereochemical quality of protein structures. J Applied Crystallogr 26:283–291. https://doi.org/10.1107/S0021889892009944 Laskowski RA, Rullmannn JA, MacArthur MW, Kaptein R, Thornton JM (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 8:477–486. https://doi.org/10.1007/BF00228148. ([PubMed id: 9008363]) Laskowski R A, MacArthur M W, Thornton J M (2001). PROCHECK: validation of protein structure coordinates, in International Tables of Crystallography, Volume F. Crystallography of Biological Macromolecules, eds. Rossmann M G & Arnold E, Dordrecht, Kluwer Academic Publishers, The Netherlands, pp. 722–725. Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992) Stereochemical quality of protein structure coordinates. Proteins 12:345–364. https://doi.org/10.1002/prot.340120407. ([PubMed id: 1579569]) Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T (2020) QMEANDisCo—distance constraints applied on model quality estimation. Bioinformatics 36(6):1765–71. https://doi.org/10.1093/bioinformatics/btz828 Kumar K, Prakash A, Anjum F, Islam A, Ahmad F, Hassan M (2015) Structure-based functional annotation of hypothetical proteins from Candida dubliniensis: a quest for potential drug targets. 3 Biotech 5(4):561–76. https://doi.org/10.1007/s13205-014-0256-3 Gligorijević V, Renfrew PD, Kosciolek T, Leman JK, Berenberg D, Vatanen T, Chandler C, Taylor BC, Fisk IM, Vlamakis H, Xavier RJ (2021) Structure-based protein function prediction using graph convolutional networks. Nat commun 12(1):1–4. https://doi.org/10.1038/s41467-021-23303-9 Yang J, Roy A, Zhang Y (2013) Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20):2588–95. https://doi.org/10.1093/bioinformatics/btt447 Yang J, Roy A, Zhang Y (2012) BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic acids res 41(D1):D1096-103. https://doi.org/10.1093/nar/gks966 Kantardjieff KA, Rupp B (2004) Protein isoelectric point as a predictor for increased crystallization screening efficiency. Bioinformatics 20(14):2162–8. https://doi.org/10.1093/bioinformatics/bth066 Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A(2005). Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook.571–607.https://doi.org/10.1385/1-59259-584-7:531