FRESCo: finding regions of excess synonymous constraint in diverse viruses
Tóm tắt
The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.
Tài liệu tham khảo
Eveleth DD, Marsh JL. Overlapping transcription units in Drosophila: sequence and structure of the Cs gene. Mol Gen Genet. 1987;209:290–8.
Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, et al. Purifying and directional selection in overlapping prokaryotic genes. Trends Genet. 2002;18:228–32.
Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 2011;21:1916–28.
Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science. 2013;342:1367–72.
Plotkin J, Kudia G. Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet. 2011;12:32–42.
Kudia G, Murray A, Tollervey D, Plotkin J. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;10:255–8.
Grundhoff A, Sullivan CS. Virus-encoded microRNAs. Virology. 2011;411:325–43.
Siegrist CA, Durand B, Emery P, David E, Hearing P, Mach B, et al. RFX1 is identical to enhancer factor C and functions as a transactivator of the hepatitis B virus enhancer. Mol Cell Biol. 1993;13:6375–84.
Mizokami M, Orito E, Ohba K, Ikeo K, Lau JY, Gojobori T. Constrained evolution with respect to gene overlap of hepatitis B virus. J Mol Evol. 1997;44:S83–90.
Kim DY, Firth AE, Atasheva S, Frolova EI, Frolov I. Conservation of a packaging signal and the viral genome RNA packaging mechanism in alphavirus evolution. J Virol. 2011;85:8022–36.
Steward M, Vipond IB, Millar NS, Emmerson PT. RNA editing in Newcastle disease virus. J Gen Virol. 1993;74:2539–47.
Xia X. Maximizing transcription efficiency causes codon usage bias. Genetics. 1996;144:1309–20.
Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature. 1987;325:728–30.
Kimchi-Sarfaty C, Oh JM, Kim I-W, Sauna ZE, Calcagno AM, Ambudkar SV, et al. A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science. 2007;315:525–8.
Simmonds P, Smith DB. Structural constraints on RNA virus evolution. J Virol. 1999;73:5787–94.
Gog JR, Afonso EDS, Dalton RM, Leclercq I, Tiley L, Elton D, et al. Codon conservation in the influenza A virus genome defines RNA packaging signals. Nucleic Acids Res. 2007;35:1897–907.
Firth AE, Atkins JF. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1’ may derive from ribosomal frameshifting. Virol J. 2009;6:14.
Mayrose I, Stern A, Burdelova E, Sabo Y, Laham-Karam N, Zamostiano R, et al. Synonymous site conservation in the HIV-1 genome. BMC Evol Biol. 2013;13:164.
Firth AE. Mapping overlapping functional elements embedded within the protein-coding regions of RNA viruses. Nucleic Acids Res. 2014;42:12425–39.
Kosakovsky-Pond S, Frost S. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 2005;22:1208–22.
Kosakovsky-Pond S, Muse S. Site-to-site variation of synonymous substitution rates. Mol Biol Evol. 2005;22:2375–85.
Kosakovsky-Pond S, Scheffler K, Gravenor M, Poon A, Frost S. Evolutionary fingerprinting of genes. Mol Biol Evol. 2010;27:520–36.
Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–4.
Pond SLK, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005;21:676–9.
Clyde K, Barrera J, Harris E. The capsid-coding region hairpin element (cHP) is a critical determinant of dengue virus and West Nile virus RNA synthesis. Virology. 2008;379:314–23.
Melian EB, Hinzman E, Nagasaki T, Firth AE, Wills NM, Nouwens AS, et al. NS1’ of flaviviruses in the Japanese encephalitis virus serogroup is a product of ribosomal frameshifting and plays a role in viral neuroinvasiveness. J Virol. 2010;84:1641–7.
Trevelyan B, Smallman-Raynor M, Cliff AD. The spatial structure of epidemic emergence: geographical aspects of poliomyelitis in north-eastern USA, July-October 1916. J R Stat Soc Ser A Stat Soc. 2005;168:701–22.
Goodfellow I, Chaudhry Y, Richardson A, Meredith J, Almond JW, Barclay W, et al. Identification of a cis-acting replication element within the poliovirus coding region. J Virol. 2000;74:4590–600.
Han J-Q, Townsend HL, Jha BK, Paranjape JM, Silverman RH, Barton DJ. A phylogenetically conserved RNA structure in the poliovirus open reading frame inhibits the antiviral endoribonuclease RNase L. J Virol. 2007;81:5561–72.
Song Y, Liu Y, Ward CB, Mueller S, Futcher B, Skiena S, et al. Identification of two functionally redundant RNA elements in the coding sequence of poliovirus using computer-generated design. Proc Natl Acad Sci U S A. 2012;109:14301–7.
Burrill CP, Westesson O, Schulte MB, Strings VR, Segal M, Andino R. Global RNA structure analysis of poliovirus identifies a conserved RNA structure involved in viral replication and infectivity. J Virol. 2013;87:11670–83.
Burns CC, Shaw J, Campagnoli R, Jorba J, Vincent A, Quay J, et al. Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region. J Virol. 2006;80:3259–72.
Mueller S, Papamichail D, Coleman JR, Skiena S, Wimmer E. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J Virol. 2006;80:9687–96.
Simmonds P, Welch J. Frequency and dynamics of recombination within different species of human enteroviruses. J Virol. 2006;80:483–93.
Mohan KV, Atreya CD. Nucleotide sequence analysis of rotavirus gene 11 from two tissue culture-adapted ATCC strains, RRV and Wa. Virus Genes. 2001;23:321–9.
Li W, Manktelow E, von Kirchbach JC, Gog JR, Desselberger U, Lever AM. Genomic analysis of codon, sequence and structural conservation with selective biochemical-structure mapping reveals highly conserved and dynamic structures in rotavirus RNAs with potential cis-acting functions. Nucleic Acids Res. 2010;38:7718–35.
Belhouchet M, Mohd Jaafar F, Firth AE, Grimes JM, Mertens PPC, Attoui H. Detection of a fourth orbivirus non-structural protein. PLoS One. 2011;6:e25697.
Firth AE. Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J. 2008;5:48.
Van Dijk AA, Huismans H. In vitro transcription and translation of bluetongue virus mRNA. J Gen Virol. 1988;69:573–81.
Wade-Evans AM, Mertens PP, Belsham GJ. Sequence of genome segment 9 of bluetongue virus (serotype 1, South Africa) and expression analysis demonstrating that different forms of VP6 are derived from initiation of protein synthesis at two distinct sites. J Gen Virol. 1992;73:3023–6.
Chung BY-W, Miller WA, Atkins JF, Firth AE. An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci U S A. 2008;105:5897–902.
Haldeman-Cahill R, Daròs JA, Carrington JC. Secondary structures in the capsid protein coding sequence and 3’ nontranslated region involved in amplification of the tobacco etch virus genome. J Virol. 1998;72:4072–9.
Hofacker IL, Stadler PF, Stocsits RR. Conserved RNA secondary structures in viral genomes: a survey. Bioinformatics. 2004;20:1495–9.
Roossinck MJ. Evolutionary history of cucumber mosaic virus deduced by phylogenetic analyses. J Virol. 2002;76:3382–7.
Grubman MJ, Baxt B. Foot-and-mouth disease. Clin Microbiol Rev. 2004;17:465–93.
Mason PW, Bezborodova SV, Henry TM. Identification and characterization of a cis-acting replication element (cre) adjacent to the internal ribosome entry site of foot-and-mouth disease virus. J Virol. 2002;76:9686–94.
Heath L, van der Walt E, Varsani A, Martin DP. Recombination patterns in aphthoviruses mirror those found in other picornaviruses. J Virol. 2006;80:11827–32.
Mehedi M, Falzarano D, Seebach J, Hu X, Carpenter M, Schnittler H, et al. A new Ebola virus nonstructural glycoprotein expressed through RNA editing. J Virol. 2011;85:5406–14.
Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S. Virus attenuation by genome-scale changes in codon pair bias. Science. 2008;320:1784–7.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.
Gruber AR, Findeiß S, Washietl S, Hofacker IL, Stadler PF. RNAz 2.0: improved noncoding RNA detection. Pac Symp Biocomput. 2010;69–79.
Darty K, Denise A, Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009;25:1974–5.