The orientation of transcription factor binding site motifs in gene promoter regions: does it matter?

Springer Science and Business Media LLC - Tập 17 - Trang 1-21 - 2016
Monika Lis1, Dirk Walther1
1Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany

Tóm tắt

Gene expression is to large degree regulated by the specific binding of protein transcription factors to cis-regulatory transcription factor binding sites in gene promoter regions. Despite the identification of hundreds of binding site sequence motifs, the question as to whether motif orientation matters with regard to the gene expression regulation of the respective downstream genes appears surprisingly underinvestigated. We pursued a statistical approach by probing 293 reported non-palindromic transcription factor binding site and ten core promoter motifs in Arabidopsis thaliana for evidence of any relevance of motif orientation based on mapping statistics and effects on the co-regulation of gene expression of the respective downstream genes. Although positional intervals closer to the transcription start site (TSS) were found with increased frequencies of motifs exhibiting orientation preference, a corresponding effect with regard to gene expression regulation as evidenced by increased co-expression of genes harboring the favored orientation in their upstream sequence could not be established. Furthermore, we identified an intrinsic orientational asymmetry of sequence regions close to the TSS as the likely source of the identified motif orientation preferences. By contrast, motif presence irrespective of orientation was found associated with pronounced effects on gene expression co-regulation validating the pursued approach. Inspecting motif pairs revealed statistically preferred orientational arrangements, but no consistent effect with regard to arrangement-dependent gene expression regulation was evident. Our results suggest that for the motifs considered here, either no specific orientation rendering them functional across all their instances exists with orientational requirements instead depending on gene-locus specific additional factors, or that the binding orientation of transcription factors may generally not be relevant, but rather the event of binding itself.

Tài liệu tham khảo

Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M, et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 2010;20(6):861–73. Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008;36(16):5221–31. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158(6):1431–43. Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, et al. DNA-binding specificities of human transcription factors. Cell. 2013;152(1–2):327–39. Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 2014;8(6):2015–30. Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34(Web Server issue):W369–73. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23(1):137–44. Luehr S, Hartmann H, Soding J. The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Res. 2012;40(Web Server issue):W104–9. Agostini F, Cirillo D, Ponti RD, Tartaglia GG. SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences. BMC Genomics. 2014;15:925. Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5(1):201. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32(Database issue):D91–4. Bulyk ML, Johnson PL, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30(5):1255–61. Kielbasa SM, Korbel JO, Beule D, Schuchhardt J, Herzel H. Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics. 2001;17(11):1019–26. Keilwagen J, Grau J, Paponov IA, Posch S, Strickert M, Grosse I. De-novo discovery of differentially abundant transcription factor binding sites including their positional preference. PLoS Comput Biol. 2011;7(2), e1001070. Casimiro AC, Vinga S, Freitas AT, Oliveira AL. An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance. BMC Bioinformatics. 2008;9:89. Berendzen KW, Stuber K, Harter K, Wanke D. Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves. BMC Bioinformatics. 2006;7:522. Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature. 2015;527(7578):384–8. Walther D. The combinatorics of cis-regulatory elements: from cooperatively acting motifs to gene regulatory networks. In: Berendzen KW, ed. The analysis of regulatory DNA: current developments, knowledge and applications uncovering gene regulation. Bentham eBooks: Bentham Science Publishers; 2013. 176–88. Haberer G, Hindemitt T, Meyers BC, Mayer KF. Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiol. 2004;136(2):3009–22. Korkuc P, Schippers JH, Walther D. Characterization and identification of cis-regulatory elements in Arabidopsis based on single-nucleotide polymorphism information. Plant Physiol. 2014;164(1):181–200. Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, et al. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science. 2003;301(5629):71–6. McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, et al. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001;29(3):774–82. Blanchette M, Schwikowski B, Tompa M. Algorithms for phylogenetic footprinting. J Comput Biol. 2002;9(2):211–23. De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, et al. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements. Bioinformatics. 2015;31(23):3758–66. Wang Y, Stumph WE. RNA polymerase II/III transcription specificity determined by TATA box orientation. Proc Natl Acad Sci U S A. 1995;92(19):8606–10. Cox JM, Hayward MM, Sanchez JF, Gegnas LD, van der Zee S, Dennis JH, et al. Bidirectional binding of the TATA box binding protein to the TATA box. Proc Natl Acad Sci U S A. 1997;94(25):13475–80. Kays AR, Schepartz A. Virtually unidirectional binding of TBP to the AdMLP TATA box within the quaternary complex with TFIIA and TFIIB. Chem Biol. 2000;7(8):601–10. Naar AM, Boutin JM, Lipkin SM, Yu VC, Holloway JM, Glass CK, et al. The orientation and spacing of core DNA-binding motifs dictate selective transcriptional responses to three nuclear receptors. Cell. 1991;65(7):1267–79. Moolla N, Kew M, Arbuthnot P. Regulatory elements of hepatitis B virus transcription. J Viral Hepat. 2002;9(5):323–31. Suwanickul A, Morris SL, Powell DR. Identification of an insulin-responsive element in the promoter of the human gene for insulin-like growth factor binding protein-1. J Biol Chem. 1993;268(23):17063–8. Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods. 2008;5(9):829–34. Rhee HS, Pugh BF. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr Protoc Mol Biol. 2012;Chapter 21:Unit 21 24. He Q, Johnston J, Zeitlinger J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol. 2015;33(4):395–401. Skene PJ, Henikoff S. A simple method for generating high-resolution maps of genome-wide protein binding. Elife. 2015;4, e09225. Pritsker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res. 2004;14(1):99–108. Lin CY, Chen YH, Lee HC, Tsai HJ. Novel cis-element in intron 1 represses somite expression of zebrafish myf-5. Gene. 2004;334:63–72. Guo WT, Bell KD, Ou JH. Characterization of the hepatitis B virus EnhI enhancer and X promoter complex. J Virol. 1991;65(12):6686–92. Sinha S, Tompa M. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2002;30(24):5549–60. Sinha S, Tompa M. A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol. 2000;8:344–54. Andersson R, Chen Y, Core L, Lis JT, Sandelin A, Jensen TH. Human gene promoters are intrinsically bidirectional. Mol Cell. 2015;60(3):346–7. Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature. 2013;499(7458):360–3. Ntini E, Jarvelin AI, Bornholdt J, Chen Y, Boyd M, Jorgensen M, et al. Polyadenylation site-induced decay of upstream transcripts enforces promoter directionality. Nat Struct Mol Biol. 2013;20(8):923–8. Duttke SH, Lacadie SA, Ibrahim MM, Glass CK, Corcoran DL, Benner C, et al. Human promoters are intrinsically directional. Mol Cell. 2015;57(4):674–84. Duttke SH, Lacadie SA, Ibrahim MM, Glass CK, Corcoran DL, Benner C, et al. Perspectives on unidirectional versus divergent transcription. Mol Cell. 2015;60(3):348–9. Arabidopsis-Genome-Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815. Molina C, Grotewold E. Genome wide analysis of Arabidopsis core promoters. BMC Genomics. 2005;6:25. Tatarinova T, Brover V, Troukhan M, Alexandrov N. Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinformatics. 2003;19 Suppl 1:i313–4. Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57(1):289–300. Basehoar AD, Zanton SJ, Pugh BF. Identification and distinct regulation of yeast TATA box-containing genes. Cell. 2004;116(5):699–709. Siggers T, Gordan R. Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 2014;42(4):2099–111. Goni JR, Perez A, Torrents D, Orozco M. Determining promoter location based on DNA structure first-principles calculations. Genome Biol. 2007;8(12):R263. Mrazek J, Karlin S. Strand compositional asymmetry in bacterial and large viral genomes. Proc Natl Acad Sci U S A. 1998;95(7):3720–5. Shioiri C, Takahata N. Skew of mononucleotide frequencies, relative abundance of dinucleotides, and DNA strand asymmetry. J Mol Evol. 2001;53(4–5):364–76. Fujimori S, Washio T, Tomita M. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics. 2005;6:26. Touchon M, Nicolay S, Arneodo A, d’Aubenton-Carafa Y, Thermes C. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett. 2003;555(3):579–82. Zuo YC, Li QZ. Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics. 2011;97(2):112–20. Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA. Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol. 2006;60(1):69–85. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337(6096):816–21. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. Wu J, Xie J. Computation-based discovery of cis-regulatory modules by hidden Markov model. J Comput Biol. 2008;15(3):279–90. Mathelier A, Wasserman WW. The next generation of transcription factor binding site prediction. PLoS Comput Biol. 2013;9(9), e1003214. Eggeling R, Roos T, Myllymaki P, Grosse I. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics. 2015;16:375. Wong KC, Chan TM, Peng C, Li Y, Zhang Z. DNA motif elucidation using belief propagation. Nucleic Acids Res. 2013;41(16), e153. Wong WS, Nielsen R. Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models. Bioinformatics. 2007;23(16):2031–7. Philips SJ, Canalizo-Hernandez M, Yildirim I, Schatz GC, Mondragon A, O’Halloran TV. TRANSCRIPTION. Allosteric transcriptional regulation via changes in the overall topology of the core promoter. Science. 2015;349(6250):877–81. Yen K, Vinayachandran V, Batta K, Koerber RT, Pugh BF. Genome-wide nucleosome specificity and directionality of chromatin remodelers. Cell. 2012;149(7):1461–73. Dror I, Golan T, Levy C, Rohs R, Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res. 2015;25(9):1268–80. Amoutzias GD, Robertson DL, Van de Peer Y, Oliver SG. Choose your partners: dimerization in eukaryotic transcription factors. Trends Biochem Sci. 2008;33(5):220–9. Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, et al. AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics. 2003;4:25. O’Connor TR, Dyreson C, Wyrick JJ. Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics. 2005;21(24):4411–3. Higo K, Ugawa Y, Iwamoto M, Higo H. PLACE: a database of plant cis-acting regulatory DNA elements. Nucleic Acids Res. 1998;26(1):358–9. Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, et al. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001;29(1):102–5. Cooper H, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis. Russell Sage Foundation Publications; 2009.