Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes

BMC Bioinformatics - Tập 3 - Trang 1-10 - 2002
Michele Caselle1, Ferdinando Di Cunto2, Paolo Provero1
1Dipartimento di Fisica Teorica, Università di Torino, and INFN, Sezione di Torino, Torino, Italy
2Dipartimento di Genetica, Biologia e Biochimica, Università di Torino, Torino, Italy

Tóm tắt

Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation. The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulatory sequence was determined. We have described and successfully tested a simple computational method to identify upstream motifs relevant to gene regulation in eukaryotes by studying the statistical correlation between overepresented upstream motifs and gene expression levels.

Tài liệu tham khảo

DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997, 278: 680–686. [http://cmgm.stanford.edu/pbrown/explore/] 10.1126/science.278.5338.680 van Helden J, André B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998, 281: 827–842. 10.1006/jmbi.1998.1947 Wagner A: A computational genomics approach to the identification of gene networks. Nucleic Acids Research 1997, 25: 3594–3604. 10.1093/nar/25.18.3594 Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature Genetics 1999, 22: 281–285. 10.1038/10343 Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nature Genetics 2001, 27: 167–171. 10.1038/84792 Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 2001, 29: 153–159. [http://genetics.med.harvard.edu/~tpilpel/MotComb.html] 10.1038/ng724 Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis -regulatory elements associated with groups of functionally related genes in. Saccharomyces cerevisiae 2000, 296: 1205–1214. 10.1006/jmbi.2000.3519 Dequard-Chablat M, Riva M, Carles C, Sentenac A: RPC19, the gene for a subunit common to yeast RNA polymerases A (I) and C (III). J Biol Chem 1991, 266: 15300–15307. Kobayashi N, McEntee K: Identification of cis and trans components of a novel heat shock stress regulatory pathway in Sac-charomyces cerevisiae. Mol Cell Biol 1993, 13: 248–256. Martinez-Pastor MT, Marchler G, Schuller C, Marchler-Bauer A, Ruis H, Estruch F: The Saccharomyces cerevisiae zinc-finger proteins Msn2p and Msn4p are required for transcriptional induction through the stress-response element (STRE). EMBO J 1996, 15: 2227–2235. Nehlin JO, Ronne H: Yeast MIG1 repressor is related to mammalian early growth response and Wilm's tumour finger proteins. EMBO J 1990, 9: 2891–2898. Ostling J, Carlberg M, Ronne H: Functional domains in the Mig1 repressor. Mol Cell Biol 1996, 16: 753–761. Johnston M: Feasting, fasting and fermenting. Glucose sensing in yeast and other cells. Trends in Genetics 1999, 15: 29–33. 10.1016/S0168-9525(98)01637-0 Sumrada MA, Cooper TG: Ubiquitous upstream repression sequences control activation of the inducible arginase gene in yeast. Proc Natl Acad Sci USA 1987, 84: 3997–4001. Gailus-Durner V, Chintamaneni C, Wilson R, Brill SJ, Vershon AK: Analysis of a meiosis-specific URS1 site: sequence requirements and involvement of replication protein A. Mol Cell Biol 1997, 17: 3536–3546. Kratzer S, Schuller HJ: Transcriptional control of the yeast acetyl-CoA synthetase gene, ACS1, by the positive regulators CAT8 and ADR1 and the pleiotropic repressor UME6. Mol Microbiol 1997, 26: 631–641. 10.1046/j.1365-2958.1997.5611937.x Spellmann PT, et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccahromyces Cerevisiae by mi-croarray hybridization. Mol Biol Cell 1998, 9: 3273–3297. van Helden J, Rios AF, Collado-Vides J: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research 2000, 28: 1808–1818. 10.1093/nar/28.8.1808 van Helden J, André B, Collado-Vides J: A web site for the computational analysis of yeast regulatory sequences. Yeast 2000, 16: 177–187. 10.1002/(SICI)1097-0061(20000130)16:2<177::AID-YEA516>3.0.CO;2-9