Prediction of co-regulated genes in Bacillus subtilis on the basis of upstream elements conserved across three closely related species

Genome Biology - Tập 2 - Trang 1-12 - 2001
Goro Terai1,2, Toshihisa Takagi1, Kenta Nakai1
1Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
2INTEC Web and Genome Informatics Corp., Tokyo, Japan

Tóm tắt

Identification of co-regulated genes is essential for elucidating transcriptional regulatory networks and the function of uncharacterized genes. Although co-regulated genes should have at least one common sequence element, it is generally difficult to identify these genes from the presence of this element because it is very easily obscured by noise. To overcome this problem, we used conserved information from three closely related species: Bacillus subtilis, B. halodurans and B. stearothermophilus. Even though such species have a limited number of clearly orthologous genes, we obtained 1,884 phylogenetically conserved elements from the upstream intergenic regions of 1,568 B. subtilis genes. Similarity between these elements was used to cluster these genes. No other a priori knowledge on genes and elements was used. We could identify some genes known or suggested to be regulated by a common transcription factor as well as genes regulated by a common attenuation effector. We confirmed that our method generates relatively few false positives in clusters with higher scores and that general elements such as -35/-10 boxes and Shine-Dalgarno sequence are not major obstacles. Moreover, we identified some plausible additional members of groups of known co-regulated genes. Thus, our approach is promising for exploring potentially co-regulated genes.

Tài liệu tham khảo

DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680-686. 10.1126/science.278.5338.680. Eisen MB, Spellman PT, Brown PO, Bostein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet. 1999, 22: 281-285. 10.1038/10343. Thieffry D, Salgado H, Huerta AM, Collado-Vides J: Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12. Bioinformatics. 1998, 14: 391-400. 10.1093/bioinformatics/14.5.391. Robinson K, McGire AM, Church GM: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol. 1998, 284: 241-254. 10.1006/jmbi.1998.2160. Miwa Y, Nakata A, Ogiwara A, Yamamoto M, Fujita Y: Evaluation and characterization of catabolite-responsive elements (cre) of Bacillus subtilis. Nucleic Acids Res. 2000, 28: 1206-1210. 10.1093/nar/28.5.1206. Mironov AA, Koonin EV, Roytberg MA, Gelfand MS: Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes. Nucleic Acids Res. 1999, 27: 2981-2989. 10.1093/nar/27.14.2981. Tan K, Moreno-Hagelsieb G, Collado-Vides J, Stormo GD: A comparative genomics approach to prediction of new members of regulons. Genome Res. 2001, 11: 566-584. 10.1101/gr.149301. Makarova KS, Mironov AA, Gelfand MS: Conservation of the binding site for arginine repressor in all bacterial lineages. Genome Biol. 2001, 2: research0013.1-0013.8. 10.1186/gb-2001-2-4-research0013. Gelfand MS, Koonin EV, Mironov AA: Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res. 2000, 28: 695-705. 10.1093/nar/28.3.695. McGuire AM, Hughes JD, Church GM: Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 2000, 10: 744-757. 10.1101/gr.10.6.744. McGuire AM, Church GM: Predicting regulons and their cis-regulatory motifs by comparative genomics. Nucleic Acids Res. 2000, 28: 4523-4530. 10.1093/nar/28.22.4523. Hardison RC, Oeltjen J, Miller W: Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 1997, 7: 959-966. Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA: Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 2000, 10: 1304-1306. 10.1101/gr.142200. Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE: Human-mouse genome comparisons to locate regulatory sites. Nat Genet. 2000, 26: 225-228. 10.1038/79965. McCue LA, Thompson W, Carmack CS, Ryan MP, Lie JS, Derbyshire V, Lewrence CE: Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001, 29: 774-782. 10.1093/nar/29.3.774. Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessieres P, Bolotin A, Borchert S, et al: The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature. 1997, 390: 249-256. 10.1038/36786. Takami H, Nakasone K, Takaki Y, Maeno G, Sasaki R, Masui N, Fuji F, Hirama C, Nakamura Y, Ogasawara Y, et al: Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis. Nucleic Acids Res. 2000, 28: 4317-4331. 10.1093/nar/28.21.4317. Ishii T, Yoshida K, Terai G, Fujita Y, Nakai K: DBTBS: A database of Bacillus subtilis promoters and transcription factors. Nucleic Acids Res. 2001, 29: 278-280. 10.1093/nar/29.1.278. DBTBS. [http://elmo.ims.u-tokyo.ac.jp/dbtbs] Ogasawara N: Systematic function analysis of Bacillus subtilis genes. Res Microbiol. 2000, 151: 129-134. 10.1016/S0923-2508(00)00118-2. Condon C, Grunberg-Manago M, Puzer H: Aminoacyl-tRNA synthetase gene regulation in Bacillus subtilis. Biochimie. 1996, 78: 381-389. 10.1016/0300-9084(96)84744-4. Lu Y, Turner RJ, Switzer RL: Roles of the three transcriptional attenuators of the Bacillus subtilis pyrimidine biosynthetic operon in the regulation of its expression. J Bacteriol. 1995, 177: 1315-1325. Lu Y, Turner RJ, Switzer RL: Function of RNA secondary structures in transcriptional attenuation of the Bacillus subtili spyr operon. Proc Natl Acad Sci USA. 1996, 93: 14462-14467. 10.1073/pnas.93.25.14462. Grundy FJ, Henkin TH: The S box regulon: a new global transcription termination control system for methionine and cysteine biosynthesis genes in Gram-positive bacteria. Mol Microbiol. 1998, 30: 737-749. 10.1046/j.1365-2958.1998.01105.x. Christiansen LC, Schou S, Nygaard P, Saxild HH: Xanthine metabolism in Bacillus subtilis: characterization of the xpt-pbuX operon and evidence for purine- and nitrogen-controlled expression of genes involved in xanthine salvage and catabolism. J Bacteriol. 1997, 179: 2540-2550. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res. 2000, 28: 263-266. 10.1093/nar/28.1.263. Derre I, Rapoport G, Msadek T: CtsR, a novel regulator of stress and heat shock response, controls clp and molecular chaperone gene expression in gram-positive bacteria. Mol Microbiol. 1999, 31: 117-131. 10.1046/j.1365-2958.1999.01152.x. Derre I, Rapoport G, Devine K, Rose M, Msadek T: ClpE, a novel type of HSP100 ATPase, is part of the CtsR heat shock regulon of Bacillus subtilis. Mol Microbiol. 1999, 32: 581-593. 10.1046/j.1365-2958.1999.01374.x. Homuth G, Masuda S, Mogk A, Kobayashi Y, Schumann W: The dnaK operon of Bacillus subtilis is heptacistronic. J Bacteriol. 1997, 179: 1153-1164. Yoshida K, Kobayashi K, Miwa Y, Kang CM, Matsunaga M, Yamaguchi H, Tojo S, Yamamoto M, Nishi R, Ogasawara N, et al: Combined transcriptome and proteome analysis as a powerful approach to study genes under glucose repression in Bacillus subtilis. Nucleic Acids Res. 2001, 29: 683-692. 10.1093/nar/29.3.683. Yada T, Nakao M, Totoki Y, Nakai K: Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. Bioinformatics. 1999, 15: 987-993. 10.1093/bioinformatics/15.12.987. Salgado H, Moreno-Hagelsieb G, Smith TF, Collado-Vides J: Operons in Escherichia coli: genomic analyses and predictions. Proc Natl Acad Sci USA. 2000, 97: 6652-6657. 10.1073/pnas.110147297. Ermolaeva MD, White O, Salzberg SL: Prediction of operons in microbial genomes. Nucleic Acids Res. 2001, 29: 1216-1221. 10.1093/nar/29.5.1216. GenBank. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome] The Bacillus stearothermophilus genome-sequencing project. [http://www.genome.ou.edu/bstearo.html] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389. LALIGN. [ftp://ftp.virginia.edu/pub/fasta] Pearson WR: Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol. 2000, 132: 185-219. Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol. 1981, 147: 195-197. Sokal RR, Michener CD: A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull. 1958, 28: 1409-1438. Moszer I: The complete genome of Bacillus subtilis: from sequence annotation to data management and analysis. FEBS Lett. 1998, 430: 28-36. 10.1016/S0014-5793(98)00620-6. SubtiList. [http://genolist.pasteur.fr/SubtiList]