Evaluation of thresholds for the detection of binding sites for regulatory proteins in Escherichia coliK12 DNA

Genome Biology - Tập 3 - Trang 1-16 - 2002
Esperanza Benítez-Bellón1, Gabriel Moreno-Hagelsieb1, Julio Collado-Vides1
1Program of Computational Genomics, CIFN, UNAM, A.P. 565-A, Cuernavaca, Mexico

Tóm tắt

Sites in DNA that bind regulatory proteins can be detected computationally in various ways. Pattern discovery methods analyze collections of genes suspected to be co-regulated on the evidence, for example, of clustering of transcriptome data. Pattern searching methods use sequences with known binding sites to find other genes regulated by a given protein. Such computational methods are important strategies in the discovery and elaboration of regulatory networks and can provide the experimental biologist with a precise prediction of a binding site or identify a gene as a member of a set of co-regulated genes (a regulon). As more variations on such methods are published, however, thorough evaluation is necessary, as performance may differ depending on the conditions of use. Detailed evaluation also helps to improve and understand the behavior of the different methods and computational strategies. We used a collection of 86 regulons from Escherichia coli as datasets to evaluate two methods for pattern discovery and pattern searching: dyad analysis/dyad sweeping using the program Dyad-analysis, and multiple alignment using the programs Consensus/Patser. Clearly defined statistical parameters are used to evaluate the two methods in different situations. We placed particular emphasis on minimizing the rate of false positives. As a general rule, sensors obtained from experimentally reported binding sites in DNA frequently locate true sites as the highest-scoring sequences within a given upstream region, especially using Consensus/Patser. Pattern discovery is still an unsolved problem, although in the cases where Dyad-analysis finds significant dyads (around 50%), these frequently correspond to true binding sites. With more robust methods, regulatory predictions could help identify the function of unknown genes.

Tài liệu tham khảo

Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE, Hieter P, Vogelstein B, Kinzler KW: Characterization of the yeast transcriptome. Cell. 1997, 88: 243-251. Hertz GZ, Hartzell GW, Stormo GD: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci. 1990, 6: 81-92. Lawrence CE, Reilly AA: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990, 7: 41-51. Waterman MS, Arratia R, Galas DJ: Pattern recognition in several sequences: consensus and alignment. Bull Math Biol. 1984, 46: 515-527. Wolfertstetter F, Frech K, Herrmann G, Werner T: Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm. Comput Appl Biosci. 1996, 12: 71-80. van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998, 281: 827-842. 10.1006/jmbi.1998.1947. Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000, 296: 1205-1214. 10.1006/jmbi.2000.3519. Brazma A, Jonassen I, Vilo J, Ukkonen E: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 1998, 8: 1202-1215. Crowley EM: A Bayesian method for finding regulatory segments in DNA. Biopolymers. 2001, 58: 165-174. 10.1002/1097-0282(200102)58:2<165::AID-BIP50>3.0.CO;2-O. Huerta AM, Salgado H, Thieffry D, Collado-Vides J: RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 1998, 26: 55-59. 10.1093/nar/26.1.55. Salgado H, Santos-Zavaleta A, Gama-Castro S, Millan-Zarate D, Diaz-Peredo E, Sanchez-Solano F, Perez-Rueda E, Bonavides-Martinez C, Collado-Vides J: RegulonDB (version 3.2): transcriptional regulation and operon in Escherichia coli K12. Nucleic Acids Res. 2001, 29: 72-74. 10.1093/nar/29.1.72. van Helden J, Rios AF, Collado-Vides J: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 2000, 28: 1808-1818. 10.1093/nar/28.8.1808. Hertz GZ, Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999, 15: 563-577. 10.1093/bioinformatics/15.7.563. Gralla JD, Collado-Vides J: Organization and function of transcription regulatory elements. In Cellular and Molecular Biology: Escherichia coli and Salmonella. Edited by Neidhardt FC, Curtiss III R, Ingraham J, Lin ECC, Low KB, Magasanik B, Reznikoff W, Schaechter M, Umbarger HE, Riley M. Washington, DC: American Society for Microbiology,. Edited by: Neidhardt FC, Curtiss III R, Ingraham J, Lin ECC, Low KB, Magasanik B, Reznikoff W, Schaechter M, Umbarger HE, Riley M. 1996, Washington, DC: American Society for Microbiology,, 1232-1245. Fernandez De Henestrosa AR, Ogi T, Aoyagi S, Chafin D, Hayes JJ, Ohmori H, Woodgate R: Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol Microbiol. 2000, 35: 1560-1572. 10.1046/j.1365-2958.2000.01826.x. Regulatory sequence analysis tools. [http://embnet.cifn.unam.mx/~jvanheld/rsa-tools/] Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al: The complete genome sequence of Escherichia coli K-12. Science. 1997, 277: 1453-1474. 10.1126/science.277.5331.1453. Serres MH, Riley M: MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. Microb Comp Genomics. 2000, 5: 205-222.