Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

Genome Research - Tập 15 Số 8 - Trang 1034-1050 - 2005
Adam Siepel1, Gill Bejerano, Jakob Skou Pedersen2, Angie S. Hinrichs, Minmei Hou, Kate R. Rosenbloom, Hiram Clawson, John Spieth, LaDeana W. Hillier, Stephen Richards, George M. Weinstock, Richard K. Wilson, Richard A. Gibbs, W. James Kent, Webb Miller, David Haussler
1Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA. [email protected]
2Department of Clinical Medicine - Department of Molecular Medicine (MOMA), Department of Clinical Medicine, Health, Aarhus University

Tóm tắt

We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%–8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%–53%), Caenorhabditis elegans (18%–37%), and Saccharaomyces cerevisiae (47%–68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3′ UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

Từ khóa


Tài liệu tham khảo

10.1017/S1355838200991921

10.1038/75556

2004, Bioinformatics, 20, I40, 10.1093/bioinformatics/bth946

10.1126/science.1098119

10.1101/gr.178701

2002, Genome Biol., 3, research0086

10.1146/annurev.biochem.72.121801.161720

10.1101/gr.1933104

10.1126/science.1081331

10.1101/gr.1960404

10.1016/S0378-1119(97)00399-5

10.1101/gr.926603

10.1101/gr.1759004

10.1126/science.1108625

10.1101/sqb.2003.68.245

10.1093/nar/gkh033

10.1101/gr.201601

10.1101/gr.2034704

10.1101/gr.3308405

10.1038/nature01251

Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, UK.

10.1093/nar/21.10.2315

10.1016/j.gde.2003.10.008

10.1093/nar/gki045

10.1126/science.1105136

10.1186/gb-2003-5-1-r1

10.1093/oxfordjournals.molbev.a025575

10.1093/nar/gkg094

10.1101/gr.198201

10.1101/gr.1961204

10.1101/gr.3545105

10.1093/nar/gkg006

10.1006/bbrc.2001.5738

10.1101/gr.7.10.959

10.1101/gr.844103

10.1093/nar/gkh066

10.1038/nature03466

10.1126/science.1086763

10.1073/pnas.0404142101

10.1038/nature03154

10.1371/journal.pbio.0020363

2000, Trends Genet., 16, 21

10.1016/S0168-9525(02)00006-9

10.1093/nar/gkg129

10.1038/nature01644

10.1101/gr.229202. Article published online before March 2002

10.1073/pnas.1932072100

10.1242/dev.00877

King, D.C., Taylor, J., Elnitski, L., Chiaromonte, F., Miller, W., and Hardison, R.C. 2005. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. (this issue).

10.1093/bioinformatics/15.6.446

10.1038/ng1536

10.1016/j.tibs.2003.12.002

10.1016/j.cell.2004.12.035

10.1093/nar/25.18.3580

10.1126/science.288.5463.136

10.1101/gr.1602203

10.1093/bioinformatics/16.11.1046

2002, Genome Biol., 3, reviews0004.1

10.1038/nature01262

10.1126/science.1088328

10.1038/nature03022

10.1101/gr.2129504

10.1101/gr.3007205

10.1101/gr.3015505

10.1093/nar/gkh839

10.1093/nar/gkg111

10.1016/S0888-7543(03)00207-6

10.1038/nature02426

10.1186/1471-2105-2-8

Roskin, K.M., Diekhans, M., and Haussler, D. 2003. Scoring two-species local alignments to try to statistically separate neutrally evolving from selected DNA segments. In Proc. 7th Annual Int'l Conf. on Research in Computational Molecular Biology pp. 257–266.

10.1016/0168-9525(96)10016-0

10.1126/science.287.5461.2204

10.1101/gr.10.4.577

10.1101/gr.809403

10.1017/S0016672399003821

10.1016/S0168-9525(01)02344-7

2002, Genome Biol., 3, research0044, 10.1186/gb-2002-3-8-reports0044

2004, Mol. Biol. Evol., 21, 468

———. 2005. Phylogenetic hidden Markov models. In Statistical methods in molecular evolution (ed. R. Nielsen), pp. 325–351. Springer, New York.

10.1016/j.ygeno.2004.07.012

10.1101/gr.1208803

10.1371/journal.pbio.0000045

10.1093/nar/27.19.3899

Sugnet, C.W., Kent, W.J., Ares, M., and Haussler, D. 2004. Transcriptome and genome conservation of alternative splicing events in humans and mice. In Proc. 9th Pacific Symp. on Biocomputing, pp. 66–77.

10.1038/nature01858

10.1016/j.mod.2004.05.009

10.1016/j.tig.2003.08.004

10.1038/337283a0

10.1371/journal.pbio.0030007

10.1038/nature03441

1995, Genetics, 139, 993, 10.1093/genetics/139.2.993

10.1126/science.1097434

10.1038/nbt808

http://www.cse.ucsc.edu/~acs/conservation; Supplemental data for this study.

http://genome.ucsc.edu; UC Santa Cruz Genome Browser.

http://genome.ucsc.edu/cgi-bin/hgTables; UC Santa Cruz Table Browser.

http://www.genetics.wustl.edu/saccharomycesgenomes/Contigs; download page for yeast sequence data, Washington University, St. Louis.

http://www.broad.mit.edu/ftp/pub/annotation/fungi/comp_yeasts; download page for yeast sequence data, Broad Institute.

http://ftp.genome.washington.edu/cgi-bin/RepeatMasker; RepeatMasker home page.

http://www.soe.ucsc.edu/~kent/src/unzipped/hg/featureBits; featureBits source code.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM; Online Mendelian Inheritance in Man home page.