The relationship of protein conservation and sequence length

David J Lipman1, Alexander Souvorov1, Eugene V Koonin1, Anna R Panchenko1, Tatiana A Tatusova1
1National Center for Biotechnology Information, National Institutes of Health, Bethesda, USA

Tóm tắt

In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.

Từ khóa


Tài liệu tham khảo

Knight RD, Freeland SJ, Landweber LF: A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2001, 2: RESEARCH0010-

Galperin MY, Tatusov RL, Koonin EV: Comparing microbila genomes: how the gene set determines the lifestyle. In: Organization of the Prokaryotic Genome. Edited by: Charlebois RL. 1999, Washington, DC: ASM Press

Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G, Barbe V, Peyretaillade E, Brottier P, Wincker P, Delbac F, El Alaoui H, Peyret P, Saurin W, Gouy M, Weissenbach J, Vivares CP: Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature. 2001, 414: 450-453. 10.1038/35106579.

Singer GA, Hickey DA: Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol. 2000, 17: 1581-1588.

Wheelan SJ, Marchler-Bauer A, Bryant SH: Domain size distributions can predict domain boundaries. Bioinformatics. 2000, 16: 613-618. 10.1093/bioinformatics/16.7.613.

Madej T, Gibrat JF, Bryant SH: Threading a database of protein cores. Proteins. 1995, 23: 356-369.

Das S, Yu L, Gaitatzes C, Rogers R, Freeman J, Bienkowska J, Adams RM, Smith TF, Lindelien J: Biology's new Rosetta stone. Nature. 1997, 385: 29-30. 10.1038/385029a0.

Skovgaard M, Jensen LJ, Brunak S, Ussery D, Krogh A: On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 2001, 17: 425-428. 10.1016/S0168-9525(01)02372-1.

Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol. 1997, 44: 383-397.

Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.

Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12: 962-968. 10.1101/gr.87702. Article published online before print in May 2002.

Hartl FU, Hayer-Hartl M: Molecular chaperones in the cytosol: from nascent chain to folded protein. Science. 2002, 295: 1852-1858. 10.1126/science.1068408.

Akashi H, Gojobori T: Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A. 2002, 99: 3695-3700. 10.1073/pnas.062526999.

Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA: Selection for short introns in highly expressed genes. Nat Genet. 2002, 31: 415-418.

Petrov DA: Evolution of genome size: new approaches to an old problem. Trends Genet. 2001, 17: 23-28. 10.1016/S0168-9525(00)02157-0.

Halliday JA, Glickman BW: Mechanisms of spontaneous mutation in DNA repair-proficient Escherichia coli. Mutat Res. 1991, 250: 55-71. 10.1016/0027-5107(91)90162-H.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

Matsuo Y, Bryant SH: Identification of homologous core structures. Proteins. 1999, 35: 70-79. 10.1002/(SICI)1097-0134(19990401)35:1<70::AID-PROT7>3.3.CO;2-0.