Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms

Genes - Tập 2 Số 4 - Trang 869-911
Arshan Nasir1, Aisha Naeem2, Muhammad Jawad Khan2, Horacio D. Lopez Nicora3, Gustavo Caetano‐Anollés1
1Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
2Mammalian NutriPhysioGenomics Laboratory, Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA
3Plant Pathology Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA

Tóm tắt

The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.

Từ khóa


Tài liệu tham khảo

Kim, 2011, Proteome evolution and the metabolic origins of translation and cellular life, J. Mol. Evol., 72, 14, 10.1007/s00239-010-9400-9

Lesk, A.M. (2001). Introduction to Protein Architecture, Oxford University Press.

Cordes, 1996, Sequence space, folding and protein design, Curr. Opin. Struct. Biol., 6, 3, 10.1016/S0959-440X(96)80088-1

Linderstrom-Lang, K.U., and Schellman, J.A. (1959). The Enzymes, Academic Press.

Wang, 2009, The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world, Structure, 17, 66, 10.1016/j.str.2008.11.008

Vogel, 2004, Structure, function and evolution of multidomain proteins, Curr. Opin. Struct. Biol., 14, 208, 10.1016/j.sbi.2004.03.011

Wang, 2007, Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world, Genome Res., 17, 1572, 10.1101/gr.6454307

Gerstein, 1998, Comparing genomes in terms of protein structure: Surveys of a finite parts list, FEMS Microbiol. Rev., 22, 277, 10.1111/j.1574-6976.1998.tb00371.x

Chothia, 2003, Evolution of the protein repertoire, Science, 300, 1701, 10.1126/science.1085371

Murzin, 1995, Scop: A structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 536, 10.1016/S0022-2836(05)80134-2

Orengo, 1997, Cath—A hierarchic classification of protein domain structures, Structure, 5, 1093, 10.1016/S0969-2126(97)00260-8

Riley, 1997, Protein evolution viewed through escherichia coli protein sequences: Introducing the notion of a structural segment of homology, the module, J. Mol. Biol., 268, 857, 10.1006/jmbi.1997.1003

Ponting, 2002, The natural history of protein domains, Annu. Rev. Biophys. Biomol. Struct., 31, 45, 10.1146/annurev.biophys.31.082901.134314

Andreeva, 2008, Data growth and its impact on the scop database: New developments, Nucleic Acids Res., 36, D419, 10.1093/nar/gkm993

Wang, 2009, The origin, evolution and structure of the protein world, Biochem. J., 417, 621, 10.1042/BJ20082063

Gough, 2001, Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure, J. Mol. Biol., 313, 903, 10.1006/jmbi.2001.5080

Wilson, 2007, The superfamily database in 2007: Families and functions, Nucleic Acids Res., 35, D308, 10.1093/nar/gkl910

Karplus, 2009, Sam-t08, hmm-based protein structure prediction, Nucleic Acids Res., 37, W492, 10.1093/nar/gkp403

Kim, 2011, The proteomic complexity and rise of the primordial ancestor of diversified life, BMC Evol. Biol., 11, 140:1, 10.1186/1471-2148-11-140

Vogel, 2004, Supra-domains: Evolutionary units larger than single protein domains, J. Mol. Biol., 336, 809, 10.1016/j.jmb.2003.12.026

Vogel, 2005, The relationship between domain duplication and recombination, J. Mol. Biol., 346, 355, 10.1016/j.jmb.2004.11.050

Vogel, 2006, Protein family expansions and biological complexity, PLoS Comput. Biol., 2, e48:0370, 10.1371/journal.pcbi.0020048

Vogel, C. Function annotation of SCOP domain superfamilies 1.73. Available online: http://supfam.cs.bris.ac.uk/SUPERFAMILY/function.html (accessed on 28 October 2011).

Moreira, 2009, Ten reasons to exclude viruses from the tree of life, Nat. Rev. Microbiol., 7, 306, 10.1038/nrmicro2108

Wang, 2011, Reductive evolution of proteomes and protein structures, Proc. Natl. Acad. Sci. USA, 108, 11954, 10.1073/pnas.1017361108

Koonin, 2008, The big bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups, Nat. Rev. Microbiol., 6, 925, 10.1038/nrmicro2030

Das, 2006, Analysis of nanoarchaeum equitans genome and proteome composition: Indications for hyperthermophilic and parasitic adaptation, BMC Genomics, 7, 186:1, 10.1186/1471-2164-7-186

Huber, 2002, A new phylum of archaea represented by a nanosized hyperthermophilic symbiont, Nature, 417, 63, 10.1038/417063a

Waters, 2003, The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism, Proc. Natl. Acad. Sci. USA, 100, 12984, 10.1073/pnas.1735403100

Randau, 2005, Nanoarchaeum equitans creates functional trnas from separate genes for their 5′- and 3′-halves, Nature, 433, 537, 10.1038/nature03233

Randau, 2008, Life without rnase p, Nature, 453, 120, 10.1038/nature06833

2006, Nanoarchaeum equitans is a living fossil, J. Theor. Biol., 242, 257, 10.1016/j.jtbi.2006.01.034

2007, The tree of life might be rooted in the branch leading to nanoarchaeota, Gene, 401, 108, 10.1016/j.gene.2007.07.004

Kim, K.M., and Caetano-Anolles, G The evolutionary history of protein fold families and proteomes confirms Archaea is the most ancient superkingdom. Ms. submitted.

Woese, 1980, Phylogenetic analysis of the mycoplasmas, Proc. Natl. Acad. Sci. USA, 77, 494, 10.1073/pnas.77.1.494

Chambaud, 2001, The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis, Nucleic Acids Res., 29, 2145, 10.1093/nar/29.10.2145

Gibson, 2010, Chemical synthesis of the mouse mitochondrial genome, Nat. Methods, 7, 901, 10.1038/nmeth.1515

Nakabachi, 2006, The 160-kilobase genome of the bacterial endosymbiont carsonella, Science, 314, 267, 10.1126/science.1134196

Forterre, 2010, Bacteria with a eukaryotic touch: A glimpse of ancient evolution?, Proc. Natl. Acad. Sci. USA, 107, 12739, 10.1073/pnas.1007720107

Franke, 2010, The compartmentalized bacteria of the planctomycetes-verrucomicrobia-chlamydiae superphylum have membrane coat-like proteins, PLoS Biol., 8, e1000281:1

Kamneva, 2010, Genome-wide influence of indel substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method, Genome Biol. Evol., 2, 870, 10.1093/gbe/evq071

Devos, 2010, Evolution. Intermediate steps, Science, 330, 1187, 10.1126/science.1196720

Katinka, 2001, P, Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi, Nature, 414, 450, 10.1038/35106579

Corradi, 2010, The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis, Nat. Commun., 1, 77, 10.1038/ncomms1082

Douglas, 2001, The highly reduced genome of an enslaved algal nucleus, Nature, 410, 1091, 10.1038/35074092

Peyretaillade, 1998, Microsporidian encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a lsu rrna reduced to the universal core, Nucleic Acids Res., 26, 3513, 10.1093/nar/26.15.3513

Martin, 1998, Gene transfer from organelles to the nucleus: How much, what happens, and why?, Plant Physiol., 118, 9, 10.1104/pp.118.1.9

Keeling, 2005, Causes and effects of nuclear genome reduction, Curr. Opin. Genet. Dev., 15, 601, 10.1016/j.gde.2005.09.003

Welch, 1938, The significance of the difference between two means when the population variances are unequal, Biometrika, 29, 350, 10.1093/biomet/29.3-4.350

Kim, 2007, The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture, Proc. Natl. Acad. Sci. USA, 104, 9358, 10.1073/pnas.0701214104

Ingham, 2011, Mechanisms and functions of Hedgehog signalling across the metazoa, Nat. Rev. Genet., 12, 393, 10.1038/nrg2984

2008, Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif, BMC Genomics, 9, 127:1