Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms
Tóm tắt
The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.
Từ khóa
Tài liệu tham khảo
Kim, 2011, Proteome evolution and the metabolic origins of translation and cellular life, J. Mol. Evol., 72, 14, 10.1007/s00239-010-9400-9
Lesk, A.M. (2001). Introduction to Protein Architecture, Oxford University Press.
Cordes, 1996, Sequence space, folding and protein design, Curr. Opin. Struct. Biol., 6, 3, 10.1016/S0959-440X(96)80088-1
Linderstrom-Lang, K.U., and Schellman, J.A. (1959). The Enzymes, Academic Press.
Wang, 2009, The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world, Structure, 17, 66, 10.1016/j.str.2008.11.008
Vogel, 2004, Structure, function and evolution of multidomain proteins, Curr. Opin. Struct. Biol., 14, 208, 10.1016/j.sbi.2004.03.011
Wang, 2007, Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world, Genome Res., 17, 1572, 10.1101/gr.6454307
Gerstein, 1998, Comparing genomes in terms of protein structure: Surveys of a finite parts list, FEMS Microbiol. Rev., 22, 277, 10.1111/j.1574-6976.1998.tb00371.x
Murzin, 1995, Scop: A structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 536, 10.1016/S0022-2836(05)80134-2
Orengo, 1997, Cath—A hierarchic classification of protein domain structures, Structure, 5, 1093, 10.1016/S0969-2126(97)00260-8
Riley, 1997, Protein evolution viewed through escherichia coli protein sequences: Introducing the notion of a structural segment of homology, the module, J. Mol. Biol., 268, 857, 10.1006/jmbi.1997.1003
Ponting, 2002, The natural history of protein domains, Annu. Rev. Biophys. Biomol. Struct., 31, 45, 10.1146/annurev.biophys.31.082901.134314
Andreeva, 2008, Data growth and its impact on the scop database: New developments, Nucleic Acids Res., 36, D419, 10.1093/nar/gkm993
Wang, 2009, The origin, evolution and structure of the protein world, Biochem. J., 417, 621, 10.1042/BJ20082063
Gough, 2001, Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure, J. Mol. Biol., 313, 903, 10.1006/jmbi.2001.5080
Wilson, 2007, The superfamily database in 2007: Families and functions, Nucleic Acids Res., 35, D308, 10.1093/nar/gkl910
Karplus, 2009, Sam-t08, hmm-based protein structure prediction, Nucleic Acids Res., 37, W492, 10.1093/nar/gkp403
Kim, 2011, The proteomic complexity and rise of the primordial ancestor of diversified life, BMC Evol. Biol., 11, 140:1, 10.1186/1471-2148-11-140
Vogel, 2004, Supra-domains: Evolutionary units larger than single protein domains, J. Mol. Biol., 336, 809, 10.1016/j.jmb.2003.12.026
Vogel, 2005, The relationship between domain duplication and recombination, J. Mol. Biol., 346, 355, 10.1016/j.jmb.2004.11.050
Vogel, 2006, Protein family expansions and biological complexity, PLoS Comput. Biol., 2, e48:0370, 10.1371/journal.pcbi.0020048
Vogel, C. Function annotation of SCOP domain superfamilies 1.73. Available online: http://supfam.cs.bris.ac.uk/SUPERFAMILY/function.html (accessed on 28 October 2011).
Moreira, 2009, Ten reasons to exclude viruses from the tree of life, Nat. Rev. Microbiol., 7, 306, 10.1038/nrmicro2108
Wang, 2011, Reductive evolution of proteomes and protein structures, Proc. Natl. Acad. Sci. USA, 108, 11954, 10.1073/pnas.1017361108
Koonin, 2008, The big bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups, Nat. Rev. Microbiol., 6, 925, 10.1038/nrmicro2030
Das, 2006, Analysis of nanoarchaeum equitans genome and proteome composition: Indications for hyperthermophilic and parasitic adaptation, BMC Genomics, 7, 186:1, 10.1186/1471-2164-7-186
Huber, 2002, A new phylum of archaea represented by a nanosized hyperthermophilic symbiont, Nature, 417, 63, 10.1038/417063a
Waters, 2003, The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism, Proc. Natl. Acad. Sci. USA, 100, 12984, 10.1073/pnas.1735403100
Randau, 2005, Nanoarchaeum equitans creates functional trnas from separate genes for their 5′- and 3′-halves, Nature, 433, 537, 10.1038/nature03233
2006, Nanoarchaeum equitans is a living fossil, J. Theor. Biol., 242, 257, 10.1016/j.jtbi.2006.01.034
2007, The tree of life might be rooted in the branch leading to nanoarchaeota, Gene, 401, 108, 10.1016/j.gene.2007.07.004
Kim, K.M., and Caetano-Anolles, G The evolutionary history of protein fold families and proteomes confirms Archaea is the most ancient superkingdom. Ms. submitted.
Woese, 1980, Phylogenetic analysis of the mycoplasmas, Proc. Natl. Acad. Sci. USA, 77, 494, 10.1073/pnas.77.1.494
Chambaud, 2001, The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis, Nucleic Acids Res., 29, 2145, 10.1093/nar/29.10.2145
Gibson, 2010, Chemical synthesis of the mouse mitochondrial genome, Nat. Methods, 7, 901, 10.1038/nmeth.1515
Nakabachi, 2006, The 160-kilobase genome of the bacterial endosymbiont carsonella, Science, 314, 267, 10.1126/science.1134196
Forterre, 2010, Bacteria with a eukaryotic touch: A glimpse of ancient evolution?, Proc. Natl. Acad. Sci. USA, 107, 12739, 10.1073/pnas.1007720107
Franke, 2010, The compartmentalized bacteria of the planctomycetes-verrucomicrobia-chlamydiae superphylum have membrane coat-like proteins, PLoS Biol., 8, e1000281:1
Kamneva, 2010, Genome-wide influence of indel substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method, Genome Biol. Evol., 2, 870, 10.1093/gbe/evq071
Katinka, 2001, P, Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi, Nature, 414, 450, 10.1038/35106579
Corradi, 2010, The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis, Nat. Commun., 1, 77, 10.1038/ncomms1082
Douglas, 2001, The highly reduced genome of an enslaved algal nucleus, Nature, 410, 1091, 10.1038/35074092
Peyretaillade, 1998, Microsporidian encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a lsu rrna reduced to the universal core, Nucleic Acids Res., 26, 3513, 10.1093/nar/26.15.3513
Martin, 1998, Gene transfer from organelles to the nucleus: How much, what happens, and why?, Plant Physiol., 118, 9, 10.1104/pp.118.1.9
Keeling, 2005, Causes and effects of nuclear genome reduction, Curr. Opin. Genet. Dev., 15, 601, 10.1016/j.gde.2005.09.003
Welch, 1938, The significance of the difference between two means when the population variances are unequal, Biometrika, 29, 350, 10.1093/biomet/29.3-4.350
Kim, 2007, The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture, Proc. Natl. Acad. Sci. USA, 104, 9358, 10.1073/pnas.0701214104
Ingham, 2011, Mechanisms and functions of Hedgehog signalling across the metazoa, Nat. Rev. Genet., 12, 393, 10.1038/nrg2984
2008, Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif, BMC Genomics, 9, 127:1