‘Unknown’ proteins and ‘orphan’ enzymes: the missing half of the engineering parts list – and how to find it

Biochemical Journal - Tập 425 Số 1 - Trang 1-11 - 2010
Andrew D. Hanson1, Anne Pribat1, Jeffrey C. Waller1, Valérie de Crécy‐Lagard2
1Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, U.S.A.
2Microbiology and Cell Science Department, University of Florida, Gainesville, FL 32611, U.S.A.

Tóm tắt

Like other forms of engineering, metabolic engineering requires knowledge of the components (the ‘parts list’) of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of ‘unknown’ proteins and ‘orphan’ enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the ‘missing parts list’ problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.

Từ khóa


Tài liệu tham khảo

Stephanopoulos, 1998, Metabolic Engineering: Principles and Methodologies

Hanson, 2002, Plant metabolic engineering: entering the S curve, Metab. Eng., 4, 1, 10.1006/mben.2001.0213

Capell, 2004, Progress in plant metabolic engineering, Curr. Opin. Biotechnol., 15, 148, 10.1016/j.copbio.2004.01.009

Wu, 2008, Metabolic engineering of natural products in plants; tools of the trade and challenges for the future, Curr. Opin. Biotechnol., 19, 145, 10.1016/j.copbio.2008.02.007

Kunze, 2002, Metabolic engineering of plants: the role of membrane transport, Metab. Eng., 4, 57, 10.1006/mben.2001.0207

Yazaki, 2005, Transporters of secondary metabolites, Curr. Opin. Plant Biol., 8, 301, 10.1016/j.pbi.2005.03.011

Stepansky, 2006, Lysine catabolism, an effective versatile regulator of lysine level in plants, Amino Acids, 30, 121, 10.1007/s00726-005-0246-1

Galperin, 2004, ‘Conserved hypothetical’ proteins: prioritization of targets for experimental study, Nucleic Acids Res., 32, 5452, 10.1093/nar/gkh885

Karp, 2004, Call for an enzyme genomics initiative, Genome Biol., 5, 401, 10.1186/gb-2004-5-8-401

Koonin, 2002, Sequence – Evolution – Function: Computational Approaches in Comparative Genomics

Durot, 2009, Genome-scale models of bacterial metabolism: reconstruction and applications, FEMS Microbiol. Rev., 33, 164, 10.1111/j.1574-6976.2008.00146.x

Feist, 2009, Reconstruction of biochemical networks in microorganisms, Nat. Rev. Microbiol., 7, 129, 10.1038/nrmicro1949

Smid, 2005, Functional ingredient production: application of global metabolic models, Curr. Opin. Biotechnol., 16, 190, 10.1016/j.copbio.2005.03.001

Pérez-Pantoja, 2008, Metabolic reconstruction of aromatic compounds degradation from the genome of the amazing pollutant-degrading bacterium Cupriavidus necator JMP134, FEMS Microbiol. Rev., 32, 736, 10.1111/j.1574-6976.2008.00122.x

Borenstein, 2008, Large-scale reconstruction and phylogenetic analysis of metabolic environments, Proc. Natl. Acad. Sci. U.S.A., 105, 14482, 10.1073/pnas.0806162105

Osterman, 2007, A subsystems-based approach to the identification of drug targets in bacterial pathogens, Prog. Drug Res., 64, 132

Pinney, 2007, Metabolic reconstruction and analysis for parasite genomes, Trends Parasitol., 23, 548, 10.1016/j.pt.2007.08.013

Thiele, 2005, Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-scale characterization of single- and double-deletion mutants, J. Bacteriol., 187, 5818, 10.1128/JB.187.16.5818-5830.2005

Ghosh, 2009, “Omics” data and levels of evidence for biomarker discovery, Genomics, 93, 13, 10.1016/j.ygeno.2008.07.006

Dhamoon, 2007, The ongoing evolution of proteomics in malignancy, Drug Discov. Today, 12, 700, 10.1016/j.drudis.2007.07.015

Weinglass, 2004, Integrating mass spectrometry into membrane protein drug discovery, Curr. Opin. Drug Discov. Dev., 7, 589

Walgren, 2004, Application of proteomic technologies in the drug development process, Toxicol. Lett., 149, 377, 10.1016/j.toxlet.2003.12.047

Osterman, 2003, Missing genes in metabolic pathways: a comparative genomics approach, Curr. Opin. Chem. Biol., 7, 238, 10.1016/S1367-5931(03)00027-9

Frishman, 2007, Protein annotation at genomic scale: the current status, Chem. Rev., 107, 3448, 10.1021/cr068303k

Venter, 2001, The sequence of the human genome, Science, 291, 1304, 10.1126/science.1058040

Horan, 2008, Annotating genes of known and unknown function by large-scale coexpression analysis, Plant Physiol., 147, 41, 10.1104/pp.108.117366

Siew, 2004, The ORFanage: an ORFan database, Nucleic Acids Res., 32, D281, 10.1093/nar/gkh116

Tatusov, 2001, The COG database: new developments in phylogenetic classification of proteins from complete genomes, Nucleic Acids Res., 29, 22, 10.1093/nar/29.1.22

Voit, 2003, Extending knowledge of Escherichia coli metabolism by modeling and experiment, Genome Biol., 4, 235, 10.1186/gb-2003-4-11-235

Galperin, 1999, Functional genomics and enzyme evolution: homologous and analogous enzymes encoded in microbial genomes, Genetica, 106, 159, 10.1023/A:1003705601428

Pouliot, 2007, A survey of orphan enzyme activities, BMC Bioinformatics, 8, 244, 10.1186/1471-2105-8-244

Lespinet, 2006, ORENZA: a web resource for studying ORphan ENZyme activities, BMC Bioinformatics, 7, 436, 10.1186/1471-2105-7-436

Chen, 2007, Distribution of orphan metabolic activities, Trends Biotechnol., 25, 343, 10.1016/j.tibtech.2007.06.001

Janitz, 2007, Assigning functions to genes: the main challenge of the post-genomics era, Rev. Physiol. Biochem. Pharmacol., 159, 115

Roberts, 2004, Identifying protein function: a call for community action, PLoS Biol., 2, E42, 10.1371/journal.pbio.0020042

Chen, 2006, OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups, Nucleic Acids Res., 34, D363, 10.1093/nar/gkj123

Hedges, 2004, A molecular timescale of eukaryote evolution and the rise of complex multicellular life, BMC Evol. Biol., 4, 2, 10.1186/1471-2148-4-2

Tian, 2003, How well is enzyme function conserved as a function of pairwise sequence identity?, J. Mol. Biol., 333, 863, 10.1016/j.jmb.2003.08.057

Bhaduri, 2004, Conserved spatially interacting motifs of protein superfamilies: application to fold recognition and function annotation of genome data, Proteins, 54, 657, 10.1002/prot.10638

Galperin, 2000, Who's your neighbor? New computational approaches for functional genomics, Nat. Biotechnol., 18, 609, 10.1038/76443

Kharchenko, 2006, Identifying metabolic enzymes with multiple types of association evidence, BMC Bioinformatics, 7, 177, 10.1186/1471-2105-7-177

de Crécy-Lagard, 2007, Identification of genes encoding tRNA modification enzymes by comparative genomics, Methods Enzymol., 425, 153, 10.1016/S0076-6879(07)25007-4

de Crécy-Lagard, 2007, Finding novel metabolic genes through plant–prokaryote phylogenomics, Trends Microbiol., 15, 563, 10.1016/j.tim.2007.10.008

Aravind, 2000, Guilt by association: contextual information in genome analysis, Genome Res., 10, 1074, 10.1101/gr.10.8.1074

Overbeek, 1999, The use of gene clusters to infer functional coupling, Proc. Natl. Acad. Sci. U.S.A., 96, 2896, 10.1073/pnas.96.6.2896

Date, 2003, Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages, Nat. Biotechnol., 21, 1055, 10.1038/nbt861

von Mering, 2003, Genome evolution reveals biochemical networks and functional modules, Proc. Natl. Acad. Sci. U.S.A., 100, 15428, 10.1073/pnas.2136809100

Lee, 2003, Genomic gene clustering analysis of pathways in eukaryotes, Genome Res., 13, 875, 10.1101/gr.737703

Field, 2008, Metabolic diversification: independent assembly of operon-like gene clusters in different plants, Science, 320, 543, 10.1126/science.1154990

Yanai, 2002, Identifying functional links between genes using conserved chromosomal proximity, Trends Genet., 18, 176, 10.1016/S0168-9525(01)02621-X

Makarova, 2003, Filling a gap in the central metabolism of archaea: prediction of a novel aconitase by comparative-genomic analysis, FEMS Microbiol. Lett., 227, 17, 10.1016/S0378-1097(03)00596-2

Pellegrini, 1999, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc. Natl. Acad. Sci. U.S.A., 96, 4285, 10.1073/pnas.96.8.4285

Suhre, 2007, Inference of gene function based on gene fusion events: the Rosetta-stone method, Methods Mol. Biol., 396, 31, 10.1007/978-1-59745-515-2_3

Enright, 1999, Protein interaction maps for complete genomes based on gene fusion events, Nature, 402, 86, 10.1038/47056

Gelfand, 2000, Comparative analysis of regulatory patterns in bacterial genomes, Brief. Bioinform., 1, 357, 10.1093/bib/1.4.357

Winkler, 2005, Regulation of bacterial gene expression by riboswitches, Annu. Rev. Microbiol., 59, 487, 10.1146/annurev.micro.59.030804.121336

Selkov, 1997, A reconstruction of the metabolism of Methanococcus jannaschii from sequence data, Gene, 197, GC11, 10.1016/S0378-1119(97)00307-7

Bono, 1998, Reconstruction of amino acid biosynthesis pathways from the complete genome sequence, Genome Res., 8, 203, 10.1101/gr.8.3.203

Overbeek, 2003, Curation is forever: comparative genomics approaches to functional annotation, Targets, 2, 138, 10.1016/S1477-3627(03)02337-7

Overbeek, 2005, The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes, Nucleic Acids Res., 33, 5691, 10.1093/nar/gki866

Ye, 2005, Automatic detection of subsystem/pathway variants in genome analysis, Bioinformatics, 21, i478, 10.1093/bioinformatics/bti1052

Gollub, 2006, The Stanford Microarray Database: a user's guide, Methods Mol. Biol., 338, 191

Obayashi, 2007, ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis, Nucleic Acids Res., 35, D863, 10.1093/nar/gkl783

Laule, 2006, Web-based analysis of the mouse transcriptome using Genevestigator, BMC Bioinformatics, 7, 311, 10.1186/1471-2105-7-311

Salwinski, 2004, The Database of Interacting Proteins: 2004 update, Nucleic Acids Res., 32, D449, 10.1093/nar/gkh086

Gerdes, 2006, Essential genes on metabolic maps, Curr. Opin. Biotechnol., 17, 448, 10.1016/j.copbio.2006.08.006

Fernandez-Ricaud, 2006, PROPHECY: a yeast phenome database, update 2006, Nucleic Acids Res., 35, D463, 10.1093/nar/gkl1029

Tzafrir, 2003, The Arabidopsis SeedGenes Project, Nucleic Acids Res., 31, 90, 10.1093/nar/gkg028

Todd, 2005, Progress of structural genomics initiatives: an analysis of solved target structures, J. Mol. Biol., 348, 1235, 10.1016/j.jmb.2005.03.037

Hermann, 2007, Structure-based activity prediction for an enzyme of unknown function, Nature, 448, 775, 10.1038/nature05981

Liolios, 2008, The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Res., 36, D475, 10.1093/nar/gkm884

Zhu, 2005, Engineering of Bacillus subtilis for enhanced total synthesis of folic acid, Appl. Environ. Microbiol., 71, 7122, 10.1128/AEM.71.11.7122-7129.2005

Wegkamp, 2004, Transformation of folate-consuming Lactobacillus gasseri into a folate producer, Appl. Environ. Microbiol., 70, 3146, 10.1128/AEM.70.5.3146-3148.2004

Bekaert, 2008, Folate biofortification in food plants, Trends Plant Sci., 13, 28, 10.1016/j.tplants.2007.11.001

Suzuki, 1974, The biosynthesis of folic acid. XII. Purification and properties of dihydroneopterin triphosphate pyrophosphohydrolase, J. Biol. Chem., 249, 2405, 10.1016/S0021-9258(19)42745-2

Klaus, 2005, A Nudix enzyme removes pyrophosphate from dihydroneopterin triphosphate in the folate synthesis pathway of bacteria and plants, J. Biol. Chem., 280, 5274, 10.1074/jbc.M413759200

Gabelli, 2007, Structure and function of the E. coli dihydroneopterin triphosphate pyrophosphatase: a Nudix enzyme involved in folate biosynthesis, Structure, 15, 1014, 10.1016/j.str.2007.06.018

Berríos-Rivera, 2002, The effect of NAPRTase overexpression on the total levels of NAD, the NADH/NAD+ ratio, and the distribution of metabolites in Escherichia coli, Metab. Eng., 4, 238, 10.1006/mben.2002.0229

Heuser, 2007, Enhancement of the NAD(P)(H) pool in Escherichia coli for biotransformation, Eng. Life Sci., 7, 343, 10.1002/elsc.200720203

Kurnasov, 2003, NAD biosynthesis: identification of the tryptophan to quinolinate pathway in bacteria, Chem. Biol., 10, 1195, 10.1016/j.chembiol.2003.11.011

Lima, 2009, NAD biosynthesis evolution in bacteria: lateral gene transfer of kynurenine pathway in Xanthomonadales and Flavobacteriales, Mol. Biol. Evol., 26, 399, 10.1093/molbev/msn261

Khannapho, 2008, Selection of objective function in genome scale flux balance analysis for process feed development in antibiotic production, Metab. Eng., 10, 227, 10.1016/j.ymben.2008.06.003

IJlst, 2002, 3-Methylglutaconic aciduria type I is caused by mutations in AUH, Am. J. Hum. Genet., 71, 1463, 10.1086/344712

Ly, 2003, Mutations in the AUH gene cause 3-methylglutaconic aciduria type I, Hum. Mutat., 21, 401, 10.1002/humu.10202

Le Rudulier, 1984, Molecular biology of osmoregulation, Science, 224, 1064, 10.1126/science.224.4653.1064

McCue, 1990, Drought and salt tolerance: towards understanding and application, Trends Biotechnol., 8, 358, 10.1016/0167-7799(90)90225-M

Lamark, 1991, DNA sequence and analysis of the bet genes encoding the osmoregulatory choline–glycine betaine pathway of Escherichia coli, Mol. Microbiol., 5, 1049, 10.1111/j.1365-2958.1991.tb01877.x

Kempf, 1998, Uptake and synthesis of compatible solutes as microbial stress responses to high-osmolality environments, Arch. Microbiol., 170, 319, 10.1007/s002030050649

Weretilnyk, 1990, Molecular cloning of a plant betaine–aldehyde dehydrogenase, an enzyme implicated in adaptation to salinity and drought, Proc. Natl. Acad. Sci. U.S.A., 87, 2745, 10.1073/pnas.87.7.2745

Lerma, 1988, Oxygen-18 and deuterium labeling studies of choline oxidation by spinach and sugar beet, Plant Physiol., 88, 695, 10.1104/pp.88.3.695

Brouquisse, 1989, Evidence for a ferredoxin-dependent choline monooxygenase from spinach chloroplast stroma, Plant Physiol., 90, 322, 10.1104/pp.90.1.322

Burnet, 1995, Assay, purification, and partial characterization of choline monooxygenase from spinach, Plant Physiol., 108, 581, 10.1104/pp.108.2.581

Rathinasabapathi, 1997, Choline monooxygenase, an unusual iron–sulfur enzyme catalyzing the first step of glycine betaine synthesis in plants: prosthetic group characterization and cDNA cloning, Proc. Natl. Acad. Sci. U.S.A., 94, 3454, 10.1073/pnas.94.7.3454

Mason, 1992, The electron-transport proteins of hydroxylating bacterial dioxygenases, Annu. Rev. Microbiol., 46, 277, 10.1146/annurev.mi.46.100192.001425