Review: Prediction of in Vivo Fates of Proteins in the Era of Genomics and Proteomics

Journal of Structural Biology - Tập 134 - Trang 103-116 - 2001
Kenta Nakai1
1Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo, 108-8639, Japan

Tài liệu tham khảo

Altuvia, 2000, Sequence signals for generation of antigenic peptides by the proteasome: Implications for proteasomal cleavage mechanism, J. Mol. Biol., 295, 879, 10.1006/jmbi.1999.3392 Anderson, 2000, Poor correspondence between predicted and experimental binding of peptides to class I MHC molecules, Tissue Antigens, 55, 519, 10.1034/j.1399-0039.2000.550603.x Antony, 1994, Statistical prediction of the locus of endoproteolytic cleavage of the nascent polypeptide in glycosylphosphatidylinositol-anchored proteins, Biochem. J., 298, 9, 10.1042/bj2980009 Apweiler, 2001, Proteome analysis database: Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes, Nucleic Acids Res., 29, 44, 10.1093/nar/29.1.44 Baerends, 2000, Sorting and function of peroxisomal membrane proteins, FEMS Microbiol. Rev., 24, 291, 10.1111/j.1574-6976.2000.tb00543.x Bailey, 1994, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, ISMB, 2, 28 Blom, 1999, Sequence and structure-based prediction of eukaryotic protein phosphorylation sites, J. Mol. Biol., 294, 1351, 10.1006/jmbi.1999.3310 Brakch, 2000, Favourable side-chain orientation of cleavage site dibasic residues of prohormone in proteolytic processing by prohormone convertase 1/3, Eur. J. Biochem., 267, 1626, 10.1046/j.1432-1327.2000.01154.x Bruce, 2000, Chloroplast transit peptides: Structure, function and evolution, Trends Cell Biol., 10, 440, 10.1016/S0962-8924(00)01833-X Brusic, 1998, MHCPEP, a database of MHC-binding peptides: Update 1997, Nucleic Acids Res., 26, 368, 10.1093/nar/26.1.368 Brusic, 1998, Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network, Bioinformatics, 14, 121, 10.1093/bioinformatics/14.2.121 Buus, 1999, Description and prediction of peptide MHC binding: The ‘human MHC project’, Curr. Opin. Immunol., 11, 209, 10.1016/S0952-7915(99)80035-1 Caro, 1997, In silico identification of glycosyl-phosphatidylinositol-anchored plasma-membrane and cell wall proteins of Saccharomyces cerevisiae, Yeast, 13, 1477, 10.1002/(SICI)1097-0061(199712)13:15<1477::AID-YEA184>3.0.CO;2-L Cedano, 1997, Relation between amino acid composition and cellular location of proteins, J. Mol. Biol., 266, 594, 10.1006/jmbi.1996.0804 Chou, 2001, Using subsite coupling to predict signal peptides, Protein Eng., 14, 75, 10.1093/protein/14.2.75 Chou, 1998, Using discriminant function for prediction of subcellular location of prokaryotic proteins, Biochem. Biophys. Res. Commun., 252, 63, 10.1006/bbrc.1998.9498 Chou, 1999, Protein subcellular location prediction, Protein Eng., 12, 107, 10.1093/protein/12.2.107 Chou, 1999, Prediction of membrane protein types and subcellular locations, Proteins, 34, 137, 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO;2-O Christlet, 1999, A database analysis of potential glycosylating Asn-X-Ser/Thr consensus sequences, Acta Crystallogr. Sect. D Biol. Crystallogr., 55, 1414, 10.1107/S0907444999006010 Claros, 1997, Prediction of N-terminal protein sorting signals, Curr. Opin. Struct. Biol., 7, 394, 10.1016/S0959-440X(97)80057-7 Cooper, 2001, GlycoSuiteDB: A new curated relational database of glycoprotein glycan structures and their biological sources, Nucleic Acids Res., 29, 332, 10.1093/nar/29.1.332 Cooper, 1999, BOLD—A biological O-linked glycan database, Electrophoresis, 20, 3589, 10.1002/(SICI)1522-2683(19991201)20:18<3589::AID-ELPS3589>3.0.CO;2-M Cuervo, 1998, Lysosomes, a meeting point of proteins, chaperones, and proteases, J. Mol. Med., 76, 6, 10.1007/s109-1998-8099-y Daniel, 1998, Relationship between peptide selectivities of human transporters associated with antigen processing and HLA class I molecules, J. Immunol., 161, 617, 10.4049/jimmunol.161.2.617 Devi, 1991, Consensus sequence for processing of peptide precursors at monobasic sites, FEBS Lett., 280, 189, 10.1016/0014-5793(91)80290-J Dice, 1990, Peptide sequences that target cytosolic proteins for lysosomal proteolysis, Trends Biochem. Sci., 15, 305, 10.1016/0968-0004(90)90019-8 Doubet, 1992, CarbBank, Glycobiology, 2, 505, 10.1093/glycob/2.6.505 Drawid, 2000, A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome, J. Mol. Biol., 301, 1059, 10.1006/jmbi.2000.3968 Drawid, 2000, Genome-wide analysis relating expression level with protein subcellular localization, Trends Genet., 16, 426, 10.1016/S0168-9525(00)02108-9 Eisenhaber, 1998, Wanted: Subcellular localization of proteins based on sequence, Trends Cell Biol., 8, 169, 10.1016/S0962-8924(98)01226-4 Eisenhaber, 1998, Sequence properties of GPI-anchored proteins near the omega-site: constraints for the polypeptide binding site of the putative transamidase, Protein Eng., 11, 1155, 10.1093/protein/11.12.1155 Eisenhaber, 1999, Prediction of potential GPI-modification sites in proprotein sequences, J. Mol. Biol., 292, 741, 10.1006/jmbi.1999.3069 Eisenhaber, 2000, Automated annotation of GPI anchor sites: Case study C. elegans, Trends Biochem. Sci., 25, 340, 10.1016/S0968-0004(00)01601-7 Emanuelsson, 1999, ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites, Protein Sci., 8, 978, 10.1110/ps.8.5.978 Emanuelsson, 2000, Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J. Mol. Biol., 300, 1005, 10.1006/jmbi.2000.3903 Engelhard, 1994, Structure of peptides associated with class I and class II MHC molecules, Annu. Rev. Immunol., 12, 181, 10.1146/annurev.iy.12.040194.001145 Fleischmann, 1995, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science, 269, 496, 10.1126/science.7542800 Garavelli, 2000, The RESID database of protein structure modifications: 2000 update, Nucleic Acids Res., 28, 209, 10.1093/nar/28.1.209 Garavelli, 2001, The RESID database of protein structure modifications and the NRL-3D sequence–structure database, Nucleic Acids Res., 29, 199, 10.1093/nar/29.1.199 Gavel, 1990, Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: Implications for protein engineering, Protein Eng., 3, 433, 10.1093/protein/3.5.433 Gribskov, 2001, PlantsP: A functional genomics database for plant phosphorylation, Nucleic Acids Res., 29, 111, 10.1093/nar/29.1.111 Gulukota, 1997, Two complementary methods for predicting peptidases binding major histocompatibility complex molecules, J. Mol. Biol., 267, 1258, 10.1006/jmbi.1997.0937 Gupta, 1999, O-GLYCBASE version 4.0: A revised database of O-glycosylated proteins, Nucleic Acids Res., 27, 370, 10.1093/nar/27.1.370 Gupta, 1999, Scanning the available Dictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites using neural networks, Glycobiology, 9, 1009, 10.1093/glycob/9.10.1009 Hamada, 1998, Screening for glycosilphosphatidylinositol (GPI)-dependent cell wall proteins in Saccharomyces cerevisiae, Mol. Gen. Genet., 258, 53, 10.1007/s004380050706 Hansen, 1995, Prediction of O-glycosylation of mammalian proteins: Specificity patterns of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase, Biochem. J., 308, 801, 10.1042/bj3080801 Hansen, 1998, NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility, Glycoconj. J., 15, 115, 10.1023/A:1006960004440 Hicke, 1999, Gettin' down with ubiquitin: Turning off cell-surface receptors, transporters and channels, Trends Cell Biol., 9, 107, 10.1016/S0962-8924(98)01491-3 Hicke, 1996, Ubiquitination of a yeast plasma membrane receptor signals its ligand-stimulated endocytosis, Cell, 84, 277, 10.1016/S0092-8674(00)80982-4 Hilt, 1996, Proteasomes: Destruction as a programme, Trends Biochem. Sci., 21, 96, 10.1016/S0968-0004(96)10012-8 Hochstrasser, 2000, Biochemistry. All in the ubiquitin family, Science, 289, 563, 10.1126/science.289.5479.563 Hofmann, 1999, The PROSITE database, its status in 1999, Nucleic Acids Res., 27, 215, 10.1093/nar/27.1.215 Holland, 1999, Protein modification: Docking sites for kinases, Curr. Biol., 9, R329, 10.1016/S0960-9822(99)80205-X Holzhütter, 1999, A theoretical approach towards the identification of cleavage-determining amino acid motifs of the 20 S proteasome, J. Mol. Biol., 286, 1251, 10.1006/jmbi.1998.2530 Hunter, 1998, The Croonian Lecture 1997. The phosphorylation of proteins on tyrosine: Its role in cell growth and disease, Philos. Trans. R. Soc. London B Biol. Sci., 353, 583, 10.1098/rstb.1998.0228 Jagla, 2000, Adaptive encoding neural networks for the recognition of human signal peptide cleavage sites, Bioinformatics, 16, 245, 10.1093/bioinformatics/16.3.245 Jans, 2000, Nuclear targeting signal recognition: A key control point in nuclear transport?, BioEssays, 22, 532, 10.1002/(SICI)1521-1878(200006)22:6<532::AID-BIES6>3.0.CO;2-O Johnson, 2001, Kabat Database and its applications: Future directions, Nucleic Acids Res., 29, 205, 10.1093/nar/29.1.205 Johnson, 1998, The Eleventh Datta Lecture. The structural basis for substrate recognition and control by protein kinases, FEBS Lett., 430, 1, 10.1016/S0014-5793(98)00606-1 Johnson, 1998, Degradation signal masking by heterodimerization of MATα2 and MATa1 blocks their mutual destruction by the ubiquitin–proteasome pathway, Cell, 94, 217, 10.1016/S0092-8674(00)81421-X Jung, 1998, Rules for the addition of O-linked N-acetylglucosamine to secreted proteins in Dictyostelium discoideum: In vivo studies on glycosylation of mucin MUC1 and MUC2 repeats, Eur. J. Biochem., 253, 517, 10.1046/j.1432-1327.1998.2530517.x Killian, 2000, How proteins adapt to a membrane–water interface, Trends Biochem. Sci., 25, 429, 10.1016/S0968-0004(00)01626-1 Kopito, 1997, ER quality control: The cytoplasmic connection, Cell, 88, 427, 10.1016/S0092-8674(00)81881-4 Kreegipuu, 1998, Statistical analysis of protein kinase specificity determinants, FEBS Lett., 430, 45, 10.1016/S0014-5793(98)00503-1 Kreegipuu, 1999, PhosphoBase, a database of phosphorylation sites: Release 2.0, Nucleic Acids Res., 27, 237, 10.1093/nar/27.1.237 Krogh, 2001, Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes, J. Mol. Biol., 305, 567, 10.1006/jmbi.2000.4315 Kuttler, 2000, An algorithm for the prediction of proteasomal cleavages, J. Mol. Biol., 298, 417, 10.1006/jmbi.2000.3683 Ladunga, 2000, Large-scale predictions of secretory proteins from mammalian genomic and EST sequences, Curr. Opin. Biotechnol., 11, 13, 10.1016/S0958-1669(99)00048-8 Laney, 1999, Substrate targeting in the ubiquitin system, Cell, 97, 427, 10.1016/S0092-8674(00)80752-7 Lefranc, 2001, IMGT, the international ImMunoGeneTics database, Nucleic Acids Res., 29, 207, 10.1093/nar/29.1.207 Lewis, 2000, Annotating eukaryote genomes, Curr. Opin. Struct. Biol., 10, 349, 10.1016/S0959-440X(00)00095-6 Mallios, 1997, An iterative algorithm for converting a class II MHC binding motif into a quantitative prediction model, Comput. Appl. Biosci., 13, 211 Marcotte, 2000, Localizing proteins in the cell from their phylogenetic profiles, Proc. Natl. Acad. Sci. USA, 97, 12115, 10.1073/pnas.220399497 Mellquist, 1998, The amino acid following an Asn-X-Ser/Thr sequon is an important determinant of N-linked core glycosylation efficiency, Biochemistry, 37, 6833, 10.1021/bi972217k Möller, 2000, A collection of well characterized integral membrane proteins, Bioinformatics, 16, 1159, 10.1093/bioinformatics/16.12.1159 Morrison, 2000, Protein kinases and phosphatases in the Drosophila genome, J. Cell Biol., 150, F57, 10.1083/jcb.150.2.F57 Muniz, 2000, Intracellular transport of GPI-anchored proteins, EMBO J., 19, 10, 10.1093/emboj/19.1.10 Nakai, 2000, Protein sorting signals and prediction of subcellular localization, Adv. Protein Chem., 54, 277, 10.1016/S0065-3233(00)54009-1 Nakai, 1999, PSORT: A program for detecting sorting signals in proteins and predicting their subcellular localization, Trends Biochem. Sci., 24, 34, 10.1016/S0968-0004(98)01336-X Nakai, 1988, Prediction of in vivo modification sites of proteins from their primary structures, J. Biochem., 104, 693, 10.1093/oxfordjournals.jbchem.a122535 Nakai, 1991, Expert system for predicting protein localization sites in gram-negative bacteria, Proteins Struct. Funct. Genet., 11, 95, 10.1002/prot.340110203 Nakai, 1992, A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics, 14, 897, 10.1016/S0888-7543(05)80111-9 Nakashima, 1994, Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. Mol. Biol., 238, 54, 10.1006/jmbi.1994.1267 Nambara, 1999, Protein farnesylation in plants: A greasy tale, Curr. Opin. Plant Biol., 2, 392, 10.1016/S1369-5266(99)00010-2 Nielsen, 1997, Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Protein Eng., 10, 1, 10.1093/protein/10.1.1 Nielsen, 1999, Machine learning approaches for the prediction of signal peptides and other protein sorting signals, Protein Eng., 12, 3, 10.1093/protein/12.1.3 Ota, 1998, Forced transmembrane orientation of hydrophilic polypeptide segments in multispanning membrane proteins, Mol. Cell, 2, 495, 10.1016/S1097-2765(00)80149-5 Pandey, 2000, Proteomics to study genes and genomes, Nature, 405, 837, 10.1038/35015709 Parker, 1994, Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains, J. Immunol., 152, 163, 10.4049/jimmunol.152.1.163 Parodi, 2000, Role of N-oligosaccharide endoplasmic reticulum processing reactions in glycoprotein folding and degradation, Biochem. J., 348, 1, 10.1042/bj3480001 Pennel, 1998, Cell walls: Structures and signals, Curr. Opin. Plant Biol., 1, 504, 10.1016/S1369-5266(98)80043-5 Petrescu, 1999, A statistical analysis of N- and O-glycan linkage conformations from crystallographic data, Glycobiology, 9, 343, 10.1093/glycob/9.4.343 Pfanner, 2000, Protein sorting: Recognizing mitochondrial presequences, Curr. Biol., 10, R412, 10.1016/S0960-9822(00)00507-8 Rammensee, 1999, SYFPEITHI: Database for MHC ligands and peptide motifs, Immunogenetics, 50, 213, 10.1007/s002510050595 Rammensee, 1997 Rawlings, 2000, MEROPS: The peptidase database, Nucleic Acids Res., 28, 323, 10.1093/nar/28.1.323 Rechsteiner, 1996, PEST sequences and regulation by proteolysis, Trends Biochem. Sci., 21, 267, 10.1016/S0968-0004(96)10031-1 Reinhardt, 1998, Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res., 26, 2230, 10.1093/nar/26.9.2230 Resh, 1999, Fatty acylation of proteins: New insights into membrane targeting of myristoylated and palmitoylated proteins, Biochim. Biophys. Acta, 1451, 1, 10.1016/S0167-4889(99)00075-0 Rholam, 1995, Role of amino acid sequence flanking dibasic cleavage sites in precursor proteolytic processing. The importance of the first residue C-terminal of the cleavage site, Eur. J. Biochem., 227, 707, 10.1111/j.1432-1033.1995.tb20192.x Robinson, 2001, IMGT/HLA database—A sequence database for the human major histocompatibility complex, Nucleic Acids Res., 29, 210, 10.1093/nar/29.1.210 Rock, 1999, Degradation of cell proteins and the generation of MHC class I-presented peptides, Annu. Rev. Immunol., 17, 739, 10.1146/annurev.immunol.17.1.739 Rogers, 1986, Amino acid sequences common to rapidly degraded proteins: The PEST hypothesis, Science, 234, 364, 10.1126/science.2876518 Rouze, 1999, Genome annotation: Which tools do we have for it?, Curr. Opin. Plant Biol., 2, 90, 10.1016/S1369-5266(99)80019-3 Ruiz, 2000, IMGT, the international ImMunoGeneTics database, Nucleic Acids Res., 28, 219, 10.1093/nar/28.1.219 Schneider, 1999, How many potentially secreted proteins are contained in a bacterial genome?, Gene, 237, 113, 10.1016/S0378-1119(99)00310-8 Schönbach, 2000, FIMM, a database of functional molecular immunology, Nucleic Acids Res., 28, 222, 10.1093/nar/28.1.222 Sinensky, 2000, Recent advances in the study of prenylated proteins, Biochim. Biophys. Acta, 1484, 93, 10.1016/S1388-1981(00)00009-3 Sonnhammer, 1998, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc. Int. Conf. Intell. Syst. Mol. Biol., 6, 175 Stanford, 2000, ADEPTs: Information necessary for subcellular distribution of eukaryotic sorting isozymes resides in domains missing from eubacterial and archaeal counterparts, Nucleic Acids Res., 28, 383, 10.1093/nar/28.2.383 Steiner, 1992, The new enzymology of precursor processing endopeptidases, J. Biol. Chem., 267, 23435, 10.1016/S0021-9258(18)35852-6 Suzuki, 1999, Degradation signals in the lysine–asparagine sequence space, EMBO J., 18, 6017, 10.1093/emboj/18.21.6017 Tanaka, 1998, The proteasome: A protein-destroying machine, Genes Cells, 3, 499, 10.1046/j.1365-2443.1998.00207.x Thanassi, 2000, Multiple pathways allow protein secretion across the bacterial outer membrane, Curr. Opin. Cell Biol., 12, 420, 10.1016/S0955-0674(00)00111-3 Tusnády, 1998, Principles governing amino acid composition of integral membrane proteins: Application to topology prediction, J. Mol. Biol., 283, 489, 10.1006/jmbi.1998.2107 Udaka, 2000, An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries, Immunogenetics, 51, 816, 10.1007/s002510000217 Uebel, 1999, Specificity of the proteasome and the TAP transporter, Curr. Opin. Immunol., 11, 203, 10.1016/S0952-7915(99)80034-X van den Steen, 1998, Concepts and principles of O-linked glycosylation, Crit. Rev. Biochem. Mol. Biol., 33, 151, 10.1080/10409239891204198 van Geest, 2000, Membrane topology and insertion of membrane proteins: Search for topogenic signals, Microbiol. Mol. Biol. Rev., 64, 13, 10.1128/MMBR.64.1.13-33.2000 van Kuik, 1992, A 1H NMR database computer program for the analysis of the primary structure of complex carbohydrates, Carbohydr. Res., 235, 53, 10.1016/0008-6215(92)80078-F Varshavsky, 1996, The N-end rule: Functions, mysteries, uses, Proc. Natl Acad. Sci. USA, 93, 12142, 10.1073/pnas.93.22.12142 Varshavsky, 1997, The N-end rule pathway of protein degradation, Genes Cells, 2, 13, 10.1046/j.1365-2443.1997.1020301.x Villadangos, 2000, Proteolysis in MHC class II antigen presentation: Who's in charge?, Immunity, 12, 233, 10.1016/S1074-7613(00)80176-4 Wang, 2000, Calpain and caspase: Can you tell the difference?, Trends Neurosci., 23, 20, 10.1016/S0166-2236(99)01479-4 Wickner, 1999, Posttranslational quality control: Folding, refolding, and degrading proteins, Science, 286, 1888, 10.1126/science.286.5446.1888 Wilkinson, 2000, Ubiquitination and deubiquitination: Targeting of proteins for degradation by the proteasome, Semin. Cell Dev. Biol., 11, 141, 10.1006/scdb.2000.0164 Wilson, 1991, Amino acid distributions around O-linked glycosylation sites, Biochem. J., 275, 529, 10.1042/bj2750529 Yan, 1999, Sequence pattern for the occurrence of N-glycosylation in proteins, J. Protein Chem., 18, 511, 10.1023/A:1020643015113 Yuan, 1999, Prediction of protein subcellular locations using Markov chain models, FEBS Lett., 451, 23, 10.1016/S0014-5793(99)00506-2 Zhou, 1999, Proteolytic processing in the secretory pathway, J. Biol. Chem., 30, 20745, 10.1074/jbc.274.30.20745