Locating proteins in the cell using TargetP, SignalP and related tools

Nature Protocols - Tập 2 Số 4 - Trang 953-971 - 2007
Olof Emanuelsson1, Søren Brunak2, Gunnar von Heijne3, Henrik Nielsen2
1Stockholm Bioinformatics Center, Albanova, Stockholm University, Stockholm, Sweden
2Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, Lyngby, Denmark
3Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden

Tóm tắt

Từ khóa


Tài liệu tham khảo

Burns, N. et al. Large-scale analysis of gene expression, protein localization, and gene disruption in Saccharomyces cerevisiae . Genes Dev. 8, 1087–1105 (1994).

Chalfie, M., Tu, Y., Euskirchen, G., Ward, W.W. & Prasher, D.C. Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994).

Sawin, K.E. & Nurse, P. Identification of fission yeast nuclear markers using random polypeptide fusions with green fluorescent protein. Proc. Natl. Acad. Sci. USA 93, 15146–15151 (1996).

Kumar, A. et al. Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002).

Simpson, J.C., Wellenreuther, R., Poustka, A., Pepperkok, R. & Wiemann, S. Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Rep. 1, 287–292 (2000).

Shevchenko, A. et al. Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc. Natl. Acad. Sci. USA 93, 14440–14445 (1996).

Peltier, J.-B. et al. Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins. Plant Cell 12, 319–341 (2000).

Yates, J.R., Gilchrist, A., Howell, K.E. & Bergeron, J.J. Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 (2005).

Andersen, J.S. et al. Directed proteomic analysis of the human nucleolus. Curr. Biol. 12, 1–11 (2002).

Andersen, J.S. et al. Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570–574 (2003).

Foster, L.J. et al. A mammalian organelle map by protein correlation profiling. Cell 125, 187–199 (2006).

Andersen, J.S. et al. Nucleolar proteome dynamics. Nature 433, 77–83 (2005).

Agaton, C. et al. Affinity proteomics for systematic protein profiling of chromosome 21 gene products in human tissues. Mol. Cell. Proteomics 2, 405–414 (2003).

Uhlen, M. et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 4, 1920–1932 (2005).

Hinsby, A.M. et al. A wiring of the human nucleolus. Mol. Cell 22, 285–295 (2006).

von Heijne, G. The signal peptide. J. Membr. Biol. 115, 195–201 (1990).

Pugsley, A.P., Francetic, O., Possot, O.M., Sauvonnet, N. & Hardie, K.R. Recent progress and future directions in studies of the main terminal branch of the general secretory pathway in Gram-negative bacteria—a review. Gene 192, 13–19 (1997).

van Vliet, C., Thomas, E.C., Merino-Trigo, A., Teasdale, R.D. & Gleeson, P.A. Intracellular sorting and transport of proteins. Prog. Biophys. Mol. Biol. 83, 1–45 (2003).

Bendtsen, J.D., Jensen, L.J., Blom, N., von Heijne, G. & Brunak, S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 17, 349356 (2004).

Binnewies, T.T. et al. Genome update: protein secretion systems in 225 bacterial genomes. Microbiology 151, 1013–1016 (2005).

Henderson, I.R., Navarro-Garcia, F., Desvaux, M., Fernandez, R.C. & Ala'Aldeen, D. Type V protein secretion pathway: the autotransporter story. Microbiol. Mol. Biol. Rev. 68, 692–744 (2004).

Ghosh, P. Process of protein transport by the type III secretion system. Microbiol. Mol. Biol. Rev. 68, 771–795 (2004).

Bendtsen, J.D., Kiemer, L., Fausbøll, A. & Brunak, S. Non-classical protein secretion in bacteria. BMC Microbiol. 5, 58 (2005).

Schatz, G. & Dobberstein, B. Common principles of protein translocation across membranes. Science 271, 1519–1526 (1996).

von Heijne, G., Steppuhn, J. & Hermann, S.G. Domain structure of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 180, 535–545 (1989).

Emanuelsson, O., Nielsen, H. & von Heijne, G. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8, 978–984 (1999).

Bruce, B.D. Chloroplast transit peptides: structure, function and evolution. Trends Cell Biol. 10, 440–447 (2000).

Emanuelsson, O., von Heijne, G. & Schneider, G. Analysis and prediction of mitochondrial targeting peptides. Methods Cell Biol. 65, 175–187 (2001).

Schneider, G. et al. Feature-extraction from endopeptidase cleavage sites in mitochondrial targeting peptides. Proteins 30, 49–60 (1998).

Kalousek, F., Hendrick, J.P. & Rosenberg, L.E. Two mitochondrial matrix proteases act sequentially in the processing of mammalian matrix enzymes. Proc. Natl. Acad. Sci. USA 85, 7536–7540 (1988).

Isaya, G. & Kalousek, F. Mitochondrial intermediate peptidase. in Signal Peptidases (ed. von Heijne, G.) 87–103 (R.G. Landes Company, Austin, 1994).

Abe, Y. et al. Structural basis of presequence recognition by the mitochondrial protein import receptor Tom20. Cell 100, 551–560 (2000).

Taylor, A.B. et al. Crystal structures of mitochondrial processing peptidase reveal the mode for specific cleavage of import signal sequences. Structure 9, 615–625 (2001).

Bonen, L. & Doolittle, W.F. On the prokaryotic nature of red algal chloroplasts. Proc. Natl. Acad. Sci. USA 72, 2310–2314 (1975).

Moreira, D., Guyader, H.L. & Philippe, H. The origin of red algae and the evolution of chloroplasts. Nature 405, 69–72 (2000).

Robinson, C., Hynds, P.J., Robinson, D. & Mant, A. Multiple pathways for the targeting of thylakoid proteins in chloroplasts. Plant Mol. Biol. 38, 209–221 (1998).

Shackleton, J.B. & Robinson, C. Transport of proteins into chloroplasts. The thylakoidal processing peptidase is a signal-type peptidase with stringent substrate requirements at the −3 and −1 positions. J. Biol. Chem. 266, 12152–12156 (1991).

Robinson, C. & Bolhuis, A. Protein targeting by the twin-arginine translocation pathway. Nat. Rev. Mol. Cell. Biol. 2, 350–356 (2001).

Chabregas, S.M. et al. Dual targeting properties of the N-terminal signal sequence of Arabidopsis thaliana THI1 protein to mitochondria and chloroplasts. Plant Mol. Biol. 46, 639–650 (2001).

Small, I., Wintz, H., Akashi, K. & Mireau, H. Two birds with one stone: genes that encode products targeted to two or more compartments. Plant Mol. Biol. 38, 265–277 (1998).

Zhang, X.P. & Glaser, E. Interaction of plant mitochondrial and chloroplast signal peptides with the Hsp70 molecular chaperone. Trends Plant Sci. 7, 14–21 (2002).

Kleffmann, T. et al. The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions. Curr. Biol. 14, 354–362 (2004).

Villarejo, A. et al. Evidence for a protein transported through the secretory pathway en route to the higher plant chloroplast. Nat. Cell Biol. 7, 1224–1231 (2006).

Drawid, A. & Gerstein, M. A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. J. Mol. Biol. 301, 1059–1075 (2000).

Marcotte, E.M., Xenarios, I., van Der Bliek, A.M. & Eisenberg, D. Localizing proteins in the cell from their phylogenetic profiles. Proc. Natl. Acad. Sci. USA 97, 12115–12120 (2000).

Nair, R. & Rost, B. Inferring sub-cellular localization through automated lexical analysis. Bioinformatics 18, S78–S86 (2002).

Chou, K.C. & Shen, H.B. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem. Biophys. Res. Commun. 347, 150–157 (2006).

Chou, K.C. & Shen, H.B. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic k-nearest neighbor classifiers. J. Proteome Res. 5, 1888–1897 (2006).

Chou, K.C. & Shen, H.B. Large-scale plant protein subcellular location prediction. J. Cell. Biochem. 100, 665–678 (2007).

Mott, R., Schultz, J., Bork, P. & Ponting, C.P. Predicting protein cellular localization using a domain projection method. Genome Res. 12, 1168–1174 (2002).

Scott, M., Thomas, D. & Hallett, M. Predicting subcellular localization via protein motif co-occurrence. Genome Res. 14, 1957–1966 (2004).

Nair, R. & Rost, B. Sequence conserved for subcellular localization. Protein Sci. 11, 2836–2847 (2002).

McGinnis, S. & Madden, T.L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, W20–W25 (2004).

Yu, C.S., Chen, Y.C., Lu, C.H. & Hwang, J.K. Prediction of protein subcellular localization. Proteins 64, 643–651 (2006).

von Heijne, G. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4683–4690 (1986).

Nakai, K. & Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992).

Baldi, P. & Brunak, S. Bioinformatics: The Machine Learning Approach (MIT Press, Cambridge, MA, USA, 1998).

Durbin, R.M., Eddy, S.R., Krogh, A. & Mitchison, G. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, U.K. 1998).

Vapnik, V. The Nature of Statistical Learning Theory (Springer, NY, USA, 1995).

Nielsen, H., Brunak, S., Engelbrecht, J. & von Heijne, G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6 (1997).

Nielsen, H. & Krogh, A. Prediction of signal peptides and signal anchors by a hidden Markov model. in Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (eds. Glasgow, J. et al.) 122–130 (AAAI Press, Menlo Park, CA, USA, 1998).

Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).

Claros, M.G. & Vincens, P. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur. J. Biochem. 241, 779–786 (1996).

Emanuelsson, O., Nielsen, H., Brunak, S. & von Heijne, G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016 (2000).

Andrade, M.A., O'Donoghue, S.I. & Rost, B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 276, 517–528 (1998).

Nakashima, H. & Nishikawa, K. Discrimination of intracellular an extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238, 54–61 (1994).

Reinhardt, A. & Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230–2236 (1998).

Cedano, J., Aloy, P., Pérez-Pons, J.A. & Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594–600 (1997).

Chou, K.-C. & Elrod, D.W. Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem. Biophys. Res. Commun. 252, 63–68 (1998).

Chou, K.-C. & Elrod, D.W. Protein subcellular location prediction. Protein Eng. 12, 107–118 (1999).

Hua, S. & Sun, Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001).

Park, K.-J. & Kanehisa, M. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003).

Bhasin, M. & Raghava, G.P.S. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res. 32, W414–W419 (2004).

Chou, K.C. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43, 246–255 (2001).

Cui, Q., Jiang, T., Liu, B. & Ma, S. Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics 5, 66 (2004).

Pierleoni, A., Martelli, P.L., Fariselli, P. & Casadio, R. BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, e408–e416 (2006).

Linding, R. et al. Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459 (2003).

Nakai, K. & Kanehisa, M. Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95–110 (1991).

Horton, P. & Nakai, K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier. in Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (eds. Gaasterland, T. et al. 147–152 (AAAI Press, Menlo Park, CA, USA, 1997).

Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005).

Brunak, S. Doing sequence analysis by inspecting the order in which neural networks learn. in Computation of Biomolecular Structures—Achievements, Problems and Perspectives (eds. Soumpasis, D.M. & Jovin, T.M.) 43–54 (Springer-Verlag, Berlin, 1993).

Hobohm, U., Scharf, M., Schneider, R. & Sander, C. Selection of representative protein data sets. Protein Sci. 1, 409–417 (1992).

Xie, D., Li, A., Wang, M., Fan, Z. & Feng, H. LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res. 33, W105–W110 (2005).

von Heijne, G. Transcending the impenetrable: how proteins come to terms with membranes. Biochim. Biophys. Acta 947, 307–333 (1988).

Peltier, J.-B. et al. Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction. Plant Cell 14, 211–236 (2002).

Berks, B.C. A common export pathway for proteins binding complex redox cofactors? Mol. Microbiol. 22, 393–404 (1996).

Cristóbal, S., de Gier, J.-W., Nielsen, H. & von Heijne, G. Competition between Sec- and TAT-dependent protein translocation in Escherichia coli . EMBO J. 18, 2982–2990 (1999).

Bendtsen, J.D., Nielsen, H., Widdick, D., Palmer, T. & Brunak, S. Prediction of twin-arginine signal peptides. BMC Bioinformatics 6, 167 (2005).

Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).

Möller, S., Croning, M.D. & Apweiler, R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646–653 (2001).

Chen, C.P., Kernytsky, A. & Rost, B. Transmembrane helix predictions revisited. Protein Sci. 11, 2774–2791 (2002).

Cuthbertson, J.M., Doyle, D.A. & Sansom, M.S. Transmembrane helix prediction: a comparative evaluation and analysis. Protein Eng. Des. Sel. 18, 295–308 (2005).

Sadovskaya, N.S., Sutormin, R.A. & Gelfand, M.S. Recognition of transmembrane segments in proteins: review and consistency-based benchmarking of internet servers. J. Bioinform. Comput. Biol. 4, 1033–1056 (2006).

Schulz, G. β-barrel membrane proteins. Curr. Opin. Struct. Biol. 10, 443–447 (2000).

Jacoboni, I., Martelli, P.L., Fariselli, P., de Pinto, V. & Casadio, R. Prediction of the transmembrane regions of β-barrel membrane proteins with a neural network-based predictor. Protein Sci. 10, 779–787 (2001).

Martelli, P.L., Fariselli, P., Krogh, A. & Casadio, R. A sequence-profile-based HMM for predicting and discriminating β-barrel membrane proteins. Bioinformatics 18, S46–S53 (2002).

Eisenhaber, F. et al. Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-Π, NMT and PTS1. Nucleic Acids Res. 31, 3631–3634 (2003).

Bologna, G., Yvon, C., Duvaud, S. & Veuthey, A.-L. N-Terminal myristoylation predictions by ensembles of neural networks. Proteomics 4, 1626–1632 (2004).

Juncker, A.S. et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 12, 1652–1662 (2003).

von Heijne, G. The structure of signal peptides from bacterial lipoproteins. Protein Eng. 2, 531–534 (1989).

Hulo, N. et al. Recent improvements to the PROSITE database. Nucleic Acids Res. 32, D134–D137 (2004).

Yuan, Z. & Teasdale, R.D. Prediction of Golgi type II membrane proteins based on their transmembrane domains. Bioinformatics 18, 1109–1115 (2002).

Cokol, M., Nair, R. & Rost, B. Finding nuclear localization signals. EMBO Rep. 1, 411–415 (2000).

Heddad, A., Brameier, M. & MacCallum, R.M. Evolving regular expression-based sequence classifiers for protein nuclear localisation. in Applications of Evolutionary Computing, EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoMUSART, EvoSTOC, vol. 3005 of LNCS (eds. Raidl, G.R. et al.) 31–40 (Springer-Verlag, Berlin, Germany, 2004).

Zhao, L.-J. & Padmanabhan, R. Nuclear transport of adenovirus DNA polymerase is facilitated by interaction with preterminal protein. Cell 55, 1005–1015 (1988).

Pemberton, L.F. & Paschal, B.M. Mechanisms of receptor-mediated nuclear import and nuclear export. Traffic 6, 187–198 (2005).

la Cour, T. et al. Analysis and prediction of leucine-rich nuclear export signals. Protein Eng. Des. Sel. 17, 527–536 (2004).

Olivier, L.M. & Krisans, S.K. Peroxisomal protein targeting and identification of peroxisomal targeting signals in cholesterol biosynthetic enzymes. Biochim. Biophys. Acta 1529, 89–102 (2000).

Emanuelsson, O., Elofsson, A., von Heijne, G. & Cristóbal, S. In silico prediction of the peroxisomal proteome in fungi, plants and animals. J. Mol. Biol. 330, 443–456 (2003).

Neuberger, G., Maurer-Stroh, S., Eisenhaber, B., Hartig, A. & Eisenhaber, F. Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence. J. Mol. Biol. 328, 581–592 (2003).

Pedersen, A.G. & Nielsen, H. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. in Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology Gaasterland, T. et al. (eds.) 226–233 (AAAI Press, Menlo Park, CA, USA, 1997).

Duckert, P., Brunak, S. & Blom, N. Prediction of proprotein convertase cleavage sites. Protein Eng. Des. Sel. 17, 107–12 (2004).

Käll, L., Krogh, A. & Sonnhammer, E.L.L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 027–1036 (2004).

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).

Chou, K.-C. & Cai, Y.-D. Predicting protein localization in budding yeast. Bioinformatics 21, 944–950 (2005).

Lee, K., Kim, D.-W., Na, D., Lee, K.H. & Lee, D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res. 34, 4655–4666 (2006).

Sprenger, J., Fink, J.L. & Teasdale, R.D. Evaluation and comparison of mammalian subcellular localization prediction methods. BMC Bioinformatics 7, S3 (2006).

Guda, C. pTARGET: a web server for predicting protein subcellular localization. Nucleic Acids Res. 34, W210–W213 (2006).

Nielsen, H., Brunak, S. & von Heijne, G. Machine learning approaches to the prediction of signal peptides and other protein sorting signals. Protein Eng. 12, 3–9 (1999).

Menne, K., Hermjakob, H. & Apweiler, R. A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741–742 (2000).

Klee, E.W. & Ellis, L.B. Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics 6, 256 (2005).

Hiller, K., Grote, A., Scheer, M., Munch, R. & Jahn, D. PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res. 32, W375–W379 (2004).

Horton, P., Park, K.-J., Obayashi, T. & Nakai, K. Protein subcellular localization prediction with WoLF PSORT. In Proceedings of the 4th Annual Asia Pacific Bioinformatics Conference APBC06 39–48 (Taipei, Taiwan, 2006).

Gardy, J.L. et al. PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617–623 (2005).

Bannai, H., Tamada, Y., Maruyama, O., Nakai, K. & Miyano, S. Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305 (2002).

Nair, R. & Rost, B. Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol. 348, 85–100 (2005).

Hawkins, J. & Bodén, M. Detecting and sorting targeting peptides with neural networks and support vector machines. J. Bioinform. Comput. Biol. 4, 1–18 (2006).

Baldi, P., Brunak, S., Frasconi, P., Soda, G. & Pollastri, G. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15, 937–946 (1999).

Pollack, J.B. The induction of dynamical recognizers. Mach. Learn. 7, 227 (1991).

Wakabayashi, M., Hawkins, J., Maetschke, S. & Bodén, M. Exploiting sequence dependencies in the prediction of peroxisomal proteins. in Intelligent Data Engineering and Automated Learning—Vol 3578 of LNCS (eds. Gallagher, M., Hogan, J. & Maire, F.) 454–461 (Springer–Verlag, Berlin, Germany, 2005).

Lu, Z. et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547–556 (2004).

Höglund, A., Dönnes, P., Blum, T., Adolph, H.-W. & Kohlbacher, O. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22, 1158–1165 (2006).