Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms

Nature Protocols - Tập 3 Số 2 - Trang 153-162 - 2008
Kuo‐Chen Chou1, Hong‐Bin Shen2
1Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, 92130, California, USA
2Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, 02115, Massachusetts, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ehrlich, J.S., Hansen, M.D. & Nelson, W.J. Spatio-temporal regulation of Rac1 localization and lamellipodia dynamics during epithelial cell–cell adhesion. Dev. Cell 3, 259–270 (2002).

Glory, E. & Murphy, R.F. Automated subcellular location determination and high-throughput microscopy. Dev. Cell 12, 7–16 (2007).

Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res. 25, 31–36 (2000).

Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

Hill, D.P., Blake, J.A., Richardson, J.E. & Ringwald, M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12, 1982–1991 (2002).

Chou, K.C. & Shen, H.B. Review: recent progresses in protein subcellular location prediction. Anal. Biochem. 370, 1–16 (2007).

Chou, K.C. Review: structural bioinformatics and its impact to biomedical science. Curr. Med. Chem. 11, 2105–2134 (2004).

Lubec, G., Afjehi-Sadat, L., Yang, J.W. & John, J.P. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog. Neurobiol. 77, 90–127 (2005).

Nakai, K. & Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992).

Nakashima, H. & Nishikawa, K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238, 54–61 (1994).

Cedano, J., Aloy, P., P'erez-Pons, J.A. & Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594–600 (1997).

Nakai, K. & Horton, P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999).

Reinhardt, A. & Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230–2236 (1998).

Chou, K.C. & Elrod, D.W. Protein subcellular location prediction. Protein Eng. 12, 107–118 (1999).

Yuan, Z. Prediction of protein subcellular locations using Markov chain models. FEBS Lett. 451, 23–26 (1999).

Nakai, K. Protein sorting signals and prediction of subcellular localization. Adv. Protein Chem. 54, 277–344 (2000).

Murphy, R.F., Boland, M.V. & Velliste, M. Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 251–259 (2000).

Emanuelsson, O., Nielsen, H., Brunak, S. & von Heijne, G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016 (2000).

Feng, Z.P. Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 58, 491–499 (2001).

Hua, S. & Sun, Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001).

Feng, Z.P. & Zhang, C.T. Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. Int. J. Biol. Macromol. 28, 255–261 (2001).

Feng, Z.P. An overview on predicting the subcellular location of a protein. In Silico Biol. 2, 291–303 (2002).

Chou, K.C. & Cai, Y.D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277, 45765–45769 (2002).

Zhou, G.P. & Doctor, K. Subcellular location prediction of apoptosis proteins. Proteins 50, 44–48 (2003).

Pan, Y.X. et al. Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. J. Protein Chem. 22, 395–402 (2003).

Park, K.J. & Kanehisa, M. Prediction of protein subcellular locations by support vector machines using compositions of amino acid and amino acid pairs. Bioinformatics 19, 1656–1663 (2003).

Gardy, J.L. et al. PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613–3617 (2003).

Huang, Y. & Li, Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20, 21–28 (2004).

Xiao, X. et al. Using complexity measure factor to predict protein subcellular location. Amino Acids 28, 57–61 (2005).

Lei, Z. & Dai, Y. An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics 6, 291 (2005).

Garg, A., Bhasin, M. & Raghava, G.P. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem. 280, 14427–14432 (2005).

Matsuda, S. et al. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804–2813 (2005).

Gao, Q.B., Wang, Z.Z., Yan, C. & Du, Y.H. Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett. 579, 3444–3448 (2005).

Xiao, X., Shao, S.H., Ding, Y.S., Huang, Z.D. & Chou, K.C. Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30, 49–54 (2006).

Chou, K.C. & Shen, H.B. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem. Biophys. Res. Commun. 347, 150–157 (2006).

Guo, J., Lin, Y. & Liu, X. GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6, 5099–5105 (2006).

Hoglund, A., Donnes, P., Blum, T., Adolph, H.W. & Kohlbacher, O. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22, 1158–1165 (2006).

Lee, K., Kim, D.W., Na, D., Lee, K.H. & Lee, D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res. 34, 4655–4666 (2006).

Zhang, Z.H., Wang, Z.H., Zhang, Z.R. & Wang, Y.X. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett. 580, 6169–6174 (2006).

Chou, K.C. & Shen, H.B. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J. Proteome Res. 5, 1888–1897 (2006).

Pierleoni, A., Martelli, P.L., Fariselli, P. & Casadio, R. BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, e408–e416 (2006).

Shi, J.Y., Zhang, S.W., Pan, Q., Cheng, Y.-M. & Xie, J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33, 69–74 (2007).

Emanuelsson, O., Brunak, S., von Heijne, G. & Nielsen, H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2, 953–971 (2007).

Shen, H.B. & Chou, K.C. Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85, 233–240 (2007).

Chen, Y.L. & Li, Q.Z. Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J. Theor. Biol. 248, 377–381 (2007).

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

Nair, R. & Rost, B. Sequence conserved for subcellular localization. Protein Sci. 11, 2836–2847 (2002).

Chou, K.C. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins, (Erratum: ibid., 2001, Vol. 44, 60) 43, 246–255 (2001).

Chou, K.C. & Shen, H.B. Predicting protein subcellular location by fusing multiple classifiers. J. Cell. Biochem. 99, 517–527 (2006).

Chen, C., Zhou, X., Tian, Y., Zou, X. & Cai, P. Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal. Biochem. 357, 116–121 (2006).

Chen, C., Tian, Y.X., Zou, X.Y., Cai, P.X. & Mo, J.Y. Using pseudo-amino acid composition and support vector machine to predict protein structural class. J. Theor. Biol. 243, 444–448 (2006).

Zhang, S.W., Pan, Q., Zhang, H.C., Shao, Z.C. & Shi, J.Y. Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30, 461–468 (2006).

Du, P. & Li, Y. Prediction of protein submitochondrial locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7, 518 (2006).

Mondal, S., Bhavna, R., Mohan Babu, R. & Ramakumar, S. Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J. Theor. Biol. 243, 252–260 (2006).

Lin, H. & Li, Q.Z. Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem. Biophys. Res. Commun. 354, 548–551 (2007).

Lin, H. & Li, Q.Z. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J. Comput. Chem. 28, 1463–1466 (2007).

Pu, X., Guo, J., Leung, H. & Lin, Y. Prediction of membrane protein types from sequences and position-specific scoring matrices. J. Theor. Biol. 247, 259–265 (2007).

Kurgan, L.A., Stach, W. & Ruan, J. Novel scales based on hydrophobicity indices for secondary protein structure. J. Theor. Biol. 248, 354–366 (2007).

Zhou, X.B., Chen, C., Li, Z.C. & Zou, X.Y. Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol. 248, 546–551 (2007).

Mundra, P., Kumar, M., Kumar, K.K., Jayaraman, V.K. & Kulkarni, B.D. Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recogn. Lett. 28, 1610–1615 (2007).

Shen, H.B. & Chou, K.C. PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. doi: 10.10.1016/j.ab.2007.10.012 (2007).

Chou, K.C. & Shen, H.B. Large-scale plant protein subcellular location prediction. J. Cell. Biochem. 100, 665–678 (2007).

Vapnik, V. Statistical Learning Theory (Wiley-Interscience, New York, 1998).

Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).

Hiller, K., Grote, A., Scheer, M., Munch, R. & Jahn, D. PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res. 32, W375–W379 (2004).

Chou, K.C. & Shen, H.B. Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem. Biophys. Res. Commun. 357, 633–640 (2007).

Shen, H.B. & Chou, K.C. Signal-3L: a 3-layer approach for predicting signal peptide. Biochem. Biophys. Res. Commun. 363, 297–303 (2007).

Regev-Rudzki, N. & Pines, O. Eclipsed distribution: a phenomenon of dual targeting of protein and its significance. Bioessays 29, 772–782 (2007).

Lubec, G. & Afjehi-Sadat, L. Limitations and pitfalls in protein identification by mass spectrometry. Chem. Rev. 107, 3568–3584 (2007).

Jahandideh, S., Abdolmaleki, P., Jahandideh, M. & Asadabadi, E.B. Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys. Chem. 128, 87–93 (2007).

Afjehi-Sadat, L. et al. Structural and functional analysis of hypothetical proteins in mouse hippocampus from two-dimensional gel electrophoresis. J. Proteome Res. 6, 711–723 (2007).

Diao, Y. et al. Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel–Ziv complexity. Amino Acids, doi: 10.1007/s00726-007-0550-z (2007).

Chen, Y.L. & Li, Q.Z. Prediction of the subcellular location of apoptosis proteins. J. Theor. Biol. 245, 775–783 (2007).

Ho, V.S.M. & Ng, T.Z. Chitinase-like proteins with antifungal activity from emperor banana fruits. Protein Pept. Lett. 14, 828–831 (2007).

Chou, K.C. & Cai, Y.D. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. Biophys. Res. Commun. 320, 1236–1239 (2004).

Apweiler, R. et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29, 37–40 (2001).

Chou, K.C. & Cai, Y.D. Predicting protein structural class by functional domain composition. Biochemical and Biophysical Research Communications, Corrigendum: ibid., 2005, Vol. 329, 1362 321, 1007–1009 (2004).

Cover, T.M. & Hart, P.E. Nearest neighbour pattern classification. IEEE Trans. Inf. Theor. IT 13, 21–27 (1967).

Denoeux, T. A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans. Syst. Man Cybern. 25, 804–813 (1995).

Zouhal, L.M. & Denoeux, T. An evidence-theoretic k-NN rule with parameter optimization. IEEE Trans. Syst. Man Cybern. 28, 263–271 (1998).

Chou, K.C. & Shen, H.B. Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J. Proteome Res. 6, 1728–1734 (2007).

Shen, H.B. & Chou, K.C. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. Biophys. Res. Commun. 355, 1006–1011 (2007).

Shen, H.B. & Chou, K.C. Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng. Des. Sel. 20, 39–46 (2007).

Chou, K.C. & Shen, H.B. Large-scale predictions of Gram-negative bacterial protein subcellular locations. J. Proteome Res. 5, 3420–3428 (2006).

Becker, H.F., Motorin, Y., Planta, R.J. & Grosjean, H. The yeast gene YNL292w encodes a pseudouridine synthase (Pus4) catalyzing the formation of psi55 in both mitochondrial and cytoplasmic tRNAs. Nucleic Acids Res. 25, 4493–4499 (1997).

Geier, C., von Figura, K. & Pohlmann, R. Structure of the human lysosomal acid phosphatase gene. Eur. J. Biochem. 183, 611–616 (1989).

Jorgensen, R. Plant genomes. Plant Cell 18, 1099 (2006).

Jackson, S., Rounsley, S. & Purugganan, M. Comparative sequencing of plant genomes: choices to make. Plant Cell 18, 1100–1104 (2006).

Chou, K.C. & Zhang, C.T. Review: Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 30, 275–349 (1995).

Zhou, G.P. An intriguing controversy over protein structural class prediction. J. Protein Chem. 17, 729–738 (1998).

Cao, Y. et al. Prediction of protein structural class with rough sets. BMC Bioinformatics 7, 20 (2006).

Gao, Q.B. & Wang, Z.Z. Classification of G-protein coupled receptors at four levels. Protein Eng. Des. Sel. 19, 511–516 (2006).

Kedarisetti, K.D., Kurgan, L.A. & Dick, S. Classifier ensembles for protein structural class prediction with varying homology. Biochem. Biophys. Res. Commun. 348, 981–988 (2006).

Zhou, G.P. & Cai, Y.D. Predicting protease types by hybridizing gene ontology and pseudo amino acid composition. Proteins 63, 681–684 (2006).

Chou, K.C. & Shen, H.B. MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun. 360, 339–345 (2007).

Shen, H.B. & Chou, K.C. EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun. 364, 53–59 (2007).