GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble

Muhammad Naveed1, Asif Ullah Khan1
1Department of Computer and Information Science, Pakistan Institute of Engineering and Applied Sciences, Nilore, Pakistan

Tóm tắt

G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at http://111.68.99.218/gpcr-mpredictor/ .

Từ khóa


Tài liệu tham khảo

Bhasin M, Raghava GPS (2005) GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors. J Nucleic Acids Res 33:W143–W147 Brownlee J (2007), WEKA Classification Algorithms, Version 1.6. http://sourceforge.net/projects/wekaclassalgos Bryson-Richardson RJ, Logan DW, Currie PD, Jackson IJ (2004) Large-scale analysis of gene structure in rhodopsin-like GPCRs: evidence for widespread loss of an ancient intron. Gene 338:15–23. doi:10.1016/j.gene.2004.05.001 Chou KC (1999) Using pair-coupled amino acid composition to predict protein secondary structure content. J Protein Chem 18:473–480 Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genet 43:246–255 Chou KC (2005a) Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding proteins. J Proteome Res 4:1681–1686 Chou KC (2005b) Prediction of G-protein-coupled receptor classes. J Proteome Res 4:1413–1418 Chou KC (2005c) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19 Chou KC, Elrod DW (2002) Bioinformatical analysis of G-protein-coupled receptors. J Proteome Res 1:429–433 Chou KC, Shen HB (2006) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527 Daives MN, Secker A, Freitas AA, Mendao M, Timmis J, Flower DR (2007) On the hierarchy classification of G protein-couples receptors. Bioinformatics 23:3113–3118 Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids, doi:10.1007/s00726-007-0550-z Dolen G, Bear MF (2008) Role for metabotropic glutamate receptor 5 (mGluR5) in the pathogenesis of fragile X syndrome. J Physiol 586.6:1503–1508 Elrod DW, Chou KC (2002) A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng 15:713–715 Fan X, Verma B (2009) Selection and fusion of facial features for face recognition. Expert Systems with Applications. doi:10.1016/j.eswa.2008.08.052 Franke J, Mandler E (1992) A comparison of two approaches for combining the votes of cooperating classifiers. Proceeding of the 11th International Conference on Pattern Recognition, pp 611–614 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp 148–156 Fridmanis D, Fredriksson R, Kapa I, Schioth HB, Klovins J (2006) Formation of new genes explains lower intron density in mammalian Rhodopsin G protein-coupled receptors. Mol Phylogenet Evol 43:864–880 Gao QB, Wang ZZ (2006) Classification of G protein-coupled receptors at four levels. Prot Eng Design Sel 19:511–516 Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005) Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376 Goudet C, Gaven F, Kniazeff J, Vol C, Liu J, Cohen-Gonsaud M, Acher F, Prezeau L, Pin JP (2003) Heptahelical domain of metabotropic glutamate receptor 5 behaves like rhodopsin-like receptors. PNAS 101:378–383. doi:10.1073/pnas.0304699101 Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006) Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids 30:397–402 Hayat M, Khan A (2011) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271:10–17 Ho TK, Hull JJ, Srihair SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75 Horn F et al (2003) GPCRDB information system for G protein-couples receptors. Nucleic Acids Res 31:294–297 Hu HJ, Alter BJ, Carrasquillo Y, Qiu CS, RW GereauIV (2007) Metabotropic glutamate receptor 5 modulates nociceptive plasticity via extra cellular signal-regulated kinase-Kv4.2 signaling in spinal cord dorsal horn neurons. J Neurosci 27:13181–13191 Huang Y, Cai J, Ji L, Li YD (2004) Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 28:39–49 Inoue Y, Ikeda M, Shimizu T (2004) Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern. Comput Biol Chem 28:39–49 Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148. doi:10.1186/1471-105-6-148 Karchin R, Karplus K, Haussler D (2002) Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18:147–159 Khan A, Majid A, Mirza AM (2005) Combination and optimization of classifiers in gender classification using genetic programming. Int J Knowl-Based Intell Eng Syst 9:1–11 Khan A, Khan MF, Choi TS (2008a) Proximity based GPCRs prediction in transform domain. Biochem Biophys Res Commun 371:411–415 Khan A, Tahir SF, Majid A, Tae-Sun Choi (2008b) Machine learning based adaptive watermark decoding in view of an anticipated attack. Pattern Recogn 41:2594–2610 Khan A, Majid A, Tae-Sun Choi (2010) Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers. Amino Acids 38:347–350 Kristiansen K (2004) Molecular mechanisms of ligand binding, signaling, and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function. Pharmacol Ther 103:21–80 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659 Liu W, Chou KC (1999) Prediction of protein secondary structure content. Protein Eng 12:1041–1050 Liu M, Parker RMC, Darby K, Eyre HJ, Copeland NG, Crawford J, Gilbert DJ, Sutherland GR, Jenkins NA, Herzog H (1999) GPR56, a Novel secretin-like human G-protein-coupled receptor gene. Genomics 55:296–305. doi:10.1006/geno.1998.5644 Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London 521 pp Martini S, Silvotti L, Shirazi A, Ryba NJP, Tirindelli R (2001) Co-expression of putative pheromone receptors in the sensory neurons of the vomeronasal organ. Neuroscience 21:843–848 Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219 Peng ZL, Yang JY, Chen X (2010) An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics 11:420 Prabhu Y, Eichinger L (2006) The dictyostelium repertoire of seven transmembrane domain receptors. Eur J Cell Biol 85:937–946 Qian B, Soyer OS, Neubig RR, Goldstein RA (2003) Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMMs. FEBS Lett 554:95–99 Shen HB, Chou KC (2007) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488 Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67 Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74 Specht DF (1990) Probabilistic neural networks. Neural Networks 3:109–118 Spiegel AM, Shenker A, Weinstein LS (1992) Receptor-effect or coupling by G proteins: implications for normal and abnormal signal transduction. Endocr Rev 13:536–565 Strader SD, Fong TM, Tota MR, Underwood D (1994) Structure and function of G proteins-coupled receptors. Annu Rev Biochem 63:101–132 Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475 Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic—algorithm partial least squares and support vector machine. Amino Acids, doi:10.1007/s00726-006-0465-0 Usman I, Khan A (2010) BCH coding and intelligent watermark embedding: employing both frequency and strength selection. Appl Soft Comput 10:332–343 Vaidehi N, Floriano WB, Trabanino R, Hall SE, Freddolino P, Choi EJ, Zamanakos G, GoddarIII WA (2002) Prediction of structure and function of G protein-coupled receptors. Proc Natl Acad Sci USA 99:12622–12627 Wang SQ, Yang J, Chou KC (2006) Using stacked generalization to predict membrane protein types based on pseudo amino acid composition. J Theor Biol 242:941–946 Wen Z, Li M, Li Y, Guo Y, Wang K (2007) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283 Xiao X, Shao S, Ding Y, Huang Z, Chou KC (2006) Using cellular automata images and pseudo amino acid composition to predict protein sub-cellular location. Amino Acids 30:49–54 Xiao X, Wang P, Chou KC (2009) A cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J Comput Chem 30:1413–1423 Xu L, Krzyak A, Suen CY (1992) Method of combining multiple classifiers and their application to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435 Yamaoka F, Lu Y, Shout A, Shridhar M (1994) Fuzzy integration of classification results in handwriting digit recognition system In: Proceedings of 4th IWFHR, pp 255–264 Zaki NM, Deris S, Arjunan SNV (2003) Assignment of protein sequence to functional family using neural network and Dempster-Shafer Theory. J Theoretics, vol 5-1 Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids, doi:10.1007/s00726-007-0496-1 Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468