An Efficient Approach for Prediction of Nuclear Receptor and Their Subfamilies Based on Fuzzy k-Nearest Neighbor with Maximum Relevance Minimum Redundancy
Proceedings of the National Academy of Sciences, India Section A: Physical Sciences - Tập 88 - Trang 129-136 - 2016
Tóm tắt
The efficient classification of nuclear receptors and their subfamilies plays an important role in the detection of various diseases such as diabetes, cancer, and inflammatory diseases and their related drug design and discovery. As of now, few methods have been reported in literature for the same but the performance and efficacy of these methods are not up to the desired level. To address the issue of efficient classification of nuclear receptor and their subfamilies, here in this paper we propose to use a fuzzy k-nearest neighbor classifier with minimum redundancy maximum relevance for the classification of nuclear receptor and their eight subfamilies. The minimum redundancy maximum relevance algorithm is used to select the optimal feature subset and observed that highest accuracy and Matthew’s correlation coefficient is obtained with 150 features among 753 features through fuzzy kNN classifier. The performance of fuzzy kNN classifier depends on two parameter number of nearest neighbor (k) and fuzzy coefficient (m) and it is observed that the highest accuracy and MCC is obtained at k = 7 and m = 1.25. The overall accuracies of tenfold cross validation with optimal number of features, k and m are 100 and 91.7% and the MCC values of 1.00 and 0.89 for the prediction of nuclear receptor families and subfamilies respectively. From the obtained results and analysis it is observed that the performance of the proposed approach for the classification of nuclear receptor and their eight subfamilies is very competitive with some other standard methods available in literature.
Tài liệu tham khảo
Robinson Rechavi M, Garcia HE, Laudet V (2003) The nuclear receptor superfamily. J Cell Sci 116(4):585–586
Moore JT, Collins JL, Pearce KH (2006) The nuclear receptor superfamily and drug discovery. ChemMedChem 1(5):504–523
Bhasin M, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279(22):23262–23266
Cai J, Li Y (2005) Classification of nuclear receptor subfamilies with RBF kernel in support vector machine. In: Advances in neural networks–ISNN, pp. 680–685
Gao QB, Jin ZC, Ye XF, Wu C, He J (2009) Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal Biochem 387(1):54–59
Wang P, Xiao X, Chou KC (2011) NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One 6(8):e23505
Xiao X, Wang P, Chou KC (2012) inr-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix. PLoS One 7(2):e30869
Wang P, Xiao X (2014) NRPred-FS: a feature selection based two-level predictor for nuclear receptors. J Proteomics Bioinform S9:002. doi:10.4172/jpb.S9-002
Vroling B, Thorne D, McDermott P, Joosten HJ, Attwood TK, Pettifer S, Vriend G (2012) NucleaRDB: information system for nuclear receptors. Nucl Acids Res 40(D1):D377–D380
Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5):680–682
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ (2011) Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 39(suppl 2):W385–W390
Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8):721–728
Wang J, Sung WK, Krishnan A, Li KB (2005) Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinform 6:174
Nasibov E, Kandemir-Cavas C (2009) Efficiency analysis of KNN and minimum distance-based classifiers in enzyme family prediction. Comput Biol Chem 33(6):461–464
Mbah AN (2014) Application of hybrid functional groups to predict ATP binding proteins. ISRN Comput Biol. doi:10.1155/2014/581245
Garg A, Bhasin M, Raghava GP (2005) SVM-based method for subcellular localization of human proteins using amino acid compositions, their order and similarity search. J Biol Chem 280:14427–14432
Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20(1):21–28
Bhasin M, Raghava GPS (2005) GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors. Nucl Acids Res 33(suppl 2):W143–W147
Li Z, Zhou X, Dai Z, Zou X (2010) Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinform 11(1):325
Fang Y, Guo Y, Feng Y, Li M (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34(1):103–109
Lin WZ, Fang JA, Xiao X, Chou KC (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6(9):e24756
Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33(1):69–74
Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616
Ma J, Gu H (2010) A novel method for predicting protein subcellular localization based on pseudo amino acid composition. BMB Rep 43(10):670–676
Wang YC, Wang XB, Yang ZX, Deng NY (2010) Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept Lett 17:1441–1449
Lu L, Qian Z, Cai YD, Li Y (2007) ECS: an automatic enzyme classifier based on functional domain composition. Comput Biol Chem 31(3):226–232
Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243(3):444–448
Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516
Gu Q, Ding YS, Zhang TL (2010) Prediction of G-protein-coupled receptor classes in low homology using Chous pseudo amino acid composition with approximate entropy and hydrophobicity patterns. Protein Pept Lett 17(5):559–567
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Sim J, Kim SY, Lee J (2005) Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics 21(12):2844–2849
Shen HB, Yang J, Chou KC (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240(1):9–13
Kim SY, Sim J, Lee J (2006) Fuzzy k-nearest neighbor method for protein secondary structure prediction and its parallel implementation. In: Huang DS, Li K, Irwin GW (eds) Computational intelligence and bioinformatics. ICIC 2006. Lecture notes in computer science, vol 4115. Springer, Berlin, pp 444–453