An Efficient Approach for Prediction of Nuclear Receptor and Their Subfamilies Based on Fuzzy k-Nearest Neighbor with Maximum Relevance Minimum Redundancy

Proceedings of the National Academy of Sciences, India Section A: Physical Sciences - Tập 88 - Trang 129-136 - 2016

Arvind Kumar Tiwari¹, Rajeev Srivastava¹

¹Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India

Tóm tắt

The efficient classification of nuclear receptors and their subfamilies plays an important role in the detection of various diseases such as diabetes, cancer, and inflammatory diseases and their related drug design and discovery. As of now, few methods have been reported in literature for the same but the performance and efficacy of these methods are not up to the desired level. To address the issue of efficient classification of nuclear receptor and their subfamilies, here in this paper we propose to use a fuzzy k-nearest neighbor classifier with minimum redundancy maximum relevance for the classification of nuclear receptor and their eight subfamilies. The minimum redundancy maximum relevance algorithm is used to select the optimal feature subset and observed that highest accuracy and Matthew’s correlation coefficient is obtained with 150 features among 753 features through fuzzy kNN classifier. The performance of fuzzy kNN classifier depends on two parameter number of nearest neighbor (k) and fuzzy coefficient (m) and it is observed that the highest accuracy and MCC is obtained at k = 7 and m = 1.25. The overall accuracies of tenfold cross validation with optimal number of features, k and m are 100 and 91.7% and the MCC values of 1.00 and 0.89 for the prediction of nuclear receptor families and subfamilies respectively. From the obtained results and analysis it is observed that the performance of the proposed approach for the classification of nuclear receptor and their eight subfamilies is very competitive with some other standard methods available in literature.

Tài liệu tham khảo

Robinson Rechavi M, Garcia HE, Laudet V (2003) The nuclear receptor superfamily. J Cell Sci 116(4):585–586 Moore JT, Collins JL, Pearce KH (2006) The nuclear receptor superfamily and drug discovery. ChemMedChem 1(5):504–523 Bhasin M, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279(22):23262–23266 Cai J, Li Y (2005) Classification of nuclear receptor subfamilies with RBF kernel in support vector machine. In: Advances in neural networks–ISNN, pp. 680–685 Gao QB, Jin ZC, Ye XF, Wu C, He J (2009) Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal Biochem 387(1):54–59 Wang P, Xiao X, Chou KC (2011) NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One 6(8):e23505 Xiao X, Wang P, Chou KC (2012) inr-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix. PLoS One 7(2):e30869 Wang P, Xiao X (2014) NRPred-FS: a feature selection based two-level predictor for nuclear receptors. J Proteomics Bioinform S9:002. doi:10.4172/jpb.S9-002 Vroling B, Thorne D, McDermott P, Joosten HJ, Attwood TK, Pettifer S, Vriend G (2012) NucleaRDB: information system for nuclear receptors. Nucl Acids Res 40(D1):D377–D380 Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5):680–682 Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ (2011) Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 39(suppl 2):W385–W390 Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8):721–728 Wang J, Sung WK, Krishnan A, Li KB (2005) Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinform 6:174 Nasibov E, Kandemir-Cavas C (2009) Efficiency analysis of KNN and minimum distance-based classifiers in enzyme family prediction. Comput Biol Chem 33(6):461–464 Mbah AN (2014) Application of hybrid functional groups to predict ATP binding proteins. ISRN Comput Biol. doi:10.1155/2014/581245 Garg A, Bhasin M, Raghava GP (2005) SVM-based method for subcellular localization of human proteins using amino acid compositions, their order and similarity search. J Biol Chem 280:14427–14432 Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20(1):21–28 Bhasin M, Raghava GPS (2005) GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors. Nucl Acids Res 33(suppl 2):W143–W147 Li Z, Zhou X, Dai Z, Zou X (2010) Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinform 11(1):325 Fang Y, Guo Y, Feng Y, Li M (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34(1):103–109 Lin WZ, Fang JA, Xiao X, Chou KC (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6(9):e24756 Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33(1):69–74 Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616 Ma J, Gu H (2010) A novel method for predicting protein subcellular localization based on pseudo amino acid composition. BMB Rep 43(10):670–676 Wang YC, Wang XB, Yang ZX, Deng NY (2010) Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept Lett 17:1441–1449 Lu L, Qian Z, Cai YD, Li Y (2007) ECS: an automatic enzyme classifier based on functional domain composition. Comput Biol Chem 31(3):226–232 Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243(3):444–448 Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516 Gu Q, Ding YS, Zhang TL (2010) Prediction of G-protein-coupled receptor classes in low homology using Chous pseudo amino acid composition with approximate entropy and hydrophobicity patterns. Protein Pept Lett 17(5):559–567 Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238 Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27 Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585 Sim J, Kim SY, Lee J (2005) Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics 21(12):2844–2849 Shen HB, Yang J, Chou KC (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240(1):9–13 Kim SY, Sim J, Lee J (2006) Fuzzy k-nearest neighbor method for protein secondary structure prediction and its parallel implementation. In: Huang DS, Li K, Irwin GW (eds) Computational intelligence and bioinformatics. ICIC 2006. Lecture notes in computer science, vol 4115. Springer, Berlin, pp 444–453

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA