A Novel Human Diabetes Biomarker Recognition Approach Using Fuzzy Rough Multigranulation Nearest Neighbour Classifier Model

Swarup Kr Ghosh1, Anupam Ghosh2
1Department of Computer Science and Engineering, Sister Nivedita University, Kolkata, India
2Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India

Tóm tắt

The selection of gene identifier from microarray databases is a challenging task since microarray contains large number of gene attributes for a few samples. This article proposes a novel fuzzy-rough set-based gene expression features selection using fuzzy-rough reduct under multi-granular space for human diabetes patient. Firstly, fuzzy multi-granular gain has been computed from the expression datasets via fuzzy entropy which reduces the dimension of the database. Thereafter, the features have been selected from microarray using the fuzzy rough reduct and information gain with respect to their expression patterns. To reduce the computational cost, a decision making scheme has been designed using a rough approximation of a fuzzy concept in the field of multi-granulation framework. Finally, we have recognized the association among the genomes that have expressively different expression patterns from controlled state to the diabetic state with respect to their impression using modified fuzzy-rough nearest neighbour classifier (FRNNC). Five standard diabetic microarray datasets have been considered to quantify the efficiency of the designed FRNNC model and are validated with F measure using diabetes gene expression NCBI database and it performs superior compared to existing methods.

Tài liệu tham khảo

Thomas A, Rebecca L (2015) Prevalence of diabetic retinopathy within a national diabetic retinopathy screening service. Br J Ophthalmol 99(1):64–68. https://doi.org/10.1136/bjophthalmol-2013-304017 Florez JC, Manning AK, Dupuis J (2007) A 100K genome-wide association scan for diabetes and related traits in the Framingham Heart Study: replication and integration with other genome-wide datasets. Diabetes 56(12):3063–3074. https://doi.org/10.2337/db07-0451 Hanson RL, Bogardus C, Duggan D (2007) A search for variants associated with young-onset type 2 diabetes in Americal Indians in 100K genotyping array. Diabetes 56(12):3045–3052. https://doi.org/10.2337/db07-0462 Rmapersaud E, Damcott CM, Fu M (2007) Identification of novel candidate genes for type 2 diabetes from a genome-wide association scan in the old order amish: evidence for replication from diabetes related quantitative traits and from independent populations. Diabetes 56(12):3053–3062. https://doi.org/10.2337/db07-0457 Das R, Kalita J, Bhattacharyya DK (2011) A pattern matching approach for clustering gene expression data. Int J Data Min Model Manag. https://doi.org/10.1504/IJDMMM.2011.041492 Jiang D, Peri J, Zhang A (2003) DHC: a density based hierarchical clustering methods for time series gene expression data. IEEE Int Symp Bioinform Bioeng. https://doi.org/10.1109/BIBE.2003.1188978 Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87. https://doi.org/10.1198/016214502753479248 Nayak RK, Mishra D, Shaw K, Mishra S (2012) Rough set based attribute clustering for sample classification of gene expression data. Int Conf Model Optim Comput. https://doi.org/10.1016/j.proeng.2012.06.219 Banerjee M, Mitra S, Banka H (2007) Evolutionary Rough feature selection in gene expression data. IEEE Trans Syst Man Cybern Part C Appl Rev. https://doi.org/10.1109/TSMCC.2007.897498 Maji P, Pal SK (2007) Protein sequence analysis using relational soft clustering algorithms. Int J Comput Math 84(5):599–617. https://doi.org/10.1080/00207160701210083 Tong MKH, Liu C, Xu W (2013) An ensemble of SVM classifiers based on gene pairs. Comput Biol Med 43(6):729–737. https://doi.org/10.1016/j.compbiomed.2013.03.010 Danaee P, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. Pac Symp Biocomput. https://doi.org/10.1142/9789813207813_0022 Xie R, Quitadamo A, Cheng J, Shi X (2016) A predictive model of gene expression using a deep learning framework. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). https://doi.org/10.1109/BIBM.2016.7822599 Jia L, Peng Q, Chen X, Sun Z (2016) A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl 59:13–19. https://doi.org/10.1016/j.eswa.2016.04.020 Gao L, Ye M et al (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genom Proteom Bioinform 15:389–395. https://doi.org/10.1016/j.gpb.2017.08.002 Lu H, Chen J et al (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62. https://doi.org/10.1016/j.neucom.2016.07.0800 Sarah MA, Saleh AI, Labib M (2019) Gene expression cancer classification using modified K-nearest neighbors technique. Biosystems 176:41–51. https://doi.org/10.1016/j.biosystems.2018.12.009 Abualigah L, Shehab M, Alshinwan M et al (2020) Ant lion optimizer: a comprehensive survey of its variants and applications. Arch Comput Methods Eng. https://doi.org/10.1007/s11831-020-09420-6 Abualigah L, Diabat A, Geem ZW (2020) A comprehensive survey of the harmony search algorithm in clustering applications. Appl Sci 10(11):3827. https://doi.org/10.3390/app10113827 Sun L, Kong X, Xu J et al (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:8978. https://doi.org/10.1038/s41598-019-45223-x Zadeh LA (1999) Fuzzy logic = computing with words. In: Zadeh LA, Kacprzyk J (eds) Computing with words in information/intelligent systems 1. Studies in fuzziness and soft computing, vol 33. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1873-4_1 Polkowski L, Skowron A (1998) Rough sets in knowledge discovery. Studies in fuzziness and soft computing series. Physica-Verlag, Heidelberg. https://doi.org/10.1007/978-3-7908-1883-3 Qu Y, Shen Q, Mac-Parthalain N, Shang C, Wu W (2012) Fuzzy similarity-based nearest-neighbour classification as alternatives to their fuzzy-rough parallels. Int J Approx Reason 54(1):184–195. https://doi.org/10.1016/j.ijar.2012.06.008 Ghosh A, De RK (2016) Fuzzy correlation association mining: selection altered associations among the genes, and some possible marker genes mediating certain cancers. Appl Soft Comput 38:587–605. https://doi.org/10.1016/j.asoc.2015.09.057 Nguyen T, Nahavandi S (2016) Modified AHP for gene selection and cancer classification using type-2 fuzzy logic. IEEE Trans Fuzzy Syst 24(2):273–287. https://doi.org/10.1109/TFUZZ.2015.2453153 Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Norwell. https://doi.org/10.1007/978-94-011-3534-4 Hu Q, Zhang L, An S, Zhang D, Yu D (2012) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651. https://doi.org/10.1109/TFUZZ.2011.2181180 Sun B, Ma W, Qian Y (2017) Multigranulation fuzzy rough set over two universes and its application to decision making. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2017.01.036 Jensen R, Parthalain NM (2015) Towards scalable fuzzy—rough feature selection. Inf Sci 15:1–15. https://doi.org/10.1016/j.ins.2015.06.025 Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall PTR, Upper Saddle River. https://doi.org/10.1021/ci950144a Li J, Zhang L, Li H et al (2019) Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinform 20:283. https://doi.org/10.1186/s12859-019-2772-y Sarkar M (2007) Fuzzy-rough nearest neighbors algorithm. Fuzzy Sets Syst 158:2123–2152. https://doi.org/10.1016/j.tcs.2011.05.040 Jensen R, Cornelis C (2011) Fuzzy-rough nearest neighbour classification and prediction. Theor Comput Sci 412:5871–5884. https://doi.org/10.1016/j.tcs.2011.05.040 Available: http://www.ncbi.nlm.nih.gov/geo Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Gene Inform 13:51–60. https://doi.org/10.1016/j.procs.2013.10.003 Melin P, Castillo OA (2014) Review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577. https://doi.org/10.1016/j.asoc.2014.04.017 Ghosh SK, Ghosh A, Chakrabarti A (2018) VEA: vessel extraction algorithm by active contour model and a novel wavelet analyzer for diabetic retinopathy detection. Int J Image Gr. https://doi.org/10.1142/S0219467818500080