Khảo sát các phương pháp giảm chiều và phân loại cho dữ liệu RNA-Seq trên véc tơ sốt rét

Micheal Olaolu Arowolo1, Marion O. Adebiyi1, Charity Aremu1, Ayodele Ariyo Adebiyi1
1Landmark University, Omu-Aran, Nigeria

Tóm tắt

Tóm tắtGần đây, các chuỗi dữ liệu di truyền độc đáo đã được các nhà nghiên cứu tạo ra, có xu hướng khám phá di truyền sử dụng phân tích tích hợp học máy và kết hợp ảo dữ liệu thích ứng vào giải pháp của các vấn đề phân loại. Phát hiện các bệnh tật và nhiễm trùng ở giai đoạn đầu là một mối quan tâm chính và là thách thức lớn cho các nhà nghiên cứu trong lĩnh vực phân loại học máy và tin sinh học. Các gen ảnh hưởng đến bệnh tật đang gây tranh cãi lớn đối với nhiều nhà nghiên cứu. Nghiên cứu này tổng hợp nhiều công trình về các kỹ thuật giảm chiều để giảm tập hợp các đặc trưng mà nhóm dữ liệu hiệu quả với thời gian xử lý tính toán ít hơn và các phương pháp phân loại góp phần vào những tiến bộ của phương pháp RNA-Sequencing.

Từ khóa


Tài liệu tham khảo

Prathusha P, Jyothi S. Feature extraction methods: a review. Int J Innov Res Sci Eng Technol. 2017;6(12):22558–77.

Usman MA, Shahzad A, Javed F. Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data. Int J Adv Comp Sci Appl. 2017;8(5):415–26.

Arowolo MO, Abdulsalam SO, Saheed YK, Salawu MD. A Feature Selection Based on One-Way-Anova for Microarray Data Classification. Al-Hikmah J Pure Appl Sci. 2016;3:30–5.

Sheela T, Lalitha R. An approach to reduce the large feature space of microarray gene expression data by gene clustering for efficient sample classification. Int J Comp Appl. 2018. https://doi.org/10.26808/rs.ca.i8v3.01.

Joseph MD, Madhavi D. Analysis of cancer classification of gene expression data a scientometric review. Int J Pure Appl Math. 2018;119(12):1–10.

Zararsız G, Dincer G, Selcuk K, Vahap E, Gozde EZ, Izzet PD, Ahmet O. A Comprehensive Simulation Study on Classification of RNASeq Data. PLOS Opened J. 2017. https://doi.org/10.1371/journal.pone.0182507.

Witten DM. Classification and Clustering of Sequencing Data Using a Poisson Model. Ann Application Stat. 2011;5(4):2493–518.

Arowolo, M.O., Isiaka, R.M., Abdulsalam, S.O., Saheed, Y.K., and Gbolagade, K.A. (2017).A Comparative Analysis of Feature Extraction Methods for Classifying Colon Cancer Microarray Data. Eur Allian Innov Endor Trans Scalable Information Systems. Vol. 4, No. 14, pp. 1–6.

Costa-Silva J, Domingues D, Lopes FM. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE. 2017;12(12):1–12. https://doi.org/10.1371/journal.pone.0190152.

Ana C, Pedro M, Sonia T, David G, Alejandra C, Andrew M, Michał WS, Daniel JG, Laura LE, Xuegong Z, Ali M. Survey of Best Practices for RNA-seq Data Analysis. Genome Biol. 2016;17(13):1–10. https://doi.org/10.1186/s13059-016-0881-8.

Agarwal A, Koppstein D, Rozowsky J, Sboner A, Habegger L, Hillier LW. Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays. BMC Genomics. 2010;11(1):1–11.

Kratz A, Carninci P. The devil in the Details of RNA-seq. Nature Biotechnol. 2014;32(9):882–4.

Mariangela B, Eric O, William AD, Monica B, Yaw A, Guaofa Z, Joshua H, Ming L, Jiabao X, Andrew G, Joseph F, Guiyun Y. RNA-Seq analyses of changes in the anopheles Gambiae transcriptome associated with resistance to Pyrethroids in Kenya. Parasit Vectors. 2015. https://doi.org/10.1186/s13071-015-1083-z.

Sean S, Jian P, Jadwiga B, Bonnie B. Discovering what dimensionality reduction really tells us about RNA-Seq data. J Comp Biol. 2015. https://doi.org/10.1089/cmb.2015.0085.

Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK. A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data. PloS ONE. 2014;9(8).

Oshlack A, Robinson MD, Young MD. From RNA-seq reads to differential expression results. Genome Biol. 2010;11(12):1–8.

Zena MH, Duncan FG. A review of feature selection and feature extraction methods applied on microarray data. Hindawi, Adv Bioinform. 2015;1:1–13. https://doi.org/10.1155/2015/198363.

Priyanka J, Dharmender K. A review on dimensionality reduction techniques. Int J Comput Appl. 2017;173(2):42–7.

Divya J, Vijendra S. Feature selection and classification systems for chronic disease prediction: A review. Egyptian Inform J. 2018. https://doi.org/10.1016/j.eij.2018.03.002.

Nadir OFE, Othman I, Ahmed HO. A novel feature selection based on one-way ANOVA F-Test for E-mail spam classification. Res J Appl Sci Eng Technol. 2014;7(3):625–38.

Arul VK, Elavarasan UN. A Survey on Dimensionality Reduction Technique. Int J Emerg Trends Technol Comput Sci (IJETTCS). 2014;3(6):36–42.

Jiang X, Peery A, Hall AB, Sharma A, Chen XG, Waterhouse RM, Komissarov A. Genome analysis of a major urban malaria vector mosquito. Anopheles Stephensi. 2014. https://doi.org/10.1186/s13059-014-0459-3.

Lavanya C, Nandihini M, Niranjana R, Gunavathi C. Classification of Microarray Data Based On Feature Selection Method. International Conference on Engineering Technology and Science. Int J Innov Res Sci Eng Technol. 2014;3(1): 1261–1264.

Yu L, Liu H. Feature selection for high-dimensional data: a fast correlationbased filter solution. ICML. 2003;3:856–63.

Kumar V, Minz S. Feature selection. SmartCR. 2014;4(3):211–29.

Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. J Infom Sci. 2009;179(13):8–17.

Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classification: Algorithm Applications. 2014;37.

Eswari T, Sampath P, Lavanya S. Predictive methodology for diabetic data analysis in big data. Procedia Computing Science. 2015;50:203–8.

Xiao Z, Dellandrea E, Dou W, Chen L. ESFS: A New Embedded Feature Selection Method Based on SFS. Rapports de recherché; 2008.

Peng Y, Wu Z, Jiang J. A novel feature selection approach for biomedical data classification. J Biomed Inform. 2010;43(1):15–23.

Sumathi A, Santhoshkumar S, Sakthivel NK. Development of an efficient data mining classifier with microarray data set for gene selection and classification. J Theor Appl Inf Technol. 2012;35(2):209–14.

Emad MM, Enas MFE, Khaled TW. Survey on different methods for classifying gene expression using microarray approach. Int J Comput Appl. 2016;150(1):12–22.

Michael L, Franz M, Martin Z, Andreas S. Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data. Sci Rep. 2016;6:1–11. https://doi.org/10.1038/srep25696.

Xintao Q, Dongmei F, Zhenduo F. An efficient dimensionality reduction approach for small-sample size and high-dimensional data modeling. J Comput. 2014;9(3):576–83.

Christoph B, Hans K, Christian R, Xiaoyi J. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics. 2010;11(1):1–11.

Emma P, Christopher Y. ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 2015;16(1):1–10.

Zhengyan H, Chi W. Classifying Lung Adenocarcinoma and Squamous Cell Carcinoma using RNA-Seq Data. Cancer studies and molecular medicine. Open J. 2017;3(2):27–31. https://doi.org/10.17140/CSMMOJ-3-120.

Arowolo MO, Sulaiman OA, Isiaka RM, Gbolagade KA. A Hybrid Dimensionality reduction model for classification of microarray dataset. Int J Inform Technol Comput Sci. 2017;11:57–63. https://doi.org/10.5815/ijitcs.2017.11.06.

Jiucheng X, Huiyu M, Yun W, Fangzhou H. Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Comput Math Methods Med. 2018. https://doi.org/10.1155/2018/5490513.

Byungjin H, Ji HL, Duhee B. Single-cell RNA sequencing technologies and bioinformatic pipelines. Exp Mol Med. 2018;50(8):96–104. https://doi.org/10.1038/s12276-018-0071-8.

Balamurugan M, Nancy A, Vijaykumar S. Alzheimer’s Disease Diagnosis by Using Dimensionality Reduction Based on KNN Classifier. Biomed Pharmacol J. 2017;10(4):1823–30.

Wenyan Z, Xuewen L. Feature selection for cancer classification using microarray gene expression data. Biostat Biometr Open Access J. 2017;1(2):1–7.

Pavithra D, Lakshmanan B. Feature selection and classification in gene expression cancer data. International Conference on Computational Intelligence in Data Science. IEEE. 2017, pp. 1–6

Kumara M, Rath NK, Swain A, Rath SK. Feature selection and classification of microarray data using MapReduce based ANOVA and KNearest neighbor. Procedia Comput Sci. 2015;54:301–10.

Uysal AK, Gunal S. A novel probabilistic feature selection method for text classification. Knowledge Based System. 2012;36(6):226–35.

Arul VK, and Elavarasan N. A survey on dimensionality reduction technique. Int J Emerg Trends Technol Comput Sci. 3(6):36–41.

Nalband S, Sundar A, Prince A, Agarwal A. Feature selection and classification methodology for the detection of kneejoint disorders. Comput Methods Programs Biomed. 2016;127:10–22.

Sivapriya TR, Banu N, Kamal AR. Hybrid Feature Reduction and Selection for Enhanced Classification of High Dimensional Medical Data IEEE International Conference on Computational Intelligence and Computing Research. 2013, pp. 327–30.

Guyon I. Gene selection for cancer classification using support vector machines. Machine Learn. 2002;46(1):389–422. https://doi.org/10.1023/A:1012487302797].

Joaquim PD, Hugo A, Luis ACR. A weighted principal component analysis and its application to gene expression data. IEEE/ACM Trans Comput Biol Bioinform. 2011;8(1):246–52. https://doi.org/10.1109/TCBB.2009.61.

Jin L, Yong X, Ying LG. Semi-supervised Feature Extraction for RNA-Seq Data Analysis. Conference: International Conference on Intelligent Computing, 2015.

Lucas A. 2013. “Package ‘amap’,”, http://cran.r-project.org/web/packages/amap/vignettes/amap.pdf.

Ching ST, Wai ST, Mohd SM, Weng HC, Safaai D, Zuraini AS. A review of feature extraction software for microarray gene expression data. Hindawi Publishing Corporation Biomend Research International. 2014;2014:1–16.

Leihong W, Xiangwen L, Joshua X. HetEnc: A Deep Learning Predictive Model for Multi-Type Biological Dataset. BMC Genomics. 2019;20(638):1–19. https://doi.org/10.1186/s12864-019-5997-2.

Cohen JB, Simi M, Campagne F. 2018. Genotype Tensors: Efficient Neural Network Genotype Callers. bioRxiv; 2018. p. 338780.

Li R, Quon G. scBFA: modeling detection patterns to mitigate technical noise in large-scale single-cell genomics data. Genome Biol. 2019;20(193):1–12. https://doi.org/10.1186/s13059-019-1806-0.

Lan HN, Susan H. Ten quick tips for effective dimensionality reduction. PLoS Comput Biol. 2019. https://doi.org/10.1371/journal.pcbi.1006907.

Shiquan S, Jiaqiang Z, Ying M, Xiang Z. Accuracy, robustness and scalability of dimensionality reduction methods for single cell RNASeq analysis. BioRxiv. 2019. doi:https://doi.org/10.1101/641142.

Huynh P, Nguyen V, Do T. Novel hybrid DCNN-SVM model for classifying RNA-Seq gene expression data. J Inform Telecommun. 2019;3(4):533–47. https://doi.org/10.1080/24751839.2019.1660845.

Chieh L, Ziv B. Continuous-State HMMS for Modeling Time-Series Single-Cell RNA-Seq Data. Bioinform Oxford Academic. 2019;35(22):4707–15. https://doi.org/10.1093/bioinformatics/btz296.

Hyun J, Athina G, Thomas DO, Michael L, Lachlan JC, David JC, Aubrey JC. Transcriptomic studies of malaria: a paradigm for investigation of systemic host-pathogen interactions. Microbiol Mol Biol Rev. 2018;82(2):1–17.

Sean S, Jian P, Jadwiga B, Bonnie B. Discovering what dimensionality reduction really tells us about RNA-Seq data. J Comput Biol Res Articles. 2015;22(8):715–28.

Conesa, A. (2016). A survey of Best Practices for RNA-seq Data Analysis. Genome Biology, 2016. Vol. 17, No. 1, pp. 13–23.

Mehdi P, Jack YY, Mary QY, Youping D. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2016;9(13):1–13. https://doi.org/10.1186/1471-2164-9-S1-S13.

Kean MT, Ashley P, Daniela W. Statistical analysis of next generation sequencing data, frontiers in probability and the statistical sciences. Springer International Publishing Switzerland, 2014. pp. 219–246

Ayon D. Machine learning algorithms: a review. Int J Comput Sci Inform Technol. 2016;7(3):1174–9.