Is iterative feature selection technique efficient enough? A comparative performance analysis of RFECV feature selection technique in ransomware classification using SHAP

Rawshan Ara Mowri1, Madhuri Siddula1, Kaushik Roy1
1Department of Computer Science, North Carolina A&T State University, Greensboro, USA

Tóm tắt

AbstractThe realm of cybersecurity places significant importance on early ransomware detection. Feature selection is critical in this context, as it enhances detection accuracy, mitigates overfitting, and reduces training time by eliminating irrelevant and redundant data. However, iterative feature selection techniques tend to select the best-performing subset of features through an iterative process which leaves chance for a crucial feature not being selected and the number of selected features may not always be the optimal or the most suitable for a given problem. Hence, this study aims to conduct a performance comparison analysis of an iterative feature selection technique- Recursive Feature Elimination with Cross-Validation (RFECV) with six supervised Machine Learning (ML) models to evaluate its efficiency in classifying ransomware utilizing the Application Programming Interface (API) call and network traffic features. The study employs an Explainable Artificial Intelligence (XAI) framework called SHapley Additive exPlanations (SHAP) to derive the crucial features when RFECV is not integrated with the ML models. These features are then compared with RFECV-selected features when it is integrated. Results show that without RFECV the ML models achieve better classification accuracies on two datasets. Again, RFECV falls short of selecting impactful features, leading to more false alarms. Moreover, it lacks the capability to rank the features based on their importance, reducing its efficiency in ransomware classification overall. Thus, this study underscores the importance of integrating explainability techniques to identify critical features, rather than solely relying on iterative feature selection methods, to enhance the resilience of ransomware detection systems.

Từ khóa


Tài liệu tham khảo

Hasan MM, Rahman MM. Ranshunt: A support vector machines based ransomware analysis framework with integrated feature set. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), 2017;1–7. https://doi.org/10.1109/ICCITECHN.2017.8281835.

Young AL, Yung M. Cryptovirology: the birth, neglect, and explosion of ransomware. Commun ACM. 2017;60(7):24–6. https://doi.org/10.1145/3097347.

Moussaileb R, Cuppens N, Lanet J-L, Le Bouder H. Ransomware network traffic analysis for pre-encryption alert. In: Benzekri A, Barbeau M, Gong G, Laborde R, Garcia-Alfaro J, editors. Foundations and practice of security. Cham: Springer; 2020. p. 20–38.

Young A, Yung M. Cryptovirology: extortion-based security threats and countermeasures. In: Proceedings 1996 IEEE Symposium on Security and Privacy, 1996;129–140. https://doi.org/10.1109/SECPRI.1996.502676.

Savage K, Coogan P, Lau H. The evolution of ransomware. https://docs.–broadcom.com/doc/the-evolution-of-ransomware-15-en (accessed on 10 March 2023).

Gane B. 9 Scariest Ransomware Viruses. Available. http://www.e92plus.com/blog/ e92plus/2017/06/02/9-scariestransomware-viruses (accessed on 29 June 2017).

Young A, Yung M. Malicious cryptography: exposing cryptovirology. Hoboken: John Wiley & Sons Inc; 2004.

Yang T, Yang Y, Qian K, Lo DC-T, Qian Y, Tao L. Automated detection and analysis for android ransomware. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, 2015;1338–1343. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.39.

Sgandurra D, Muñoz-González L, Mohsen R, Lupu EC. Automated Dynamic Analysis of Ransomware: Benefits, Limitations and use for Detection. 2016.

Maniath S, Ashok A, Poornachandran P, Sujadevi VG, Sankar AU, P, Jan S. Deep learning lstm based ransomware detection. In: 2017 Recent Developments in Control, Automation & Power Engineering (RDCAPE), pp. 2017;442–446. https://doi.org/10.1109/RDCAPE.2017.8358312.

Vinayakumar R, Soman KP, Senthil Velan KK, Ganorkar S. Evaluating shallow and deep networks for ransomware detection and classification. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017;259–265 https://doi.org/10.1109/ICACCI.2017.8125850.

Takeuchi Y, Sakai K, Fukumoto S. Detecting ransomware using support vector machines. In: Workshop Proceedings of the 47th International Conference on Parallel Processing. ICPP Workshops ’18. Association for Computing Machinery, New York, NY, USA. 2018https://doi.org/10.1145/3229710.3229726.

Hwang J, Kim J, Lee S, Kim K. Two-stage ransomware detection using dynamic analysis and machine learning techniques. Wireless Pers Commun. 2020;112:2597–609.

Zhang H, Xiao X, Mercaldo F, Ni S, Martinelli F, Sangaiah AK. Classification of ransomware families with machine learning based onn-gram of opcodes. Futur Gener Comput Syst. 2019;90:211–21. https://doi.org/10.1016/j.future.2018.07.052.

Baldwin J, Dehghantanha A. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds.) Leveraging Support Vector Machine for Opcode Density Based Detection of Crypto-Ransomware, 2018;107–136. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_6.

Khammas BM. Ransomware detection using random forest technique. ICT Express. 2020;6(4):325–31. https://doi.org/10.1016/j.icte.2020.11.001.

Subedi KP, Budhathoki DR, Dasgupta D. Forensic analysis of ransomware families using static and dynamic analysis. In: 2018 IEEE Security and Privacy Workshops (SPW), 2018;180–185. https://doi.org/10.1109/SPW.2018.00033.

Shaukat SK, Ribeiro VJ. Ransomwall: a layered defense system against cryptographic ransomware attacks using machine learning. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS), 2018;356–363. https://doi.org/10.1109/COMSNETS.2018.8328219.

Ferrante A, Malek M, Martinelli F, Mercaldo F, Milosevic J. Extinguishing ransomware—a hybrid approach to android ransomware detection. In: Imine A, Fernandez JM, Marion J-Y, Logrippo L, Garcia-Alfaro J, editors. Foundations and Practice of Security. Cham: Springer; 2018. p. 242–58.

Roundy KA, Miller BP. Binary-code obfuscations in prevalent packer tools. ACM Comput Surv. 2013. https://doi.org/10.1145/2522968.2522972.

Coogan K, Debray S, Kaochar T, Townsend G. Automatic static unpacking of malware binaries. In: 2009 16th Working Conference on Reverse Engineering, 2009;167–176. https://doi.org/10.1109/WCRE.2009.24.

Almashhadani AO, Kaiiali M, Sezer S, O’Kane P. A multi-classifier network-based crypto ransomware detection system: A case study of Locky ransomware. IEEE Access. 2019;7:47053–67. https://doi.org/10.1109/ACCESS.2019.2907485.

Chen Z-G, Kang H-S, Yin S-N, Kim S-R. Automatic ransomware detection and analysis based on dynamic api calls flow graph. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems. RACS ’17, 2017;196–201. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3129676.3129704.

Cabaj K, Mazurczyk W. Using software-defined networking for ransomware mitigation: the case of cryptowall. IEEE Network. 2016;30(6):14–20. https://doi.org/10.1109/MNET.2016.1600110NM.

Aragorn T, Yun-chun C, YiHsiang K, Tsungnan L. Deep Learning for Ransomware Detection. https://www.semanticscholar.org/paper/Deep-Learning-for-Ransomware-Detection-Aragorn-Yun-chun/cc3a41b37230861cfe429632744e0d1db19256b7 (accessed on 11 March 2023).

Alhawi OMK, Baldwin J, Dehghantanha A. Leveraging machine learning techniques for windows ransomware network traffic detection, 2018;93–106 https://doi.org/10.1007/978-3-319-73951-9_5.

Bae SI, Lee GB, Im EG. Ransomware detection using machine learning algorithms. Concurrency Comput Pract Exp. 2020;32(18):5422. https://doi.org/10.1002/cpe.5422.

Almashhadani AO, Carlin D, Kaiiali M, Sezer S. Mfmcns: a multi-feature and multi-classifier network-based system for ransomworm detection. Comput Secur. 2022;121: 102860. https://doi.org/10.1016/j.cose.2022.102860.

Singh J, Sharma K, Wazid M, Das AK. Sinn-rd: spline interpolation-envisioned neural network-based ransomware detection scheme. Comput Electr Eng. 2023;106: 108601. https://doi.org/10.1016/j.compeleceng.2023.108601.

Continella A, Guagnelli A, Zingaro G, De Pasquale G, Barenghi A, Zanero S, Maggi F. Shieldfs: a self-healing, ransomware-aware filesystem. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. ACSAC ’16, pp. 336–347. Association for Computing Machinery, New York, NY, USA 2016. https://doi.org/10.1145/2991079.2991110.

Lu T, Zhang L, Wang S, Gong Q. Ransomware detection based on v-detector negative selection algorithm. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2017;531–536. https://doi.org/10.1109/SPAC.2017.8304335.

Zahoora U, Khan A, Rajarajan M, Khan SH, Asam M, Jamal T. Ransomware detection using deep learning based unsupervised feature extraction and a cost sensitive pareto ensemble classifier. Sci Rep. 2022. https://doi.org/10.1038/s41598-022-19443-7.

Masum M, Hossain Faruk MJ, Shahriar H, Qian K, Lo D, Adnan MI. Ransomware classification and detection with machine learning algorithms. In: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), 2022;0316–0322. https://doi.org/10.1109/CCWC54503.2022.9720869.

VirusShare.com—Because Sharing is Caring. http://virusshare.com (accessed on 8 October 2022).

rmowri/GetRansomware. https://github.com/rmowri/GetRansomware (accessed on 8 October 2022).

ytisf/theZoo. http://github.com/ytisf/theZoo (accessed on 8 October 2022).

Free Automated Malware Analysis Service—powered by Falcon Sandbox. https://www.hybrid-analysis.com/ (accessed on 8 October 2022).

malware-traffic-analysis.net Homepage. https://www.malware-traffic-analysis.net/ (accessed on 11 March 2023).

Al-rimy BAS, Maarof MA, Shaid SZM. Ransomware threat success factors, taxonomy, and countermeasures: a survey and research directions. Comput Secur. 2018;74:144–66. https://doi.org/10.1016/j.cose.2018.01.001.

Al-Bakri AM, Hussein HL. Static analysis based behavioral api for malware detection using markov chain. Comput Eng Intel Syst. 2014;5:55–63.

Amro SA, Cau A. Behavioural api based virus analysis and detection. 2012.

Falcon Sandbox: Automated Malware Analysis Tool - CrowdStrike. https://www.crowdstrike.com/products/threatintelligence/falconsandbox-malware-analysis (accessed on 10 May 2022).

PayloadSecurity. https://github.com/PayloadSecurity/VxAPI (accessed on 9 October 2022).

Recursive Feature Elimination. https://www.scikit-yb.org/en/latest/api/model_selection/rfecv.html#:~:text=Recursive%20feature%20elimination%20(RFE)%20is,number%20of%20features%20is%20reached (accessed on 30 October 2022).

Narudin FA, Feizollah A, Anuar NB, Gani A. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 2016;20(1):343–57. https://doi.org/10.1007/s00500-014-1511-6.

Berrueta E, Morato D, Magaña E, Izal M. A survey on detection techniques for cryptographic ransomware. IEEE Access. 2019;7:144925–44. https://doi.org/10.1109/ACCESS.2019.2945839.

Wireshark. https://www.wireshark.org/ (accessed on 9 October 2022).

Wireshark User Guide. https://www.wireshark.org/docs/wsug_html/ #ChapterIO (accessed on 9 October 2022).

Berrueta E, Morato D, Magaña E, Izal M. Open repository for the evaluation of ransomware detection tools. IEEE Access. 2020;8:65658–69. https://doi.org/10.1109/ACCESS.2020.2984187.

Pandas get_dummies (One-Hot Encoding) Explained. https://datagy.io/pandas-get-dummies/ (accessed on 30 October 2022).

One-vs-Rest and One-vs-One for Multi-Class Classification. https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/ (accessed on 1 December 2022).

sklearn.multiclass.OneVsRestClassifier. https://scikit-learn.org/stable/ modules/generated/sklearn.multiclass. OneVsRestClassifier.html (accessed on 1 December 2022).

sklearn.multiclass.OneVsOneClassifier. https://scikit-learn.org/stable/ modules/generated/sklearn.multiclass. OneVsOneClassifier.html (accessed on 1 December 2022).

sklearn.model_selection.RandomizedSearchCV. https://scikit-learn.org/stable/ modules/generated/sklearn.model_selection. RandomizedSearchCV.html (accessed on 1 December 2022).

Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, 2017;4768–4777. Curran Associates Inc., Red Hook, NY, USA.

Molnar C. Chapter 6 Model-Agnostic Methods. https://christophm.github.io/ interpretable-ml-book/agnostic.html (accessed on 11 March 2023).

Welcome to the SHAP Documentation. https://shaplrjball.readthedocs.io/en/ latest/ index.html (accessed on 9 October 2022).