Short- versus long-term performance of detection models for obfuscated MSOffice-embedded malware

Silviu Viţel1,2, Marilena Lupaşcu1,2, Dragoş Teodor Gavriluţ1,2, Henri Luchian1
1Faculty of Computer Science, “Al.I.Cuza” University, Iaşi, Romania
2Bitdefender Labs, Iaşi, Romania

Tóm tắt

This paper analyzes the efficiency of various machine learning models (artificial neural networks, random forest, decision tree, AdaBoost and XGBoost) against the evolution of VBA-based (Visual Basic for Applications) malware over a large period of time (1995–2021). The file set used in our research is comprehensive—approximately 1.9 million files (out of which 944,595 are malicious and the rest are benign)—which allowed to gain insights on the resilience of various machine learning models against the diversity and the evolution of file features that reflect obfuscation techniques in VBA-based malware. In studying detection of VBA-based malware, we focus on characteristics of both the classifiers—proactivity (short-term detection efficiency against future malware), endurance (long-term detection robustness)—and of the detection-wise relevant file features—feature perishability (dynamics of feature relevance). We also describe in some detail—as a prerequisite of the study—various obfuscation techniques used by the malware under investigation during the last decade.

Tài liệu tham khảo

Viţel, S., Lupaşcu, M., Gavriluţ, D.T., Luchian, H.: Detection of msoffice-embedded malware: Feature mining and short- vs. long-term performance. In: Su, C., Gritzalis, D., Piuri, V. (eds.) Information Security Practice and Experience, pp. 287–305. Springer, Cham (2022) Viţel, SC., Lupaşcu, M., Gavriluţ, DT., Luchian, H.: Evolution of macro vba obfuscation techniques. In: 2022 15th International Conference on Security of Information and Networks (SIN), pp. 1–8 (2022). https://doi.org/10.1109/SIN56466.2022.9970550 You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: 2010 International conference on broadband, wireless computing, communication and applications, pp. 297–300. IEEE (2010) Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transformations. Tech. Rep. 148, Department of Computer Sciences, The University of Auckland (1997). http://www.cs.auckland.ac.nz/~/Research/Publications/CollbergThomborsonLow97a/index.html Ertaul, L., Venkatesh, S.: Jhide—a tool kit for code obfuscation. In: IASTED Conference on Software Engineering and Applications, pp. 133–138 (2004) Ertaul, L., Venkatesh, S.: Novel obfuscation algorithms for software security. In: Proceedings of the 2005 International Conference on Software Engineering Research and Practice, SERP, Citeseer, vol. 5 (2005) Xu, W., Zhang, F., Zhu, S.: The power of obfuscation techniques in malicious javascript code: a measurement study. In: 2012 7th International Conference on Malicious and Unwanted Software, pp. 9–16 (2012). https://doi.org/10.1109/MALWARE.2012.6461002 Kolisar: Whitespace: A different approach to javascript obfuscation (2008). https://defcon.org/images/defcon-16/dc16-presentations/defcon-16-kolisar.pdf Chellapilla, K., Maykov, A.: A taxonomy of javascript redirection spam. In: AIRWeb ’07 (2007) AL-Taharwa, I.A., Lee, H.M., Jeng, A.B., Wu, K.P., Ho, C.S., Chen, S.M.: Jsod: Javascript obfuscation detector. Secur. Commun. Netw. 8(6), 1092–1107 (2015) Xu, W., Zhang, F., Zhu, S.: Jstill: mostly static detection of obfuscated malicious javascript code. In: Proceedings of the third ACM conference on Data and application security and privacy, pp. 117–128 (2013) Choi, Y., Kim, T., Choi, S., Lee, C.: Automatic detection for javascript obfuscation attacks in web pages through string pattern analysis. In: Ślezak, D., Lee, Y., Kim, T., Fang, W. (eds.) Future Generation Information Technology, pp. 160–172. Springer, Berlin (2009) Liu, C., Xia, B., Yu, M., Liu, Y.: Psdem: a feasible de-obfuscation method for malicious powershell detection. In: 2018 IEEE Symposium on Computers and Communications (ISCC), pp 825–831. IEEE (2018) Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: Powerdrive: accurate de-obfuscation and analysis of powershell malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp 240–259. Springer (2019) Hendler, D., Kels, S., Rubin, A.: Detecting malicious powershell commands using deep neural networks. In: Proceedings of the 2018 on Asia conference on computer and communications security, pp. 187–197 (2018) Aboud, E., O’Brien, D.: Detection of malicious VBA macros using machine learning methods (2018) Kim, S., Hong, S., Oh, J., Lee, H.: Obfuscated VBA macro detection using machine learning. In: DSN, IEEE Computer Society, pp. 490–501 (2018) De los Santos, S., Torres, J.: Macro malware detection using machine learning techniques—a new approach. In: ICISSP, pp. 295–302 (2017) Bearden, R., Lo, DCT: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (2017) Huneault-Leblanc, S., Talhi, C.: P-code based classification to detect malicious vba macro. In: 2020 International Symposium on Networks. Computers and Communications (ISNCC), pp. 1–6. IEEE (2020) Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. J. Inf. Process. 27, 555–563 (2019) Mimura, M.: An improved method of detecting macro malware on an imbalanced dataset. IEEE Access 8, 204709–204717 (2020) Mimura, M.: Using sparse composite document vectors to classify VBA macros, pp. 714–720. (2019)https://doi.org/10.1007/978-3-030-36938-5_46 Mimura, M.: Using fake text vectors to improve the sensitivity of minority class for macro malware detection. J. Inf. Secur. Appl. 54, 102600 (2020) Ravi, V., Gururaj, S., Vedamurthy, H., Nirmala, M.: Analysing corpus of office documents for macro-based attacks using machine learning. Glob. Trans. Proc. 3, 20–24 (2022) Nissim, N., Cohen, A., Elovici, Y.: Aldocx: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. EEE Trans. Inf. Forensic Secur. 12, 631–646 (2016) Cohen, A., Nissim, N., Rokach, L., Elovici, Y.: Sfem: structural feature extraction methodology for the detection of malicious office documents using machine learning methods. Expert Syst. Appl. 63, 324–343 (2016) Casino, F., Totosis, N., Apostolopoulos, T., Lykousas, N., Patsakis, C.: Analysis and correlation of visual evidence in campaigns of malicious office documents. Association for Computing Machinery, New York, NY, USA (2022) https://doi.org/10.1145/3513025 Rudd, EM., Harang, RE., Saxe, J.: MEADE: towards a malicious email attachment detection engine (2018) CoRR abs/1804.08162, arXiv:1804.08162 Yang, S., Chen, W., Li, S., Xu, Q.: Approach using transforming structural data into image for detection of malicious ms-doc files based on deep learning models. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 28–32 (2019) Lu, X., Wang, F., Shu, Z.: Malicious word document detection based on multi-view features learning pp. 1–6 (2019) https://doi.org/10.1109/ICCCN.2019.8846940 Li, Wj., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.: A study of malcode-bearing documents (2007) Koutsokostas, V., Lykousas, N., Apostolopoulos, T., Orazi, G., Ghosal, A., Casino, F., Conti, M., Patsakis, C.: Invoice# 31415 attached: Automated analysis of malicious microsoft office documents. Comput. Secur. 114(102), 582 (2022) Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.: Combining static and dynamic analysis for the detection of malicious documents (2011) Yu, M., Jiang, J., Li, G., Li, J., Lou, C., Liu, C., Huang, W., Wang, Y.: A unified malicious documents detection model based on two layers of abstraction (2019) Iwamoto, K., Wasaki, K.: A method for shellcode extraction from malicious document files using entropy and emulation. Int. J. Eng. Technol. 8, 101–106 (2015) Schreck, T., Berger, S., Göbel, J.: Bissam: automatic vulnerability identification of office documents (2012) Smutz, C., Stavrou, A.: Preventing exploits in microsoft office documents through content randomization (2015) Otsubo, Y.: O-checker : Detection of malicious documents through deviation from file format specifications (2016) Moubarak, J., Feghali, T.: Comparing machine learning techniques for malware detection. In: ICISSP (2020) Azeez, N.A., Odufuwa, O.E., Misra, S., Oluranti, J., Damaševičius, R.: Windows pe malware detection using ensemble learning. Informatics 8(1), 10 (2021) Szandała, T.: Review and comparison of commonly used activation functions for deep neural networks. In: Bio-inspired Neurocomputing, pp. 203–224 (2021) Gabor, S.: Vba is not dead! Virus Bulletin (2014). https://www.virusbulletin.com/virusbulletin/2014/07/vba-not-dead