Graph embedding as a new approach for unknown malware detection
Tóm tắt
Malware is any type of computer program which is developed to harm computers, networks, and information. Noticeable growth of malware development has made computer and network security a significant and challenging area in recent years. There is an intensive competition between malwares and antiviruses. Malware authors make every effort to develop new harmful codes using various programming tricks and exploits which are unseen for detection techniques. On the other hand, antivirus developers upgrade their methods and algorithms to recognize unknown malware. Therefore, an accurate and rapid detection method is an irrefutable demand in computer security area. This paper proposes a new malware detection method based on the OpCodes within an executable file. Proposed method generates a graph of operational codes (OpCode) within an executable file and then embeds this graph into eigenspace using “Power Iteration” method. This will help us represent an executable file as a linear combination of eigenvectors proportionate to their eigenvalues, which is beneficial to train machine learning classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The main advantages of our proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.
Tài liệu tham khảo
Elhadi, A.A.E., Maarof, M.A., Osman, A.H.: Information Assurance and Security Research Group, Faculty of Computer Science and Infor.: Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9(3), 283–288 (2012)
Bazrafshan, Z., Hashemi, H., Mehdi, S., Fard, H., Hamzeh, A., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. Inf. Knowl. Technol. (IKT) 2, 113–120 (2013)
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)
Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), pp. 74–79 (2009)
Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. 1(2), 156–168 (2007)
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–342. IEEE (2014)
Uppal, D., Sinha, R.: Exploring behavioral aspects of API calls for malware identification and categorization. In: Networks (CICN), vol. 2014 (2014)
Sundarkumar, G., Ravi, V.: Malware detection via API calls, topic models and machine learning. In: (CASE), 2015 IEEE (2015)
Fan, C.I., Hsiao, H.W., Chou, C.H., Tseng, Y.F.: Malware detection systems based on API log data mining. In: 2015 IEEE 39th Annual Computer Software and Application Conference (COMPSAC), pp. 255–60. IEEE (2015)
Alam, S., Traore, I., Sogukpinar, I.: Annotated control flow graph for metamorphic malware detection. Comput. J. (2014)
Cesare S., Xiang Y., Zhou W.: Control flow-based malware variant detection. In: IEEE Transactions on Dependable and Secure Computing, pp. 307–317. IEEE (2014)
Wüchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proc. 9th ACM Symp. Information, Comput. Commun. Secur. - ASIA CCS ’14, pp. 271–282 (2014)
Wüchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. (2015). arXiv:1502.01609
Abou-assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
Canfora, G., Lorenzo, A.: Effectiveness of opcode ngrams for detection of multi family android malware. In: 10th International Conference on Availability, Reliability and Security (ARES), pp. 333–340. IEEE (2015)
Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G.: Opcode-sequence-based semi-supervised unknown malware detection. Comput. Intell. Secur. Inf. Syst., pp. 50–57 (2011)
Santos, I., Brezo, F., Nieves, J., Penya, Y.Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. Eng. Secur. Softw. Syst., pp. 35–43 (2010)
Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220 (2011)
Santos, I., Laorden, C., Bringas, P.G.P.: Collective classification for unknown malware detection. SECRYPT, pp. 251–256 (2011)
Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011)
Jalote, P., Jalote, P., Jalote, P.: An integrated approach to software engineering. Springer, NewYork (2005)
Mccabe, T.J.: A complexity measure. Softw. Eng. IEEE Trans. 4, 308–320 (1976)
Wilhelm, R., Engblom, J., Ermedahl, A.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., vol. V, pp. 1–47 (2008)
Allen, F.E.: Control flow analysis. ACM SIGPLAN Not. 5(7), 1–19 (1970)
Zhao, Z.: A virus detection scheme based on features of Control Flow Graph. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943–947 (2011)
Mitchell, T.M.: Machine learning and data mining over the past. vol. 42, no. 11 (1999)
Breiman, L.: Bagging predictors. Mach. Learn. 140, 123–140 (1996)
Breu, F., Guggenbichler, S., Wollmann, J.: Random forests. Vasa, pp. 1–35 (2008)
Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5–6 (1996)
Arnold, W., Tesauro, G., Heights, Y.: Automatically generated Win32 heuristic virus detection. In: Proc. 2000 Int. virus Bull. Conf., no. September, pp. 51–60 (2000)
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. 35(1), 1–20 (2010)
Peng, H., Long, F., Ding C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012)
Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)
Shanmugam, G., Low, R.M., Stamp M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9(3), 159–170 (2013)
Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2012)
Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3(2), 65–74 (2007)
Dhavare, A., Low, R.M., Stamp, M.: Efficient cryptanalysis of homophonic substitution ciphers. Cryptologia 37(3), 250–281 (2013)
Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001)
Riesen, K., Bunke, H.: Graph classification and clustering based on vector space embedding. World Scientific, Singapore (2010)
Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Brain Cogn. 52, 262 (2007)
Chung, F.R.K.: Spectral Graph Theory, vol. 30. AMS Bookstore (1999)
Hancock, E.R.: Structural graph matching using the EM algorithm and singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1120–1136 (2001)
Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: ICPR (2), no. C, pp. 489–492 (2004)
Robles-Kelly, A., Hancock, E.R.: A Riemannian approach to graph embedding. Pattern Recognit. 40(3), 1042–1056 (2007)
Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. Pattern Anal. Mach. Intell. IEEE 10(5), 695–703 (1988)
Luo, B., Wilson, R.C., Hancock, E.R., Wilson, R.C.: Spectral embedding of graphs. Pattern Recognit. 36(10), 2213–2230 (2003)
Lin, F., Cohen, W.W.: Power iteration clustering. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 655–662 (2010)
Harrington, P.: Machine Learning in Action, vol. 37, no. 3 (2012)
Eskandari, M., Khorshidpour, Z., Hashemi, S.: HDM-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy esti-mation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995)