Các ứng dụng của học sâu trong phát hiện malware di động: Một đánh giá hệ thống về tài liệu

Neural Computing and Applications - Tập 34 - Trang 1007-1032 - 2021
Cagatay Catal1, Görkem Giray2, Bedir Tekinerdogan3
1Department of Computer Science and Engineering, Qatar University, Doha, Qatar
2İzmir, Turkey
3Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands

Tóm tắt

Để phát hiện và giải quyết các loại malware khác nhau, các kỹ thuật mới đã được đề xuất, trong đó thuật toán học sâu đóng vai trò quan trọng. Mặc dù đã có nhiều nghiên cứu về sự phát triển của các phương pháp phát hiện malware di động dựa trên học sâu, nhưng chúng vẫn chưa được xem xét một cách chi tiết. Bài báo này nhằm mục tiêu xác định, đánh giá và tổng hợp các bài báo đã được công bố liên quan đến việc ứng dụng các kỹ thuật học sâu trong phát hiện malware di động. Một Đánh giá Tài liệu Hệ thống đã được thực hiện trong đó chúng tôi chọn ra 40 bài báo từ tạp chí để phân tích sâu. Đánh giá này trình bày và phân loại các bài báo này dựa trên các loại học máy, nguồn dữ liệu, thuật toán học sâu, các tham số và phương pháp đánh giá, kỹ thuật lựa chọn đặc trưng, tập dữ liệu và các nền tảng triển khai học sâu. Nghiên cứu cũng nêu bật những thách thức, giải pháp đề xuất và hướng nghiên cứu trong tương lai về việc sử dụng học sâu trong phát hiện malware di động. Nghiên cứu cho thấy rằng các thuật toán Mạng Nơ-ron Tích chập và Mạng Nơ-ron Sâu là những thuật toán học sâu được sử dụng nhiều nhất. Gọi API, Quyền truy cập và Gọi hệ thống là những đặc trưng chiếm ưu thế nhất. Keras và Tensorflow là các nền tảng phổ biến nhất. Drebin và VirusShare là những tập dữ liệu được sử dụng rộng rãi nhất. Học có giám sát và đặc trưng tĩnh là các loại học máy và nguồn dữ liệu được ưa chuộng nhất.

Từ khóa

#học sâu #phát hiện malware #khảo sát tài liệu hệ thống #thuật toán học máy #đặc trưng

Tài liệu tham khảo

Ab Razak MF, Anuar NB, Salleh R, Firdaus A (2016) The rise of “malware”: bibliometric analysis of malware study. J Netw Comput Appl 75:58–76 Ali NB, Petersen K (2014) Evaluating strategies for study selection in systematic literature studies. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement pp. 1–4 Antonakakis M, April T, Bailey M, Bernhard M, Bursztein E, Cochran J, Durumeric Z, Halderman JA, Invernizzi L, Kallitsis M, Kumar D (2017) Understanding the mirai botnet. In: 26th {USENIX} security symposium ({USENIX} Security 17) pp. 1093–1110 AppBrain, “Number of Android apps on Google Play.” [Online]. Available: https://www.appbrain.com/stats/number-of-android-apps. [Accessed: 17-July-2020]. Aslan ÖA, Samet R (2020) A comprehensive review on malware detection approaches. IEEE Access 8:6249–6271 Baltrušaitis T, Ahuja C, Morency LP (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443 Bazrafshan Z, Hashemi H, Fard SMH, Hamzeh A (2013) A survey on heuristic malware detection techniques. In: The 5th conference on information and knowledge technology IEEE, pp. 113–120 Berman DS, Buczak AL, Chavis JS, Corbett CL (2019) A survey of deep learning methods for cyber security. Information 10(4):122 Brownlee J (2016) Deep learning with Python: develop deep learning models on Theano and TensorFlow using Keras. Machine Learning Mastery, Vermont Brownlee J (2017) Long Short-term memory networks with Python: develop sequence prediction models with deep learning. Machine Learning Mastery, Vermont Brownlee J (2019) Deep learning for computer vision: image classification, object detection, and face recognition in Python. Machine Learning Mastery, Vermont Budgen D, Brereton P, Drummond S, Williams N (2018) Reporting systematic reviews: some lessons from a tertiary study. Inf Softw Technol 95:62–74 Carlin D, Burgess J, O’Kane P, Sezer S (2019) You could be mine (d): the rise of cryptojacking. IEEE Secur Priv 18(2):16–22 Catal C (2012) On the application of genetic algorithms for test case prioritization: a systematic literature review. In: Proceedings of the 2nd international workshop on Evidential assessment of software technologies pp. 9–14. Catal C, Mishra D (2013) Test case prioritization: a systematic mapping study. Software Qual J 21(3):445–478 Catal C, Sevim U, Diri B (2010) Metrics-driven software quality prediction without prior fault data. In: Ao SI, Gelman L (eds) Electronic Engineering and Computing Technology. Springer, Dordrecht, pp 189–199 Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electr Eng 67:15–24 Cui Z, Xue F, Cai X, Cao Y, Wang GG, Chen J (2018) Detection of malicious code variants based on deep learning. IEEE Trans Industr Inf 14(7):3187–3196 Darabian H, Dehghantanha A, Hashemi S, Homayoun S, Choo KKR (2020) An opcode-based technique for polymorphic Internet of Things malware detection. Concurr Comput Practice Exp 32(6):e5173 Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inform Process. https://doi.org/10.1017/atsip.2013.9 Du Z, Miao Q, Zong C (2020) Trajectory planning for automated parking systems using deep reinforcement learning. Int J Automot Technol 21(4):881–887 Elkahky AM, Song Y, He X (2015) A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th international conference on world wide web pp. 278–288 Farfade SS, Saberian MJ, Li LJ (2015) Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on international conference on multimedia retrieval pp. 643–650 Feizollah A, Anuar NB, Salleh R, Wahab AWA (2015) A review on feature selection in mobile malware detection. Digit Investig 13:22–37 Gay G, Menzies T, Cukic B, Turhan B (2009) How to build repeatable experiments. In: Proceedings of the 5th international conference on predictor models in software engineering pp. 1–9 Gibert D, Mateu C, Planes J (2020) The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J Network Comput Appl 153:102526 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. Adv Neural Info Process Syst 27 Griffin K, Schneider S, Hu X, Chiueh TC (2009) Automatic generation of string signatures for malware detection. In: International workshop on recent advances in intrusion detection. Springer, Berlin, Heidelberg. pp. 101–120 Hassler E, Carver JC, Kraft NA, Hale D (2014) Outcomes of a community workshop to identify and rank barriers to the systematic literature review process. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering. pp. 1–10 He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp. 770–778 Hsiao SC, Kao DY, Liu ZY, Tso R (2019) Malware image classification using one-shot learning with Siamese networks. Proced Comput Sci 159:1863–1871 Jerome Q, Allix K, State R, Engel T (2014) Using opcode-sequences to detect malicious Android applications. In: 2014 IEEE international conference on communications (ICC) IEEE. pp. 914–919 Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture: a survey. Comput Electron Agric 147:70–90 Kitchenham BA, Dyba T, Jorgensen M (2004) Evidence-based software engineering. In: Proceedings. 26th international conference on software engineering IEEE. pp. 273–281 Kitchenham B, Pretorius R, Budgen D, Brereton OP, Turner M, Niazi M, Linkman S (2010) Systematic literature reviews in software engineering–a tertiary study. Inf Softw Technol 52(8):792–805 Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 51(1):7–15. https://doi.org/10.1016/j.infsof.2008.09.009 Kok SH, Abdullah A, Jhanjhi NZ, Supramaniam M (2019) Ransomware, threat and detection techniques: a review. Int J Comput Sci Network Secur 19(2):136 Kolias C, Kambourakis G, Stavrou A, Voas J (2017) DDoS in the IoT: mirai and other botnets. Computer 50(7):80–84 Kouliaridis V, Barmpatsalou K, Kambourakis G, Chen S (2020) A survey on mobile malware detection techniques. IEICE Trans Inf Syst 103(2):204–211 Kuznietsov Y, Stuckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp. 6647–6655 Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496 Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: 2018 IEEE international conference on robotics and automation (ICRA) IEEE, pp. 7286–7291 Li Y, Yang M, Zhang Z (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883 Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Jeroen Van Der, Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88 Liu K, Xu S, Xu G, Zhang M, Sun D, Liu H (2020) A review of android malware detection approaches based on machine learning. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3006143 Liu X, Liu J (2014) A two-layered permission-based android malware detection scheme. In: 2014 2nd IEEE international conference on mobile cloud computing, services, and engineering IEEE, pp. 142–148 Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2016) Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sens 55(2):645–657 Mahdavifar S, Ghorbani AA (2019) Application of deep learning to cybersecurity: a survey. Neurocomputing 347:149–176 McLaughlin N, Martinez del Rincon J, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A, Joon Ahn G (2017) Deep android malware detection. In: Proceedings of the seventh ACM on conference on data and application security and privacy. pp. 301–308 Miles MB, Huberman AM, Saldana J (2014) Qualitative data analysis: a methods sourcebook, 3rd edn. SAGE Publications Inc., Thousand Oaks, CA Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533 Nataraj L, Karthikeyan S, Jacob G, Manjunath BS (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security. pp. 1–7 Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11). pp 689–696 Oussidi A, Elhassouny A (2018) Deep generative models: survey. In: 2018 international conference on intelligent systems and computer vision (ISCV). IEEE, pp. 1–8 Pan Y, Ge X, Fang C, Fan Y (2020) A systematic literature review of android malware detection using static analysis. IEEE Access 8:116363–116379 Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12th international conference on evaluation and assessment in software engineering (EASE) 12. pp. 1–10 Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18 Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36 Qamar A, Karim A, Chang V (2019) Mobile malware attacks: review, taxonomy & future directions. Futur Gener Comput Syst 97:887–909 Salakhutdinov R, Hinton G (2009) Deep boltzmann machines. In: Artificial intelligence and statistics. pp 448–455. PMLR Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J Intell Inform Syst 38(1):161–190 Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inform Secur Tech Rep 14(1):16–29 Sohn K, Shang W, Lee H (2014) Improved multimodal deep learning with variation of information. Adv Neural Inform Process Syst 27:2141–2149 Souri A, Hosseini R (2018) A state-of-the-art survey of malware detection approaches using data mining techniques. HCIS 8(1):3 Suresh S, Di Troia F, Potika K, Stamp M (2019) An analysis of Android adware. J Comput Virol Hacking Tech 15(3):147–160 Tarhan A, Giray G (2017) On the use of ontologies in software process assessment: a systematic literature review. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering. pp. 2–11 Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 30 Tummers J, Kassahun A, Tekinerdogan B (2019) Obstacles and features of farm management information systems: a systematic literature review. Comput Electron Agric 157:189–204. https://doi.org/10.1016/j.compag.2018.12.044 Ucci D, Aniello L, Baldoni R (2019) Survey of machine learning techniques for malware analysis. Comput Secur 81:123–147 Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering—EASE ’14, 1–10. Doi: https://doi.org/10.1145/2601248.2601268 Ye Y, Chen L, Hou S, Hardy W, Li X (2018) DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl Inf Syst 54(2):265–285 Ye Y, Li T, Adjeroh D, Iyengar SS (2017) A survey on malware detection using data mining techniques. ACM Comput Surv (CSUR) 50(3):1–40 Yuxin D, Siyi Z (2019) Malware detection based on deep learning algorithm. Neural Comput Appl 31(2):461–472 Zeng J, Hu J, Zhang Y (2018) Adaptive traffic signal control with deep recurrent Q-learning. In: 2018 IEEE intelligent vehicles symposium (IV), IEEE, pp. 1215–1220 Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor 21(3):2224–2287 Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657 Alotaibi A (2019) Identifying malicious software using deep residual long-short term memory. IEEE Access 7:163128–163137 Alzaylaee MK, Yerima SY, Sezer S (2020) DL-Droid: deep learning based android malware detection using real devices. Comput Secur 89:101663 Amin M, Shah B, Sharif A, Ali T, Kim KL, Anwar S (2019) Android malware detection through generative adversarial networks. Trans Emerg Telecommun Technol. https://doi.org/10.1002/ett.3675 Amin M, Tanveer TA, Tehseen M, Khan M, Khan FA, Anwar S (2020) Static malware detection and attribution in android bytecode through an end-to-end deep system. Futur Gener Comput Syst 102:112–126 Ananya A, Aswathy A, Amal TR, Swathy PG, Vinod P, Mohammad S (2020) SysDroid: a dynamic ML-based android malware analyzer using system call traces. Cluster Comput 23:2789–2808 Bakhshinejad N, Hamzeh A (2019) Parallel-CNN network for malware detection. IET Inf Secur 14(2):210–219 Chen T, Mao Q, Lv M, Cheng H, Li Y (2019) DroidVecDeep: android malware detection based on Word2Vec and deep belief network. TIIS 13(4):2180–2197 D’Angelo G, Ficco M, Palmieri F (2020) Malware detection in mobile environments based on autoencoders and API-images. J Parallel Distrib Comput 137:26–33 De Lorenzo A, Martinelli F, Medvet E, Mercaldo F, Santone A (2020) Visualizing the outcome of dynamic analysis of Android malware with VizMal. J Inform Secur App 50:102423 Dharmalingam VP, Palanisamy V (2020) A novel permission ranking system for android malware detection—the permission grader. J Ambient Intell Human Comput 12:5071–5081 Jan S, Ali T, Alzahrani A, Musa S (2018) Deep convolutional generative adversarial networks for intent-based dynamic behavior capture. Int J Eng Technol 7(4.29):101–103 Karbab EB, Debbabi M, Derhab A, Mouheb D (2018) MalDozer: automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59 Kim T, Kang B, Rho M, Sezer S, Im EG (2018) A multimodal deep learning method for android malware detection using various features. IEEE Trans Inf Forensics Secur 14(3):773–788 Li D, Zhao L, Cheng Q, Lu N, Shi W (2019) Opcode sequence analysis of Android malware by a convolutional neural network. Concurr Comput: Practice Exp 32:e5308 Mahdavifar S, Ghorbani AA (2020) DeNNeS: deep embedded neural network expert system for detecting cyber attacks. Neural Comput Appl 32:14753–14780 Martín A, Rodríguez-Fernández V, Camacho D (2018) CANDYMAN: classifying android malware families by modelling dynamic traces with Markov chains. Eng Appl Artif Intell 74:121–133 Martinelli F, Marulli F, Mercaldo F (2017) Evaluating convolutional neural network for effective mobile malware detection. Proced Comput Sci 112:2372–2381 Mercaldo F, Santone A (2020) Deep learning for image-based mobile malware detection. J Comput Virol Hacking Tech 16:157–171 Nauman M, Tanveer TA, Khan S, Syed TA (2018) Deep neural architectures for large scale android malware analysis. Clust Comput 21(1):569–588 Nguyen-Vu L, Ahn J, Jung S (2019) Android fragmentation in malware detection. Comput Secur 87:101573 Pei X, Yu L, Tian S (2020) AMalNet: a deep learning framework based on graph convolutional networks for malware detection. Comput Secur 93:101792 Pei X, Yu L, Tian S, Wang H, Peng Y (2020) Combining multi-features with a neural joint model for Android malware detection. J Intell Fuzzy Syst (Preprint) 38:2151–2163 Pektaş A, Acarman T (2020) Learning to detect Android malware via opcode sequences. Neurocomputing 396:599–608 Pektaş A, Acarman T (2020) Deep learning for effective Android malware detection using API call graph embeddings. Soft Comput 24(2):1027–1043 Saif D, El-Gokhy SM, Sallam E (2018) Deep belief networks-based framework for malware detection in android systems. Alex Eng J 57(4):4049–4057 Sharmeen S, Huda S, Abawajy J, Hassan MM (2020) An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches. Appl Soft Comput 89:106089 Shi-qi L, Bo N, Ping J, Sheng-wei T, Long Y, Rui-jin W (2019) Deep learning in Drebin: android malware image texture median filter analysis and detection. KSII Trans Internet Inform Syst (TIIS) 13(7):3654–3670 Su X, Shi W, Qu X, Zheng Y, Liu X (2020) DroidDeep: using Deep Belief Network to characterize and detect android malware. Soft Comput 24:6017–6030 Tang M, Qian Q (2018) Dynamic API call sequence visualisation for malware classification. IET Inf Secur 13(4):367–377 Vinayakumar R, Soman KP, Poornachandran P, Sachin Kumar S (2018) Detecting Android malware using long short-term memory (LSTM). J Intell Fuzzy Syst 34(3):1277–1288 Wang S, Chen Z, Yan Q, Ji K, Peng L, Yang B, Conti M (2020) Deep and broad URL feature mining for android malware detection. Inf Sci 513:600–613 Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 10(8):3035–3043 Xiao X, Wang Z, Li Q, Xia S, Jiang Y (2016) Back-propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences. IET Inf Secur 11(1):8–15 Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK (2019) Android malware detection based on system call sequences and LSTM. Multimed Tools Appl 78(4):3979–3999 Yen YS, Sun HM (2019) An android mutation malware detection based on deep learning using visualization of importance from codes. Microelectron Reliab 93:109–114 Yuan B, Wang J, Liu D, Guo W, Wu P, Bao X (2020) Byte-level malware classification based on markov images and deep learning. Comput Secur 92:101740 Yuan W, Jiang Y, Li H, Cai M (2019) A lightweight on-device detection method for android malware. IEEE transactions on systems, man, and cybernetics: systems Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123 Zhong W, Gu F (2019) A multi-level deep learning system for malware detection. Expert Syst Appl 133:151–162 Zhou Q, Feng F, Shen Z, Zhou R, Hsieh MY, Li KC (2019) A novel approach for mobile malware classification and detection in Android systems. Multimed Tools Appl 78(3):3529–3552