Discovering optimal features using static analysis and a genetic search based method for Android malware detection

Ahmad Firdaus1, Nor Badrul Anuar1, Ahmad Karim2, Mohd Faizal Ab Razak3
1Department of Computer System and Technology, University of Malaya, Kuala Lumpur, 50603, Malaysia
2Department of Information Technology, Bahauddin Zakariya University, Multan, Pakistan
3Faculty of Computer System & Software Engineering, University Malaysia Pahang, Gambang, 26300, Malaysia

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aafer Y, Du WL, Yin H, 2013. Droidapiminer: mining API-level features for robust malware detection in Android. Proc 9th Int ICST Conf on Security and Privacy in Communication Networks, p.86–103.

Adewole KS, Anuar NB, Kamsin A, et al., 2017. Malicious accounts: dark of the social networks. J Netw Comput Appl, 79:41–67. https://doi.org/10.1016/j.jnca.2016.11.030

Afifi F, Anuar NB, Shamshirband S, et al., 2016. Dyhap: dynamic hybrid ANFIS-PSO approach for predicting mobile malware. PLoS ONE, 11(9):e0162627. https://doi.org/10.1371/journal.pone.0162627

Android, 2015. App manifest. https://doi.org/developer.Android.com/guide/topics/manifest/manifest-intro.html [Accessed on Apr. 28, 2015].

Android Developers, 2015. Android security overview. Android. https://doi.org/source.Android.com/devices/tech/security/ [Accessed on Sept. 1, 2015].

Anuar NB, Sallehudin H, Gani A, et al., 2008. Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malays J Comput Sci, 21(2):101–115.

Anuar NB, Papadaki M, Furnell S, et al., 2013. Incident prioritisation using analytic hierarchy process (AHP): risk index model (RIM). Secur Commun Netw, 6(9):1087–1116. https://doi.org/10.1002/sec.673

Apvrille A, Strazzere T, 2012. Reducing the window of opportunity for Android malware gotta catch’ em all. J Comput Virol, 8(1-2):61–71. https://doi.org/10.1007/s11416-012-0162-3

Arp D, Spreitzenbarth M, Malte H, et al., 2014. Drebin: effective and explainable detection of Android malware in your pocket. Proc Symp on Network and Distributed System Security, p.1–15.

Arzt S, Rasthofer S, Fritz C, et al., 2014. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android Apps. Proc 35th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.259–269. https://doi.org/10.1145/2666356.2594299

Aung Z, Zaw W, 2013. Permission-based Android malware detection. Int J Sci Technol Res, 2(3):228–234.

Bartel A, Klein J, Le Traon Y, et al., 2012. Automatically securing permission-based software by reducing the attack surface: an application to Android. Proc 27th IEEE/ACM Int Conf on Automated Software Engineering, p.274–277. https://doi.org/10.1145/2351676.2351722

Bird S, Klein E, Loper E, 2009. Natural language processing with Python—analyzing text with the natural language toolkit. O’Reilly Media.

Burguera I, Zurutuza U, Nadjm-Tehrani S, 2011. Crowdroid: behavior-based malware detection system for Android. Proc 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, p.15–26. https://doi.org/10.1145/2046614.2046619

Caruana R, Karampatziakis N, Yessenalina A, 2008. An empirical evaluation of supervised learning in high dimensions. Proc 25th Int Conf on Machine Learning, p.96–103. https://doi.org/10.1145/1390156.1390169

Chan PPK, Song WK, 2014. Static detection of Android malware by using permissions and API calls. Proc Int Conf on Machine Learning and Cybernetics, p.82–87. https://doi.org/10.1109/ICMLC.2014.7009096

Chang TK, Hwang GH, 2007. The design and implementation of an application program interface for securing XML documents. J Syst Softw, 80(8):1362–1374. https://doi.org/10.1016/j.jss.2006.10.051

Chess B, McGraw G, 2004. Static analysis for security. IEEE Secur Priv, 2(6):76–79. https://doi.org/10.1109/MSP.2004.111

Deshotels L, Notani V, Lakhotia A, 2014. Droidlegacy: automated familial classification of Android malware. Proc ACM SIGPLAN on Program Protection and Reverse Engineering Workshop, Article 3. https://doi.org/10.1145/2556464.2556467

Desnos A, 2015. Androguard. https://doi.org/github.com/androguard/androguard [Accessed on June 29, 2015].

Díaz-Uriarte R, de Andrés SA, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinform, 7:3. https://doi.org/10.1186/1471-2105-7-3

eBay, 2016. Online shopping. www.ebay.com [Accessed on Apr. 4, 2016].

Faruki P, Ganmoor V, Laxmi V, et al., 2013. AndroSimilar: robust statistical feature signature for Android malware detection. Proc 6th Int Conf on Security of Information and Networks, p.152–159. https://doi.org/10.1145/2523514.2523539

Feizollah A, Anuar NB, Salleh R, et al., 2013a. A study of machine learning classifiers for anomaly-based mobile botnet detection. Malays J Comput Sci, 26(4):251–265.

Feizollah A, Shamshirband S, Anuar NB, et al., 2013b. Anomaly detection using cooperative fuzzy logic controller. Proc 16th FIRA RoboWorld Congress, p.220–231. https://doi.org/10.1007/978-3-642-40409-2_19

Feizollah A, Anuar NB, Salleh R, et al., 2015. A review on feature selection in mobile malware detection. Dig Invest, 13:22–37. https://doi.org/10.1016/j.diin.2015.02.001

Feizollah A, Anuar NB, Salleh R, et al., 2017. Androdialysis: analysis of Android intent effectiveness in malware detection. Comput Secur, 65:121–134. https://doi.org/10.1016/j.cose.2016.11.007

Feng Y, Anand S, Dillig I, et al., 2014. Apposcopy: semantics-based detection of Android malware through static analysis. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.576–587. https://doi.org/10.1145/2635868.2635869

Firdaus A, Anuar NB, 2015. Root-exploit malware detection using static analysis and machine learning. Proc 4th Int Conf on Computer Science and Computational Mathematics, p.177–183.

Frank E, Hall MA, Witten IH, 2016. The WEKA Workbench (4th Ed.). Morgan Kaufmann. https://doi.org/www.cs.waikato.ac.nz/ml/WEKA/Witten_et_al_2016_appendix.pdf

Fröhlich H, Chapelle O, Schölkopf B, 2003. Feature selection for support vector machines by means of genetic algorithm. Proc 15th IEEE Int Conf on Tools with Artificial Intelligence, p.142–148. https://doi.org/10.1109/TAI.2003.1250182

Gascon H, Yamaguchi F, Arp D, et al., 2013. Structural detection of Android malware using embedded call graphs. Proc ACM Workshop on Artificial Intelligence and Security, p.45–54. https://doi.org/10.1145/2517312.2517315

Goldberg DE, Holland JH, 1988. Genetic algorithms and machine learning. Mach Learn, 3(2-3):95–99. https://doi.org/10.1023/A:1022602019183

Google, 2014. Google play store. https://doi.org/play.google.com/store?hl=en [Accessed on Jan. 1, 2014].

Gordon MI, Kim D, Perkins J, et al., 2015. Information-flow analysis of Android applications in droidSafe. Proc Network and Distributed System Security Symp, p.8–11.

Grace M, Zhou YJ, Wang Z, et al., 2012a. Systematic detection of capability leaks in stock Android smartphones. Proc 19th Network and Distributed System Security Symp, p.1–15.

Grace M, Zhou W, Jiang XX,et al., 2012b. Unsafe exposure analysis of mobile in-app advertisements. Proc 5th ACM Conf on Security and Privacy in Wireless and Mobile Networks, p.101–112. https://doi.org/10.1145/2185448.2185464

Grace M, Zhou YJ, Zhang Q, et al., 2012c. RiskRanker: scalable and accurate zero-day Android malware detection. Proc 10th Int Conf on Mobile Systems, Applications, and Services, p.281–294. https://doi.org/10.1145/2307636.2307663

Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10–18. https://doi.org/10.1145/1656274.1656278

Huang CY, Tsai YT, Hsu CH, 2013. Performance evaluation on permission-based detection for Android malware. Proc Int Computer Symp, p.111–120. https://doi.org/10.1007/978-3-642-35473-1_12

Huang JJ, Zhang XY, Tan L, et al., 2014. AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. Proc 36th Int Conf on Software Engineering, p.1036–1046. https://doi.org/10.1145/2568225.2568301

Ikinci A, Holz T, Freiling F, 2008. Monkey-spider: detecting malicious websites with low-interaction honeyclients. Proc Sicherheit-Schutz und Zuverlässigkeit, p.407–421.

Junaid M, Liu DG, Kung D, 2016. Dexteroid: detecting malicious behaviors in Android apps using reverse- engineered life cycle models. Comput Secur, 59:92–117. https://doi.org/10.1016/j.cose.2016.01.008

Kang H, Jang JW, Mohaisen A, et al., 2015. Detecting and classifying Android malware using static analysis along with creator information. Int J Distr Sens Netw, 11(6), Article 7. https://doi.org/10.1155/2015/479174

Karim A, Salleh RB, Shiraz M, et al., 2014. Botnet detection techniques: review, future trends, and issues. J Zhejiang Univ Sci-C (Comput & Elcetron), 15(11):943–983. https://doi.org/10.1631/jzus.C1300242

Karim A, Salleh R, Khan MK, 2016. Smartbot: a behavioral analysis framework augmented with machine learning to identify mobile botnet applications. PLoS ONE, 11(3):e0150077. https://doi.org/10.1371/journal.pone.0150077

Khatavakhotan AS, Ow SH, 2015. Development of a software risk management model using unique features of a proposed audit component. Malays J Comput Sci, 28(2):110–131.

Komili O, 2015. Sophos detects 100% of Android malware in independent test—for the sixth time in a row. https://doi.org/blogs.sophos.com/2015/08/14/sophos-detects-100-of-Android-malware-in-independent-test-for-the-sixth-time-in-a-row/ [Accessed on Jan. 1, 2016].

Kotsiantis SB, 2013. Decision trees: a recent overview. Artif Intell Rev, 39(4):261–283. https://doi.org/10.1007/s10462-011-9272-4

Kotsiantis SB, Zaharakis ID, Pintelas PE, 2006. Machine learning: a review of classification and combining techniques. Artif Intell Rev, 26(3):159–190. https://doi.org/10.1007/s10462-007-9052-3

La Delfa GC, Monteleone S, Catania V, et al., 2016. Performance analysis of visualmarkers for indoor navigation systems. Front Inform Technol Electron Eng, 17(8):730–740. https://doi.org/10.1631/FITEE.1500324

Lai HJ, Tang Y, Luo HX,et al., 2011. Greedy feature selection for ranking. Proc 15th Int Conf on Computer Supported Cooperative Work in Design, p.42–46. https://doi.org/10.1109/CSCWD.2011.5960053

Lee J, Lee S, Lee H, 2015. Screening smartphone applications using malware family signatures. Comput Secur, 52:234–249. https://doi.org/10.1016/j.cose.2015.02.003

Lee SH, Jin SH, 2013. Warning system for detecting malicious applications on Android system. Int J Comput Commun Eng, 2(3):324–327. https://doi.org/10.7763/IJCCE.2013.V2.197

Liang SY, Keep AW, Might M, et al., 2013. Sound and precise malware analysis for Android via pushdown reachability and entry-point saturation. Proc 3th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, p.21–32. https://doi.org/10.1145/2516760.2516769

Lippmann R, 1987. An introduction to computing with neural nets. IEEE ASSP Mag, 4(2):4–22. https://doi.org/10.1109/MASSP.1987.1165576

Lu L, Li ZC, Wu ZY,et al., 2012. CHEX: statically vetting Android apps for component hijacking vulnerabilities. Proc ACM Conf on Computer and Communications Security, p.229–240. https://doi.org/10.1145/2382196.2382223

Middlemiss MJ, Dick G, 2003. Weighted feature extraction using a genetic algorithm for intrusion detection. Proc Congress on Evolutionary Computation, p.1669–1675. https://doi.org/10.1109/CEC.2003.1299873

Narudin FA, Feizollah A, Anuar NB,et al., 2016. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput, 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6

Peiravian N, Zhu XQ, 2013. Machine learning for Android malware detection using permission and API calls. Proc 25th Int Conf on Tools with Artificial Intelligence, p.300–305. https://doi.org/10.1109/ICTAI.2013.53

Peng H, Gates C, Sarma B, et al., 2012. Using probabilistic generative models for ranking risks of Android apps. Proc ACM Conf on Computer and Communications Security, p.241–252. https://doi.org/10.1145/2382196.2382224

Punch WFIII, Goodman ED, Pei M, et al., 1993. Further research on feature selection and classification using genetic algorithms. Proc 5th Int Conf on Genetic Algorithms, p.557–564.

Rasthofer S, Arzt S, Bodden E, 2014. A machine-learning approach for classifying and categorizing Android sources and sinks. Proc Network and Distributed System Security Symp, p.1–15.

Razak MFA, Anuar NB, Salleh R, et al., 2016. The rise of “malware”: bibliometric analysis of malware study. J Netw Comput Appl, 75:58–76. https://doi.org/10.1016/j.jnca.2016.08.022

Russon MA, 2016. Android malware discovered on Google Play has infected millions of users with spyware. https://doi.org/www.ibtimes.co.uk/Android-malware-discovered-google-play-store-1553341 [Accessed on June 13, 2016].

Sahs J, Khan L, 2012. A machine learning approach to Android malware detection. Proc European Intelligence and Security Informatics Conf, p.141–147. https://doi.org/10.1109/EISIC.2012.34

Samra AAA, Yim K, Ghanem OA, 2013. Analysis of clustering technique in Android malware detection. Proc 7th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing, p.729–733. https://doi.org/10.1109/IMIS.2013.111

Sanz B, Santos I, Laorden C, et al., 2013a. PUMA: permission usage to detect malware in Android. Int Joint Conf CISIS’12-ICEUTE’12-SOCO’12 Special Sessions. Springer Berlin Heidelberg, p.289–298.

Sanz B, Santos I, Laorden C, et al., 2013b. Mama: manifest analysis for malware detection in Android. Cybern Syst, 44(6-7):469–488. https://doi.org/10.1080/01969722.2013.803889

Sarip AG, Hafez MB, Daud MN, 2016. Application of fuzzy regression model for real estate price prediction. Malays J Comput Sci, 29(1):15–27. https://doi.org/10.22452/mjcs.vol29no1.2

Sarma BP, Li NH, Gates C, et al., 2012. Android permissions: a perspective combining risks and benefits. Proc 17th ACM Symp on Access Control Models and Technologies, p.13–22. https://doi.org/10.1145/2295136.2295141

Schmidt AD, Bye R, Schmidt HG, et al., 2009a. Static analysis of executables for collaborative malware detection on Android. Proc IEEE Int Conf on Communications, p.1–5. https://doi.org/10.1109/ICC.2009.5199486

Schmidt AD, Schmidt HG, Batyuk L, et al., 2009b. Smartphone malware evolution revisited: Android next target? Proc 4th Int Conf on Malicious and Unwanted Software, p.1–7. https://doi.org/10.1109/MALWARE.2009.5403026

Schneider J, 2016. Cross validation. https://doi.org/www.cs.cmu.edu/~schneide/tut5/node42.html [Accessed on Aug. 1, 2016].

Seo SH, Gupta A, Mohamed Sallam A, et al., 2014. Detecting mobile malware threats to homeland security through static analysis. J Netw Comput Appl, 38:43–53. https://doi.org/10.1016/j.jnca.2013.05.008

Shabtai A, Fledel Y, Elovici Y, 2010. Automated static code analysis for classifying Android applications using machine learning. Proc Int Conf on Computational Intelligence and Security, p.329–333. https://doi.org/10.1109/CIS.2010.77

Shabtai A, Kanonov U, Elovici Y, et al., 2012. “Andromaly”: a behavioral malware detection framework for Android devices. J Intell Inform Syst, 38(1):161–190. https://doi.org/10.1007/s10844-010-0148-x

Sharif M, Yegneswaran V, Saidi H, et al., 2008. Eureka: a framework for enabling static malware analysis. Proc 13th Symp on Research in Computer Security, p.481–500. https://doi.org/10.1007/978-3-540-88313-5_31

Sheen S, Anitha R, Natarajan V, 2015. Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151:905–912. https://doi.org/10.1016/j.neucom.2014.10.004

Skylot, 2015. Jadx. https://doi.org/github.com/skylot/jadx

Stein G, Chen B, Wu AS, et al., 2005. Decision tree classifier for network intrusion detection with GA-based feature selection. Proc 43rd Annual Southeast Regional Conf, p.136–141. https://doi.org/10.1145/1167253.1167288

Suarez-Tangil G, Tapiador JE, Peris-Lopez P, et al., 2014. Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 41(4):1104–1117. https://doi.org/10.1016/j.eswa.2013.07.106

Talha KA, Alper DI, Aydin C, 2015. Apk auditor: permission-based Android malware detection system. Dig Invest, 13:1–14. https://doi.org/10.1016/j.diin.2015.01.001

Thomas P, 2015. Google’s Android operating system dominates the smartphone market. https://doi.org/finance.yahoo.com/news/google-Android-operating-system-dominates-170640913.html [Accessed on June 11, 2016].

Tropp JA, 2004. Greed is good: algorithmic results for sparse approximation. IEEE Trans Inform Theory, 50(10): 2231–2242. https://doi.org/10.1109/TIT.2004.834793

Walenstein A, Deshotels L, Lakhotia A, 2012. Program structure-based feature selection for Android malware analysis. Proc 4th Int Conf on Security and Privacy in Mobile Information and Communication Systems, p.51–52. https://doi.org/10.1007/978-3-642-33392-7_5

Williams G, 2010. ARFF data. https://doi.org/datamining.togaware.com/survivor/ARFF_Data0.html [Accessed on Sept. 10, 2015].

Wu DJ, Mao CH, Wei TE, et al., 2012. Droidmat: Android malware detection through manifest and API calls tracing. Proc 7th Asia Joint Conf on Information Security, p.62–69. https://doi.org/10.1109/AsiaJCIS.2012.18

Yang ZM, Yang M, 2012. LeakMiner: detect information leakage on Android with static taint analysis. Proc 3rd World Congress on Software Engineering, p.101–104. https://doi.org/10.1109/WCSE.2012.26

Yerima SY, Sezer S, McWilliams G, et al., 2013. A new Android malware detection approach using Bayesian classification. Proc IEEE 27th Int Conf on Advanced Information Networking and Applications, p.121–128. https://doi.org/10.1109/AINA.2013.88

Yerima SY, Sezer S, McWilliams G, 2014a. Analysis of Bayesian classification-based approaches for Android malware detection. IET Inform Secur, 8(1):25–36. https://doi.org/10.1049/iet-ifs.2013.0095

Yerima SY, Sezer S, Muttik I, 2014b. Android malware detection using parallel machine learning classifiers. Proc 8th Int Conf on Next Generation Mobile Apps, Services and Technologies, p.37–42. https://doi.org/10.1109/NGMAST.2014.23

Yerima SY, Sezer S, Muttik I, 2015. High accuracy Android malware detection using ensemble learning. IET Inform Secur, 9(6):313–320. https://doi.org/10.1049/iet-ifs.2014.0099

Yu L, Pan ZL, Liu JJ, et al., 2013. Android malware detection technology based on improved Bayesian classification. Proc 23rd Int Conf on Instrumentation, Measurement, Computer, Communication and Control, p.1338–1341. https://doi.org/10.1109/IMCCC.2013.297

Zhang LS, Niu Y, Wu X, et al., 2013. A3: automatic analysis of Android malware. Proc 1st Int Workshop on Cloud Computing and Information Security, p.89–93. https://doi.org/10.2991/ccis-13.2013.22

Zhang T, 2009. On the consistency of feature selection using greedy least squares regression. J Mach Learn Res, 10:555–568.

Zhou W, Zhou YJ, Jiang XX,et al., 2012. Detecting repackaged smartphone applications in third-party Android marketplaces. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.317–326. https://doi.org/10.1145/2133601.2133640

Zhou W, Zhou YJ, Grace M, et al., 2013. Fast, scalable detection of “Piggybacked” mobile applications. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.185–196. https://doi.org/10.1145/2435349.2435377

Zia T, Akhter MP, Abbas Q, 2015. Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci, 28(2):93–109.