A survey of cloud-based network intrusion detection analysis

Nathan Keegan1, Soo-Yeon Ji2, Aastha Chaudhary1, Claude Concolato1, Byunggu Yu1, Dong Hyun Jeong1
1Department of Computer Science and Information Technology, University of the District of Columbia, 4200 Connecticut Avenue NW, Washington, DC, 20008, USA
2Department of Computer Science, Bowie State University, 14000 Jericho Park Road, Bowie, MD 20715, USA

Tóm tắt

Abstract

As network traffic grows and attacks become more prevalent and complex, we must find creative new ways to enhance intrusion detection systems (IDSes). Recently, researchers have begun to harness both machine learning and cloud computing technology to better identify threats and speed up computation times. This paper explores current research at the intersection of these two fields by examining cloud-based network intrusion detection approaches that utilize machine learning algorithms (MLAs). Specifically, we consider clustering and classification MLAs, their applicability to modern intrusion detection, and feature selection algorithms, in order to underline prominent implementations from recent research. We offer a current overview of this growing body of research, highlighting successes, challenges, and future directions for MLA-usage in cloud-based network intrusion detection approaches.

Từ khóa


Tài liệu tham khảo

Kemmerer RA, Vigna G (2002) Intrusion detection: a brief history and overview. Computer 35(4):27–30. doi:10.1109/mc.2002.1012428

Kind A, Stoecklin MP, Dimitropoulos X (2009) Histogram-based traffic anomaly detection. IEEE Trans Netw Serv Manag 6(2):110–121. doi:10.1109/TNSM.2009.090604

Fontugne R, Mazel J, Fukuda K (2014) Hashdoop: a mapreduce framework for network anomaly detection. In: 2014 IEEE conference on computer communications workshops (INFOCOM WKSHPS). pp 494–499. doi:10.1109/INFCOMW.2014.6849281

Francois J, Wang S, Bronzi W, State R, Engel T (2011) Botcloud: detecting botnets using mapreduce. In: 2011 IEEE international workshop on Information Forensics and Security (WIFS). pp 1–6. doi:10.1109/WIFS.2011.6123125

Kumar M, Hanumanthappa M (2013) Scalable intrusion detection systems log analysis using cloud computing infrastructure. In: 2013 IEEE international conference on computational intelligence and computing research (ICCIC). pp 1–4. doi:10.1109/ICCIC.2013.6724158

Lee Y, Lee Y (2011) Detecting ddos attacks with hadoop. In: Proceedings of The ACM CoNEXT Student Workshop, CoNEXT ’11 Student. ACM, New York, pp 7–172. doi:10.1145/2079327.2079334

Tripathi S, Gupta B, Veluru S (2013) Hadoop based defense solution to handle distributed denial of service (ddos) attacks. J Inform Secur 4(3):150–164. doi:10.4236/jis.2013.43018

Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Jaatun M, Zhao G, Rong C (eds) Cloud Computing, Lecture Notes in Computer Science. Springer, Berlin, pp 674–679. doi:10.1007/978-3-642-10665-1_71

Apache mahout: scalable machine learning and data mining. https://mahout.apache.org/. Accessed 03 Sept 2014

Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) Systemml: declarative machine learning on mapreduce. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11. IEEE Computer Society, Washington, DC, pp 231–242. doi:10.1109/ICDE.2011.5767930

Ghoting A, Kambadur P, Pednault E, Kannan R (2011) Nimble: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’11. ACM, New York, pp 334–342. doi:10.1145/2020408.2020464 http://doi.acm.org/10.1145/2020408.2020464

Kraska T, Talwalkar A, Duchi JC, Griffith R, Franklin MJ, Jordan MI (2013) Mlbase: a distributed machine-learning system. In: 6th biennial conference on innovative data systems reserch (CIDR). http://cidrdb.org/cidr2013/program.html. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper118.pdf

Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727. doi:10.14778/2212351.2212354

Crotty Andrew AG, Kraska T (2014) Distributed machine learning on small clusters. IEEE Data Eng Bull 37(3):63–76

Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. SIGMETRICS Perform Eval Rev 41(4):70–73. doi:10.1145/2627534.2627557

Hu B, Shen Y (2012) Machine learning based network traffic classification: a survey. J Inform Comput Sci 9(11):3161–3170

Yingqiu L, Wei L, Yunchun L (2007) Network traffic classification using k-means clustering. In: Second international multi-symposiums on computer and computational sciences, 2007. IMSCCS 2007, pp 360–365. doi:10.1109/IMSCCS.2007.52

Sommer R, Paxson V (2010) Outside the closed world: on using machine learning for network intrusion detection. In: 2010 IEEE symposium on security and privacy (SP), pp 305–316. IEEE, New York. doi:10.1109/sp.2010.25

Esteves RM, Pais R, Rong C (2011) K-means clustering in the cloud—a mahout test. In: Proceedings of the 2011 IEEE workshops of international conference on advanced information networking and applications, WAINA ’11. IEEE Computer Society, Washington, DC, pp 514–519. doi:10.1109/WAINA.2011.136

Chen Z, Han F, Cao J, Jiang X, Chen S (2013) Cloud computing-based forensic analysis for collaborative network security management system. Tsinghua Sci Technol 18(1):40–50. doi:10.1109/TST.2013.6449406

Lee Y, Kang W, Son H (2010) An internet traffic analysis method with mapreduce. In: 2010 IEEE/IFIP network operations and management symposium workshops (NOMS Wksps). pp 357–361. doi:10.1109/NOMSW.2010.5486551

Marnerides A, Watson MR, Shirazi N, Mauthe A, Hutchison D (2013) Malware analysis in cloud computing: network and system characteristics. In: 2013 IEEE globecom workshops (GC Wkshps), pp 482–487. doi:10.1109/GLOCOMW.2013.6825034

Scarfone K, Scarfone K, Cybersecurity S, Mell P, Blank RM (2007) Secretary A: guide to intrusion detection and prevention systems (IDPS)

Debar H, Dacier M, Wespi A (1999) Towards a taxonomy of intrusion-detection systems. Comput Netw 31(8):805–822

Patcha A, Park JM (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput Netw 51(12):3448–3470. doi:10.1016/j.comnet.2007.02.001

Halme LR, Bauer RK (1995) Aint misbehaving: a taxonomy of anti-intrusion techniques. In: Proceedings of the 18th national information systems security conference

Cannady96 J, Harrel J (1996) A comparative analysis of current intrusion detection technologies. In: Technology in information security conference (TISC), pp 212–218

Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. Commun Surveys Tutor 10(4):56–76. doi:10.1109/SURV.2008.080406

McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Passive and active network measurement. Springer, Berlin, pp 205–214

Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K (2006) Traffic classification on the fly. ACM SIGCOMM Comput Commun Rev 36(2):23–26

Gupta P, McKeown N (2001) Algorithms for packet classification. IEEE Netw 15(2):24–32

Qi Y, Xu L, Yang B, Xue Y, Li J (2009) Packet classification algorithms: from theory to practice. In: INFOCOM 2009. IEEE, pp 648–656. doi:10.1109/INFCOM.2009.5061972

Erman J, Mahanti A, Arlitt M (2006) Internet traffic identification using machine learning. In: Global telecommunications conference, 2006, GLOBECOM ’06. IEEE, pp 1–6. doi:10.1109/GLOCOM.2006.443

Li K, Gibson C, Ho D, Zhou Q, Kim J, Buhisi O, Brown DE, Gerber M (2013) Assessment of machine learning algorithms in cloud computing frameworks. In: 2013 IEEE systems and information engineering design symposium (SIEDS), pp 98–103. doi:10.1109/SIEDS.2013.6549501

Singh K, Agrawal S (2011) Performance evaluation of five machine learning algorithms and three feature selection algorithms for ip traffic classification. IJCA Special Issue on Evolution in Networks and Computer Communications (1):25–32. http://www.ijcaonline.org/specialissues/encc/number1/3716-encc005

Stevanovic M, Pedersen JM (2014) An efficient flow-based botnet detection using supervised machine learning. In: 2014 international conference on computing, networking and communications (ICNC). pp 797–801. doi:10.1109/ICCNC.2014.6785439

Xia T, Qu G, Hariri S, Yousif M () An efficient network intrusion detection method based on information theory and genetic algorithm. In: 24th IEEE international performance, computing, and communications conference, 2005. IPCCC 2005, pp 11–17. doi:10.1109/PCCC.2005.1460505

Wang Y (2005) A multinomial logistic regression modeling approach for anomaly intrusion detection. Comput Secur 24(8):662–674. doi:10.1016/j.cose.2005.05.003

Cannady J (1998) Artificial neural networks for misuse detection. In: National information systems security conference, pp 443–456

Amor NB, Benferhat S, Elouedi Z (2004) Naive bayes vs decision trees in intrusion detection systems. In: Proceedings of the 2004 ACM symposium on applied computing, SAC ’04. ACM, New York, pp 420–424. doi:10.1145/967900.967989

Albayati M, Issac B (2015) Analysis of intelligent classifiers and enhancing the detection accuracy for intrusion detection system. Int J Comput Intel Syst 8(5):841–853. doi:10.1080/18756891.2015.1084705

Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521. doi:10.1007/s00778-006-0002-5

Mulay SA, Devale PR, Garje GV (2010) Intrusion detection system using support vector machine and decision tree. Int J Comput Appl 3(3):40–43

Yao J, Zhao S, Fan L (2006) An enhanced support vector machine model for intrusion detection. Proceedings of the first international conference on rough sets and knowledge technology., RSKT’06. Springer, Berlin, pp 538–543

Ji S-Y, Jeong B-K, Choi S, Jeong DH (2016) A multi-level intrusion detection method for abnormal network behaviors. J Netw Comput Appl 62:9–17. doi:10.1016/j.jnca.2015.12.004

Kausar N, Belhaouari Samir B, Abdullah A, Ahmad I, Hussain M (2011) A review of classification approaches using support vector machine in intrusion detection. In: Abd Manaf A, Sahibuddin S, Ahmad R, Mohd Daud S, El-Qawasmeh E (eds) Proceedings, part III, informatics engineering and information science: international conference, ICIEIS 2011, Kuala Lumpur, Malaysia, November 14-16, 2011. Springer, Berlin, pp 24–34. doi:10.1007/978-3-642-25462-8

Majeed PG, Kumar S (2014) Genetic algorithms in intrusion detection systems: a survey. Int J Innov Appl Stud 5(3):233–240

Pawar SN, Bichkar RS (2015) Genetic algorithm with variable length chromosomes for network intrusion detection. Int J Autom Comput 12(3):337–342. doi:10.1007/s11633-014-0870-x

Sakr S, Liu A, Batista DM, Alomari M (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor 13(3):311–336

Muthurajkumar S, Kulothungan K, Vijayalakshmi M, Jaisankar N, Kannan A (2013) A rough set based feature selection algorithm for effective intrusion detection in cloud model. In: Proceedings of the international conference on advances in communication, network, and computing, pp 8–13

Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co., Greenwich

Meng X, Bradley JK, Yavuz B, Sparks ER, Venkataraman S, Liu D, Freeman J, Tsai DB, Amde M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A (2015) Mllib: machine learning in apache spark. JMLR 17(34):1–7

Boehm M, Evfimievski AV, Pansare N, Reinwald B (2016) Declarative machine learning—a classification of basic properties and types. CoRR abs/1605.05826

Li B, Springer J, Bebis G, Hadi Gunes M (2013) A survey of network flow applications. J Netw Comput Appl 36(2):567–581. doi:10.1016/j.jnca.2012.12.020

Stein G, Chen B, Wu AS, Hua KA (2005) Decision tree classifier for network intrusion detection with GA-based feature selection. In: ACM-SE 43: Proceedings of the 43rd annual southeast regional conference. ACM, New York, pp 136–141. doi:10.1145/1167253.1167288

Chen T, Zhang X, Jin S, Kim O (2014) Efficient classification using parallel and scalable compressed model and its application on intrusion detection. Expert Syst Appl 41(13):5972–5983

Shu X, Smiy J, Yao DD, Lin H (2013) Massive distributed and parallel log analysis for organizational security. In: 2013 IEEE Globecom workshops (GC Wkshps), pp 194–199. doi:10.1109/GLOCOMW.2013.6824985

Zhai Y, Ong YS, Tsang IW (2014) The emerging “big dimensionality”. IEEE Comput Intel Mag 9(3):14–26. doi:10.1109/MCI.2014.2326099

Hyunjoo Kim IK, Jonghyun K, Chung Tm (2015) Behavior-based anomaly detection on big data. In: Proceedings of 13th australian information security management conference, pp 73–80

Amy Xuyang Tan MK, Li Liu V, Thuraisingham B (2010) A comparison of approaches for large-scale data mining. Technical Report UTDSC-24-10, University of Texas at Dallas, Department of Computer Science

Aljarah I, Ludwig SA (2013) Mapreduce intrusion detection system based on a particle swarm optimization clustering algorithm. In: 2013 IEEE congress on evolutionary computation, pp 955–962. doi:10.1109/CEC.2013.6557670

del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inform Sci 285:112–137. doi:10.1016/j.ins.2014.03.043

Vieira K, Schulter A, Westphall C, Westphall C (2009) Intrusion detection for grid and cloud computing. IT Prof Mag 4:38–43

Singh K, Guntuku SC, Thakur A, Hota C (2014) Big data analytics framework for peer-to-peer botnet detection using random forests. Inform Sci 278:488–497. doi:10.1016/j.ins.2014.03.066

Ji SY, Choi S, Jeong D (2014) Designing an internet traffic predictive model by applying a signal processing method. J Netw Syst Manag 26:1–18. doi:10.1007/s10922-014-9335-3

Bhat AH, Patra S, Jena D (2013) Machine learning approach for intrusion detection on cloud virtual machines. Int J Appl Innov Eng Manag (IJAIEM) 2(6):56–66

Wang H, Ding W, Xia Z (2012) A cloud-pattern based network traffic analysis platform for passive measurement. In: 2012 international conference on, cloud and service computing (CSC), pp 1–7. doi:10.1109/CSC.2012.8