Active learning approach to label network traffic datasets

Journal of Information Security and Applications - Tập 49 - Trang 102388 - 2019
Jorge L. Guerra Torres1, Carlos A. Catania2, Eduardo Veas3
1Institute for Information Technology and Communications, National University of Cuyo, Mendoza, Argentina
2LABSIN, School of Engineering, National University of Cuyo, Mendoza, Argentina
3Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria

Tài liệu tham khảo

Catania, 2012, Automatic network intrusion detection: current techniques and open issues, Comput Electr Eng, 7, 1063 Bhuyan, 2015, Towards generating real-life datasets for network intrusion detection, Int J Netw Secur, 17, 683 Sommer, 2010, Outside the closed world: on using machine learning for network intrusion detection, 305 Sebastian G. Stratosphere research laboratorys. 2015. https://stratosphereips.org/, [Online; accessed Jun-2018]. University of California I. Knowledge discovery in databases DARPA archive. 1999. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html/ [Online; accessed September-2016]. DEFCON Hacking Conference - capture the flag archive. 2011. https://www.defcon.org/html/links/dc-ctf.html, [Online; accessed April-2018]. Center for applied internet data analysis. 1997. University of California, San Diego, http://www.caida.org/ [Online; accessed April-2019]. Mukkavilli S.K., Shetty S., Hong L. Generation of Labelled Datasets to Quantify the Impact of Security Threats to Cloud Data Centers 2016; (April): 172–184. http://www.scirp.org/journal/PaperInformation.aspx?paperID=65482. doi:10.4236/jis.2016.73013. Görnitz N., Kloft M., Rieck K., Brefeld U.. Active learning for network intrusion detection 2009. doi:10.1145/1654988.1655002. Aparicio-Navarro, 2014, Automatic dataset labelling and feature selection for intrusion detection systems, Proceedings the IEEE military communications conference MILCOM, 46 Beaugnon, 2017, ILAB: an interactive labelling strategy for intrusion detection, 120, 10.1007/978-3-319-66332-6_6 Soule, 2008, Webclass: adding rigor to manual labeling of traffic anomalies, Comput Commun Rev, 38, 35, 10.1145/1341431.1341437 Pius Owoh, 2018, Automatic annotation of unlabeled data from smartphone-based motion and location sensors, Sensors (Switzerland), 18, 10.3390/s18072134 Lemay, 2016, Providing SCADA network data sets for intrusion detection research Sperotto, 2009, A labeled data set for flow-based intrusion detection, 5843 LNCS, 39 Pelleg, 2004, Active learning for anomaly and rare-category detection, Adv Neural Inf Process Syst, 18, 1073 Guerra, 2017, Visual exploration of network hostile behavior, 51 Shneiderman, 2003, The eyes have it: A Task by data type taxonomy for information visualizations, Craft Inf Vis, 364 Kodinariya, 2013, Review on determining number of cluster in K-Means clustering, Int J Adv Res Comput Sci Manag Stud, 1, 90 Malware capture facility project. 2013. Czech Technical University, https://mcfp.weebly.com/ [Online; accessed May-2019]. Lewis, 1994, A sequential algorithm for training text classifiers, 3 Staheli, 2014, Visualization evaluation for cyber security, 49 Garcia, 2014 The CTU-13 dataset. 2011. Stratosphere Project, https://www.stratosphereips.org/datasets-ctu13/ [Online; accessed Jun-2018]. The CTU-19 dataset, botnet kelihos tdptu02.exe. 2013a. https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-3/ [Online; accessed Jun-2018]. The CTU-19 Dataset, Botnet 39UvZmv.exe. 2013b. Stratosphere Project, https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-1/ [Online; accessed Jun-2018]. The CTU-19 Dataset, Normal Datasets. 2013c. Stratosphere Project, https://www.stratosphereips.org/datasets-normal/ [Online; accessed Jun-2018]. Sáez, 2016, Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure, Neurocomputing, 176, 26, 10.1016/j.neucom.2014.11.086 Ruiz-Gazeb, 2007, Storms prediction: Logistic regression vs random forest for unbalanced data, Case Stud Bus Ind Gov Stat, 1, 91 Liu, 2013, Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: application to the recognition of orange beverage and chinese vinegar, Sens Actuators B Chem, 177, 970, 10.1016/j.snb.2012.11.071 Breiman, 2001, Random forests, Mach Learn, 45, 5, 10.1023/A:1010933404324 Kuncheva, 2014, 10.1002/9781118914564 Avazpour I., Pitakrat T., Grunske L., Grundy J. Recommendation systems in software engineering 2014. doi:10.1007/978-3-642-45135-5. Collins, 2002, Logistic regression, AdaBoost and Bregman distances, Mach Learn, 48, 253, 10.1023/A:1013912006537