Faulty use of the CIC-IDS 2017 dataset in information security research

Rohit Dube1
1Networking and Advanced Security Business Group, Palo Alto, USA

Tóm tắt

The summarized traffic flow version of the Canadian Institute for Cybersecurity Intrusion Detection Evaluation dataset created at the University of New Brunswick in 2017 is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data.

Tài liệu tham khảo

Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: 4th International Conference on Information Systems Security and Privacy (2018). https://www.scitepress.org/Papers/2018/66398/66398.pdf https://scholar.google.com/. Citation counts retrieved on October 21, (2023) Yulianto, A., Sukarno, P., Suwastika, N.A.: Improving adaboost-based intrusion detection system (IDS) performance on CIC IDS 2017 dataset. J. Phys. (2019). https://doi.org/10.1088/1742-6596/1192/1/012018 Vinayakumar, R., et al.: Deep learning approach for intelligent intrusion detection system. IEEE Access (2019). https://ieeexplore.ieee.org/abstract/document/8681044/ Global correlation on CISCO IPS sensors (2012). https://blogs.cisco.com/cin/ips-in-the-data-center-workshop Performance of cisco ips 4500 and 4300 series sensors (2012). https://blogs.cisco.com/cin/ips-in-the-data-center-workshop Distributed intrusion detection/prevention system: Solution overview (2023). https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/docs/vmware-nsx-distributed-ids-ips-solution-overview.pdf https://www.snort.org/. Retrieved on October 21 (2023) https://suricata.io/. Retrieved on October 21 (2023) Panigrahi, R., Borah, S.: A detailed analysis of cicids2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. (2018). https://www.researchgate.net/publication/329045441_A_detailed_analysis_of_CICIDS2017_dataset_for_designing_Intrusion_Detection_Systems Engelen, G., Rimmer, V., Joosen, W.: Troubleshooting an intrusion detection dataset: the cicids2017 case study. IEEE Security and Privacy Workshops (2021) Liu, L., Engelen, G., Lynar, T., Essam, D., Joosen, W.: Error prevalence in nids datasets: A case study on cic-ids-2017 and cse-cic-ids-2018. IEEE Conference on Communications and Network Security (2022) Rosayand, A., Cheval, E., Carlier, F., Leroux, P.: Network intrusion detection: A comprehensive analysis of cic-ids2017. In: 8th International Conference on Information Systems Security and Privacy (2022) Lee, W., Stolfo, S., Mok, K.: Mining in a data-flow environment: Experience in network intrusion detection. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999). https://dl.acm.org/doi/pdf/10.1145/312129.312212 Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.: A detailed analysis of the kdd cup 99 data set. In: IEEE Symposium on Computational Intelligence in Security and Defense Applications (2009) ESG. Network traffic analysis (nta): A cybersecurity quick win (2020). https://www.cisco.com/c/dam/en/us/products/collateral/security/stealthwatch/stealthwatch-esg-wp.pdf Network traffic analysis: Solution overview (2023). https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/docs/vmware-nsx-network-traffic-analysis.pdf Firepower 9300 series data sheet. (2021). https://www.cisco.com/c/en/us/products/collateral/security/firepower-9000-series/datasheet-c78-742471.html Noormohammadpour, M., Raghavendra, C.: Datacenter traffic control: Understanding techniques and trade-offs. In: IEEE Communications Surveys and Tutorials (2018). https://www.researchgate.net/publication/321744877_Datacenter_Traffic_Control_Understanding_Techniques_and_Trade-offs Dube, R.: (mis)use of the cicids 2017 dataset in information security research (2022). https://doi.org/10.13140/RG.2.2.25435.64809