How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Journal of Network and Systems Management - Tập 23 - Trang 830-869 - 2014
Riyad Alshammari1, A. Nur Zincir-Heywood2
1College of Public Health and Health Informatics, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia
2Faculty of Computer Science, Dalhousie University, Halifax, Canada

Tóm tắt

The classification of encrypted network traffic represents an important issue for network management and security tasks including quality of service, firewall enforcement, and security. Traffic classification becomes more challenging since the traditional techniques, such as port numbers or Deep Packet Inspection, are ineffective against Peer-to-Peer Voice over Internet Protocol (VoIP) applications, which used non-standard ports and encryption. Moreover, traffic classification also represents a particularly challenging application domain for machine learning (ML). Solutions should ideally be both simple—therefore efficient to deploy—and accurate. Recent advances in ML provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain and increasing the throughput of solutions. In this work, we investigate the robustness of an ML approach to classify encrypted traffic on not only different network traffic but also against evasion attacks. Our ML based approach only employs statistical network traffic flow features without using the Internet Protocol addresses, source/destination ports, and payload information to unveil encrypted VoIP applications in network traffic. What we mean by robust signatures is that the signatures learned by training on one network are still valid when they are applied to traffic coming from totally different locations, networks, time periods, and also against evasion attacks. The results on different network traces, as well as on the evasion of a Skype classifier, demonstrate that the performance of the signatures are very promising, which implies that the statistical information based on the network layer with the use of ML can achieve high classification accuracy and produce robust signatures.

Tài liệu tham khảo

IANA, Internet assigned numbers authority, http://www.iana.org/assignments/port-number (last Accessed Oct 2009)

Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: SIGCOMM ’05: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240. ACM Press, New York, NY, USA (2005)

Haffner, P., Sen, S., Spatscheck, O., Wang, D.: ACAS: automated construction of application signatures. In: MineNet ’05: Proceeding of the 2005 ACM SIGCOMM Workshop on Mining Network Data, pp. 197–202. ACM Press, New York, NY, USA (2005)

Montigny-Leboeuf, A.D.: Flow Attributes for Use in Traffic Characterization, CRC Technical Note No. CRC-TN-2005-003, Feb 2005.

Alshammari, R.: Automatically classifying encrypted network traffic: a case study of ssh. Mater thesis, Dalhousie University, NS, Canada, 133 pp. (2008)

Quinlan, J.: See5-comparison, http://www.rulequest.com/see5-comparison.html (last Accessed Feb 2011)

Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. SIGCOMM Comput. Commun. Rev. 37(4), 37–48 (2007)

Bacquet, C., Gumus, K., Tizer, D., Zincir-Heywood, A., Heywood, M.I.: A comparison of unsupervised learning techniques for encrypted traffic identification. J. Inf. Assur. Secur. 5, 464–472 (2010)

Skype, http://www.skype.com/useskype/

IETF, http://www3.ietf.org/proceedings/97apr/97apr-final/xrtftr70.htm

NetMate, http://www.ip-measurement.org/tools/netmate/

Arndt, D.: How to calculating flow statistics using netmate, http://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ (last Accessed Sept 2011)

Quinlan, J.: see5-info, http://www.rulequest.com/see5-info.html (last Accessed July 2010)

T. U. of Waikato, WEKA software, http://www.cs.waikato.ac.nz/ml/weka/

SBB-GP, Symbiotic bid-based (sbb) paradigm, http://www.cs.dal.ca/mheywood/Code/SBB/SCM.9.r20081212.tar.gz (last Accessed March 2008)

PacketShaper, http://www.packeteer.com/products/p-acketshaper/ (last Accessed March 2008). CalladoBetter2010

Traces, S.: Telecommunication networks group—politecnico ditorino, http://tstat.tlc.polito.it/traces-skype.shtml (last Accessed August 2009)

Alshamamri, R.: Downloading the NIMS data sets, http://web.cs.dal.ca/riyad/Site/Download.html (last Accessed Sept 2011)

Wireshark, http://www.wireshark.org/ (last Accessed Sept 2008)

Peeker, N.: Netpeeker, http://www.net-peekerCalladoBetter2010.com (last Accessed Oct 2009)

Signalogic, Speech codec wav samples, http://www.signalogic.com/index.pl?page=codec_samples (last Accessed Oct 2009)

Zimmermann, P.: The Zfone project, http://zfoneproject.com/ (last Accessed Oct 2009)

Zimmermann, E.P., Johnston, A., Callas, J.: Zrtp: media path key agreement for secure rtp, http://tools.ietf.org/html/draft-zimmermann-avt-zrtp-17 (2010)

P. T. C. Inc, Primus softphone client, http://www.primus.ca/en/residential/talkbroadband/talkBroadband-softphone.htm (last Accessed Oct 2009)

ETSI, Digital cellular telecommunications system (phase 2+), general packet radio service (gprs), overall description of the gprs radio interface, stage 2 (gsm 03.64, version 7.0.0, release 1999)

BirdsSoft, Vpn-x, http://birdssoft.com/ (last Accessed March 2011)

MAWI, Mawi working group traffic archive, http://tracer.csl.sony.co.jp/mawi/

Fink, R., Hinden, R.: 6bone (IPv6 testing address allocation), http://tools.ietf.org/html/rfc3701 (2004)

Skype, Skype garage, http://blogs.skype.com/garage/windows/ (last Accessed Sept 2011)

N. software, Switch audio converter for mac, http://www.nch.com.au/switch/index.html (last Accessed May 2011)

OpenDPI, the open source version of ipoque’s dpi engine, http://www.opendpi.org/ (last Accessed April 2011)