How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Journal of Network and Systems Management - Tập 23 - Trang 830-869 - 2014

Riyad Alshammari¹, A. Nur Zincir-Heywood²

¹College of Public Health and Health Informatics, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia

²Faculty of Computer Science, Dalhousie University, Halifax, Canada

Tóm tắt

The classification of encrypted network traffic represents an important issue for network management and security tasks including quality of service, firewall enforcement, and security. Traffic classification becomes more challenging since the traditional techniques, such as port numbers or Deep Packet Inspection, are ineffective against Peer-to-Peer Voice over Internet Protocol (VoIP) applications, which used non-standard ports and encryption. Moreover, traffic classification also represents a particularly challenging application domain for machine learning (ML). Solutions should ideally be both simple—therefore efficient to deploy—and accurate. Recent advances in ML provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain and increasing the throughput of solutions. In this work, we investigate the robustness of an ML approach to classify encrypted traffic on not only different network traffic but also against evasion attacks. Our ML based approach only employs statistical network traffic flow features without using the Internet Protocol addresses, source/destination ports, and payload information to unveil encrypted VoIP applications in network traffic. What we mean by robust signatures is that the signatures learned by training on one network are still valid when they are applied to traffic coming from totally different locations, networks, time periods, and also against evasion attacks. The results on different network traces, as well as on the evasion of a Skype classifier, demonstrate that the performance of the signatures are very promising, which implies that the statistical information based on the network layer with the use of ML can achieve high classification accuracy and produce robust signatures.

Tài liệu tham khảo

IANA, Internet assigned numbers authority, http://www.iana.org/assignments/port-number (last Accessed Oct 2009)

Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: Passive and Active Network Measurement: Proceedings of the Passive & Active Measurement Workshop, pp. 41–54 (2005)

Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. MASCOTS 2006. 14th IEEE International Symposium on, pp. 179–188 (2006)

Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 512–521. ACM, New York, NY, USA (2004)

Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: MineNet ’06: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, pp. 281–286. ACM Press, New York, NY, USA (2006)

Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: SIGCOMM ’05: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240. ACM Press, New York, NY, USA (2005)

Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? Comput. Netw. 55(6), 1326–1350 (2011)

Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36(2), 23–26 (2006)

Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: SIGMETRICS ’05: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 50–60. ACM Press, New York, NY, USA (2005)

Alshammari, R., Zincir-Heywood, A.N.: A flow based approach for ssh traffic detection. In: Proceedings of the IEEE International Conference on System, Man and Cybernetics—SMC’2007 (2007)

Alshammari, R., Zincir-Heywood, A.N.: Investigating two different approaches for encrypted traffic classification. In: PST ’08: Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust, pp. 156–166. IEEE Computer Society, Washington, DC, USA (2008)

Alshammari, R., Zincir-Heywood, N.: Generalization of signatures for ssh encrypted traffic identification. In: Computational Intelligence in Cyber Security. CICS ’09. IEEE Symposium on, pp. 167–174 (2009)

Early, J., Brodley, C., Rosenberg, C.: Behavioral authentication of server flows. In: Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55 (2003)

Haffner, P., Sen, S., Spatscheck, O., Wang, D.: ACAS: automated construction of application signatures. In: MineNet ’05: Proceeding of the 2005 ACM SIGCOMM Workshop on Mining Network Data, pp. 197–202. ACM Press, New York, NY, USA (2005)

Montigny-Leboeuf, A.D.: Flow Attributes for Use in Traffic Characterization, CRC Technical Note No. CRC-TN-2005-003, Feb 2005.

Wright, C., Monrose, F., Masson, G.M.: HMM profiles for network traffic classification. In: VizSEC/DMSEC ’04: Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security, pp. 9–15. ACM Press, New York, NY, USA (2004)

Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)

Pise, N., Kulkarni, P.: A survey of semi-supervised learning methods. In: Computational Intelligence and Security. CIS ’08. International Conference on, vol. 2, pp. 30–34 (2008)

Alshammari, R.: Automatically classifying encrypted network traffic: a case study of ssh. Mater thesis, Dalhousie University, NS, Canada, 133 pp. (2008)

Quinlan, J.: See5-comparison, http://www.rulequest.com/see5-comparison.html (last Accessed Feb 2011)

Callado, A., Kelner, J., Sadok, D.: Alberto Kamienski C, Fernandes S.: Better network traffic identification through the independent combination of techniques. J. Netw. Comput. Appl. 33(4), 433–446 (2010)

Nguyen, T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)

Callado, A., Kamienski, C., Szabo, G., Gero, B., Kelner, J., Fernandes, S., Sadok, D.: A survey on internet traffic identification. Commun. Surv. Tutor. IEEE 11(3), 37–52 (2009)

Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. SIGCOMM Comput. Commun. Rev. 37(4), 37–48 (2007)

Freire, E., Ziviani, A., Salles, R.: Detecting skype flows in web traffic. In: Network Operations and Management Symposium. NOMS 2008, pp. 89–96. IEEE (2008)

Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53(14), 2476–2490 (2009)

Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64, 1194–1213 (2007)

Bacquet, C., Gumus, K., Tizer, D., Zincir-Heywood, A., Heywood, M.I.: A comparison of unsupervised learning techniques for encrypted traffic identification. J. Inf. Assur. Secur. 5, 464–472 (2010)

Iliofotou, M., Kim, H.C., Faloutsos, M., Mitzenmacher, M., Pappu, P., Varghese, G.: Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput. Netw. 55(8), 1909–1920 (2011)

Park, J., Tyan, H.-R., Kuo, C.-C.: Ga-based internet traffic classification technique for QoS provisioning. In: Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP ’06. International Conference on, pp. 251–254 (2006)

Hu, Y., Chiu, D.-M., Lui, J.C.S.: Profiling and identification of p2p traffic. Comput. Netw. 53(6), 849–863 (2009)

Wright, C.V., Coull, S.E., Monrose, F.: Traffic morphing: an efficient defense against statistical traffic analysis. In: Proceedings of the Network and Distributed Security Symposium—NDSS ’09 (2009)

Wright, C.V., Ballard, L., Monrose, F., Masson, G.M.: Language identification of encrypted VoIP traffic: Alejandra y roberto or alice and bob? In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 4:1–4:12. USENIX Association, Berkeley, CA, USA (2007)

Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 255–263. ACM, New York, NY, USA (2006)

Skype, http://www.skype.com/useskype/

Baset, S.A., Schulzrinne, H.G.: An analysis of the skype peer-to-peer internet telephony protocol. In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pp. 1–11 (2006)

Bonfiglio, D., Mellia, M., Meo, M., Ritacca, N., Rossi, D.: Tracking down skype traffic. In: INFOCOM 2008. The 27th Conference on Computer Communications, pp. 261–265. IEEE (2008)

De Cicco, L., Mascolo, S., Palmisano, V.: Skype video responsiveness to bandwidth variations. In: Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’08), pp. 81–86. ACM, New York, NY, USA (2008)

Barbosa, R., Callado, A., Kamienski, C., Fernandes, S., Mariz, D., Sadok, D.: Performance evaluation of P2P VoIP application. In: Proceedings of the 17th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’07), IL, USA (2007)

IETF, http://www3.ietf.org/proceedings/97apr/97apr-final/xrtftr70.htm

NetMate, http://www.ip-measurement.org/tools/netmate/

Arndt, D.: How to calculating flow statistics using netmate, http://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ (last Accessed Sept 2011)

Quinlan, J.: see5-info, http://www.rulequest.com/see5-info.html (last Accessed July 2010)

Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge, MA (2004)

Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)

de Jong, E.: A monotonic archive for pareto-coevolution. Evol. Comput. 15(1), 61–93 (2007)

T. U. of Waikato, WEKA software, http://www.cs.waikato.ac.nz/ml/weka/

SBB-GP, Symbiotic bid-based (sbb) paradigm, http://www.cs.dal.ca/mheywood/Code/SBB/SCM.9.r20081212.tar.gz (last Accessed March 2008)

PacketShaper, http://www.packeteer.com/products/p-acketshaper/ (last Accessed March 2008). CalladoBetter2010

Traces, S.: Telecommunication networks group—politecnico ditorino, http://tstat.tlc.polito.it/traces-skype.shtml (last Accessed August 2009)

Alshamamri, R.: Downloading the NIMS data sets, http://web.cs.dal.ca/riyad/Site/Download.html (last Accessed Sept 2011)

Wireshark, http://www.wireshark.org/ (last Accessed Sept 2008)

Peeker, N.: Netpeeker, http://www.net-peekerCalladoBetter2010.com (last Accessed Oct 2009)

Signalogic, Speech codec wav samples, http://www.signalogic.com/index.pl?page=codec_samples (last Accessed Oct 2009)

Zimmermann, P.: The Zfone project, http://zfoneproject.com/ (last Accessed Oct 2009)

Zimmermann, E.P., Johnston, A., Callas, J.: Zrtp: media path key agreement for secure rtp, http://tools.ietf.org/html/draft-zimmermann-avt-zrtp-17 (2010)

P. T. C. Inc, Primus softphone client, http://www.primus.ca/en/residential/talkbroadband/talkBroadband-softphone.htm (last Accessed Oct 2009)

ETSI, Digital cellular telecommunications system (phase 2+), general packet radio service (gprs), overall description of the gprs radio interface, stage 2 (gsm 03.64, version 7.0.0, release 1999)

BirdsSoft, Vpn-x, http://birdssoft.com/ (last Accessed March 2011)

Kent, S., Atkinson, R.: Security architecture for the internet protocol, http://www.ietf.org/rfc/rfc2401.txt (1998)

MAWI, Mawi working group traffic archive, http://tracer.csl.sony.co.jp/mawi/

Fink, R., Hinden, R.: 6bone (IPv6 testing address allocation), http://tools.ietf.org/html/rfc3701 (2004)

Ehlert, S., Petgang, S., Magedanz, T., Sisalem, D.: Analysis and signature of Skype VoIP session traffic. In: CIIT 2006: 4th IASTED International Conference on Communications, Internet, and Information Technology, pp. 83–89 (2006)

Skype, Skype garage, http://blogs.skype.com/garage/windows/ (last Accessed Sept 2011)

Valin, J.-M., Montgomery, C.: Improved noise weighting in CELP coding of speech—applying the Vorbis psychoacoustic model to speex, http://www.speex.org (2006)

N. software, Switch audio converter for mac, http://www.nch.com.au/switch/index.html (last Accessed May 2011)

OpenDPI, the open source version of ipoque’s dpi engine, http://www.opendpi.org/ (last Accessed April 2011)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA