Deciphering malware’s use of TLS (without decryption)

Springer Science and Business Media LLC - Tập 14 - Trang 195-211 - 2017
Blake Anderson1, Subharthi Paul1, David McGrew1
1Cisco Systems, Inc., San Jose, USA

Tóm tắt

The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages. However, TLS also introduces a complex set of observable data features that allow many inferences to be made about both the client and the server. We show that these features can be used to detect and understand malware communication, while at the same time preserving the privacy of the benign uses of encryption. These data features also allow for accurate malware family attribution of network communication, even when restricted to a single, encrypted flow. To demonstrate this, we performed a detailed study of how TLS is used by malware and enterprise applications. We provide a general analysis on millions of TLS encrypted flows, and a targeted study on 18 malware families composed of thousands of unique malware samples and tens-of-thousands of malicious TLS flows. Importantly, we identify and accommodate for the bias introduced by the use of a malware sandbox. We show that the performance of a malware classifier is correlated with a malware family’s use of TLS, i.e., malware families that actively evolve their use of cryptography are more difficult to classify. We conclude that malware’s usage of TLS is distinct in an enterprise setting, and that these differences can be effectively used in rules and machine learning classifiers.

Tài liệu tham khảo

Anderson, B., McGrew, D.: Identifying encrypted malware traffic with contextual flow data. In: ACM Workshop on Artificial Intelligence and Security (AISec), pp. 35–46 (2016) Anderson, B., McGrew, D.: Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In: ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD), pp 1723–1732 (2017) Anderson, B., Storlie, C., Lane, T.: Multiple Kernel learning clustering with an application to malware. In: 12th International Conference on Data Mining (ICDM), pp. 804–809. IEEE (2012) Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX Security Symposium, pp. 491–506 (2012) Bayer, U., Comparetti, P M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, Behavior-based malware clustering. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), vol. 9, pp. 8–11 (2009) Bilge, L., Balzarotti, D., Robertson W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: 28th Annual Computer Security Applications Conference, pp. 129–138. ACM (2012) Callegati, F., Cerroni, W., Ramilli, M.: Man-in-the-middle attack to the HTTPS protocol. IEEE Security & Privacy 7(1), 78–81 (2009) Cisco Talos: IP Blacklist Feed. http://www.talosintel.com/feeds/ip-filter.blf (2016) Dierks, T., Rescorla, E.: The Transport Layer Security (TLS) Protocol Version 1.2. RFC 5246 (2008) Dietterich, T G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998) Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple Classifier Systems, pp. 1–15. Springer, Berlin (2000) Durumeric, Z., Wustrow, E., Halderman, J.A.: ZMap: fast internet-wide scanning and its security applications. In: USENIX Security Symposium, pp. 605–620 (2013) Holz, R., Amann J., Mehani, O., Wachs, M., Kaafar, M.A.: TLS in the Wild: an internet-wide analysis of TLS-based protocols for electronic communication. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016) Koh, K., Kim, S.J., Boyd, S.P.: An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007) Krishnapuram, B., Carin, L., Figueiredo, M.A., Hartemink, A.J.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005) Microsoft. Choose the right ciphersuites in SChannel. https://www.ssl.com/how-to/choose-the-right-cipher-suites-in-schannel-dll/ (2016) Microsoft. SChannel. https://msdn.microsoft.com/en-us/library/windows/desktop/ms678421%28v=vs.85%29.aspx (2016) Most Internet Traffic will be Encrypted by Year End. Here’s Why. http://fortune.com/2015/04/30/netflix-internet-traffic-encrypted/. Accessed 31 Oct 2016 Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008) Opderbeck, D.W., Hurwitz, J.G.: Apple v. FBI: Brief in Support of Neither Party in San Bernardino iPhone case. http://ssrn.com/abstract=2746100 (2016) Panchenko, A., Lanze, F., Zinnen, A., Henze, M., Pennekamp, J., Wehrle, K., Engel, T.: Website fingerprinting at internet scale. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016) Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: NSDI, pp. 391–404 (2010) Qualys. Qualys SSL Labs. https://www.ssllabs.com/ssltest/clients.html (2016) Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 108–125 (2008) Roesch, M.: Snort—lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration, LISA, pp. 229–238. USENIX Association (1999) Snort. Community Rules. https://www.snort.org/downloads/community/community-rules.tar.gz (2016) Vassilev, A.: Annex A: Approved Security Functions for FIPS PUB 140-2, Security Requirements for Cryptographic Modules. http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf (2016) Virus Total. https://www.virustotal.com/ (2016) Wang, K., Cretu, G., Stolfo, S.J.: Anomalous payload-based worm detection and signature generation. In: International Symposium on Recent Advances in Intrusion Detection (RAID), pp. 227–246. Springer, Berlin (2005) Wurzinger, P., Bilge, L., Holz, T., Goebel, J., Kruegel, C., Kirda, E.: Automatically generating models for botnet detection. In: Computer Security–ESORICS 2009, pp. 232–249. Springer (2009) Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for L1-regularized logistic regression. J. Mach. Learn. Res. 13, 1999–2030 (2012) Zander, S., Nguyen, T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: The 30th IEEE Conference on Local Computer Networks, pp. 250–257. IEEE (2005) Zeus Source Code. https://github.com/Visgean/Zeus