Recognition of tor malware and onion services

Jesper Bergman1, Oliver B. Popov1
1Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden

Tóm tắt

The transformation of the contemporary societies through digital technologies has had a profound effect on all human activities including those that are in the realm of illegal, unlawful, and criminal deeds. Moreover, the affordances provided by the anonymity creating techniques such as the Tor protocol which are beneficial for preserving civil liberties, appear to be highly profitable for various types of miscreants whose crimes range from human trafficking, arms trading, and child pornography to selling controlled substances and racketeering. The Tor similar technologies are the foundation of a vast, often mysterious, sometimes anecdotal, and occasionally dangerous space termed as the Dark Web. Using the features that make the Internet a uniquely generative knowledge agglomeration, with no borders, and permeating different jurisdictions, the Dark Web is a source of perpetual challenges for both national and international law enforcement agencies. The anonymity granted to the wrong people increases the complexity and the cost of identifying both the crimes and the criminals, which is often exacerbated with lack of proper human resources. Technologies such as machine learning and artificial intelligence come to the rescue through automation, intensive data harvesting, and analysis built into various types of web crawlers to explore and identify dark markets and the people behind them. It is essential for an effective and efficient crawling to have a pool of dark sites or onion URLs. The research study presents a way to build a crawling mechanism by extracting onion URLs from malicious executables by running them in a sandbox environment and then analysing the log file using machine learning algorithms. By discerning between the malware that uses the Tor network and the one that does not, we were able to classify the Tor using malware with an accuracy rate of 91% with a logistic regression algorithm. The initial results suggest that it is possible to use this machine learning approach to diagnose new malicious servers on the Tor network. Embedding this kind of mechanism into the crawler may also induce predictability, and thus efficiency in recognising dark market activities, and consequently, their closure.

Tài liệu tham khảo

Ahmed, Y.A., Kocer, B., Huda, S., et al.: A system call refinement-based enhanced minimum redundancy maximum relevance method for ransomware early detection. J. Netw. Comput. Appl. 167(102), 753 (2020). https://doi.org/10.1016/j.jnca.2020.102753 Alazab, M., Venkataraman, S., Watters, P.: Towards understanding malware behaviour by the extraction of api calls. In: Second Cybercrime and Trustworthy Computing Workshop. IEEE, Ballarat, Australia, pp. 52–59 (2010) https://doi.org/10.1109/CTC.2010.8 Alazab, M., Alazab, M., Shalaginov, A., et al.: Intelligent mobile malware detection using permission requests and api calls. Fut. Gener. Comput. Syst. 107, 509–521 (2020). https://doi.org/10.1016/j.future.2020.02.002 Bernaschi, M., Celestini , A., Guarino, S., et al.: Spiders like onions: On the network of tor hidden services. In: The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, pp. 105–115, (2019) https://doi.org/10.1145/3308558.3313687 Casenove, M., Miraglia, A.: Botnet over tor: the illusion of hiding. 6th International Conference On Cyber Conflict (CyCon 2014) pp 273–282,(2014). https://doi.org/10.1109/CYCON.2014.6916408 Chen, H., Chung, W., Quin, J., et al.: Uncovering the dark web: a case study of jihad on the web. J. Am. Soc. Inf. Sci. Technol. 59(8), 580 (2008) CISA, Ransomware-what it is and what to do about it. (2019) https://www.us-cert.gov/sites/default/files/publications/Ransomware_Executive_One-Pager_and_Technical_Document-FINAL.pdf Crowder, E., Lansiquot , J.: Darknet data mining–a canadian cyber-crime perspective. (2021) arxiv:2105.13957 Cuckoo, cuckoo/test config.py at master cuckoosandbox/cuckoo. (2020) https://github.com/cuckoosandbox/cuckoo/blob/master/tests/test_config.py Cuckoo.org Analysis package-cuckoo v2.7.0 book. (2019) https://cuckoo.sh/docs/usage/packages.html?highlight=module Cuckoo.org , Pre-analysis network routing-cuckoo v2.7.0 book. (2020) https://cuckoo.sh/docs/installation/host/routing.html?highlight=tor#routing-tor D’Agostino, V.D.: Complaint: United States of America v. blake benthall. (2014) https://www.justice.gov/usao/nys/pressreleases/November14/BlakeBenthallArrestPR/Benthall%2C%20Blake%20Complaint.pdf Dingledine, R., Mathewson, N., Syverson , P.: Tor: The second-generation onion router. In: Proceedings of the 13th USENIX Security Symposium (2004) Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inf. 35, 352–359 (2002). https://doi.org/10.1016/S1532-0464(03)00034-0 Europol: Drugs and the darknet: Perspectives for enforcement, research and policy. (2017) https://www.europol.europa.eu/publications-documents/drugs-and-darknet-perspectives-for-enforcement-research-and-policy F-Secure : Ransomware timeline 2010–2017. (2017) https://blog.f-secure.com/ransomware-timeline-2010-2017/ Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006) Frank, J.M.R.: Sentiment crawling: extremist content collection through a sentiment analysis guided web-crawler. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2015) Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secu. 5(2), 458 (2014). https://doi.org/10.4236/jis.2014.52006 Ghafir, I., Svoboda, J., Prenosil, V.: Tor-based malware and tor connection detection. In: International Conference on Frontiers of Communications, Networks and Applications (ICFCNA), pp. 1–6. (2014) https://doi.org/10.1049/cp.2014.1411 Gheorghe, A.: New backdoor allows full access to mac systems, bitdefender warns. (2016) https://www.bitdefender.com/blog/labs/new-mac-backdoor-nukes-os-x-systems/ Grajeda, C., Breitinger, F., Baggili, I.: Availability of datasets for digital forensics and what is missing. Dig. Investig. 22(5), 594–5105 (2017). https://doi.org/10.1016/j.diin.2017.06.004 Horejsi, J.: Retefe banking trojan targets UK banking customers. (2016) https://blog.avast.com/retefe-banking-trojan-targets-uk-banking-customers Hwang, J., Kim, J., Lee, S., et al.: Two-stage ransomware detection using dynamic analysis and machine learning techniques. Wireless Pers. Commun. 112(112), 2597–2609 (2020). https://doi.org/10.1007/s11277-020-07166-9 Irshad, A., Dutta, M.K.: Identification of windows-based malware by dynamic analysis using machine learning algorithm. In: Gao, X.Z., Tiwari, S., Trivedi, M.C., et al. (eds.) Advances in Computational Intelligence and Communication Technology, pp. 207–218. Springer, Singapore (2021) Juarez, M., Afroz, S., Acar, G., et al.: A critical evaluation of website fingerprinting attacks. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, USA, CCS ’14, pp. 263–274 (2014) https://doi.org/10.1145/2660267.2660368 Karlsson, I.: Order in the random forest. PhD thesis, Stockholm University, Kista, Sweden, (2017) https://su.diva-portal.org/smash/get/diva2:1090364/FULLTEXT01.pdf Kaspersky (2022) The onion ransomware (encryption trojan). https://usa.kaspersky.com/resource-center/threats/onion-ransomware-virus-threat Kwon, KH., Priniski, JH., Sakar, S., et al.: Crisis and collective problem solving in dark web: An exploration of a black hat forum. In: Proceedings of the 8th International Conference on Social Media & Society Article No. 45. ACM, pp. 1–5 (2017) Ligh, M.H., Adair, S., Hartstein, B., et al.: Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code, 1st edn. Wiley, London (2011) Ligh, M.H., Case, A., Levy, J., et al.: The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory, 1st edn. Wiley, London (2014) Ling, Z., Luo, J., Wu, K., et al.: Torward: discovery, blocking, and traceback of malicious traffic over tor. IEEE Trans. Inf. Forensics Secur. 10(12), 2515–2530 (2015) Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008). https://doi.org/10.1017/CBO9780511809071.014 Meng, Y., Zhuang, H., Lin, Z., et al.: A survey on machine learning-based detection and classification technology of malware. In: 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI), pp. 783–792, (2021) https://doi.org/10.1109/CISAI54367.2021.00158 Nunes, E., Diab, A., Gunn, A., et al.: Darknet and deepnet mining for proactive cyber treat intelligence. In: Conference on Intelligence and Security Informatics (ISI). IEEE, pp. 7–12, (2016) https://doi.org/10.1109/ISI.2016.7745435 Page, C.: Revil ransomware group goes dark after its tor sites were hijacked. (2021) https://techcrunch.com/2021/10/18/revil-ransomware-group-goes-dark-after-its-tor-sites-were-hijacked/ Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) Pektas, A., Acarman, T.: Ensemble machine learning approach for android malware classification using hybrid features. In: Proceedings of the 10th International Conference on Computer Recognition Systems. Springer, pp. 191–200 (2017) Pirscoveanu, RS., Hansen, SS., Larsen, TMT., et al.: Analysis of malware behavior: type classification using machine learning. In: 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). IEEE, pp. 113–118, (2015) https://doi.org/10.1109/CyberSA.2015.7166115 Popov, O., Bergman, J., Valassi, C.: A framework for a forensically sound harvesting the dark web. In: CECC 2018: Proceedings of the Central European Cybersecurity Conference 2018. ACM, pp. 1–7, (2018) https://doi.org/10.1145/3277570.3277584 Portnoff, RS., Afroz, S., Durrett, G., et al.: Tools for automated analysis of cybercriminal markets. In: International World Wide Web Conference Committee (IW3C2). ACM, pp. 1–5, (2017) https://doi.org/10.1145/3038912.3052600 Provost, F., Fawcett, T.: Data Science for Business-What You Need to Know about Data Mining and Data-Analytic Thinking, 1st edn. O’Reilly, USA (2013) Rabaut, JT.: Complaint: United states of america v. alexandre cazes. (2017). https://www.justice.gov/opa/press-release/file/982821/download Reed, T.: New mac backdoor malware: Eleanor. (2016) https://blog.malwarebytes.com/cybercrime/2016/07/new-mac-backdoor-malware-eleanor/ Rokach, L., Maimon, O.: Decision Trees. World Scientific Publishing, Singapore (2010) Saleem, J., Islam, R., Kabir, MA.: The anonymity of the dark web: a survey. IEEE Access 10:33,628–33,660. (2022). https://doi.org/10.1109/ACCESS.2022.3161547 Scikit-Learn-Developers: 1.9.2. multinomial naive bayes. (2020) https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes Scikit-Learn-Developers: 1.1.11. logistic regression. (2022) https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regression#sklearn.linear_model.LogisticRegression Shalaginov, A., Banin, S., Dehghantanha, A., et al.: Machine learning aided static malware analysis: a survey and tutorial. Adv. Inf. Sec. 70, 559 (2018). https://doi.org/10.1007/978-3-319-73951-9_2 Sophos: The current state of ransomware: Cryptowall. (2015) https://news.sophos.com/en-us/2015/12/17/the-current-state-of-ransomware-cryptowall/ Spitters, M., Klaver, F., Koot, G., et al.: Authorship analysis on dark marketplace forums. In: European Intelligence and Security Informatics Conference. IEEE, pp. 631–641, (2015) StatCounter: Desktop operating system market share worldwide aug 2021–aug 2022. (2022), https://gs.statcounter.com/os-market-share/desktop/worldwide Tai, XH., Soska, K., Christin, N.: Adversarial matching of dark net market vendor accounts. In: KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. IEEE, pp. 1871–1880, (2019) https://doi.org/10.1145/3292500.3330763 Tarakanov, D.: The inevitable move-64-bit zeus enhanced with tor. (2013) https://securelist.com/the-inevitable-move-64-bit-zeus-enhanced-with-tor/58184/ Tian, R., Batten, L., Islam, R., et al.: An automated classification system based on the strings of trojan and virus families. In: 2009 4th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, pp. 23–30, (2009) Tor-Project: dir-spec.txt-torspec-tor’s protocol specifications. (2022a) https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt Tor-Project: glossary.txt - torspec - tor’s protocol specifications. (2022b) https://gitweb.torproject.org/torspec.git/tree/glossary.txt Tor-Project “tor rendezvous specification - version 3”. (2022c) urlhttps://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt Virusshare.com Virusshare.com-because sharing is caring. (2020) https://virusshare.com/research Westlake, B., Bouchard, M., Frank, R.: Assessing the validity of automated webcrawlers as data collection tools to investigate online child sexual exploitation. Sexual Abuse 29(7), 685–708 (2015). https://doi.org/10.1177/1079063215616818 Zainor, F.: Trojan-spy:js/retefe description. (2019) https://www.f-secure.com/v-descs/trojan-spy_js_retefe.shtml