Malware phylogeny generation using permutations of code

Springer Science and Business Media LLC - Tập 1 - Trang 13-23 - 2005
Md. Enamul. Karim1, Andrew Walenstein1, Arun Lakhotia1, Laxmi Parida2
1Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA
2IBM T. J., Watson Research Center, York town, USA

Tóm tắt

Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies

Tài liệu tham khảo

Abou-Assaleh T., Cercone N., Kešelj V., Sweidan R. (2004). Detection of new malicious code using n-grams signatures. In: Second annual conference on privacy, security and trust. Fredericton, NB, Canada, pp 193–196 Arief B., Besnard D. (2003). Technical and human issues in computer-based systems security. Tech. Rep. CS-TR-790, School of Computing Science, University of Newcastle-upon-Tyme Arnold W., Tesauro G. (2000). Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 international virus bulletin conference Baker BS. (1992). A program for identifying duplicated code. Comput Sci Stat 24:49–57 Baker BS., Manber U. (1998). Deducing similarities in java sources from bytecodes. In: Proceedings of the USENIX annual technical conference (no 98) Beszédes Á., Ferenc R., Gyimóthy T. (2003). Survey of code-size reduction methods. ACM Comput Surve 35:223–267 Bontchev V., Tocheva K. (2002). Macro and script virus polymorphism. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 406–438 Bontchev V. (2004). Anti-virus spamming and the virus-naming mess: Part 2. Virus Bull pp. 13–15 Erdélyi G., Carrera E. (2004). Digital genome mapping: advanced binary malware analysis. In: Proceedings of 15th virus bulletin international conference (VB 2004),Chicago, IL, pp. 187–197 Goldberg LA., Goldberg PW., Phillips CA., Sorkin GB. (1998). Constructing computer virus phylogenies. J Algorithms 26:188–208 Godfrey M., Tu Q. (2001) Growth, evolution, and structural change in open source software. In: Proceedings of the 4th international workshop on principles of software evolution, Vienna, Austria ACM Press, pp. 103–106 Gusfield D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK Jordan M. (2002). Dealing with metamorphism. Virus Bulletin pp 4–6 Karypis G. (2003). CLUTO: A clustering toolkit, release 2.1.1, Tech. Rep.#02-017, Department of Computer Science, University of Minnesota,Minneapolis, MN 55455, November 2003 Kephart JO. (1994). A biologically inspired immune system for computers. In: Brooks RA., Maes P (eds), Artificial Life IV: Proceedings of the fourth international workshop on synthesis and simulation of living systems MIT Press, Cambridge, MA, pp 130–139 Kephart JO., Sorkin GB., Arnold WC., Chess DM., Tesauro GJ., White SR. (1995). Biologically inspired defenses against computer viruses. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI’95), Morgan Kaufman, Montreal, PQ, pp 985–996 Kephart JO., Arnold WC. (1994). Automatic extraction of computer virus signatures. In: Ford R (ed.) Proceedings of the 4th Virus Bulletin International Conference Virus Bulletin Ltd., Abingdon, England, pp. 179–194 Kolter JZ., Maloof MA. (2004). Learning to detect malicious executables in the wild. In: Kim W, Kohavi R, Gehrke J, DuMouchel W, (eds.), Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Seattle, WA, pp 470–478 Marko R. (2002). Heuristics: Retrospective and future. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 107–124 National Center for Biotechnology Information (2004) Just the facts: A basic introduction to the science underlying NCBI resources, http://www.ncbi.nlm.nih.gov/ About/primer/phylot .html, Last retrieved 20 March, 2005 Oberhumer MFXJ., Molnár L (2005) The Ultimate Packer for eXecutables – homepage. http://upx.sourceforge.net, Last retrieved 20 March, 2005 Raiu C (2002) A virus by any other name: Virus naming practices. Security focus, http://www.securityfocus.com/infocus/1587, Last accessed March 5, 2005 Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, Oakland, CA, IEEE Computer Society Press, pp 38–49 Ször P, Ferrie P (2001) Hunting for metamorphic. In: Proceedings of the 12th virus bulletin international conference pp 123–144 Tesauro G., Kephart JO., Sorkin GB. (1996). Neural networks for computer virus recognition. IEEE Expert 11(4):5–6 Tichy WF. (1984). The string-to-string correction problem with block moves. ACM Trans Comput Syst 2(4):309–321 VX heavens (2005) Available from vx.netlux.org (and mirrors), Last retrieved 5 March Wehner S (2005) Analyzing worms using compression. http://homepages.cwi.nl/∼wehner/worms/, Last accessed March 5,2005 Zobel J., Moffat A. (1998). Exploring the similarity space. SIGIR Forum 32(1):18–34