Malware phylogeny generation using permutations of code
Tóm tắt
Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies
Tài liệu tham khảo
Abou-Assaleh T., Cercone N., Kešelj V., Sweidan R. (2004). Detection of new malicious code using n-grams signatures. In: Second annual conference on privacy, security and trust. Fredericton, NB, Canada, pp 193–196
Arief B., Besnard D. (2003). Technical and human issues in computer-based systems security. Tech. Rep. CS-TR-790, School of Computing Science, University of Newcastle-upon-Tyme
Arnold W., Tesauro G. (2000). Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 international virus bulletin conference
Baker BS. (1992). A program for identifying duplicated code. Comput Sci Stat 24:49–57
Baker BS., Manber U. (1998). Deducing similarities in java sources from bytecodes. In: Proceedings of the USENIX annual technical conference (no 98)
Beszédes Á., Ferenc R., Gyimóthy T. (2003). Survey of code-size reduction methods. ACM Comput Surve 35:223–267
Bontchev V., Tocheva K. (2002). Macro and script virus polymorphism. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 406–438
Bontchev V. (2004). Anti-virus spamming and the virus-naming mess: Part 2. Virus Bull pp. 13–15
Erdélyi G., Carrera E. (2004). Digital genome mapping: advanced binary malware analysis. In: Proceedings of 15th virus bulletin international conference (VB 2004),Chicago, IL, pp. 187–197
Goldberg LA., Goldberg PW., Phillips CA., Sorkin GB. (1998). Constructing computer virus phylogenies. J Algorithms 26:188–208
Godfrey M., Tu Q. (2001) Growth, evolution, and structural change in open source software. In: Proceedings of the 4th international workshop on principles of software evolution, Vienna, Austria ACM Press, pp. 103–106
Gusfield D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK
Jordan M. (2002). Dealing with metamorphism. Virus Bulletin pp 4–6
Karypis G. (2003). CLUTO: A clustering toolkit, release 2.1.1, Tech. Rep.#02-017, Department of Computer Science, University of Minnesota,Minneapolis, MN 55455, November 2003
Kephart JO. (1994). A biologically inspired immune system for computers. In: Brooks RA., Maes P (eds), Artificial Life IV: Proceedings of the fourth international workshop on synthesis and simulation of living systems MIT Press, Cambridge, MA, pp 130–139
Kephart JO., Sorkin GB., Arnold WC., Chess DM., Tesauro GJ., White SR. (1995). Biologically inspired defenses against computer viruses. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI’95), Morgan Kaufman, Montreal, PQ, pp 985–996
Kephart JO., Arnold WC. (1994). Automatic extraction of computer virus signatures. In: Ford R (ed.) Proceedings of the 4th Virus Bulletin International Conference Virus Bulletin Ltd., Abingdon, England, pp. 179–194
Kolter JZ., Maloof MA. (2004). Learning to detect malicious executables in the wild. In: Kim W, Kohavi R, Gehrke J, DuMouchel W, (eds.), Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Seattle, WA, pp 470–478
Marko R. (2002). Heuristics: Retrospective and future. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 107–124
National Center for Biotechnology Information (2004) Just the facts: A basic introduction to the science underlying NCBI resources, http://www.ncbi.nlm.nih.gov/ About/primer/phylot .html, Last retrieved 20 March, 2005
Oberhumer MFXJ., Molnár L (2005) The Ultimate Packer for eXecutables – homepage. http://upx.sourceforge.net, Last retrieved 20 March, 2005
Raiu C (2002) A virus by any other name: Virus naming practices. Security focus, http://www.securityfocus.com/infocus/1587, Last accessed March 5, 2005
Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, Oakland, CA, IEEE Computer Society Press, pp 38–49
Ször P, Ferrie P (2001) Hunting for metamorphic. In: Proceedings of the 12th virus bulletin international conference pp 123–144
Tesauro G., Kephart JO., Sorkin GB. (1996). Neural networks for computer virus recognition. IEEE Expert 11(4):5–6
Tichy WF. (1984). The string-to-string correction problem with block moves. ACM Trans Comput Syst 2(4):309–321
VX heavens (2005) Available from vx.netlux.org (and mirrors), Last retrieved 5 March
Wehner S (2005) Analyzing worms using compression. http://homepages.cwi.nl/∼wehner/worms/, Last accessed March 5,2005
Zobel J., Moffat A. (1998). Exploring the similarity space. SIGIR Forum 32(1):18–34