Growing up I read every book my library had to offer by Jules Verne and Isaac Asimov. These and many other similarly minded authors inspired me to think far beyond "what is" into "what might be". I was so excited about the ideas proposed by some of these early science fiction writers that I seriously considered post-graduate study in Artificial Intelligence (A.I.) before ultimately accepting a job as a Pentester. Yeah, I sold out early - so what. However, those initial seeds have stuck with me through my life and have continued their influence into my eventual career in computing and security. Imagination and idea creation are still invaluable tools that I credit directly to my world being expanded while reading under the covers with a flashlight as a boy.
"That's right. When I was your age, television was called books."
--The Princess Bride (1987)
Now, ████ years later as a Security Researcher with Trustwave SpiderLabs I get to spend a percentage of my time working on pet1 research projects. I am in the opportune position to be able to come full circle and attempt to seriously explore some of those exciting buds of ideas I once only day dreamed about while flipping through the pages of whichever Philip K Dick novel I had just discovered.
The purpose of this post is actually several fold. The first is to publicly announce the project I am about to work on because I am excited about it, and because by doing so I have taken the first step in being accountable for its progress. Secondly, by self-admission, I am no expert in this advanced field but I hope to document my progress here on this blog, both the happy successes AND the inevitable failures. Lastly, by posting here with my progress it is my hope that a discussion will be created with others out there who are also interested in this topic. To share ideas, and to brainstorm new approaches in such a way that the whole community can benefit from them.
So that all sounds pretty good right, but what am I talking about? What is the project actually about already?! Ok, you've read this far, so let me explain... no, there is too much, let me summarize here the best I can.
High Level Goals:
- To categorize, define, and otherwise classify potential malware based on the data and meta-data collected from existing automated forms of dynamic and static analysis tools.
- To apply the results to gigabytes of malware that we are currently processing daily in such a way to give us a clearer and deeper insights into the malware itself.
- To collaborate and breed in-depth technical discussions on these common problem sets that we as a security community share as a whole.
- Meaningfully parse the output from tooled sandbox and static analysis tool execution
- Create a categorization scheme from scratch, leverage existing, or some hybrid
- Study, develop and apply ML/NLP algorithms to categorize malware
- Review and refine processes and algorithms
- Rinse and Repeat
Make sense? I want to use some form of AI/ML/NLP to help make sense of the crazy amounts of malware report output we produce everyday. I know this is not a new idea by any stretch of my previously mentioned overactive imagination. I was lucky enough to attend REcon last year in Montreal. While attending, there were more than a couple of talks that discussed similar goals, whether it was through automatically analyzing call graphs of disassembled binaries, or just talking about applying machine learning at an abstract level in the computer security world. I left the conference very excited and my head was swimming with ideas. So I know some very smart people are already looking into these very sorts of issues. It is clear that my first goal will be to tackle the large amount of reading and discussions it will take to catch up on the current state of work in this field. Maybe someone is already working on this EXACT sort of thing, and that could take me 95% of the way to achieving my goals (please, oh please!), or maybe there is an existing Open Source project out there that is established and well on its way? Maybe there are some obvious tools and/or resources that I have not listed here. What are they? Leave a comment below or hit me up on Twitter. Whatever the answers are, I am looking forward to this upcoming research quite a bit and I hope a few of you might be interested as well.
The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...' -Isaac Asimov
Anyone come across anything "that's funny" lately?
1 Obvious caveat statement about the "pet project" work being relevant to our goals goes here.