Throughout Trustwave SpiderLabs’ many forensic investigations, we often stumble upon malicious samples that have been ‘packed’. This technique/concept can be unfamiliar to the aspiring malware reverser or digital forensic investigator, so I thought it would be fun to use this opportunity to talk about portable executable (PE) packers at a high level. If you already know what PE packers are and how they work, you’re more than welcome to continue reading, however it’s certainly possible you may not learn something new. Think of this as a 101 blog post.
So what are PE packers? How do they work? How can you defeat them? I’m going to do my best to answer these questions.
In essence, packers are tools that are used to compress a PE file. This primarily allows the person running the tool to reduce the size of the file. As an added benefit, since the file is compressed, it will also typically thwart many reverse engineers from analyzing the code statically (without running it). Packers were originally created for legitimate purposes, to decrease the overall file size. Later on, malware authors realized the benefits previously mentioned and began to utilize them as well. Of course, just compressing a PE file won’t really work on its own. If you try to run it, the file will complain, and fail, and you’ll end up having a horrible day.
In order to address this, packers will wrap this compressed data inside a working PE file structure. When run, the packed file will decompress the legitimate PE file data into memory, and execute it.
Before we get into the specifics of how this takes place at a technical level, it’s important to understand some of the fundamentals of how a PE file is structured.
PE File Structure
Whether you know it or not, it’s a near certainty that you’ve encountered PE files before. If you’ve used Microsoft Windows, you’ve used PE files. It’s a file format developed by Microsoft that is used on a large number of file types. The most common ones that you’re likely to recognize are .exe and .dll files (executables and dynamic-linked libraries respectively). PE files are made of two parts—The header and sections. The header contains details about the PE file itself, while the sections contain the contents of the executable.
The header is broken up into a number of different parts—the DOS header, the PE header, optional headers, and the sections table. There can be any number of sections within a PE file, and they can be named anything the author would like. However, there must almost always be sections containing the actual code of the file, the import table, and the data used by the code. It ends up looking something like this:
The optional header in particular is of special importance when we’re talking about (un)packing. Not to diminish the DOS or PE header importance, but they simply don’t offer anything we’re interested in for the scope of this blog post. All you really need to know about those headers is that they simply contain information about the PE file structure itself.
This header can contain a wealth of information, such as the size of code on disk, the checksum of the file, the size of the stack and heap, etc. It also contains the address of the entry point, or where code execution will begin. This becomes important later as the packer modifies this value, and it will fall on us to find the location the original entry point, or OEP, of the packed file. Once this is identified, we can dump the memory of the executable, reconstruct the imports (mentioned more in a second), and voilà, we’re good to go.
Data directories are contained within the optional header. Data directories contain tables that contain information about resources, imports, exports, debug information, and local thread storage to name a few. If you’re looking for more in-depth information on the data directories, I highly recommend you take a look at this great MSDN article on the topic. The import information in particular is of special interest to us. Before we jump into it, let’s talk about imports for a second—
So to explain imports, let’s think of a hypothetical situation. We have five executables that all share some piece of code. Now, it doesn’t really make a ton of sense for all of these executables to store this code by themselves. Instead, it makes a lot more sense to break out this code into a separate library and simply have each executable load this library at runtime. This is essentially what the import table is—A list of libraries and their associated functions that an executable wishes to load at runtime. This table of functions and libraries is replaced in a packed file, and is generated when executed. This means that if we wish to unpack a binary that has been packed, we’ll have to reconstruct this information in order for the unpacked PE file to be valid and to work as expected. Once this header is parsed, execution flow moves onto the Sections table.
The Sections Table outlines all sections that are present within the PE file. This includes the name of the section, the location of the section, size of the section both in the file and in virtual memory, as well as any flags associated with that section. Sections make up the bulk of the PE file itself, so it is important to have this table of information on hand.
As mentioned earlier, PE sections typically at the very least contain a section for code, a section for data, and a section for imports. The import section contains the actual addresses for all functions needed by the PE file. These addresses are populated at runtime, as every Windows system may be different. As such, it’s possible that a function may not be located at the same memory location between Windows versions. By populating the import table with these addresses at runtime, it allows us to use this PE file on multiple Microsoft Windows machines.
Digging Into An Example
Now that we’ve got a decent grasp of the PE file structure, let’s use what we’ve learned to manually unpack an actual file. For this example I’m going to use calc.exe using the packer MPRESS. MPRESS is a popular packer that has been around for a while now. It supports a large number of file types and works on all current versions of Microsoft Windows. So before we pack our file, let’s take a quick look at what calc.exe looks like in its original state.
Using one of my favorite tools, CFFExplorer, we can easily view various pieces of information about the PE file structure, including, but not limited to various headers, section information, import information, and any embedded resource entries. I’ve specifically shown the Optional Header and the various sections contained in calc.exe in the screenshots below.
Now let’s pack the sample with MPRESS and see how the file has changed. As you can see in the following screenshots, MPRESS has modified the sections present in the PE file. The “.text” and “.data” sections have been replaced with “.MPRESS1” and “.MPRESS2”. The “.MPRESS1” section contains the compressed data of the original calc.exe file, and the “.MPRESS2” section contains a number of functions used to decompress this data and reconstruct the import table.
You might also notice in the above screenshot that the entry point of code execution (AddressOfEntryPoint) has changed. The first step in unpacking this sample manually is to identify the original entry point (OEP). Let’s start unpacking this sample dynamically.
For this example, I’m going to switch between IDA Pro and OllyDbg. You’re welcome to use whatever tools you wish, but my personal style just leads me to switching between these tools often, so I’m going to use them both here. I find that IDA does a better job of visualizing what is happening, but OllyDbg tends to have more features and better results in a dynamic analysis environment.
One of the first things we see while debugging the sample is a call to this rather complicated function. This function is in actuality decompressing the compressed data found in ‘.MPRESS1’. It uses an interesting technique called ‘in-place decompression’ to accomplish this. That is to say, MPRESS is able to decompress the data without creating a new section of memory and dumping the decompressed data to it. Instead, it simple overwrites the compressed data with the decompressed data.
Once decompression completes, we then see code execution pass to a series of loops, which reconstruct the import table. This can be seen below where I demonstrate a before and after of the unpacking process:
Eventually, we hit the original entry point of the (at this point) unpacked calc.exe. It is here we want to dump the process’ memory to disk. For this task, I prefer a plugin for OllyDbg called OllyDump. Not only will this allow you to dump the process, but it also has the ability to, among other things, rebuild the import table (remember the packing process destroys the original).
Like many things in this industry, there are many ways to approach this. If you’re not a big fan of OllyDbg, a nice alternative to the plugin I just mentioned is a tool called ChimpRec. The tool’s main page has a full list of features, but essentially it allows you do dump a currently running process and rebuild its imports. I’m sure there are many others out there that will accomplish the same thing. These two utilities are simply my personal favorites.
And at this point that’s really all there is to it for unpacking a basic packed binary. I haven’t touched on a number of more advanced functionality that is present in other packers. Namely, the ability to perform anti-reversing techniques. It is my hope to discuss some of these techniques in future blog posts.