Dissect overview

Read everything you need to know about Dissect in this blog post.

25 October 2022

Author: Erik Schamper, Senior Security Analyst Fox-IT

Dissect is a toolset which enables us to effortlessly perform incident response on thousands of systems and focus on providing better results to our clients. With Dissect, an analyst no longer has to be concerned about how to access investigation data and can instead focus more time on performing actual analysis. Exotic systems or environments have little impact on efficiency, as it is generally possible to easily and quickly adapt Dissect to work in these scenarios. It gives access to the full suite of forensic capabilities Fox-IT has developed over the years for any new platform we support, as well as allowing analysts to work with a familiar toolset. Crucial factors that increase effectiveness in incident response.

 

In two blog posts we will shed some light on our proprietary incident response technology, dubbed “Dissect”. Besides improving our processes and use of threat intelligence, we saw an opportunity to bring our technical incident response capabilities to the next level. With Dissect, we are no longer limited to a single data format or platform, and our analysts no longer need to be concerned about how to access investigation data. They can focus more on performing actual analyses.

How did we get there? Read on!

Incident response challenges

While all our engagements are covered by strict non-disclosure agreements, sometimes they do see the light of day, including the fact that we are involved. These are often our more challenging and complex engagements, such as the Belgacom and Diginotar cases and more recently the University of Maastricht. They have one thing in common: a large and complex infrastructure that needs careful examination for often complex indicators of compromise (IOCs), while the investigators (us) need to remain undetected by sophisticated threat actors. These cases, and of course many more, often require us to go beyond the capabilities of the analysis tools commonly used.

Over the years, we’ve encountered situations like these many times: technical limitations, process limitations and scaling limitations. Let’s look at each of those limitations first, and then discuss how we have overcome them.

Technical limitations

There’s no shortage of parsers and tools for most of the common Digital Forensics and Incident Response (DFIR) tasks. They all work fine most of the time, but sometimes you need to go beyond the capabilities of what is publicly available. Either because we’re investigating a particularly advanced adversary, or because we have specific automation, performance or scaling requirements. In general, the reasons for creating new tooling can be summarized to be one of:

• Missing functionality, either not implemented or abstracted away;
• Difficult or impossible to automate, for example GUI or “oneshot” applications and scripts;
• Poor performance.

For these reasons, we’ve had analysts writing their own implementations of parsers or tools in Python over the years. Often, we achieved better results with our own parsers and tools than those that were publicly available. We could more easily fix bugs, add features or improve the parser in other ways. Our own parsers were consistently faster, too.

Process limitations

In DFIR, there is no shortage of tools that do approximately the same thing. However, the usage and output format can often differ wildly. Or one tool may have been recently updated to include some additional artefacts, whereas another may not. Besides making it more difficult to collaborate, the choice of tools could also lead to one analyst having different findings than another. This is detrimental to the level of quality we strive for.

There’s also the larger analysis methodology at play. If you have a large set of tools and methods in use by your analysts, it’s very hard to streamline your analysis pipeline with automation or creation of new, overlapping tools.

Scaling limitations

Over the years, as we started doing more and larger engagements, we started to run into the previously mentioned limitations more often. Our pipeline at the time was optimized for in-depth analysis of a handful of hosts: turning all the evidence on a host inside out, usually taking several days. However, we started to run into scenarios where we wanted to analyze tens, hundreds or even thousands of hosts. Because of this, we wanted to be able to triage hosts. We wanted to get an initial impression of the state of a host within a few hours from acquisition. We couldn’t do it, so we had to look for alternatives.

Around 2016 we started to experiment with various endpoint solutions. They worked for a few engagements but were ultimately found to be too limiting in their capabilities when used on their own. Endpoint solutions work fine when all the hosts you’re investigating are turned on and running a supported operating system, but that’s not always the case.

Endpoint solutions are generally very good at “live monitoring” from the point of installation. Sometimes however, you want to be able to analyze a “snapshot in time” of a specific host. But since an endpoint agent is generally installed after the initial compromise happened, it’s crucial to have another ability to analyze historic traces. This was usually achieved using full disk images, especially back then. The idea of a small “forensic package” wasn’t that widespread yet. Nowadays, many endpoint solutions have a variation of this concept where, instead of a full disk image, only relevant files or artefacts are collected. This saves a lot of time and space in the acquisition process.

One other major issue that we started to run into is that analysis tools, scripts and parsers we had previously written were incompatible will all these different solutions. APIs or capabilities weren’t available, and someone would have to spend time getting to know the platform and its quirks to make a tool compatible. Not an ideal situation when we can only leverage some analysis capability in one situation (e.g. full disk image) and not in another (e.g. endpoint agent).

Over time, as our arsenal of own parsers grew, we learned that there were many benefits to writing our own analysis tools. We especially liked the flexibility it gave us – suddenly we had many more analysis capabilities because we were in complete control of the entire analysis chain.
We wanted all our analysts to be able to enjoy the flexibility of these tools.

In short: we needed the proverbial one tool to rule them all.

 

Starting the development of dissect

It all began with a bunch of Python scripts in someone’s home directory. We started using the name “Dissect” as the namespace when we started storing them more centrally. As we were writing more parsers, the whole toolset collectively started to be known as “Dissect”. However, each project was still very much a separate parser, and the tools using those parsers were still tailored to specific investigations.

It wasn’t until late 2017 that a traineeship project was initiated to change this. The way it was described at the time was “The Sleuth Kit but for all investigation data”. Could we develop an abstraction layer or framework, like The Sleuth Kit provides for filesystems, but for almost everything on a host? It wouldn’t matter what the source of the data was, because we would develop abstraction layers to universally access this information across any source data.

The principles of Dissect

We already had a catalog of capable and flexible parsers, now we needed to work on creating an overarching framework that could fully utilize all these capabilities. Important considerations were that we should try to solve as much of our encountered issues as possible. One of the main goals was that the framework should be scalable to many hosts, not just a handful. It should also be flexible and easy to expand with new or different implementations of various parsers. If we encounter a previously unsupported disk format or filesystem, we want to be able to easily add support for that disk format or filesystem, without having to make many changes to the rest of the framework.

If you think about it, there are a lot of components that most investigation material has in common:

  • Disks/containers: e.g. raw disk, EWF evidence files or virtual disks
  • Partitions/volumes
  • Filesystems: or anything that can be interpreted as one, e.g. a ZIP file or a directory full of files
  • Operating system: hostname, network interfaces, users
  • Application data

That’s a lot of opportunities for abstraction! These components can also be nicely layered on top of each other. A partition or volume is generally part of a disk, and usually contains a filesystem. But there’s generally no hard relation from a filesystem back to a volume or disk container. So, for example, we can safely swap a raw disk with an EWF file or a virtual disk, and it wouldn’t make a difference for the volume or filesystem layer. We can also leave layers out completely. For example, a ZIP file can be interpreted as a standalone filesystem and doesn’t need an underlying partition or disk.

When you start to look even further, and think about the steps you must take before you’re even able to start your analysis on a host, you also realize that it consists of a lot of repeated, boring and time-consuming steps:

1. Mount or open the container file with some third-party tool, if applicable; e.g. raw disks, EWF evidence files or virtual disks
2. Mount or open the filesystems with some other third-party tool;
3. Extract artefact files from the filesystem;
4. Run a collection of different third-party tools on the extracted artefact files: get a range of different output formats, e.g. JSON, csv or plain text

Could we also improve this workflow, and make it scalable at the same time?

Designing Dissect

The final important consideration is that all these ideas must result in a concise and simple to use API. One goal we set out to achieve from the start is that programming with this framework should be as easy as programming with the native standard library. Anyone with Python experience should be able to get started.

We took all of these considerations and worked on an initial version of what would internally become known as “dissect.target”. At its core, it consists of several abstraction layers:

  • Containers;
  • Volumes;
  • Filesystems;
  • Operating systems; operating system specific abstraction layers, such as a registry layer for Windows
  • Analysis plugins.

As mentioned earlier, each of these abstraction layers is incredibly flexible and can operate completely independent from the others. For example, we can create a tool that solely uses the container abstraction layer, or that solely uses the filesystem abstraction layer. This also means we can swap each layer for any other compatible layer, easily replacing third-party implementations with our own or introducing completely new ones. As long as the specific APIs required for that layer are implemented, it can be swapped in and out effortlessly.

On top of the basic abstraction layers are the operating system implementations. These are responsible for making sure an operating system is properly loaded. This includes things like virtually mounting filesystems to the correct drive letter, or virtually unpacking an appliance operating system to its “live” state (e.g. ESXi or network appliance disk images). The operating system layer is also responsible for parsing some basic common information, such as the hostname, operating system version, network interfaces and users.

Finally, we have our analysis plugins. These can include OS-specific plugins, like Windows event logs or Linux bash history, or more generic plugins, like browser history or filesystem timelining. An important detail is that, by default, we only target the “known locations” of artefacts. That means that we don’t try to parse every file on a disk, but instead only look for data in the known or configured locations.

All these layers combined allow us to interact with “targets” in any way we want, where a “target” is any type of source data from which we can reconstruct a state of a system. This includes full disk images, a bunch of copied registry hive files from a Windows host or a remote filesystem and registry API provided by an endpoint solution. 

The benefits

The most important benefit of this layered approach is that it doesn’t have any impact on our analysis plugins. We want a “write once, run anywhere” approach to our analysis capabilities. A parser for, say, Windows run keys, doesn’t need to concern itself with how it gets access to “a registry”. The details on how to do that are abstracted away by the filesystem layer. Analysis tools written on top of these parsers enjoy this benefit, too.

The result: our tools are no longer limited to a single data format or platform, but instead are data format agnostic.

Author: Erik Schamper, Senior Security Analyst Fox-IT