The two driving forces in incident response

25 October 2022

Author: Erik de Jong, Global Lead for Incident Response Fox-IT

At Fox-IT, we’ve had an incident response practice for about 20 years. It started as a traditional forensics team dealing mainly with employee fraud and other insider threats and has morphed into a full-blown practice spanning the globe that deals with everything from business email compromise to Iranian spies. While the essence of our work has stayed the same over the years, much of the context has changed dramatically. What were these changes and how have we responded to them?

The essence of incident response

After all these years, incident response is essentially still:

1. An organization experiences pain. Usually unexpected, and sometimes excruciating. It involves insiders, spies, criminals, or generic unknown miscreants.
2. The organization calls a trusted third party to take the pain away – pronto. Oh, and by the way, they have many questions.

That’s it. Incident response still is that simple. Except of course that it isn’t. “Pronto” is doing a lot of work in that sentence. Under the hood, incident response is a complex machine and many of its parts have changed over time. As a response provider, we need to change also.

The two long term trends that we’ll delve into in this blog are that
(1) Incident handling has become more complex, and
(2) time pressure has increased (while the questions have largely remained the same).

1. Incident handling has become more complex

Incident handling has become more complex as a result of changes on the side of the attackers as well as changes on the side of the defenders.

On the attacker’s side, much more often than before, the cases that we handle are a matter of total ownage. That is, the threat actor has achieved the highest level of privilege in an infrastructure, and as a result they can do as they please. They can steal any data, stop any systems, eavesdrop on any conversation. You may be the one paying the bills, but you’re no longer in control. We used to get calls for single compromised web servers, but this hardly happens now.

A more extensive breach requires a more complex investigation. In the most direct way, it means having to investigate hundreds or thousands of systems as opposed to one or a handful. But not only that: the investigation usually needs to reconstruct a longer timeline with more events that are also more diverse. Rather than simply carrying out actions on a single system, an attacker that aims to take over a complete infrastructure will move laterally, collect a multitude of credentials, touch and explore multiple systems, may deploy additional tools for purposes of connection, persistence and exfiltration, and finally, may have more incentive to fly under the radar and cover their tracks.

On the defenders’ side, infrastructures have changed enormously. We come from a past where almost everyone had a 100% on premise infrastructure. For purposes of investigation, such an infrastructure is straightforward – there are no or hardly any third parties to consider. Now, most infrastructures that we investigate are hybrid: a combination of on premise and cloud. Interestingly, while many small businesses have moved to the cloud completely, many larger organizations (the bulk of our customers) have moved parts of their business to the cloud (email and conferencing are common) but also maintain sizable chunks on premise.

An incident responder’s job is now becoming especially challenging once you add it all up: a hybrid infrastructure that comprises a patch work of smaller networks of constituent organisations with varying levels of trust among them, victim of an attacker that has roamed around freely for days, weeks or months.

2. Time pressure has increased

The questions that a victim has after a succesful attack, have largely stayed the same over time:

What happened?
When did this happen?
What is the scope of compromise?
How did the attacker “get in”?
- Which information was accessed, modified, exfiltrated or otherwise affected?
- Which systems were accessed (and am I totally hosed?)
Who is behind the attack?
How do I clean this up asap and get back to business as usual?
Less pressing during the incident but most important for the longer term: what is the laundry list of improvements that I should make in order to reduce the chances of this happening again?

When all operations have halted or are about to be halted, not only is there an extreme urgency to make the pain go away, it is also crucial for the victim to understand exactly how their operations and their information was or will be affected. And for that, the other questions need to be answered.

There are also other reasons for the increased interest in answering those crucial questions. The two most important reasons are probably the increased prevalence of insurance and regulations with regards to data breaches. In case of insurance, there is a need to understand the nature and details of an attack because these will influence whether or not, or to what extent an incident is covered under the terms of the insurance.

The data breach regulation landscape is a worldwide patchwork. The European GDPR appears to be one of the strictest and most uniform in terms of geographical coverage and it has some effect outside of the EU as well. An organization that fell victim to an attack may need to perform notification or disclosure of the attack and its effects. Disclosure laws generally mandate a time frame (sometimes vaguely worded) for notification and disclosure. And because this is typically a few days, it puts pressure on the victim to find answers to questions.

Adding it all up

The crucial question for Fox-IT, as an incident response provider, continues to be: how can we scale our incident response service in such a way that we increase our speed while we maintain or even increase the accuracy of our investigation? Increasing our speed and accuracy sound like obligatory declarations that you could find on any bland strategy slide, but it is worth examining them just a little bit closer.

Increasing our speed is necessary because we have more work that needs to be done in less time. But increasing our speed is also desirable because it allows us to help our customers in a more effective manner and because it allows us to help more customers full stop.

Increasing our accuracy may sound strange at first. Aren’t we accurate already? What does increasing accuracy even mean? After all, if you’re not fully accurate, aren’t you simply…. “not accurate”? Well, no.

Incident response is crime scene investigation and firefighting

Incident response, while frequently and accurately portrayed as firefighting, is also crime scene investigation. To find answers to the victim’s questions, incident handlers will be gathering evidence and will be forming and trying to disprove hypotheses, in order to finally arrive at a story that’s as close to reality as possible. More complex attacks in more complex infrastructures means more, much much more, potential evidence to investigate.

You’ve got to make choices and accept that your story is not as close to reality as it could have been in theory. That it’s not 100% accurate. We won’t know everything, but the victim is in good enough shape to move on. With increasing complexity of attacks in infrastructures though, our accuracy will decline if we don’t actively work to improve ourselves and keep pushing the boundaries of what is generally accepted to be “reasonable” to investigate.

So increased speed and accuracy are crucial. And that makes sense: they are the flip sides of the long-term trends of increased complexity and time pressure.

Incident handling has become more complex & time pressure has increased

The essence of incident response

1. Incident handling has become more complex

2. Time pressure has increased

Adding it all up

Incident response is crime scene investigation and firefighting