This is the final post in a series about the broken Internet. In the first, we looked at SIEM. Last week, we explored the value of NetFlow analysis. This week, we close with an overview of incident response.
When evaluating risk, I like to use as reference the following formula:
Probability of occurrence, broken into threats x vulnerabilities, helps us determine how likely it is that a specific threat might reach our information resources. Business impact is a measure of the negative affects if a threat is able to exploit a vulnerability. The product of Probability of Occurrence and Business Impact is mitigated by the reasonable and appropriate use of administrative, technical, and physical controls. One such control is a documented and practiced incident response plan.
The purpose of incident response is to mitigate business impact when we detect an exploited vulnerability. The steps in this process are shown in the following graphic. Following the detection of an incident (using SIEM, NetFlow, or some other monitoring control), the first step is to contain it before it can spread or cause more business impact. Containment is easier in a segmented network; segments under attack are quickly segregated from the rest of the network and isolated from external attackers.
Following containment, the nature of the attack is assessed. Failing to follow this step can result in incorrectly identifying the threat, the threat agent, the attack vector, or the target. Missing any of these can make the following steps less effective.
Once we understand the who, what, when, where, how, and why of an attack, we can eradicate it. Eradication often takes the form of applying a patch, running updated anti-malware, or system or network reconfiguration. When we’re certain the threat agent is neutralized, we recover all business processes.
Business process restoration requires a documented and up-to-date business continuity/disaster recovery plan. Some incidents might require server rebuilds. Business impact increases as a factor of the time required to restore business operation. Without the right documentation, the restoration time can easily exceed the maximum tolerable downtime: the time a process can be down without causing irreparable harm to the business.
Finally, we perform root cause analysis. This involves two assessments. One determines what was supposed to happen during incident response, what actually happened, and how can we improve. The second assessment targets the attack itself. We must understand what broken control or process allowed the threat agent to get as far as it did into our network. Both assessments result in an action plan for remediation and improvement.
The Internet is broken. We must assume that one or more devices on our network is compromised. Can you detect anomalous behavior and effectively react to it when the inevitable attack happens?