In the business of security, linking performance metrics to strategy has become an accepted best practice. If strategy is the blueprint for building a security operations center (SOC), metrics are the raw materials. But there is a catch: a security organization can easily lose sight of its strategy and instead focus strictly on the metrics that are meant to represent it.
A recent SANS survey showed that 77% of security operations centers indicated that they provide metrics to gauge status and effectiveness of SOC capabilities. That represents a 50% increase in SOC metrics programs over the past five years. However, 33% of survey respondents indicated dissatisfaction with their metrics.
Why are some metrics good and others not so good? All metrics are inherently imperfect at some level. In security, as in business, the intent behind your metrics is usually to capture some underlying intangible goal—and they almost always fail to do this as well as you hoped. Performance management systems are full of metrics that are flawed proxies for what you care about. Clearly, this soon becomes a problem as a result of the many ways to boost scores while actually displeasing your stakeholders. Tying financial incentives to a metric is usually a mistake: often it only increases the focus on the numbers.
Though it’s easy to fall into these metrics traps, security organizations can take steps to avoid them. For instance, involving the people who’ll implement a strategy in its formulation means they’ll be more likely to grasp it and less likely to replace it with a metric. Using multiple yardsticks is also a helpful approach, in that it highlights the fact that no single metric captures the strategy.
Effective metrics programs leverage data that you already have access to. The mechanics of measurement are the easy part. You should also require appropriately placed expectations that are tied to security strategy. This is the hard part. There should be a quality control mechanism to guard against time-based mental anchors of Green/Yellow/Red. Metrics shouldn’t always be service-level objectives (SLOs).
Let’s take a closer look at how to effectively measure and report metrics for a typical SOC.
Data feed health
As you monitor your data, assets, and users with instrumentation, you first need to know how well they are working. The first measure to take is which monitoring points are down. However, just because it’s up doesn’t mean all is well. There could be delays in receipt, drops, or other temporary or permanent anomalies. Measure them and review them regularly.
Coverage
For your coverage measurements, tracking the absolute number and percentage of coverage per compute environment/enclave/domain is a worthwhile place to start. As you get more granular with insights into network, OS, applications, additional insights arise that show you what may be working/not working in your approach. Tracking your alert and detection coverage to the ATT&CK Framework is an ideal way to inform holistic viewpoints as to whether your current approach will guard against the various tactics leveraged by threat actors to attack you.
Coverage is always a moving target. There will always be more stones to turn over. There is always another environment to cover or a customer to serve. Don’t shoot for 100% because there is no spike-the-football moment with coverage. Instead, focus on the percentage of systems “managed”—this means assets are inventoried, tied to a user and/or business unit, configurations are checked, and risk is assessed. In doing so, your SOC knows what they are monitoring and can more clearly identify the rogue entities in the environment.
Scanning & sweeping
At the basic level, you are probably scanning on-premise & cloud assets for vulnerabilities. You should measure the number and percentage of known bugs, as well as the amount of time it took to compile vulnerability and risk status during your last critical headline CVE fire drill. As you progress, start to measure the time it takes to sweep and compile results for a given vulnerability or indicator of compromise (IOC), and across workstations versus servers. Break it down further to insights specific to a given domain or identity plane. Then zero in on everything internet-facing.
You should eventually have an accurate number and percentage of assets you can’t/don’t cover and be able to answer the questions, “How fruitful is our scanning?” and “How effective is our patching?”
Analytics & analyst performance
Next, you need insights into how well the instrumentation is working for us—or better yet, how well are you using it? It is appropriate here to tie our efforts to MITRE ATT&CK. Be thorough in your coverage, documentation, and standards of output. All the triage effort in the world is useless if something is missed, or worse—found but n