Key metrics to evaluate EDR performance

Want to assess the performance of an EDR, but what to measure is not totally clear to you? Here are some key metrics to evaluate how relevant a tool is to detect and respond security threats, and what needs they meet.
4 min

First thing first, you will need to consider all or any of these metrics depending on your organzation priorities, requirements and context. You can challenge these metrics with the supplier, or with your vendor if the solution is to be managed by a MSSP. 

How an EDR will facilitate the work of experts, without overloading them with alerts, but providing them with relevant data, is something you can test during the POC phase with the publisher’s teams or your vendor. 

The sensitivity of the EDR also comes into play: it can be configured to be more or less sensitive, with more or less drastic rules depending on the type of threat to be protected against. So, to fully meet your expectations, an EDR always needs to be set up properly according to your needs – by in-house teams or a partner.

False Positive Rate 

What’s the percentage of alerts on non-malicious events or indicators? And what does the solution do to minimize it, how does the whitelisting features work, and is it possible to add rules? 
This metric depends a lot on your context. False positive can be induced by the detection rules, and it can be optimized thanks to whitelists (in case you rely on a MSSP, they can manage this over the early deployment phase). Also keep in mind that an alert may sound without there being a security problem, but by correlating it with other events, it can provide information about a real problem. 

False Negative Rate

Does the solution potentially miss some threats, and what kind? What does the supplier do to improve the solution? 
It may sound obvious, but this is a metric you will only be able to measure in real conditions, and the POC stage is made to do it. 

 Mean Time to Detect (MTTD)

How long does it take the EDR to detect a security event? Does it detect via a sandbox or in the Cloud, or directly in the agent closest to the threat?   
Faster is better to allow optimal reactivity as timing is curcial for responding an incident – knowing that in some cases there may be a latency due to the mode of detection, whether via a sandbox, or even using a correlation engine. 

Mean Time to Investigate (MTTI)  

How long will it take for the analysts (in-house or vendors) to investigate after an alert, including information gathering, data analysis, criticality assessment…? How open the solution is to interfacing with other monitoring and incident management solutions (ITSM, SIEM, etc.)? 
This is a metric you can simulate or approximate on the basis of the data available and the time you will need to process them, depending on your resources or the performance of your in-house SOC; if a partner operates the solution, it can be contractually defined. 

Mean Time to Respond/Remediate (MTTR)  

How long does it take for the EDR to respond, or contain and eradicate the threat? What can be automated? How does it help to optimize the recovery process? What is the data retention period? If a MSSP operates the solution, what is the SLA mentioned in the contract?  

Service Availability Up Time

How resilient is the solution? What is planned in the contract in terms of stack availability? How can the solution and detection engines keep running even when the network is down, when agents can no longer communicate with the manager?  
Note that the more operations are carried out in the agent, the better the availability in all circumstances. 

And beyond figures, what are the criteria to evaluate, to ensure that
that an EDR will enable you to achieve goals of your cyber strategy?