HL-AI Binaries depending on version

Description

Ashley Binaries is a malware detection module powered by machine learning and deep learning. It works on executable files (PE files for windows and ELF for linux) and gives, for each sample, a “score of potential maliciousness”.

This state-of-the-art module to detect malicious files is embedded in HarfangLab’s EDR. It is present in the agents and runs directly on the local machine and can block threats identified. We have evaluated the performance evolution this past year of Ashley Binaries across the different versions of HarfangLab’s agent. Here are the results.

Performance evolution

Currently, Ashley Binaries can be used with two different alert levels, ‘Critical‘ and ‘High’. The ‘Critical’ level only raises alerts on files that are extremely likely to be malwares and has very few false alerts. The ‘High’ level raises alerts on all files that may be malwares and hence has more false alerts than the ‘Critical’ level.

Here are the performances of Ashley Binaries on Windows and Linux with each of these levels.

On Windows

All tests were carried out on 14,000 “dangerous” malicious files and 6,500 “benign” files never before seen by Ashley Binaries.

Critical Level

The following graph displays the performance improvements done on the Ashley Binaries for windows executable files. On the horizontal axis, the version is the version of the agent used. In the first and second graphs, the vertical axis is, respectfully, the percentage of false positives and the percentage of false negatives of the agent.

Percentage of false negative at critical level on different versions of the windows agent

It clearly appears that HL-AI-Binaries has always had good performances in the detection of dangerous windows malwares (around 200 undetected on the 14 000 on average) and this level of detection was maintained while drastically decreasing the number of false alerts: 35 times fewer false alerts in a single year!

High level

What about at the level high? In HarfangLab’s EDR, one can set the alert level of Ashley Binaries to high. This level is meant to trigger alerts on files harder to gather more weak signals but will increase the rate of false alerts.

Percentage of false negative at high level on different versions of the windows agent

As expected, the graphs show that the false positive ratio is higher across the different versions and that the ratio of false negative is lower. Here as well, during this past year, the detection level remained almost similar while the number of false alerts plummeted.

Starting from version 2.18, we are trying a more polarized approach which creates the gap visible with version 2.17 on both graphs.

On Linux

All tests were done on 450 malware files and 6 250 benign files that have never been seen by Ashley Binaries.

Here are the false positive results on Linux executables:

Percentage of false positive at the critical level on different versions of the linux agent

All the versions of Ashley Binaries correctly detect all the malware files at the critical level, hence we won’t plot the evolution of false negatives nor study the high level on linux. This his mainly due to the fact that the malware landscape in Linux environments is way smaller than in Windows environments.

Here again, the number of false alerts decreased significantly this past year dropping by a factor of 41. In 2.10, Ashley Binaries had 41 false alerts and only has a single one in 2.18!

Detection performance on recent malware families

To investigate further, we analyzed the results of Ashley Binaries on recent malware families such as DarkSide or Hades malwares. A total of 240 malwares in 13 families were selected to run this evaluation.

Detection percentage of different windows malware families by Ashley Binaries at the critical level

Detection percentage of different windows malware families by Ashley Binaries at the high level

We can see that, at high level, all the malwares are detected by Ashley Binaries in version 2.18!

The performances of these new malwares are also very good for older models, even the ones design over a year ago. The latest models are relevant on these malwares as well and outperform slightly the older models.

Improvements made

So how did we obtain such staggering improvements?

We worked on three main subjects to improve Ashley Binaries: increase the quality of our data, improve the feature engineering of our machine learning models and deploy a new Deep Learning (DL) model. Here is a timeline of these enhancements in the different versions of the agent.

Enhancements made in the different versions of Ashley Binaries

Data qualification

As goes the famous quote, “garbage in, garbage out”. One cannot expect good performances out of the models while providing them with poorly qualified data or with no data at all.

Always more data

The first obvious step is to acquire more data which we have been continually doing in the past years. HarfangLab has now constituted a proprietary dataset of over 50 million files that can be easily used for many different applications (and in particular for malware detection). Every month, millions of files are added to this dataset. All this new data entices to regularly retrain the models to boost their performances on the latest threats.

More information on the data

The second step is to improve the information available on the files and to refine the identification of the files used by the models to raise their performances.

For this, we improved the labeling of the executable files as malwares or goodwares in version 2.14. Identifying a malware as such is actually not such an easy task. There are many different way to do it (manual analysis, antivirus software, detonating …) and we focused on improving our criteria for malware condition assignement.

Feature engineering

On Windows

The first versions of our Windows model were using around 160 features. In 2.13 version, we implemented 80 new features, covering new aspects of the PE file structure (for example, version metadata fields, statistics on the resources, …). This addition of features allowed us to halve the false positive rate.

On Linux

On Linux, the model was using 83 features until version 2.12. In the same process as for Windows, we added 35 new features in collaboration with our threat intelligence team. The results were even better, with a 5-fold reduction in false positives, while maintaining the same recall rate.

Deep Learning

A major work this past year was done on the conception, implementation, and deployment of a new state-of-the-art deep learning algorithm. This algorithm is released in Ashley Binaries 2.18.

This model complements the Machine Learning models already present in Ashley Binaries. It analyzes the overall structure of executable files, while the other Ashley Binaries modules scrutinize the content of these files through the prism of previously defined features. This Deep Learning model gives Ashley Binaries a completely new perspective on executable files, and enables a considerable improvement in performance in version 2.18.

Conclusion

In the past year, we have worked on all of the aspects of AI to enhance our malware detection: we refined our datasets, continuously built on the models we had and added new state-of-the-art modelization modules. This work led to a drastic diminution of our false positives without diminishing our detection capabilities.

Written by