Improving detection of malicious files with an innovative approach based on Deep Learning

4 min

Hibou is a malware detection module powered by deep learning. It works on Windows executable files (PE files) and gives, for each sample, a “score of potential maliciousness”. This state-of-the-art deep learning method to detect malicious files is now embedded in HarfangLab’s EDR.

How Deep Learning improves malware detection

Deep Learning and classical rules: a complementary approach

Deep learning approaches are complementary to other techniques, such as other machine learning methods and the more classical rule and signature-based approaches. Where the latter techniques run an in-depth analysis of specific features of the files, deep learning solutions survey the general form of the complete binary file to make a prediction. Deep learning doesn’t rely on features but self-computes them implicitly instead. This gives deep learning a different perspective on the executable files that other techniques lack and makes it relevant in completing the predictions of other techniques.

Additionally, since deep learning methods produce predictions only from the general structure of files, they benefit from a capacity for generalization, allowing them to detect threats that have never been seen before.

Therefore, in HarfangLab’s EDR, deep learning methods enrich the predictions of rule and signature-based approaches and of other Machine Learning-based approaches.

How Hibou improves HarfangLab’s EDR detection capacity

As part of HarfangLab’s AI binaries module (HL-AI-Binaries) that operates file analysis and malware detection, Hibou runs directly on the endpoints protected by the EDR to identify and block threats. Malicious files are spotted quickly (under 200 ms) and blocked immediately, before execution.

Besides its speed, one of Hibou’s advantages is to work completely offline with low memory consumption. Alerts caught by Hibou are centralized in HL-AI-Binaries module and aggregated across endpoints for further analysis.

Key figures about Hibou

We ran an evaluation of HL-AI-Binaries’ detection performances with/without Hibou. This evaluation is run on 10,000 malwares and 10,000 goodwares. These malwares are a subset of HarfangLab’s dataset of several million files and are the malwares identified as most critical in this dataset.

We first noticed that Hibou spots malwares HL-AI-Binaries do not detect without Three times fewer malwares are undetected by the combination of HL-AI-Binaries and Hibou on the 10,000 malwares, dropping the number of undetected malwares from 77 to 25 (which corresponds to 0.0077% to 0.0025% undetected malwares).

We also observe on the number of false positives that there is no bar related to the association of HL-AI-Binaries and Hibou: this simply means that a critical malware detection made by both HL-AI-Binaries and Hibou reveals to always be a true malware and never a false alert!

Here is a closer look at the type of malware detected by HL-AI-Binaries with/without Hibou:

The range of types of malware detected is improved with Hibou, in particular for miner and backdoor type malware. It is important to note the data is plotted in percentage and the bars are ordered by the number of malware of the bar type. Most of the malware types in the subset analyzed are ransomwares and a few are miners.

How Hibou works

Now that you are convinced of Hibou’s relevancy for malware detection in HarfangLab’s EDR, let’s introduce more details on the method used by Hibou.

With Convolutional Neural Networks (CNN), the computer vision field has had significant breakthroughs in recent years in the processing and classification of images. Hibou leverages this progress in the cybersecurity field by analyzing executable files exactly as if they were images (as in McAfee article).

To do so, Hibou first converts the file into an image by directly considering the raw bytes of the file and casting them into an image-like representation of the file. This representation is then given as input to a Convolutional Neural Network (CNN), designed to discriminate images according to their malicious or legitimate nature.

Hibou’s architecture was designed using the core mechanism of one of the most famous CNN architecture, Xception, designed by François Chollet, a French researcher. We modified and optimized this architecture specifically for the malware detection task. Furthermore, the chosen architecture underwent training and evaluation on a dataset of several million files owned by HarfangLab.

With this neural network architecture, Hibou and the image-based deep learning approach have proven effective in real-world scenarios, and represent an efficient additional detection layer.

Whats next? Further research on Hibou is considered, such as improving the image representation. How?By optimizing the casting of a one-dimensional variable sequence (bytes) to a fixed two-dimensional matrix (image), or even designing one-dimensional convolutional neural networks.