Co-authored by Yihua Liao, Ari Azarafrooz, and Yi Zhang
Ransomware attacks are on the rise. Many organizations have fallen victim to ransomware attacks. While there are different forms of ransomware, it typically involves the attacker breaching an organization’s network, encrypting a large amount of the organization’s files, which usually contain sensitive information, exfiltrating the encrypted files, and demanding a ransom. Therefore, a sudden increase of encrypted data movement in the corporate network traffic can be a strong indication of ransomware infection. To effectively detect such behavior patterns, at Netskope, we have developed the capability to detect encrypted files using machine learning (ML) and generate encrypted data movement alerts as part of Advanced UEBA (user and entity behavior analytics). This has helped our customers to identify ransomware attacks as they unfold in their network. One example is to detect ransomware on unmanaged devices. In this blog post, we will explain the technology behind encrypted file detection and Advanced UEBA, which is part of a pending patent application.
ML-based encrypted file detection
The sequence of bytes in an encrypted file tends to be more random than unencrypted files, which is often manifested in some statistical measures of randomness and information density in the file. Therefore, these statistical tests can be helpful in determining whether a file is encrypted or not. We have explored various statistical tests, including:
- Chi-square Test
- Entropy
- Arithmetic Mean
- Monte Carlo Value for Pi
- Serial Correlation Coefficient
However, our analysis shows that using any of these statistical tests alone is not sufficient to identify encrypted files and can generate excessive false positives. For example, some compressed files also