Data Science AI and Machine Learning at Netskope

Feb 18 2020
AI and Machine Learning at Netskope

This is the first in a series of articles focused on AI/ML.  

The past few years have witnessed rapid developments in artificial intelligence (AI) and machine learning (ML). Thanks to the breakthroughs in deep learning, such as convolutional neural networks (CNN) for image recognition and transformers for natural language processing (NLP), AI/ML is now used to solve many real-world problems with great accuracy across different industries, including cybersecurity. AI/ML models have the potential to detect unknown threats and anomalous behavioral patterns, which makes them an indispensable part of any comprehensive, multi-layered cybersecurity solution. 

A leader in cloud security, Netskope is integrating the latest AI/ML technology into its data and threat protection features, as well as business operations. At Netskope, we have a team of dedicated data scientists, security researchers, and engineers who have track records of solving security and fraud problems in different domains with over 100 patents. Leveraging our expertise in AI/ML and security, we are developing large-scale AI/ML solutions for cloud security. In this blog post, we will give you an overview of Netskope’s data, the types of problems we are trying to solve with AI/ML, and some of the technical challenges our team faces in addressing these problems. 

Netskope’s Data Advantage

Data powers AI and machine learning solutions. Netskope’s data advantage lies in the breadth and depth of corporate user traffic that we protect. Every day the Netskope Security Cloud processes billions of events and files, capturing a wide variety of user activities in:

  • SaaS applications, such as Microsoft Office 365, Box, Salesforce, G Suite, etc.
  • Public cloud infrastruc ture services, such as AWS, Microsoft Azure, and Google Cloud Platform
  • Websites that users have visited

Netskope also has a comprehensive understanding of all enterprise data stored and transacted in the cloud. For example, Netskope has intimate knowledge of how a file was uploaded, downloaded, or shared within a managed cloud storage app, and transferred to unmanaged cloud apps and personal devices. This contextual understanding of user activities and corporate data, coupled with the breadth of our cloud and web traffic, enables sophisticated applications of AI/ML models.

Historically, enterprise security products were hardware appliances that were managed independently across branch offices. This made generating insights and actionable intelligence across all of the enterprise security infrastructure practically impossible for legacy security vendors. With a cloud-native, cloud-scale approach to cloud security, Netskope is uniquely positioned to leverage enterprise data from users and devices on/off the corporate network, stream that to a central brain in the cloud, and build centralized intelligence.

AI/ML Everywhere

At Netskope, we embrace AI/ML technology wherever it is applicable. Some of the use cases are:

  • Compliance and Privacy – Help organizations comply with compliance regulations such as GDPR, CCPA, PCI, HIPAA, etc.
    • Detect sensitive information in documents, images and application traffic flows
    • Use embedded user and entity behavior analytics (UEBA) to detect malicious insiders, compromised accounts, brute force attacks, and data exfiltration attacks
  • Enterprise Security – Protect enterprise assets from being compromised and used as a launchpad for malicious activities including data exfiltration, botnets, spam, etc.
    • Detect malware using machine learning models as a complementary approach to anti-virus signatures, threat intelligence, heuristics, and sandboxes
    • Categorize and detect malicious web domains, URLs, and web content
  • Cloud DevOps – Proactive monitoring of cloud production environment for efficient and effective delivery of a secure access service edge (SASE) solution.
    • Automate Netskope’s internal workflows, including automated classification of SaaS and web apps
    • Production monitoring, troubleshooting, and incident prioritization for Netskope Security Cloud operations

Challenges

Cybersecurity has its own challenges when it comes to the adoption of AI and machine learning technology. Let’s take a look at some of those challenges. 

  • High accuracy requirements. Security policies are often evaluated in real time, for example, when users are browsing websites or accessing data in the cloud. Millions of critical decisions that impact user experience need to be made every day. The cost of false positives and false negatives can be high at the same time, which poses great challenges for machine learning models. While a 0.1% false positive rate may be acceptable in other domains, such as ad targeting or AI assistant, it can be too high for cybersecurity applications. Therefore, we have to carefully design product flows that include AI/ML models in hopes of achieving a good balance between security and user experience.   
  • Lack of labeled data. High-accuracy AI/ML models require high-quality labeled training data, and they need a lot of it. It’s not always easy to get labeled data in cybersecurity. For instance, malicious insider attacks in a large corporation may be rare. It usually involves manual validation from security domain experts to determine the ground truth of a particular user’s behavior, which can be costly and time-consuming. To address the lack of labeled data, our data scientists have to either leverage pre-trained models, come up with innovative ideas to synthesize labeled data, or use unsupervised learning that doesn’t require labels (at the expense of accuracy). 
  • Changing patterns. Patterns of cyberthreats and malicious insiders change constantly. A machine learning model trained with last month’s data may not work as well this month. To effectively detect the changing patterns, there is a need to continuously retrain the model, with the latest labeled data as they come into the system.    
  • Privacy concerns. Sensitive data in a corporate environment includes customer PII, trade secrets, financial data, and so on. When applying AI/ML, privacy concerns need to be addressed adequately throughout the model development lifecycle, including model training, validation, prediction, and feedback collection. Investment in privacy-preserving machine learning is necessary to ensure no violation of data privacy.   
  • Model interpretability. When we make a decision based on machine learning models, often we need to explain our decision to the end user or security analysts. For example, why does the model think a user’s certain behavior is anomalous? It’s always helpful to provide additional evidence and reasons when flagging such behavior. 

At Netskope, we are leveraging our unique data, latest AI/ML technology, and security expertise to address some of these challenges and solve cloud security problems. Multiple types of machine learning models have been deployed and refreshed in production. In the subsequent blog posts, we will go over some of these AI/ML efforts in detail.

author image
About the author
Dr. Yihua Liao is the Director of Data Science at Netskope. His team Develops cutting-edge AI/ML technology to tackle many challenging problems in cloud security, including data loss prevention, malware and threat protection, and user/entity behavior analytics. Previously, he led data science teams at Uber and Facebook.
Dr. Yihua Liao is the Director of Data Science at Netskope. His team Develops cutting-edge AI/ML technology to tackle many challenging problems in cloud security, including data loss prevention, malware and threat protection, and user/entity behavior analytics. Previously, he led data science teams at Uber and Facebook.