Netskope named a Leader in the 2024 Gartner® Magic Quadrant™ for Security Service Edge. Get the report

close
close
  • Why Netskope chevron

    Changing the way networking and security work together.

  • Our Customers chevron

    Netskope serves more than 3,000 customers worldwide including more than 25 of the Fortune 100

  • Our Partners chevron

    We partner with security leaders to help you secure your journey to the cloud.

Still Highest in Execution.
Still Furthest in Vision.

Learn why 2024 Gartner® Magic Quadrant™ named Netskope a Leader for Security Service Edge the third consecutive year.

Get the report
Netskope Named a Leader in the 2024 Gartner® Magic Quadrant™ for Security Service Edge graphic for menu
We help our customers to be Ready for Anything

See our customers
Woman smiling with glasses looking out window
Netskope’s partner-centric go-to-market strategy enables our partners to maximize their growth and profitability while transforming enterprise security.

Learn about Netskope Partners
Group of diverse young professionals smiling
Your Network of Tomorrow

Plan your path toward a faster, more secure, and more resilient network designed for the applications and users that you support.

Get the white paper
Your Network of Tomorrow
Introducing the Netskope One Platform

Netskope One is a cloud-native platform that offers converged security and networking services to enable your SASE and zero trust transformation.

Learn about Netskope One
Abstract with blue lighting
Embrace a Secure Access Service Edge (SASE) architecture

Netskope NewEdge is the world’s largest, highest-performing security private cloud and provides customers with unparalleled service coverage, performance and resilience.

Learn about NewEdge
NewEdge
Netskope Cloud Exchange

The Netskope Cloud Exchange (CE) provides customers with powerful integration tools to leverage investments across their security posture.

Learn about Cloud Exchange
Netskope video
The platform of the future is Netskope

Intelligent Security Service Edge (SSE), Cloud Access Security Broker (CASB), Cloud Firewall, Next Generation Secure Web Gateway (SWG), and Private Access for ZTNA built natively into a single solution to help every business on its journey to Secure Access Service Edge (SASE) architecture.

Go to Products Overview
Netskope video
Next Gen SASE Branch is hybrid — connected, secured, and automated

Netskope Next Gen SASE Branch converges Context-Aware SASE Fabric, Zero-Trust Hybrid Security, and SkopeAI-powered Cloud Orchestrator into a unified cloud offering, ushering in a fully modernized branch experience for the borderless enterprise.

Learn about Next Gen SASE Branch
People at the open space office
Designing a SASE Architecture For Dummies

Get your complimentary copy of the only guide to SASE design you’ll ever need.

Get the eBook
Make the move to market-leading cloud security services with minimal latency and high reliability.

Learn about NewEdge
Lighted highway through mountainside switchbacks
Safely enable the use of generative AI applications with application access control, real-time user coaching, and best-in-class data protection.

Learn how we secure generative AI use
Safely Enable ChatGPT and Generative AI
Zero trust solutions for SSE and SASE deployments

Learn about Zero Trust
Boat driving through open sea
Netskope achieves FedRAMP High Authorization

Choose Netskope GovCloud to accelerate your agency’s transformation.

Learn about Netskope GovCloud
Netskope GovCloud
  • Resources chevron

    Learn more about how Netskope can help you secure your journey to the cloud.

  • Blog chevron

    Learn how Netskope enables security and networking transformation through security service edge (SSE)

  • Events and Workshops chevron

    Stay ahead of the latest security trends and connect with your peers.

  • Security Defined chevron

    Everything you need to know in our cybersecurity encyclopedia.

Security Visionaries Podcast

The Intersection of Zero Trust and National Security
On the latest episode of Security Visionaries, co-hosts Max Havey and Emily Wearmouth sit down for a conversation with guest Chase Cunningham (AKA Dr. Zero Trust) about zero trust and national security.

Play the podcast
The Intersection of Zero Trust and National Security
Latest Blogs

Read how Netskope can enable the Zero Trust and SASE journey through security service edge (SSE) capabilities.

Read the blog
Sunrise and cloudy sky
SASE Week 2023: Your SASE journey starts now!

Replay sessions from the fourth annual SASE Week.

Explore sessions
SASE Week 2023
What is SASE?

Learn about the future convergence of networking and security tools in today’s cloud dominant business model.

Learn about SASE
  • Company chevron

    We help you stay ahead of cloud, data, and network security challenges.

  • Leadership chevron

    Our leadership team is fiercely committed to doing everything it takes to make our customers successful.

  • Customer Solutions chevron

    We are here for you and with you every step of the way, ensuring your success with Netskope.

  • Training and Certification chevron

    Netskope training will help you become a cloud security expert.

Supporting sustainability through data security

Netskope is proud to participate in Vision 2045: an initiative aimed to raise awareness on private industry’s role in sustainability.

Find out more
Supporting Sustainability Through Data Security
Thinkers, builders, dreamers, innovators. Together, we deliver cutting-edge cloud security solutions to help our customers protect their data and people.

Meet our team
Group of hikers scaling a snowy mountain
Netskope’s talented and experienced Professional Services team provides a prescriptive approach to your successful implementation.

Learn about Professional Services
Netskope Professional Services
Secure your digital transformation journey and make the most of your cloud, web, and private applications with Netskope training.

Learn about Training and Certifications
Group of young professionals working

The Importance of a Machine Learning-Based Source Code Classifier

Aug 08 2022

Co-authored by Yihua Liao and Yi Zhang

This is the fifth in a series of articles focused on AI/ML.  

Source code is a critical part of an organization’s intellectual property and digital assets. As more and more centralized source code repositories are moving to the cloud, it is imperative for organizations to use the right security tools to safeguard their source code.

In December 2020, a software engineer started working at Tesla and immediately began uploading the company’s source code files to his personal Dropbox account. Tesla didn’t confront him about his alleged theft until January 6, 2021. In March 2022, Microsoft confirmed that the Lapsus$ hacking group had compromised an employee account and stolen the company’s source code from Bing, Bing Maps and Cortana. These are just some of the latest examples of sensitive data leaking in the form of source code. 

Challenges of source code detection

It is not an easy task to determine programmatically whether a text document is source code or not. First of all, there are many different programming languages, and there is no specific pattern to describe what the source code should look like. As a result, it is impossible to come up with some regular expressions to match source code files with acceptable accuracy. 

Furthermore, programming languages are different from natural languages. Therefore, many popular pre-trained NLP (Natural Language Processing) models, such as GPT, BERT, and XLNet, which have shown great results in other document classification problems, are not effective in identifying source code. For example, some terms, punctuations, and symbols, such as “str”,  “def”, “==”, “>=”, and “:”, are not included in the vocabularies of most pre-trained models. However, they are widely used and carry significant meanings in source code. On the other hand, some words, such as “return” and “switch”, are used in both natural English and programming languages, yet with very different semantic meanings. 

The Netskope source code classifier

To address these challenges, we have developed a machine learning (ML) based source code classifier to detect source code files, as part of Netskope’s Advanced DLP (data loss prevention) solution. The source code classifier takes advantage of a proprietary code vocabulary, which consists of 80,000 common phrases in source code. The code vocabulary was extracted from a large corpus of source code sample files, covering more than 20 of the most popular programming languages. 

We have generated machine learning features based on the code vocabulary and trained a decision tree-like source code classifier. Compared to the model that is refined from a pre-trained language model, the source code classifier achieves 92% reduction in false positives while keeping the source code detection rate at 99%.   

The source code classifier scans our customers’ network traffic and looks for source code files inline. Its runtime in production is just a few milliseconds. This allows customers to enforce their source code policy and prevent data exfiltration in real time. 

More about Netskope DLP

Netskope’s award-winning DLP solution helps an organization protect the sensitive data it owns or its employees process. Netskope understands the context of cloud and web access, including the user, device, app, instance, activity, and content involved, to accurately identify violations and data risks. From there, it can then allow, challenge, block, quarantine, encrypt, or apply a legal hold, as well as integrate with on-premises solutions to prevent data loss and exposure. Netskope performs accurate inspection through 3,000+ out-of-the-box data identifiers, 25 predefined legal and regulatory compliance templates, and various matching techniques (proximity expression, custom regex and dictionaries, file fingerprinting, exact data matching, and so on).

Netskope Advanced DLP includes machine learning based file classification that provides a fast and effective way to identify sensitive documents, enabling users to work inline with granular real-time DLP policy controls. ML classifiers are able to accurately classify documents into different categories, including source code, tax forms, patent documents, and other sensitive legal and financial documents, without the need to identify specific pieces of sensitive information contained in those files. 

For more information, please check out our white paper Protecting Data Using Machine Learning.

author image
Yihua Liao
Dr. Yihua Liao is the Head of AI Labs at Netskope. His team Develops cutting-edge AI/ML technology to tackle many challenging problems in cloud security, including data loss prevention, malware and threat protection, and user/entity behavior analytics. Previously, he led data science teams at Uber and Facebook.

Stay informed!

Subscribe for the latest from the Netskope Blog