Co-authored by Yihua Liao and Yi Zhang
You have probably heard of how AI technology is used to recognize cats, dogs and humans in images, a task known as image classification. The same technology that identifies a cat or dog – can also identify sensitive data (such as identification cards and medical records) in images traversing your corporate network. In this blog post, we will show you how we use convolutional neural networks (CNN), transfer learning, and generative adversarial networks (GAN) to provide image data protection for Netskope’s enterprise customers.
Image Data Security
Images represent over 25% of the corporate user traffic that goes through Netskope’s Data Loss Prevention (DLP) platform. Many of these images contain sensitive information, including customer or employee personally identifiable information (PII) (e.g., pictures of passports, driver’s licenses, and credit cards), screenshots of intellectual property, and confidential financial documents. By detecting sensitive information in images, documents, and application traffic flows, we help organizations comply with compliance regulations and protect their assets.
The traditional approach to identifying sensitive data in an image has been to use optical character recognition (OCR) to extract text out of the image. The extracted text is then used for pattern matching. This technology, though effective, is resource-intensive and delays detection of security violations. OCR also has difficulties identifying violations in low-quality images. In many cases, we only need to determine the classification of the input image. For example, we would like to find out whether an image is a credit card or not, without knowing the 16-digit card number and other details in the image. Machine learning-based image classification is an ideal choice for that because of its accuracy, speed and ability to work inline with granular policy controls. We can also combine image classification with OCR to generate more detailed violation alerts.
CNN and Transfer Learning
Deep learning and convolutional neural networks (CNN) were a huge breakthrough in image classification in the early 2010s. Since then, CNN-based image classification has been applied to many different domains, including medicine, autonomous vehicles, and security, with accuracy close to that of humans. Inspired by how the human visual cortex works, a CNN is able to effectively capture the shapes, objects and other qualities to better understand the contents of the image. A typical CNN has two parts (depicted in the chart below):
- The convolutional base, which consists of a stack of convolutional and pooling layers. The main goal of the convolutional base is to generate features from the image. It builds progressively higher-level features out of an input image. The early layers refer to general features, such as edges, lines, and dots in the image. Meanwhile, the latter layers refer to task-specific features, which are more human interpretable, such as the logo on a credit card, or application windows in a screenshot.
- The classifier, which is usually composed of fully connected layers. Think of the classifier as a machine that sorts the features identified in the convolutional base. The classifier will tell you if the features identified are a cat, dog, drivers license, or X-ray.