From where I do business in the Pacific Northwest and manage colleagues across the western region of the United States, I serve some of the most demanding customers in the retail, financial, high-technology, and manufacturing vertical industries. Many of these companies are at the forefront of cloud adoption, yet also have a lot to lose when it comes to sensitive data across a variety of data types.
Three out of four Netskope’s customers have adopted cloud DLP, and my customers are among the most sophisticated users of it. Here are four observations about what’s easy and hard to do, and why I believe Netskope customers are the most likely to address their sensitive data concerns.
Separate signal from noise
What’s easy: It’s easy to cast a wide net, catching anything that “could be” personally-identifiable information (PII), payment card information (PCI), protected health information (PHI), and so on. But when your false positives dwarf your actual detection, DLP becomes useless.
What’s hard: It’s hard to filter down to what’s important using context (user, group, device, location, app or category, activity, and so on) to separate signal from noise to dramatically reduce false positives and increase accuracy detection.
Find data when it’s hidden
What’s easy: It’s easy to detect sensitive data when it’s in simple files such as documents, spreadsheets, and text files.
What’s hard: It’s not so easy to find sensitive data when it’s housed in obscure file formats or in files that have been compressed or zipped. The prior point decreases my customers’ false positives, but this has enabled them to markedly decrease their false negatives.
See portions of files
What’s easy: It’s easy to look for “exact” sensitive files (such as a file hash match).
What’s hard: When you have to pick up “portions” of sensitive files is where things get tricky. Being able to do this right has enabled my customers to find sensitive content, even when it’s embedded somewhere else.
Be thorough in what you look for
What’s easy: It’s easy to look for specific “words” or “characters” in a file through simple regular expression. But this is useless if you don’t have an army of people to cull through the thousands (or millions) of detections. And it also means you’ll miss whole swaths of important data, such as source code.
What’s hard: It’s hard to detect more sophisticated data, such as source code structures, when they’re in a file. But if you’re a high-technology company or have any in-house development, this is a critical are