Data Science at Netskope and the Numbers Behind the Cloud Confidence Index

Netskope Cloud Report

Hi, I’m Ram, your friendly Netskope data scientist. You’ll hear from me from time to time on things like our Cloud Confidence IndexTM and SkopeSightsTM, our cloud usage insights. Please contact me with questions and topics you’d like to see covered in my blog.

First off, I am a believer that nothing speaks louder than data. I have worked with data for years, am passionate about analysis, and see data as the key enabler for all aspects of business including marketing, sales, and security. In today’s competitive environment (especially in software), a strong data science team is critical to success.

Using Data to Say “Yes” to Cloud Apps

Netskope Cloud ReportWhether they are aware of it or not, most tech-savvy people use several cloud services daily, including collaboration, backup, or email . The cloud is so seamlessly ingrained in our lives that we forget we are even using these services.

Similarly, cloud services have penetrated enterprise ecosystems. The main drivers for this are ease-of-use, service accessibility, and a zero-footprint (hardware, software) on the enterprise. Organizations use cloud apps to do everything from measure employee performance, enable payroll, automate marketing, and track sales to manage software development, test the security of websites, and back up data. Looking across all of these activities, it’s easy to see how organizations’ IP and confidential information can now be found in the cloud.

With all of the benefits of using cloud services, enterprises should be happy to take advantage of them to increase employee productivity and overall business profitability. However, enter BYOD and personal cloud services. This completely changes the equation. It drives enterprises from being happy to being paranoid, from feeling safe to feeling helpless in protecting their company secrets, and from being conscious to being unaware of what is traversing the company pipes.

Let me explain why. There are several reasons for this.

  1. Content of the traffic to the cloud service. The connections between a user and the cloud service are typically encrypted, and hence the enterprise is unaware of the contents of these sessions. The typical questions that the IT department of an enterprise ask include:
    1. Is the cloud service destined to an enterprise-approved account? In other words, is it the personal account of a user or enterprise account being used by the user?
    2. What data are sent to the cloud service? Are the data supposed to reach the cloud service? How do we ensure that sensitive enterprise information (that an enterprise does NOT want in a cloud service) does not reach the cloud?
    3. Is there any malicious content in the traffic? If yes, how do we find out and prevent bad things from happening.
  2. Personal or enterprise cloud services. Most cloud services today offer both a personal and enterprise service. From the perspective of enterprise IT, it is impossible to distinguish between the two. Allowing personal cloud service usage results in two issues: unwanted enterprise resource consumption and unintentional data/information leakage.
  3. Impossible block/allow strategy. Currently there are close to 3,000 cloud apps that provide services in a variety of different areas and this number is increasing at a rapid rate. From a security perspective, each of these apps offers a different level of risk. A traditional approach of enterprise IT is to block as many ports/services as possible. But in the case of cloud apps it is very difficult to apply the same strategy.
    1. All the traffic might use the same TCP destination port, and hence they cannot be selectively blocked using traditional port based filtering.
    2. Several cloud apps are being used in the enterprise. A user can use the same app for his/her personal use. It is virtually impossible to distinguish these sessions using traditional firewalls.
    3. Keeping track of thousands of cloud apps is not a simple task.
    4. Employee satisfaction (and hence their productivity) is significantly affected if they are not allowed to use their favorite cloud apps.

The solution for this dilemma does not lie in eliminating cloud services from the enterprise ecosystem. In fact, the solution lies in gaining incisive understanding about the cloud services and their behavior. Such an understanding can help enterprises to enable cloud services while still be able to identify problems such as malicious content downloads and data leaks. In other words, enterprises can be open to cloud services while not being fearful of the contents that are exchanged between the enterprise and cloud. The enterprise IT should be completely aware of the impact of using cloud services and hence make informed decisions.

One of the first enablers toward the goal of “gaining incisive understanding” is to comprehend the impact of using a cloud app on a company’s overall goal of security and data integrity. At Netskope, we call this the Cloud Confidence Index (or CCI for short) of a given cloud app. The CCI score is a quantitative measure that indicates the enterprise readiness of a cloud app.

CCI computation involves two steps:

  1. Data collection
  2. An algorithm that will interpret and express the collected data as a single CCI score

Data Collection

The most significant effort involved in CCI computation is the collection of data for all the cloud apps that exist today. The data that we collect fall into seven different functional categories, including identity and access control, file sharing, data classification; encryption, audit and alert; certifications and compliance; and disaster recovery and business continuity. These categories are adapted by the Cloud Security Alliance’s (CSA) Cloud Controls Matrix (CCM), version 1.4. CSA’s goal is to promote the use of best practices for providing security assurance within cloud computing. Our data collection team at Netskope has worked tirelessly to gather data on cloud apps and to ensure that the collected data are accurate. On top of this, the team keeps track of new app features and keeping our data updated accordingly.

CCI Algorithm

The first step in the CCI computation algorithm is to understand the data collected for various apps. The data Netskope has collected can be answers to different types of questions including yes/no questions, multiple choice questions, and free-form questions.

As a pre-processing step, we convert all of the questions/answers into attributes – some questions/answers result in just one attribute while others result in multiple attributes. There are several challenges here:

  1. How can we process the different types of questions to create attributes that are comparable?
  2. How do we assign quantitative values to the attributes so that we can compute a quantitative CCI value?
  3. How do we take care of missing values? In other words, when dealing with such a large set of manually collected data, it is inevitable to run into a situation where some of the critical data points are missing. We need to deal with this scenario.

We address the above challenges to create attributes that have the following two properties:

  1. An attribute can take only one of the three integer values in the set {-1,0,+1}.
  2. A positive value (i.e., +1) of an attribute always represents “good” and negative value (i.e., -1) represents “bad.” A value of zero (0) implies that there is no answer for the attribute.

The second step in our algorithm is to use these attributes to compute a CCI score. Note that irrespective of whether an attribute corresponds to compliance certification or audit logging, we give it the same value. In other words, the attributes by themselves are all weighted equally – a value of 1 for “good” and a value of -1 for “bad.” Hence we need a technique to apply weights to these attributes so that we can extract meaningful values for every aspect of the app characteristic. To do this we follow the classical approach of rewards and penalties – reward an attribute that has a positive impact on the app behavior and penalize the attribute that has a negative impact on the app behavior. In our system, the rewards and penalties are in the range [0,5] and only take integer values.

It is worth noting that rewards and penalties for different cloud apps will be different. For instance, in a cloud storage app that deals with sensitive documents, it is very important to have the data-at-rest encryption capability, while it is not as important in an app that only deals with non-sensitive data. Hence, in our system, we assign different rewards and penalties for different categories of applications. Using the rewards and penalties, and a weighted averaging approach we compute a normalized CCI value for each app in the range [0,100]. Our initial findings from the CCI computation can be found in the Netskope Cloud Report.