Emily Wearmouth [00:00:01] Hi, and welcome to this edition of Security Visionaries, a podcast for anyone working in the cybersecurity and data ecosystems. I'm your host, Emily Wearmouth, and this week I have three amazing guests who bring three different perspectives to a discussion that I wanted to have with them all about AI. So first off, let me introduce everybody. Firstly, we have Yihua Liao who's a data scientist who's worked for all the big names, really, Facebook, Microsoft, Uber, and he's now the head of the AI Labs at Netskope. So welcome, Yihua.
Yihua Liao [00:00:31] Thank you. Glad to be here.
Emily Wearmouth [00:00:32] Next, we've got Neil Thacker, who's a chief information security officer and a very experienced data protection officer. He's worked with major names like Swiss Re, Deutsche Bank, Camelot, the lottery company. And he's also served as an advisor for both ENISA and the Cloud Security Alliance. Welcome Neil.
Neil Thacker [00:00:49] Thank you, Emily. A great pleasure to be here.
Emily Wearmouth [00:00:51] And then finally, Suzanne Oliver is our intellectual property expert today. Suzanne works in private practice as an IP strategist and a lawyer at Cintra. She used to run the IP department at ARM and is also no stranger at Westminster in the U.K., where she represents a number of IP bodies and AI and machine learning associations at a government level. I first met Suzanne at London Tech Week, where we had a great conversation while trying to eat politely for little cardboard lunch boxes. So I'm really pleased that she's joining us today so I can have a second chance at a first impression. Welcome, Suzanne.
Suzanne Oliver [00:01:23] Well, thank you. And yeah, it was a great lunch. Great. Thank you.
Emily Wearmouth [00:01:28] So AI is a really hyped topic, and I think you'd struggle to find anybody that has not talked about AI this year. And so you might wonder, what can we add to this huge pool of discussion? And really, that's the thing I wanted to do today. I wanted to turn that question around to our panelists. And so I've asked each of them to be prepared to answer one question from me. And that one question is what one thing, amid all of this chat and the hype around AI, would you really like to see being discussed more? Everyone comes to this from a slightly different angle, so let's see what their answers are. And Neil, I'm going to start with you. So it's a bit like blind date. First question to you, please Neil.
Neil Thacker [00:02:05] Um, yeah, I mean, it's a great question. I think. I mean, for instance, I'd love to see everybody stop using chatGPT, AI and ML is a synonyms, so I think it helps for us to kind of move away from that so we can better understand and raise awareness of how pervasive AI is today. I think we generally say it's generally underreported. So it's similar to saying somebody we had in the past, somebody saying an organization saying, oh, for instance, we're using the cloud, but actually they're using thousands of cloud apps and each one is performing slightly different task. And I think we have that same challenge with AI. It's already quite pervasive across, again, organizations and of course, consumers as well. They are utilizing these tools and services. So again, that would be the number one. And of course, I mean, it matters because of a number of key areas. So I mean, one is just the general awareness of the current use of AI and also being aware it's not a future technology. It's happening today. I think, secondly, we need to understand the people and kind of business and data that consumers know that A.I. isn't just a specific app. So a bit like saying it's ChatGPT. I mean, my pet hate is when I hear something say I hear somebody talking about AI and they say it's it's AI such as ChatGPT it's a common misconception that there's only a few apps out there today utilizing AI type capability. So it lives in many of the apps we use today. It's being used, it's pervasive. It's critical for organizations and consumers to understand what reason, what data is being processed, what outcome is intended through the use of that app and service. So those are the kind of the key aspects that I see.
Emily Wearmouth [00:03:48] Brilliant thanks Neil. I've got I'm looking at Suzanne's reaction that I've got a question for you actually on the on the back of this. Who do you think should be owning these definitions? You know, Neil doesn't want it to be defined as just ChatGPT or is there a common definition or standard for AI? And if so, who doesn't it or who should own it?
Suzanne Oliver [00:04:07] Again, really good question. No one does own it today. And there are other standards that are used to describe levels, for example, autonomy levels in terms of autonomous cars and vehicles. And I think this era is certainly crying out for a little bit more transparency about what is and what isn't AI and what is ML. They're Often used interchangeably, and they are actually very different. So I think there's a lot of hype, and I think theres a lot of hype is because there's a lack of understanding of what these systems can and can't do. But on the flip side, there's a lack of understanding of who owns what goes in and who owns what comes out to pick up on one of Neil's points, which is my biggest is and I think Netskope in one of your reports have highlighted the amount of source code that's actually input into these tools. Proprietary source code may be a lack of understanding that these tools don't claim to keep that secret or don't claim to allow you to keep it as your own. And they claim to own any output as well. So that whole area of the transparency of who owns what. What's it called and what's it do, I think yeah. Is right for maybe not standardization but right for perhaps a new language to be created that helps us communicate these these aspects more clearly to to people who don't necessarily understand them straight away.
Emily Wearmouth [00:05:35] Neil, have you got an example, you talk about A.I. being pervasive already and it being a technology of today, not of the future. Have you got any examples of where you've seen AI in action that people wouldn't perhaps know to look for it there?
Emily Wearmouth [00:08:12] I'd say it's quite bold to at this point send AI to a meeting in place of you. All the conversations about a people going to lose their jobs that sort of willingly walking that path, isn't it?
Neil Thacker [00:08:23] Yeah, I guess the challenge is when we set, we all send our own AI assistant to the meeting. I mean, what do they discuss? I'd love to be a fly on the wall.
Emily Wearmouth [00:08:30] With this. Brilliant. Yihua. I wanted to bring you in. At this point, you're doing a lot of work building AI systems and writing models for machine learning models. What is your thought around the sort of persistent relabeling of everything as ChatGPT Do you find that frustrating?
Yihua Liao [00:08:49] It is, you know, because I feel like AI-washing is is definitely a problem. You know, it's, it's obscuring the clarity in an understanding of AI. So I would definitely like to see more discussion about the way in which, you know, security companies and perhaps tech companies in general, you know, how we are building AI and ML, you know, what's the input for the model for the AI and what's the output and how reliable is the output? Right? And I feel like there's a lack of understanding and lack of transparency on those aspects. So as a result, I think there are a few misconceptions. You know, the way I see, you know, usually some people may say that, hey, I can do everything. You know, it's going to it's going to take our jobs and, you know, it's going to control all we do. And then there are also people saying like, you're training with my data and I don't want you to use my data to improve, to help my competitors. So I think perhaps since I did the Netskope AI Labs and over the years, we've actually developed a lot of A.I. capabilities at Netskope and before this whole general A.I. frenzy. So perhaps I could share some of my perspectives and how we build machine learning and AI models at Netskope. So first of all, we we actually have built a lot of A.I. models at Netskope, including models, to identify malware, phishing websites, and sensitive data. I think Suzanne mentioned the source code classifier, that's something that my team actually built a couple of years ago. Then we also use machine learning to identify anomalous user behavior, which may indicate of data breach or insider threat and so on. So at a high level and the way what goes into our models is is the data that we try to gather from different different sources, including data in public domains or data that we acquired from third-parties. And so we would never use our customers data to build out machine learning models without their permission. So as you can imagine, some of the machine learning models, for example, like a user behavior analysis, for that, we do need to look at each individual, individual user's normal behavior. But even for that kind of model, first of all, we have permission from our customers. Second of all, we are looking at some other information we're not looking at, for example, when the user is downloading a file, we don't necessarily need to look at the the file content in order to decide whether that behavior is is abnormal or not. Right. So so yeah, so that's basically what, you know, the what goes into the model. Now in terms of the output of the model. And you know, a lot of there's a lot of you know, nowadays almost every company is claiming that they're doing AI, how accurate the AI models are. But I would say that if someone tells you that their AI is is 100% accurate, I would say they are lying. You know, at the end of the day, AI is really a you can think of as a probability, you know, how likely something would happen based on the training data they have. So, you know, you always need some either human in the loop to to verify the output of the AI model or perhaps some kind of feedback loop in inside your product so that you can take that feedback and retrain your model and make the model better over time. So AI is really a innovative process. It's a journey, and you can never expect the model is 100% accurate and or even like 99% accurate in the first time. You always have to iterate over time.
Emily Wearmouth [00:13:30] I can see lots of nods from both Suzanne and Neil. Do feel free to jump the something you wanted to add.
Suzanne Oliver [00:13:37] Yeah, I think the reliability point I was sort of really vehemently agreeing with and I think that's the lack of understanding especially, you know, I know this is Neil's pet hate, when you go from AI straight to ChatGPT. It is a quite interesting example in itself that can be used as, you know, it presents incorrect data as correct because it has no understanding of right and wrong of true. You know, it's just the probability that that is the answer that you're looking for. It's how it operates. And I think this is where, you know, my earlier point of about education and skills to understand that these things are tools. And like humans, they're fallible, but they're fallible in different ways. Right. And I think it's thought that we as a society need to understand little bit better that therefore. But the point about, you know, having this transparency of how we describe them, maybe we need to categorize them in different sort of functional ways and categorize the risk output like the autonomy levels those earlier, but yeah, sort of yeah nodding, nodding vehement agreement.
Emily Wearmouth [00:14:54] And Neil, I wanted to ask you and you have something else to say, so feel free to answer both. But whether some of the points that Yihua was picking up on resonated with you as someone whose job it is to protect data. What sort of questions are you asking someone like Yihua at other tech companies to really get to the bottom of what these systems are being built to do?
Neil Thacker [00:15:16] Yeah, I mean, it always comes back down to the data, right? So questions around what happens to the data that's being input as part of an input query. For instance, if you're using genAI what happens to the model is the model being used to, to, to provide additional services privately or publicly and also what an analyzation controls. And obviously it was mentioned there around again an example of an organization controls that are implemented, but is also then what happens to the output and is there any kind of integrity check performed on the output and can the output then be also used to go through and pre train further models? Right. So this is another aspect that again, you could you can of course go into a loop and further harness, further refine the output query based on a series of kind of feedback loops utilizing AI services. But again, it's important to understand that again, how old is that data? How many iterations is that data gone through? But then it also comes down to things like, I mean, for instance, what country? I think this is something definitely we're going to be asking more questions about in the near future as perhaps more regulations come in to to protect individuals. And we would be hearing that the EU AI act as a law that will come into place protecting EU citizens. But, of course, that usually means that there's a there's going to be a follow up series of other regulatory requirements and regulations that come in from other countries who want to continue doing business with the EU, those kind of things. We saw that we GDPR is one example. So those are things to be aware of. One thing I'd also add is that we I think we're all aware a perhaps on this call and perhaps others are aware of this the hallucination phenomenon where AI will and can occasionally just create imaginative and creative content. And it's not based on fact or truth. I saw this recently as an example. Somebody queried again, "What's the world record for crossing the channel on foot?" And it gave a name date and a crossing time.
Emily Wearmouth [00:17:20] That was my record, Neil.
Neil Thacker [00:17:21] So yeah, I mean, 14 hours, 51 minutes.
Emily Wearmouth [00:17:24] It took me ages. Yeah, Yeah.
Neil Thacker [00:17:26] But for some people they think, well, perhaps again it was correct. Perhaps somebody actually crossed the again the channel using the Channel Tunnel, for instance. But then the follow up content said that it should only be attempted by professional swimmers. So you can kind of think that there is some hallucination. So that's just one example. But yeah, I think we need to be aware of the data, the integrity, how it's being protected and also all the regulations that are likely to be coming in. But we're actually trying to protect, again, citizens around the use of these services and what data was actually being processed.
Yihua Liao [00:17:59] Yeah. So, Neil, that's is so true. I mean, hallucination is certainly a challenge for, you know, practitioners like me, we're trying a lot of new things, trying to minimize the likelihood of hallucination. But I would also like to mention that, you know highlight the thuing that you mentioned earlier. A.I. is really about the data, right. Your AI is as good as your training data. You know, if if you don't have reliable, high quality data, if your data is biased and your model is not going to perform really well, I'm sure some of you have heard the story that some of the face recognition AI models, they are less accurate. And when it comes to darker skin or women, so there is some gender and age bias in the training data. So it's a that's an issue for us for, you know, security companies like ours, because most of our machine learning models, we don't look at things like age, gender and other PII information. But I would argue that still it's possible that the training that we use to train our AI models may not actually represent what we see in the real world. So, you know, as a data scientist or ML scientist, we always try to improve the quality of our training data so that it's more representative of what we see in the real world.
Neil Thacker [00:19:44] Yeah, I mean, we're also seeing, for instance, the whole supply chain and the economics of A.I., right? There's organizations that all supply data that can be used for training those kind of things, and that we start talking about data integrity there as well. And where was that data obtained is a bit like the marketing discussion. Where was that information obtained from? Was it obtained with consent? All those kind of things. So that also draws into a a discussion, right? The whole economic survey on the supply chain where that data has come from, who has given permission or approval to process that data. Right. There's lots of, I guess, requirements and challenges that organizations need to go through as they're starting to look at AI and the use of AI in their organization.
Emily Wearmouth [00:20:28] Definitely. And I think this is possibly it segues quite nicely into how you would answer the question. Suzanne. We had a pre chat, introduce your answer. So how would you answer the question? What's the one thing you would like to be discussed more?
Suzanne Oliver [00:20:42] Yeah, I think I've touched on it already. It's definitely around the transparency of what goes in and what's coming out and who owns that. I mean, data per se can't be owned per se, and that's probably a whole different topic in itself, so I won't go into that. But I think, building on Yihua's point about unconscious bias. You know there's 35 minimum cognitive biases that we have as humans, and I could probably have named three if you'd asked me before this call. So how can we expect that data that is going into these tools is representative of us if we don't even understand us to begin with? So that's one, one element of my answer of my book, I should say. The other is really who owns the output. So from my perspective, AI mostly touches on copyright and copyright ownership. So for example, if I upload some photos into one of these tools and it creates a sort of takes one of my photos and puts an amendment on it, then that amendment is owned maybe by the tool. So for example here it would be OpenAI and ChatGPT, but I own the original photo so the output is a new piece of work, for example, but it maybe potentially infringes something that has happened before because you look backwards with infringement and the innovation moves forward. So again, these are very sort of difficult concepts with layperson to understand and business people as well. But there's very little, unless you're a bit of a sort of IP geek like me, there's very little discussed about it. And certainly the language is not easy. Copyright is not an easy subject to get your head around full stop. So I think as technology leaders, we need to be the ones making the conversation more transparency. And this comes back to the point I was making earlier about having a common language that we define to talk about the data that goes in, the data that comes out and the tool itself so that we can really understand and monitor and standardize some of what's going on to make it easier to to understand what's going on.
Emily Wearmouth [00:23:04] From your perspective, to what extent are we now trying to close the stable door after the horse has bolted? I mean, you talk about things like Facebook have owned my holiday snaps for over a decade. Lucky them. But for consumers, it feels like to some extent that horse has bolted in terms of data ownership to feed these systems. Is that the case for corporations or is there still a chance to shut the door?
Suzanne Oliver [00:23:26] I'm not sure we're ever going to shut the door, but I think there's no point sitting there and watching the horse sort of run away and the distance. I think there's maybe a bit of corraling to be done. And for me, it's the speed of change. It's the speed of change in terms of organizations, not not knowing where their golden nugget data is, who's handling it, who is uploading into the cloud. And you can only you can only manage my exception. But you don't want those exceptions to happen. Right. So I think it's really understanding about your engineer behavior, your marketing people's behavior and having those conversations with them about, okay, these tools are great. They're going to help save you some time. But do you realize that when you, have these this secretarial tool running in the background, it's on a server in a country where maybe actually you don't want your monthly executive board meeting minutes being stored on a server in that country, no matter how much time it saves you from writing those minutes up. So whilst it seems like an easy thing to do, seems like an efficient tool to use, there are actually downsides. And it's just about having that little bit of brain power to say, okay, you know, every positive has there has to be a catch. Here Neil alluded to earlier is asking those questions about, okay, this tool seems really efficient, but actually why is it free? Why is it cheap? Why is it cost less than having a person sat there, you know, writing the minutes or sharing it? And there's got to be a there's got to be that other side to the equation. And you need to be asking yourself that question, I think.
Neil Thacker [00:25:15] Yeah, I think I mean, one of the concerns I have generally is that today we're already seeing organizations, for instance, issue questionnaires to better understand the use of AI in a product in a service. The challenge you always have is a questionaire is usually, again, a one off, perhaps on the adoption of a new tool, new technology that may be on AI, or it might be an annual review of that tool. But I think it always needs to be a stage in time where we're almost doing this more regularly, more continuously, based on the, Suzanne, as you highlighted, this fast adoption, this fast maturity growth in these services. So I mean, a tool that you might adopt and kind of today is going to look very different in a week's time or in two weeks time or in a month's time as new features and capabilities get added. So it almost needs to be a continuous assessment. I'm a big fan of looking at scoring apps and services and using this to define policy based on that. And again, it could be, for instance, I mean even we're seeing this challenge around apps and services that have been compromised. The rules that they've put in place, the providers wouldn't put in place to protect the service have been broken. Like most things people are going to hack, hackers are going to hack. Right. So you need to be aware of those kind of things as well. And yeah, there's also some I mean, some far fetched use cases where tools are coming in perhaps that are crossing, crossing the line in terms of ethics and that organization, for instance, that are employees in the organization, for instance, are utilizing and perhaps that may cross over their own ethics policy. So again, that needs to be also part of that assessment, that trust score. That's where really from an organizational perspective, organizations should be looking at providing better oversight governance around the use of AI services.
Emily Wearmouth [00:27:02] I asked you at the beginning, Suzanne, who should be owning the definitions and the standards. And in some ways, this is a repeat of that question of who should be owning the adjudication of these sorts of decisions. Neil alluded to the EU's AI acts coming up. And we've seen lots of posturing from governments around the world trying to be seen to take a leadership role in the development of AI. But is it realistically something that can be regulated or should it be coming from industry? Where do you see leadership coming from?
Suzanne Oliver [00:27:34] I think it always works when it comes from industry, personally speaking. But at the end of the day, I think standards bodies do do a really good job. So the autonomy of this six levels, 0 to 5 autonomy levels are I'm maintained by SEA I think, which is an autonomous automotive organizationm unless I'm mistaken. You've got the Internet is at present regulated by independent bodies. So, you know, my vote would be for independent body. But from an industry perspective, otherwise I don't think I'll stick. Um, but I certainly and it's just been mentioned as well on this call, I think Yihua mentioned the concept of greenwashing. I don't think it should be self-certification. There's too much of that with some carbon and carbon offsets and those sort of things that, you know, we are, you know, I think ethically, you know, green or whatever with a little sort of ethics tech mean, I've seen far too many startups that have one ML algorithm running on their machinery and then they have all over their sort of pitch decks you know where I company I machinery, you know, they're using one off the shelf computer vision algorithm. I think that as well is not is not helping anybody understand what it is, what it's there for what value it's bringing as well. But to quite a lot of points that both Neil and you off made on this call.
Emily Wearmouth [00:29:20] Yihua, what about you as someone who's developing this stuff, how much responsibility do you think should be shouldered by the developers themselves and how much is that unfair? And you know, you guys should be left to create things that that as a wider society we perhaps regulate.
Yihua Liao [00:29:32] I think we as as AI practitioners should also have a lot of responsibility when it comes to responsible AI. Here at Netskope we have a internal AI governance committee to help us set the AI strategy, set up the review processes and so on. And every time we start working on a new initiative or new AI model, we always have to go through a very robust internal security and privacy review process, and we have to fill out questionnaires and then make it really clear, you know, what goes into the model. What the model, how the model is going to be used, is there any any privacy concerns and so on. So a few, I do think not just government and the industry, but also every every company and all the AI practitioners should be aware of this and take this seriously. And in order to make sure that all of us can build AI systems or products in a responsible way.
Emily Wearmouth [00:30:48] Thank you. That was a mean question for me to throw at you. So thank you for tackling that one. So I'm really enjoying this conversation and I'm in no doubt at all that if we were cozily ensconced in the British pub it could probably go on for for many more hours. But our producer is currently waving at me, and that's my cue to to blow the whistle and try and wrap you guys up. So I'm going to try and summarize and feel free to interrupt if I if I'm doing so wrongly. I think it's fair to say that there are a lot of intertwined threads between the different ways the three of you have answered my question, but it also feels like there's a lot of consensus as well. And I think you all largely agree that we want people to stop using chatGPT, AI and ML as a synonym. So Neil, we all get behind your wish and we'll try work on that one. We also fell into the trap ourselves in this discussion. I don't know if you noticed to some extent I was referring to the same example, but we must do better. And we also, I think, largely agreed that it would be beneficial to have more explicit sort of under the bonnet or under the hood for the American listener detail behind this shiny AI labeling that people are putting on everything, whether in pursuit of press attention or high valuations, in order that organizations and users can get a better understanding of how much risk they should be applying in their assessments and what they can and shouldn't trust. And then I think the final point that we touched on throughout and came from your main answer, Suzanne, was that we would definitely benefit from more conversations around data ownership within the full AI supply chain, both with regards to what's coming in and what's coming out. So just building a greater understanding within society, businesses, individuals around what that ownership conversation looks like so people can make informed decisions. But just thinking through that list, we know what much do we? I mean, it's a very modest list.
Neil Thacker [00:32:36] It sounds easy.
Suzanne Oliver [00:32:37] Yeah, all solved. In one podcast.
Emily Wearmouth [00:32:41] Done they shoudl get us a more often. I thank you, all of you, for your time and for such an interesting and cross-functional, I suppose, conversation with all of you coming in with your perspective. And so to our listeners and I just want to say we'll catch you next time on Security Visionaries. Thank you.
Yihua Liao [00:33:00] Danke.
Neil Thacker [00:33:00] Vielen Dank an alle.
Suzanne Oliver [00:33:01] Danke.