Hash It Out: AI & Cybersecurity Discussion
[Music] foreign Gus Walker: Hey, everybody! Welcome to the latest episode of Hash it Out. I'm Gus Walker, a VP here at Virtru on the apps team. We're here today to talk about all things AI, and I am joined by my colleague. Avery Pfeiffer: Avery Pfeiffer. Hello, everybody! I'm Avery Pfeiffer. I am the
colleague aforementioned. I do a bunch around Virtru, but you can just think of me as a technology generalist and full stack engineer. Gus Walker: There we go. Perfect. Alright, so Avery and I are both very passionate about AI. So, you got the right people here to talk about this. We have a couple of questions that we are going to kind of noodle around to kind of frame this dialogue.
Alright, so our first question today that we want to explore is: how is the rise of AI technology influencing our approach to zero trust data centric security. So My perspective is that large language models are going to be effectively the most intelligent employee that large organizations have and also the employee, probably with the biggest mouth. And so if I was to start the zero trust story in the large language era, I would start with them. New advancements are being made that are going to allow you to train these models on your kind of bespoke organizational data, which, of course, sounds to me like intellectual property, which, of course, do the challenge of data loss prevention. And so that's where I think we'll be able to leverage some of gateway’s tools to address that. I've got some ideas on how, but before I answer that, Avery, what do you think? Avery Pfeiffer: Yeah. I mean, I would mirror a lot of what you said. Right? Like,
the mass adoption of AI is really proliferating through organizations training these systems or fine tuning or whatever you wanna call it, creating embeddings of their knowledge bases. And, generally, that includes IP. Right? Either sensitive information, like maybe health information that you really don't want to get out or your actual trade secrets. And I think that's the key. It's controlling that data.
Gus Walker: Exactly. Yeah. And that's the story that Virtru’s had up until today and we'll continue to have. Right? You have invested as an organization, a lot of a lot of calories. Let's put it that way into collecting specialized information, whether you're in a fintech. Right?
When is the time is best to make the trades, how do you evaluate that, whether you're in healthcare. Obviously, you've got the HIPAA considerations there. But even if you're not in those elevated spaces, you may be dealing with just your strategic plans. Right? You might have just done your annual planning. You put that in your system. Somehow the large language model gets a hold of that, and somebody can ask it a question and they can get an answer. And if you consider the kind of speed that these large language models operate at, it is very possible that you could create a threat vector where you could package a payload of prompts together that are kinda stack ranked against the most valuable information, maybe your financial information, your competitive information, any new mergers and acquisitions you're making, and get that out really quickly. And -- Yep -- we don't have to map out the domain. You just ask the model. Avery Pfeiffer: Yeah. That's exactly right. Something that we've kind of experimented with
here at Virtru is doing that, taking our knowledge base and sort of embedding it so that we can benefit from GPT models in terms of onboarding training as sort of an internal use case. And something we've explicitly tried to steer away from is making that internal use case external. As soon as you allow you know, customers, but potentially bad actors as well to start probing, even the public data that we share, it just becomes so much easier to perpetrate things like phishing attacks against us if we make that resource available. Now that's not to say that a bad actor couldn't just train their own bot on the stuff we have publicly available on our website. And we need to be vigilant against that as well, but we definitely don't wanna make it easy for them. Something I sort of was ruminating on as I was looking at this question is, you know,
the question sort of poses the idea that, like, how do we wanna change what we do because of the rise of AI. I think not to plug Virtru too hard, but I think we're actually doing it in the right way. I think the way that Virtru approaches Zero Trust Security, the fact that it's data centric makes us really well suited as a solution to start to work with sort of these ML and AI workflows in a protected way. Wrapping your data and policies, no matter how trivial the data might seem, just
positions you to be able to sort of take advantage of these LOMs with peace of mind. Because if anything were to happen or the wrong data were to get into the model, you just revoke those policies. Right? As long as you're not training on the raw data, which I think we’ll get to. Gus Walker: Absolutely. And then one of the points you made earlier about the kind of urgency to adopt large language models. One of the places I think people will lean in first is
customer success. Spaces where they can support their external user base. If you're supporting an external user base with a technology that has deep insight into your internal mechanisms, it it's obviously important to make sure you secure that. So we could rabbit hole here forever, but I agree. Virtru is very very well positioned to address this because this type of threat environment mimics very similarly, the threat environment just that we exist experience now with email and large large files -- Right. -- same sort of thing. But we don't wanna rabbit hole here. We got a whole host of questions. No questions for me, Avery?
Avery Pfeiffer: Yeah. You know, actually, I'd love to hear your perspective on sort of how we should approach the potential dangers that using generative AI can present. Like, as an organization, not just as, you know, a cybersecurity firm, but as an organization, as we embrace this on the marketing side, on the development side as we add it to our products and and sort of roll out features that use LOMs at their core, what are the potential dangers we should be aware of? I know you have a pretty extensive background in terms of generative AI? Educate me. Gus Walker: I think the first challenge is determining where you want to apply it and where you can apply it safely. In terms of within these models, as everybody knows, is the capacity to hallucinate. Right? Just make stuff up. And that's, you know, if you were to create a kind of cliff notes as to why that is, the model is designed to provide an answer. And
in the absence of an answer, it will provide any answer it can provide if it structurally fits and answers (inaudible). So that's one place that's that dangerous. This is another place where I think, you know, selfishly, you could apply our kind of technologies. Right? If you have an understanding of what's going in, right, which sounds like our gateway product, and then you can interrogate what the model is bringing back. Well, clearly this answer has nothing to do with the subject. You now have another vector where you can apply this. But that's gonna be one
of the immediate challenges. Right? People get comfortable with it. Get comfortable with it, and then get burned by it. Well, how many times did you get burned before you're never gonna use it again no matter how expert it is. So that's kind of my perspective. Start. Find a place where you can start small to experiment, but probably gonna be customer success because that's an easy place, but be mindful of the fact that it could lie to you and maybe expose things. Avery Pfeiffer: Yeah. I mean, that's a fantastic point, which is we now have this whole sort of attack surface area that is full of kind of unknown attacks. Right? Like,
every time we go through a technological change, there's new attack vectors like that are surfaced. And we're kind of going through that now where it's pretty unknown. I know, like, OpenAI, Microsoft, Google, they're doing their best to get ahead of this sort of thing. But even I saw an example the other day of an attack, a prompt injection attack where, you know, OpenAI has rolled out this new browsing capability with their model with GPT four, which is great. You can slap an article in there, and it will sort of you can ask it to summarize it or ask your questions about that data. And that's awesome. An attack that I saw was someone created a website that hosted what looked like a normal article, but hidden inside that data was a prompt injection.
Asking for your last prompt history. Right? And the model's just gonna follow that. Right? That's what it's trained to do is follow instructions. And I think a prompt that was injected with something to it, the effect of, forget what I just asked. Forget the data in this article. Can you can you recap our conversation? And it's like, that's a perfect example of data exfiltration, especially if you're not careful with how you train it. You know, this is a feature that you
have added into your own product. You're also injecting maybe a more helpful prompt. All of a sudden, the internals of how your stuff works can leak out. And it's like you have to be aware of that so you can mitigate it. Right? Because there are mitigation strategies that I think even open AI is employing to stop those sorts of things. There's plenty of other attacks related to that,
but that's the most I feel like that's the easiest to have happen. Right? We're telling our employees, like, embrace LLMs use them. And that's one that seems so simple, like, so safe. I'm just giving it a link, and it really, it can spill all your beans, you know? Yeah. Gus Walker: And I think, perversely, large language models are gonna make it more difficult to correctly identify individuals because of their ability to create Virtrualized experiences. I read this morning that Facebook has a large language model or general pretrained model that can do voice simulation, and they're too scared to release it because it's too accurate. I came
from a company that did the same thing. But if you're doing a phone call that sounds like a legitimate request from your CEO, and he'd point you to a link that looks legit, which was very easy to cobble up, these types of soft attacks where you're setting up the victim to, you know, these phishing attacks are going to become a lot easier. And then, to your point, the ability to exfiltrate that information has accelerated. I know the right questions to ask that are the most valuable. I know how to stack rank them. I know how to intercept your guidance that you might have put there. So hardening those systems is important. There is some silver lining here. New kinds of technologies like real human live feedback will help you train these models, but they won't prevent the model from spilling your secrets.
Avery Pfeiffer: Yeah. I would even add just before we move on. You know, you don't even have to know the right questions anymore. You can just ask the LLM to give you a list of questions that would get you that data. Right? You don't even have to be a smart criminal, you can have the LLM do it for you. And I think that is scary. Right? Because all of a sudden we have these potential, you know, super criminals that are… but really, it's an LLM behind the scenes. Right? And we're not even to the age of autonomous AI. Right? Autonomous agents. We're getting there.
Gus Walker: And you don't really have to be a super criminal. One of the things that may have been lost in all of this availability of AI is if you were a nation state or an individual that wanted to present protective threats, you wouldn't have had the expertise in house to do it. Well, now you got the expertise. So a lot of the lower level bad actors have suddenly been enabled by this technology just like everybody else has. And I love your observation that you don't even have to know what to look for. You can just come in the door and say, hey,
that's the most valuable thing in your house. Send it to me. Avery Pfeiffer: That's exactly right. I've heard a lot of discussion about, you know, this concept of asking. But really, it's the export of knowledge. Right? This is something
that we've been, like, pretty protective of in the US… in regards to semiconductors and whatnot. We protect that technology because if that expertise, you know, gets into an enemy state's hands, well, now they have that technology. We lose our lead. Right? With LMs, exporting that knowledge is so easy that it becomes well, now we need to find a different way to police. Right? Because now keeping the cat in the bag is not about traffic. Gus Walker: It's gonna be harder. And I think South Korea just experienced this. I believe it was Samsung, one of their high level executives have been collecting information on their semiconductor information so that they could start a new plant in China. I imagine he was working on that for months trying to stay under the radar, Now imagine he's in our times, and these systems exist, he's disgruntled. That's a half hour’s worth of work to completely
undermine the value of the entire technology for that industry, let alone that organization because now you've got a competitor who won't play by the same rules. Let's see. In what ways can data class education and tagging aid in implementing a data centric security approach to AI? Avery Pfeiffer: Well, man. Alright. I have a lot of thoughts on this. But I think the easy answer is basically in every way. Right? Like, if you are classifying your data and properly tagging it, that's, like, the first step to protecting yourself against any number of attacks, AI or not against your data. Right? Because now you can understand it without even having to look at it. At Virtru, we kinda follow this policy of encrypt all your data. Right? Encrypt everything, turn it
into a TDF, and then you have control over it. Where that gets in the weeds is when you need to operate on that data, but you don't have the key, or you don't want to access the key to decrypt it, or you're not in a place where you can. In this case, data tagging becomes invaluable. Right? Because now you can have automated workflows and processes that make decisions on this data without having to decrypt it. It can potentially be in an unsafe environment because it stays encrypted the whole time. Right? And you just do something with it. The same thing with access controls, we talked earlier about, like, the potential dangers about incorporating LLMs into our workflows. One of the biggest dangers let me rephrase. One of the coolest areas of innovation is allowing
LLMs to do sort of dynamic real time access control basically, giving them a bunch of information in terms of where the IP address something was accessed at, what time of day, from what device, that sort of thing, and allowing it to make a decision you know, a logical decision on whether this request should go through. That's a huge, huge, productively helpful idea, something that can really change the landscape in terms of how we monitor our sort of digital parameters. But who wants to trust an LLM with that? Right? Like, that is probably the most scary thing that you can do in terms of, like, working with LLMs. Data tagging makes that a lot more feasible. Right? As soon as you start tagging your data, well,
you can have a tag that says this data is not allowed to be accessed by an LLM no matter what, this is too sensitive and we don't trust it enough, and all of a sudden, all of this, like, complex data curation, trying to separate the data LLMs can work with and the data they can't, and you have all these if conditions to facilitate that, it goes away because it's just done through the data labeling across application that was done at the point of encryption before the data was actually encrypted. Makes it infinitely more feasible to work with these things. Gus Walker: Yeah. I think, again, as part of the large language model changes that are happening to the environment that we're in, there's going to be a re emphasis on data hygiene. You can't even begin to take advantage of these things to get to the point where you might step on a rake if you don't have clean data. The good news is all these large language models… Well, prior to these large language models, we had a lot of reinforcement training, which meant there were a lot of labeling tools out there. So there are loads of labeling tools out there. But how
do you leverage that so that you can label it appropriately so you can then apply a security policy over top of it, I think, is the challenge in spaces where we can that where we can help. Avery Pfeiffer: So I I had a question kind of related to human error. We all know as a part of anything you do in technology. And I think I might know the answer,
but just for the listeners and the watchers, I feel like it's important to touch on. How can businesses minimize the impact of human error when working with AI or building AI into your workflow? Gus Walker: I think it starts with training, obviously. Right? There's a lot of misconceptions about AI. You and I have been fortunate enough to work with them long enough that we know that they aren't scary boxes that are gonna wake up at nights, you know and take over. We're not at a
cyberdyne state, yet. Right? With that said, they are, to your point, still spectacularly capable and therefore need training. So I would start with policy. When can we use them? Is this an appropriate thing for me to put in my financial information? Is it appropriate? Whatever.
How would we be able to parse the results that come out in a way that we can measure those so that we can keep improving. And that's another place. I think another thing to do would be maybe just get people to start using them. Right? If you've spent time with Chat GPT or any of these large large language models and been asking them for recipes or trip planning experience, you get it. Okay. This gets me sixty, seventy five percent of the way. But sometimes seventy five percent of the way on something really onerous is fantastic. But getting people to sensitize themselves to what it's capable of would be the first thing, one of the first things I would encourage people to do. Avery Pfeiffer: Yeah. I think I would mirror that in terms of I mean, that's the generic answer.
Right? Train your people. Train your people better and mistakes don't happen. And it's like, of course, that's the case. You have to train your people, particularly in the case of, like, these sort of third party extensions you can get for Chrome that claim to be Chad GPT, just one of the easiest fishing vectors I've ever seen. So train your people on the basic stuff first. But then, you know, after that, we're talking about human error, and we're talking about
zero trust. Right? The whole point of zero trust is that you shouldn't have to trust the actor. Whereas the whole point of training is you're trusting your employees to rely on their training. So at some point, that breaks down. Right? This is how attacks happen. And I think you have to be prepared for that. Do the training, but have these safeguards in place to patch your fallible human employees when they inevitably fall down. Because they're tired or sick or whatever.
And to do that, I mean, there's a number of ways, but I think one of the most beneficial is inject DLP type mechanisms into every place that data really, that data is leaving your system. But at least that data is leaving your system to enter an LLM. Right? These days, you kinda have this trade off of: I can use the latest and greatest in GPT 4 on Open AI servers, but I have to give up control of my data because that model is hosted somewhere else. Or, I can be very protective of my data, but I'm gonna be left behind because everyone and their mother is gonna be using this new latest and greatest LLM. Right? So that's, like, a hard choice to make. One of the ways you can sort of be the middleman in that decision is to inject DLP. Inject DLP at the beginning and
for those that don't know, DLP is data leakage prevention. Right? You wanna create mechanisms that will catch human falling down on the job before that data leaves your perimeter. Right? Inject it in the browser, inject it on phones, use VPNs, any way that you can sort of cache that data before it leaves, run it through some sort of filter to check and either give a yes or no before it leaves your system, will do wonders. Probably will catch ninety percent of the things that you know, are gonna leak their way out into the LLMs. Gus Walker: Yeah. Absolutely. I agree. Let's see. You kind of answered the next question
I was gonna ask, could you explain how sensitive data unintentionally could be used in generative AI tools? Well, that's the answer. Avery Pfeiffer: You know, basically email you know, really, anytime you use an LLM, that's how sensitive data can leak out. But especially, you know, if you're a small medical provider, a dentist, a doctor, or whatever. You know, that's one of the easiest ways. It's just your employees
emailing stuff. Something that Google is doing is they're adding the functionality of Bard, which is their sort of consumer grade LLM, the competitor to open AI's. Into… they're folding it into Docs and Gmail and all of that, which is great. But guess what? In order for that to work, it's gotta have access to your email and your documents. And you know, hey, I trust Google with a lot, but I don't know if I trust them. In fact, I know I don't trust them with specifically sensitive data. And so I'm just not gonna use that feature. Right? But that's a way that someone
that is unsuspecting or perhaps doesn't know that we'll see it start to use it and not even realize all the data that they're essentially signing away. Probably violations. Right? That's their business model. They don't wanna break the law, but they're also not gonna prioritize something that… the powers that be may be following, but the small, you know, mom and pop, dentists, and doctors offices aren't really realizing, they're not gonna correct that mistake. There's just not enough feedback in that loop to get into. And so you gotta be careful, you know. Yeah. Gus Walker: And back to the earlier point, education, education. Education… not just of our customers, but anybody who's dealing with security at all.
Avery Pfeiffer: Mmhmm. data at all. Right? And to your point, get them to use it. So it's not such a unique experience when they start to apply their logical brain. Speaking of, sort of AI tools… One of the questions I wrote down here, and I thought a lot about, but I was interested in your take, especially from a product perspective. How can we ensure AI tools are designed with a data centric security approach? You know, it seems hard. There's gotta be a way to do this. Right? To secure them. Gus Walker: I think there is. And I think, as we've mentioned, there are tools that Virtru has, such as the gateway product, that can act as an envelope, let's put it that way around the interaction with the model. Interrogate what's coming through.
Make sure you associate the correct prompts with it, maybe something, you know, kind of a turn that prompt injection into a good thing. Do not give this person any information that does not comply with their –and they don't even have to see that– It just happens under the hood. And then the inputs on the way out, as I said earlier. We can inspect that model. Oh, sorry, the response. Make sure that the context there is using maybe a classification model or just simple
regular expressions. Do I see any Social Security number? Whatever that stuff is, is in there. And the beauty of that type of an approach, as I said, it's model agnostic. You're using Bard? Slap your gateway around it! You're using Chat GPT? Slap your gateway around it! You're using the one from Anthropic? Slap your gateway around it! You built your own? Slap your gateway around it! Regardless --you do that. And that gateway can continue to mature independently of your
model. And it's getting you all of the richness of understanding your data, benefiting from your labeling xercise earlier, through data hygiene. And now you've got a new metric to understand how your employees behave with your large language model. So there's just a lot to apply there. Avery Pfeiffer: Yeah. I mean, I heard you mention before, actually. It was definitely a loaded question. Mhmm. And I think you're dead on with that. Right? Especially the model
agnostic approach. There's one thing we know about technology. There's one constant. It's that it changes. Right? Like, it's going to change. And as we've seen with AI and with ML, The curve is like this right now. Right? We're, like, almost vertical. And so don't design brittle DLP solutions. Design something that's going to be agnostic.
It's going to be kind of designed around the response and the input rather than the model itself. Mhmm. And put those safeguards in place, as you mentioned, we're working on that with a Virtru gateway. I think that's absolutely something that people will find value out of. And, you know, frankly, it's only a matter of time if you don't implement these sorts of things till you have a breach. Until you have a data leakage. It's going to happen. Defend against it. Right? Yep.
Gus Walker: And then one of the other things I will add is we're currently talking about large language models. You and I are aware of the multimodal models that are coming out. So these are models that will combine speech and object recognition and other kinds of abstract understanding. So when those guys get in place, that's another tier. But again, the solution that we've been kind of talking about, a gateway can help mediate that. So again, these models are more sophisticated. Or in particular, I read about one this morning, these models can now, I think it was Chat GPT, or one of the other ones, reach out to other systems and invoke commands on them. Avery Pfeiffer: No. I'm actually calling APIs. I'm very excited. I could talk twenty,
thirty minutes just about that. But, yes. Yeah. All of a sudden, now they have more capability. Right? You can integrate them deeper, but you need to be careful. Gus Walker: So with that said, Avery, thank you for talking with me. I learned a lot. Hopefully, we educated some people and at least gave me something to think about. But this,
I guess, concludes our, or my first, hash it out at Virtru. I welcome any questions, and look forward to having more of these. Avery Pfeiffer: Definitely. Definitely. Excited to be here. Thanks for having me. Allow me to share my thoughts and, you know, for all you listening out there. If you wanna part two, just ask for it. We'll make it. You know? It's easy for that. Gus Walker: Thank you, guys.
Avery Pfeiffer: Thank you, Gus. I appreciate it.