Cisco Hypershield: AI-Driven Security for Dynamic Environments

Show Video

hello folks my name is Andrew oov I'm distinguish engineer and portfolio CTO for our network security business here at Cisco and here today with my colleague Jan Whit whom you will meet in a few to talk about Cisco hypers Shield before we talk about sort of the details the Gory details I would say of hypers Shield what it does and what it is I want to talk about how it fits into everything else we are doing with Cisco security portfolio and specifically network security up until not too long ago there was a tendency to use the network firewall for old things security only Network which sort of makes sense I had customers very large customers very big companies who managed to take a firewall put it into the data center and actually put each application host into a separate private or isolated VLAN and Ed the firewall to stitch them all together which you can only imagine how complex fragile and uh well frankly mind-blowingly interesting I would put put that way it was but lately the tendency has been to simplify data center Security in general and this is what we call today Cisco hybrid mesh firewall and the basic idea behind three word term hybrid mesh firewall at least not an acronym right it would be kind of weird to have an acronym like that but the basic idea is that firewalling is a function not a device so we take every environment and we build the firewalling function out of a device or product an application that's fit for that particular protection purpose so if we take Network firewall which again we used to use for all kinds of purposes in protecting all kinds of things it still belongs at the edge of your data center because you typically have a network and you do want to put features like IPS and maybe some malware protection W API Gateway whatnot you also use that with some like sd1 to interconnect things like branches campuses to your data center which yeah is very convenient but then also lately and we we've heard that in some of the other talks today and probably yesterday there's a tendency to do things in the cloud now cloud is your stuff running on somebody else's Hardware so no exception with the firewall or any kind of cloud delivery security solution it's the same feature set same more or less outcomes we say running in the cloud but it doesn't mean that Cloud should be completely separate from your Edge firewall Cloud firewall Edge firewall should be very similar so they should work together so Cloud firewall and on premises data center edge firewall they should work together to admit users again interconnect things and otherwise do what firewalls do in Forest policy if you look at public Cloud public cloud is now that's again your stuff running into somebody else's data center at a whole different scale because it's not just your stuff it's everybody's stuff it's running there public clouds each one is a little bit different and each has its own little tricks and twists on how you insert things and obviously visibility is very different into threats and everything else in public Cloud so we have a solution called multicloud defense which speaks the language of public cloud and specifically serves in the public Cloud at the edge as that edge firewall slight Twist on the firewall with that and a hint of public Cloud specificity as you get deeper into the data center get into the host operating system for instance again kind of foolish to put a firewall in front of every VM or every host so we have a solution called secure workload which goes deep into the application looks at the process trees and how processes interacts does various fingerprinting and behavioral analysis and eventually feeds all that information into enforcement points like firewalls and built-in firewalls into the host operating system so it does provide the firewalling functionality inside the operating systems and finally last but definitely not the least especially today we get into hypers shield hypers shield is relatively new not just a product it is a product but it's more than a product I'll talk about how it's more than a product in a few minutes but hypers Shield takes what workload does at the application host OS level and turns it into proper inline security for pretty much every input output call into every individual application so we took took the traditional firewall at the edge of the data center and we went all the way down through the floor into the application host operating system into the actual application attachment to the rest of everything and that allows us to not only do kind of segmentation you otherwise can't do with a firewall but also get all kinds of visibility not into the network alone but into for instance disk access is this application writing something on the disc or reading something from the disc it shouldn't be so again a very granular way of segmenting things and of course on top of all that you know five different things doing five very important functions we have that one management layer of CSCO security Cloud control which again I'm very very confident we've heard we all heard about in other talks as well so I'm not going to spend a whole lot of time on that so hypers Shields starts as the first tangable thing in this application security solution quum Factory whatever you want to call it but hypers share like I said it's more than just one product it's a it's an innovation framework and that Innovation framework will continue expanding so it starts in the application it will expand across the entire network fabric it is based on a few key principles or concepts so one we want to really build a firewall engine into every device every aspect of enforcement or connectivity we have across the network it could be something inside a switch with gpus data processing units it is something inside the kernel with extended Berkeley packet filters it's something inside virtual environment you get the idea it is taking a consistent enforcement threat protection engine and dispersing it across many different little tiny firewalls versus one big one uh then there is the data plane resiliency there's a tendency with Traditional High availability clustering solutions to be a little more rigid you upgrade you don't really know what happens after you reboot is your firewall still working the way you expect is something being dropped because somebody made a an error in the code you push the policy same thing is am I going to get a call from my CEO because well we push the rule and it blocks something they really really cared about so we have this concept of dual data plane or Shadow data plane where every software change every policy change should be tested automatically inside that secondary Shadow data plane before we switch all production traffic too so gives you an opportunity to well preferably not have an outage at all but at the very least if you do have a an outage it is contained to a very small set of flows which are mirrored through that secondary data plane so that's another concept and last but not least everybody talks about artificial intelligence obviously AI is no no presentation goes without the obligatory AI mention nowadays but we do want to use AI for proper purposes we want to introduce AI to reduce for instance policy complexity because you have tens of thousands of applications I work with many Bank customers that's literally what they have I was at ITF meeting some years ago and it was a presentation by a CIO of a major bank and he was talking through some of the flows they experience in their environment specifically because uh they wanted the TLs working group I was a part off to do something to the standard to make things easier for them to travel sh long story short that go through but the one statement resonated with me he said we have 400 applications when you when you basically you go to online banking you click the login button put in your username and password there's 400 applications making 10,000 connections to each other just to bring up that first page with your bank balance and all the other details and obviously if you have to write a policy which looks at 10,000 connections across 400 applications doing it by hand is a very difficult task It's a combination of application manifest from devops team but also behavioral analysis and some of the artificial intelligence machine driven help that you get for writing those policies make sure that yes this connection or this call from this application to this other application is actually expected and legitimate and we all we're all fooling ourselves if we think that application developers know exactly what calls their application make at any given time let's be honest about that so that's kind of the different framing right so hypers Shield starts as this application security component but eventually it becomes this framework which incorporates those concept across all kinds of network security products that we build at Cisco and now to get into more details on hypers Shield Jen today um let's focus a bit so H took like the top level message how everything fits in the portfolio how things are evolving so now let's spend a remainder of time doing somewhat of a deep dive into okay what is hypers what can it do how does it work right so what we've been announcing and and talking about quite a while uh are these two use cases and by the way people keep asking me when is it real when is it real so let me try to use an analogy that everybody always understands with cars is you know on these car shows when people have like when when genders have like these fancy prototypes that you always know this super cool but they're never going to make it well we have this fancy prototype but now we actually made it and customers start using it in Early Access and soon it will be like General availability so it is something fancy and cool and my mind uh and it's something that actually is going to be delivered quite soon actually so what we've been talking about initially are these two use cases so autonomous segmentation well what's in the name right so the key thing that we're trying to do here is take the the challenge of okay we have this environment that you want to segment for all the the the known reasons right reducing blast radius segmenting things keeping things from each other uh complying to rules sometimes it's just a matter of you need to be able to check a books to be compliant to this and that policy regulations Etc and there's quite a lot of challenges with those things still today right so this is a tough not to crack and with the hypers shield we actually realize that as we look at the technologies that we're able to use inside of tech of of hypers Shield we actually have a few unique approaches that we now can start using to actually make life easier right and not at the same time make it easier but also do a much more thorough and in-depth uh job as as figuring out okay what is what is going on so the whole idea behind autonomous segmentation is that like Andrew mentioned we now are using this pretty cool technology called DPF right so without making this in ebf ebpf Deep dive what is this all about we are now inside the kernel we're using a kernel feature which is called ebpf that's a key thing to realize right ebpf is a kernel feature meaning we use a feature and we don't need to go and start modifying the kernel right because if you start modifying the kernel in my mind uh a kernel regardless of the operating system probably qualifies as being one of the most complex pieces of code in existence today right so every time you need a new capability you need to start modifying the kernel to get access to that capability sooner or later not so fun things will happen right so by using an ebpf a kernel feature it's a whole different ball game right because now you're no longer uh needing to change the kernel itself you can just use existing functionality now what is so cool about ebpf it actually allows you to add certain specific previously not used or custom uh capabilities to the kernel and we will run it inside the kernel and doing it this way actually allows us to do a quite uh some quite cool things first of all from a performance point of view it's about as fast as you can go right there's no more contact switching there's no more refreshing cash tables instruction tables and all that when you go from user space to Kernel space run some stuff then hand off back to user space so there's none of that at the same time it's still very secure because some say wow you're now running inside of the kernel are you now not bypassing all of these kernel protection mechanisms right the answer is obviously no I mean we did think about it and when I say we it's the whole industry right because ebpf today is mainly on Linux platform it's coming for Windows but it's open source right so the the hypers scales of the the likes of like Google they have like made furthers to go and check and try and break and make sure everything is up to Snuff they have Google has even released that further the the research is there so it's secure right one of the ways that we actually secure it from a technical point of view is first of all these programs they run in a sandbox right so a predefined set of rules as to what they can do right they can do these things they can access these data structures and they cannot break out that second there's also going to be a just in time compilation process so when we hand off the code which is your mini ebpf script all of the hardening will be checked it will be make sure that it's not going to bypass any of the kernel protection mechanisms Etc and then finally we even do one additional layer of hardening making sure that we don't do funny things with memory protect against Spectre and things like that right since the Crouch strike I see every time somebody says the word kernel then probably a kitten dies so how do you prevent yourself from well turning millions of computers badly up again not you but you know General the event from how do you prevent the event from repeating itself that event is probably one of the main motivators from Microsoft actually now enable ebpf as a default in the rep rating system right so ebpf for Windows exists right you can you can search it you will find the GitHub rep with it's there the key the key thing that's missing is it's not a default part of the operating system so now if you want to start from security vendor try to start doing interesting things like and mention see look at things at at at process level make sure that we can also apply some security enforcement things if the kernel does not have a CIS that allows you to do that you need to modify the kernel right and like I mentioned it's a complex piece of code if you make a mistake everything blows up right while now because you're using a kernel feature you use this small program if you make a mistake in that program that program breaks but that program will never be able to kill your kernel okay so what you're doing is actually getting feedback from the kernel but you're not touching the kernel itself okay so you're just putting this right in the middle in a way that you can interact with it but without medling or fiddling with it exactly that's why I stress at the beginning that it is ebpf is a kernel feature yes it's all about custom codes and yes it's all about customizing things but we're consuming a kernel feature so we're not changing the kernel we're just giving it additional bits of functionality it's almost like a loadable module but again without the risk of a loadable module because a loadable module you're getting input from it yeah you still need to write the loadable module if you if you get it wrong same ex same disaster right things blows up get and die like you mentioned all kinds of bad stuff happens right so so it's not a good place to be so that's why ebpf is so appealing right so it's a very secure environment it's reasonably safe not just from security point of view but also a safe way to start interacting with the operating system and everything inside the box right because it's a very well defined set of things that you can access and change obviously by being inside the kernel we can see almost everything because there's still like kernel specific memory locations that we can and should not touch right so we cannot see that from what you can influence the list is smaller but still big enough to do very interesting security protections at the kernel level right to give you an example when I when I start talking about distributed exploit protection what we do there is basically take it's essentially simple the idea of what we're trying to do but to actually build it is quite complex what is distributed exploit uh protection all about well we have these public databases with non cves right everything that every cve that's actually exposed in your environment today is an issue right now we could say well we can start looking at the current level we see every single process we can enrich that information with sbom style information so we say well we see a process that calls itself engine X well we can talk with the package manner we can talk with other things that on your system to figure out how how did it get here what is the version then we can check out this claims to be engine X version one 1.2 well let's take the hash of an engine X binary of 1.2 let's check that this is indeed what it claims to be so now we have with that view see it's doing the Integrity check as well not well an Integrity Check Yes because then you know yes this is an unmodified version but more specifically now we know as a fact in near real time what is running where in your environment right so now we know everything that's running what the version is and all that so once you have that view and you compare that with the list of public cves now we know everything that's wrong with your environment right linking that with your vulnerability Management program you'd be able to be a lot more accurate in what's actually vulnerabilities in your because there's no more guessing involved right with now for a fact that's are potentials yes before you had to wait to see traffic and then you see there is traffic coming on 443 let's look at the handshake let's see what we can see in the headers that are not encrypted and start deducing what's there but there's always some level of guessing now we don't need to wait to see a traffic right we see the process spin up even before engine X even opens the socket we know it's there we know the version right and now once we know the version we know every single thing that's wrong with that particular process right now that's a very depressing thing right because now we can actually say every single cve that's exposed in your environment so at the same time that's being very very depressing but now comes the second bit right by being in the kernel it's not just about seeing things it's also being able to influence things so now what we do we have a big fat AI that is consuming all of this public cve information and what it does does it tries to create an ebpf program that will prevent an attacker from abusing this vulnerability and that ebpf program will load in the kernel so essentially what we do we make those cves non- exploitable so from an attack point of view it as if they disappear they're still there but then that can't do do do anything with them so to give you an example how that looks like right because I might be quite tricky to wrap your head around a while ago we had this vulnerability in SSH where there was some funny business going on in one of the supporting Library slip lzma some f business KN in and FSH actually loads that library then it's uh exposed right then bad things can and probably will happen so then you could write a shield that says look if your SSH process is trying to load this Library prevent it from loading one of the nonvulnerable versions right only allow it to load a nonvulnerable version and that you can now do because loading a library that's a kernel call right so now that's one of the things we are actually through ebpf able to influence so now when you run that shield now you can prevent SSH from loading a vulnerable library right so the vulnerability is still there as long as that library is available on your system if a process would load it it would make that vulnerability exploitable but now we can create a shield in BPF program that says let's not load the bad ones the impact of that is like how accurate is it in the sense of like uh a lot of a lot of very vulnerable systems are running older versions of things because they have to they can't upgrade and so my concern is am I going to impact um the business in any way am I going to stop them from being able to do what they need to do because we think it's the malicious actor yes so there's a few things to that right um one first of all if you look at what we're currently doing so we have this AI that takes this CV information calculat Shield right we need to verify two things first of all is The Shield doing what's supposed to be doing meaning is the exploit no longer working that's reasonably easy for us to do and automate the testing for right because we generate the exploit code we test the exploit code we apply the shield we test the exploit code again if the exploit code no longer works we're pretty knowledge well pretty sure that okay the exploit is prevented but to your other point that's that's only half of the equation we also need to make sure that after deploying The Shield we don't break anything else right the replication needs to stay working that's the tricky bit so that's a bit that we're still working on on automating all of that so one of the reasons why our Early Access customers only have a very short list of shields available that's because of the sheer amount of human labor that today we still need to go through to validate to your point it's not just about preventing the attack it's also about protecting the workload and making sure it still continues to work as as as and I guess you will maybe start with a detection mode as well for some policies where just say okay you have all these machines that are currently Vol unable they are loading the process that should be updated so at least that we have an idea before before let's say we intercept also the traffic uh which systems are let's say vulnerable to a certain attch type yeah and I can also Imagine I remember many moons ago uh we had a lot of uh endpoint security vendors were doing endpoint firewalls with you running in userland performance was really bad flag now we have this additional flag of kind of always the process is in the context of the policy so our policy gets a bit more complex compared to let's say traditional firewall policy where we have just Source destination ports yeah you always have to consider this additional process but you can now 100% say this is a correct process that we're accepting and then all these things were I don't know attack us rename processes or uh do all these things that do not longer work yeah with this policy model yeah yes and to to both of your point you can extend that line of thinking a bit further right because we need to make sure that we're not breaking anything right now keep that in mind now we've talked about a system that can detect a whole bunch of things quite with a high degree of certainty right so we improve the how sure we are about things but at the same time it's still a system a security system that's telling you as a security admin oh we think you should be doing this so if that's where we stop I don't think will be very successful right because why would you trust it I mean this thing is saying I want you to deploy the shield and the example I just gave is a very easy one I can show you another one which on the surface sounds very easy but if you then look at the implementation it's like well I need to be a kernel expert to figure out what this thing is doing understanding a single access list entry is easy understanding some of these hypers Shield policies is everything but easy right so we need to have something else right so what I've described so far is awesome if you trust every single suggestion it makes but I have not done a single thing yet to get you to buy into that notion right yeah so I guess that what you would be using because AI is everywhere now right so you're probably going to be feeding the data so that the AI will be interpreting all these things for you and try to re chew it and put it right in front of you or yes to a point so we use different flavors of Automation and pattern recognition and AI or machine learning as as you want now the big place where we use AI today in the context of hypers Shield is to generate these shields right that's the big part where we use AI everywhere else it's algorithmic borderline machine learning right so there is no direct interaction with an AI models however these shields and even the pattern matching and the ABN anomaly detections that we do those are still whether not it's an AR or an algorithm or some developer you need to have a way to gain some trust in it right and it's not going to be by at some point saying yes okay let's go for it let's let's run the rule in our production Network that's not the way to gain trust right so what we came up with is actually something quite uh I'll just give through to get there quite simple but very efficient because to Andrew's Point AI is not yet everywhere but it will be everywhere and one of the key things is whenever AI assists in coming up with a suggestion what you see in this D is well the AI is so much perc sure oh great what does that mean 80% sure wow this one is 90% sure this one is 99% sure when is it sure enough for you I don't know the thing is hallucination sure yeah I want me to say if it says I'm 100% sure would you trust it I would even trust it less right so what we do things in my life for which I have been very little you I have been a very low percent sure so I'm not really sure if actually see I'm not even really sure D it it's a bit confusing though but I guess you're going to be doing this on the digal twin and everything is going to happen in the shadow data plane and the moment you confirm it then you switch it we should s places l i don't talk that fast but my respect so to your point that is the solution that we came up with and I love it because it's solving a very difficult problem in a very simple and easy way which is you have the data plane where all the magic happens right where the polic is enforced now we just introduce a simple thing we introduce a shadow data plane which is as a performance impact do us not that much because we're not replicating the package right right we're sharing the pointer to the thing we let it decide so if we have a new policy for example we have this new Shield this new segmentation enhancement we run this in the shadow data plane for a while we look at the differences and the cool thing is the shadow data plane is not in our test or your test environment this is in your actual environment so there is no more gaps in in testing you do this wonderful test and test environment you go live things still blow up why because there is a difference and that difference is accounting for something you did not anticipate right so this is running in your actual environment we run some tests and we compare what you what the policy is suggesting you to do and we see how it would behave and then what we do and by the way this is not the percentage point I was just making fun of right this is a very measurable thing of how many perent of the actual tests passed now again if I say a test is passed you again need to trust me no why not we tell you look this is the actual difference we say it is passed but we tell you why it's passed because the difference is within 5% for example if the CPU difference is within 5% we call it past but you might be in a high comput environment and for you anything more than 1% might not be passed so again in the test report we give you all the data to make up your own mind so you don't need to agree with this being passed because you might say well for me it needs to be within 1% within 0 five of a percent even show you for each of the different policy rules the different in hit counts Etc right so now we give you on a platter this is what the behavior would be of this particular policy so that we use this as a way to let the system try and convince you right because now it's no more pinky promise and we tested on on kitten free environments and all of that no we actually said this is what would happen in your environment with everything that is your unique Hardware your unique software footprint your unique traffic footprint Etc right so this is our attempt at trying to convince you so this is a very measurable score right this is 80% of the test cases each of the test cases you get the data in the test report and you can even agree or disagree well this line is passed but for me this is not passed right because the margin of error or the mar of difference that you consider to be past for me is not acceptable sorry please go ahead no I was just going to say the could I tailor it so that I don't have to constantly look into the details I can just say this margin needs to be less than five 1% as you said could I have that so that then it would be more accurate so I don't have to continuous s clck CL yes but there's a few things right so one thing with with hypers Shield everything is API exposed so our UI is nothing more than like with open stack Horizon a graphical front end for a human to translate a mouse and a click into an API call right so everything is automatable so we are planning to make the like the the passing rate human configurable but at the same time you can also plug this in for example with an automation pipe is CCD whatever automation even just a simple python script to say look for these types of events if these scores are within that we can do we don't even need a human to check anymore obviously the long-term goal would be you do auto approve everything thing right but today there's no point right both us as an Engineering Group as you as as user need to gain trust that this does what it's intending to right so that's what this test report is all about give you the raw factual testing data of what it's proposing so that when it comes oh by the way I want you to go left pinky promise no I want you to go left and if you go left these are the wonderful things you will see right how long is the evaluation time that you normally do here for this so sometimes we have something that is just occurring let's say on a fre I don't know a backup application just once a week or so how long do you start the Telemetry and what are the comparability so features you have yeah today it's a hardcoded uh length of time we are testing with a whole bunch of different things for example start with a minimum amount of time and then look at rate of change so that we see are these test results converging or for example well we've never saw anything so that's why I use this example right so I could have used a 100% score But Here For example we see two things first first of all there is no hits on the policy so I've been smoking something and I've made the wrong policy or I did the test when the traffic was not there so I need to reun the test because right now I can see well this test is meaningless right I need to either change my policy so it actually does something so that these hit counters are non zero also that the difference between the primary and the shadow is there because even if there were hit counters if the hit counters are the same for primary and Shadow again your change is not doing anything different so why bother right or you might see well there is uh there's no difference or you don't see the traffic so you might need to run the test again at a later point in time so that's why one of the things that we're looking at is a minimum fix amount of time plus a rate of change to influence how long we test right so basically you just need to Define on the back end how long can I store this uh traffic data so that I can this is not storing this is testing in real in real time so as as you deploy this in the shadow policy in the shadow data plane this is in real time right so as traffic is being seen we update the differences uh that we measure the differen that we actually looking for CPU memory are we dropping denying the same package as the the latency changing all of those things and once we stop testing we generate a test report we store the test report but it's no longer being updated right but sure but uh to to get into changes that are just occurring occasionally you need to have a longer evaluation date so you need to store the metrics somewhere over a longer period of time to you mean historic references and that kind of stuff right so get these yeah so another thing that just because he's mentioning this we're talking about digital twin as as with any other highly available and redundant structure or or element then it requires certain expense certain commitment or compromise is the right word for this then from a performance storage and processing point of view then what what's actually what you're giving away what's the the cost of all this so obviously if if you introduce a second data plane it's going to consume more resource resources than if you didn't right now I didn't really talk about the different enforcement points but each enforcement Point has a different implementation for for example on a dpu it's done np4 and the enforcement VM it's in a VPP stack and an an operating system it's ebpf right so what we do is we make sure that either we put a g in place in case of ebpf Because by being in the curent we can now say do not allow this to consume more than those so much perc CPU so it's no longer a data sheet promise we can actually have a kernel rule that says do not consume more than that right and it cannot break it in the VPP implementation for example we say the shadow can only consume idle CPU there is no no impact right the only impact might be well if the system is not having any CPU available the time the test will take more time or never finish but then at least you know while the system is overloaded right so we put all these things into place I think we're done with questions perhaps if we have time for one more otherwise we close

2025-02-26 23:38

Show Video

Other news

Technology in the Workplace 2025-06-02 05:16

JRE: Google's Quantum Computer is Communicating with Multiple Universes? 2025-05-28 12:26

What's new in Flutter 2025-05-27 08:21