Securing container deployments on Azure Kubernetes Service with open-source tools | BRK264H
Hello everyone and welcome to this breakout session titled Securing Container Deployments on Azure Kubernete Service with Open Source Tools. My name is Josh Duffney and I want to make a quick note of the QR code that you see over there. You can scan that and put in all of your questions throughout the session. We'll have time at the very end for QA. So if they come up during the session, feel free to scan that, type them in, and then I'll be able to see them on the screen later. So I
want to start this talk with a little bit of a story. And that story begins with Once Upon a time, all software was trusted. At least that's what I thought. So I began my tech career on the help desk, and the primary responsibility for me on the help desk was to install software.
And lots of it. I was even an administrator of a product that some of you might be familiar with. Or Noah, which was System Center Configuration Manager. And I was on the help desk of a very large enterprise construction company with thousands and thousands of computers in the fleet. And one day I pulled down a package from a website that I trusted that I had installed a lot of software from in the past, and I deployed it. Through SCCM to probably 2 to 300 computers and infected them with malware. Now I know what you might be
wondering, which is you know, did I get fired from that job? Luckily, no, I didn't. I didn't get fired from that job. I kept it. But I learned a really valuable lesson and that was to be skeptical. Skeptical of the sources that the sources that I get software from. And so I want to pose a question which is our container safe? So back on my days in the help desk, the primary means of installing software was to install it on the desktop. Now we have different mediums,
so different means to install software and to deploy applications and containers is a relatively new and popular way to do that. So up here on the screen you see two screenshots. The one with the GitHub repository is the only. Trusted code in the world, and that is a code base by Kelsey Hightower. And the reason it's secure is because there's no software in it, so the only secure software is no software. So that means the containers are inherently not safe. And just like the lesson that I
learned on the help desk, I should be skeptical of the images and the containers that I pull down from different registries, whether that be Docker, Hub or elsewhere. And to help try to convince you that a little bit, I want to share some statistics with you on the urgency of integrating security into your workflow. So 75% of the containers running out there in production have high or critical vulnerabilities on them, Not low, not medium, high and critical, and then just one more percent higher than that at 76%. Are containers running as root so they're running as an elevated permission. So what this talk is about is providing
a solution for that with open source tools and then deploying those containers out to Azure Kubernated service. And this solution, the secure Supply chain solution, begins with building and pushing container images up to a registry like Azure Container Registry. The next step in this before we continue, has anybody here done any kind of operations work or had a pager, pager, a couple pagers? OK, so my one of my favorite stories was scanners. I'm not, I'm a big fan of scanners. So we had a security team. Well, we had a breach at the company that I worked at a couple, couple jobs ago. And then security got
a lot of big funding, which was good, which was great. However, they just started plugging in scanners and then flooding our pagers with all of these scans, which we didn't even know how to remediate. And one of the reasons why I really enjoy this particular tool that I'm about to mention is because it helps to shift that left so that way we can prevent even deploying some of those vulnerabilities. And that open source tool is called Trivia.
So Trivia is a container scanning or container vulnerability scanner. That pulls down a database of known vulnerabilities and allows you to scan for OS vulnerabilities for configuration vulnerabilities or drift, and then also secrets. So we can scan your code for secrets that you might have missed and help you avoid committing them up to your GitHub repository and then trying to clean the history of that repository. So
to recap, we've got the registry where we build and push images. We can scan them as the next step, but then there's another open source tool called Copacetic that will allow us to patch them based on the output of trivia. So Trivia has the ability to output to a JSON file in different formats too, but in the upcoming demo you'll see JSON and then Copacetic will take that, open up the container and start to patch those applications and those vulnerabilities and generate a new image for you. And then push that to the registry. So at this point, we have an updated image that we trust, but how do we know that we trust the image? And that's where another open source tool comes into play, Notary. So Notary is responsible for creating digital signatures. So we
have this patched image that we want to trust. We want to have a way that we can identify that we trust this image. And so we can sign it with notary and that completes the first part of that process of this secure container supply chain where we have a signature and we have an image and a repository that we trust Now on the other side. So we can avoid having a
signature that is no better than kind of going through the line of Costco and getting a highlighted mark on your receipt. We need to have some kind of system on the other end to act as governance. And so on the other side, on the deploy side of this equation, it still starts with the container registry where we have our image. And then there's two additional open source tools that can help provide that governance ratify and open policy agent. And these will work to validate the signatures that are on those images and prevent them or authorize them to be deployed to your Kubernetes cluster. And so if your image has a signature.
It'll get approved and it'll deploy. If it does not have a signature, it's going to get denied. And so with that, we'll cut over to the demo and we'll take a look at running this all in action. So
what I have here is a dev container and a demo script. So the demo script's going to start and I'm going to start a build and that's going to run the background. But as it goes in the background, I'm going to run through all of these steps manually at the command line so we can see what is happening. And at the end, hopefully I'll have a nice deployed workflow that we can take a look at. So let me just maximize the real estate that I have on the terminal and I'll start my demo script here. So it's going to start, it's going to load some environment specific variables into here.
And then we're going to push a change and then we'll start walking through all of the manual steps. So I'm just going to push a To my read me to trigger a change and commit that up so we can do a live demo of the workflow. So now that that's running in the background, it has an action that's been triggered and we'll take a look at that closer towards the end of the demo. Let's walk through what this workflow will be doing. So the very first step in that once we have an image in the container registry is to scan it with trivia. So I'm running the trivia CLI and this
is one of the wonderful things that I like about all of these tools is that there are command line tools first. So they're command line tools that you can use and interact with locally. Learn how to use them, learn the different options, learn the switches, and then you can throw them into an action either.
They have a provided GitHub action that you can use that abstracts that and gives you parameters that you can fill in or you can just run a shell script and get the same results. So this trivia command has a couple options to it. I'm choosing to not exit our air on exit and what that allows me to do is to continue on. So you could hook trivia up to your workflow and say hey if I found any. Critical or any vulnerabilities at all? I want to throw an Xcode one and it would kill your build process. But what I'm choosing to do here is to continue on because I'm outputting the patches that it wants to have or the vulnerabilities into a file called patch dot JSON. Now I'm also manipulating the scanner and I'm only
scanning for vulnerabilities. I could choose to also scan the configuration. So I have some terraform files in here and it would probably yell at me for variable names and stuff like that. So it can actually be kind of like a linter in a way for your configuration files and then it would also scan for secrets. So if I
had a secret hard coded and maybe a Kubernetes manifest or something like that, it would catch that as well. I'm also filtering on the vulnerability type so I'm only caring about in this instance the OS vulnerabilities. And then I'm at the end, that dollar image is the name of the container image, including the registry and the tag at the end, the trivia allows you to scan local images and remote images. And so we'll just take a look at that result. So here are all
the package ID's that were found in the vulnerability scanner that are in that patch JSON. And so these are some of the things that we're going to use, some of the issues that we're going to patch. With the next tool called copacetic. So copacetic is another open source tool that takes that patch dot JSON and opens up the container and starts to add layers into it and patch them and then it eventually creates a new patched image. And so if I show the patched
image just running the docker images. Command and then grapping for patched because I appended a patch at the end of the new tag. So there exists A tag of V 01A and now there's a- patched version of that. But this only exists locally now
and so just to prove that some things were patched, I'm going to rerun trivia with the severity of high and you didn't see it in the last one. But I believe there was around 18 high vulnerabilities and we can bring it down to 14 all inside of the workflow. So as it's running, we're immediately making our containers more secure by automatically patching them in our pipeline. So like I said, that image is local, so now I have to push that up to the registry and so now I have my patched container image. An ACR Azure Container Registry if I continue on now, so the story, the build part kind of ends here and then we move into the ploy. But we have we had an image that had some vulnerabilities. We patched
it automatically and now we want a way to figure out and identify which images we trust. And that's where Notary comes in. The CLI tool for Notary is called notation. And again, all these are CLI tools. I love that they're that way because then you can interact with them like this. But what this requires because I'm using the a plug in for it for Key Vault, I also have to add that particular plug in, so I'm signing the package. So I'm using notation
sign and I'm giving it the key, the key name. That's in Key Vault, so this is a certificate that's being used to sign. Its name is ratified, it was loaded into notation and then I'm giving it the name, so that's the full name of the container registry and then it has the tag of patched and I'm providing a username and password, and this is to authenticate to the registry itself. So this is just a token that's
being used with ACR to authenticate and be able to sign the artifact and create the metadata for that artifact on the image in ACR. So not all container registries support digital signatures or Azure Container Registry is one of them. So just to see what that looks like and to validate that the command line tool did what it said it did, I'm going to open up the portal and I'm going to browse to my registry.
And then under Services I want to go to repositories to find where this image went. So it is the Azure voting app Rust image. And now I have this tag with the version number and patched and so where this signature goes is underneath the artifact preview tab and if I click that I can see here here's the notary signature. And so this is the signature that's going
to be used by ratify. To validate that the container image was signed and then allow it to be deployed to the cluster. And if I click that it's just going to give me some metadata about the signature. So go back to the demo, I've got a Kubernetes or a series of manifests inside this repository and now I need to replace the image that's being used in the deployment with the signed one, because if I were to deploy ratify and to deploy unsigned images I'm going to get. A failure. I'm going to get access denied on that and so I'm just using the said command to do that and I'm going to open it up and just make sure that it did what it did. And I
won't lie, I did use ChatGPT to create the said command because I'm not that gifted with that command. So now this particular deployment, and it's pretty typical of a deployment that would go up to production. I'm just using an image from docker hub in the original manifests.
And So what I want to do instead, because I can't sign images that are on Docker Hub, I'm going to pull that image down locally. And this is for the Postgres, the database portion. It's worth mentioning that this is the Azure voting app that I'm building here and deploying to Kubernetes, And so it has two images. It
has the voting app image that's written in Russ, which is the app which is in the deployment dot app yaml. And then there's the database that's using a Postgres database. And as I mentioned just a minute ago, it's using the image from Docker Hub and so I want to pull that image from Docker Hub locally and then I'm going to take it with my registry name and push it to my registry so I can sign it. So
I'll take it and then I'll push it to the registry. Now that it's there, I do still have to sign it so both images will be signed in the deployment. And again, that's just the notation sign command, and if you wanted to verify, we can come back to the registry and make sure our signature is there. So at this point I have to also update the manifest for the deployment for the database deployment. That's all you said for that. And we're switching out just the
Postgres 15 Alpine to have it used in my container registry. So now comes the fun part. I'll be using helm to deploy a gatekeeper to the cluster. And so to do that I have to add the repository for Gatekeeper, and then I have to install Gatekeeper. It's worth mentioning if you follow along and you do this later on on your own, that you need to deploy Gatekeeper and ratify it to the same namespace, because they'll need to be able to communicate with one another. Has anyone used any of these tools before? Or are all these open source tools relatively new to everybody? You've used a couple? OK, trivia. OK, couple. You couple people have. Awesome. Is anyone familiar with notary? Right. Yeah, notation.
Well, notary is a project and the notation would be the CLI. Yep. OK, And then what about ratify? Has anybody dabbled in that at all? Awesome, very cool. OK, so now I have Gatekeeper installed and the next portion is to install ratify. So I have to add the helm chart. This
is one of the ways that you can install it. So I'll add the helm repo and then I'm going to install ratify. Now again, remember I'm deploying it to the same namespace which is Gatekeeper system and I'm setting this up in a particular way. So this particular configuration ratify needs to know the certificate that's going to be used that's in Azure Key Vault to validate the signatures and it needs to be specific notation. Requires that the signature only have the ability to digitally sign and so there are some limitations or prerequisites to the certificate that you use. I'm also setting up ratify
to use a workload identity, an Azure workload identity, and so ratify is using that identity to authenticate to Key vault to get the certificate and so now I have ratify deployed to the cluster. I have to deploy this. I have to deploy some constraints, so just having ratify there running and having a pod running inside of my hes cluster doesn't do anything. I have to deploy a template that defines how it's going to validate that signature and create the communities resources for that and then I also have to deploy a constraint.
And so now I have a template deployed and I have a constraint deployed. But let's take a look at what the constraint looks like. So it's using the API version on gatekeeper, the constraints gatekeeper V1 beta one, and it's of a kind ratified verification that was created by the template that I just deployed to the cluster. So if you put the deployment or if you tried to deploy the constraint without that template. It would fail, I wouldn't know what kind that this constraint was, and then I'm naming it with the metadata of ratify constraint. And then I'm setting some specs and I want the enforcement action to be deny. I want
it to deny the containers that are running that violate the policies, and then the matches are the kind of pod. So I'm looking for pods you can also change. The this constraint to target deployments or other Kubernetes resources, but the things that primarily use images that would need to be signed would be pods and deployments that would also have the image inside. And then I'm also constraining this to only run in a particular namespace. I'm having it only constrained default, so if I were to create a demo or a dev namespace on the same cluster, ratify wouldn't be monitoring that particular namespace and rejecting deployments. So to prove that the constraint works, I'm going to deploy out the unsigned version of this. So at the
very beginning I had this unpatched version of this image on ACR and I'm trying to have cougar dedies run that and so I'm getting this error saying it's forbidden by the admission webhook and I'm getting the request denied. And so now I'm going to take a look or I'm going to deploy all the manifest of the voting app and I'm not going to get that error because all the images that are in those deployments have the signed or are signed. And so now we can look at the logs and see what happened when I deployed an unsigned and assigned and we can see where ratify or what ratify is using to validate and let things come on and off the cluster. So the last thing that I deployed was the deployment for the Azure Voting app that had the sign images and you can see the subject which is the image here, but instead of the tag you're seeing the Shaw. But the important thing to see here is the is success true? That is the indication that the verification process was successful and again the command for that was. We go all the way to the top was just to get the logs. There's a lot more output than
I remember they're being there we go. So it's just cube ctl or cube cuddle, whichever your preference is logs and then I'm choosing to target the deployment. You could also target the pod itself and then you need to specify the namespace, but that lets you get all the logs from ratify and so if we Scroll down a little bit. Maybe if I search, there we go. That's even faster. So if you didn't know that, you can actually search in the terminal. That's a little trick NVS code, but
here is the image that was unsigned and we're seeing that the success was false. So now that we've taken a look and been splinking around in the ratify logs, let's move on. So here's the workflow, and I'm curious if the demo guys were kind to me today. So we'll head over to GitHub Actions and it looks like they were. I've got a third one that started 16 minutes ago. So we'll take a brief look about it, a brief look at it inside here, and then we'll walk through it a little bit more in detail in VS Code in the XAML file.
So everything that we did there, it was a lot of work. I had to download several tools. There was probably six or seven steps in there. And if you had to do that every time you were going to deploy a container image, you wouldn't do it. It's just
too much work. But luckily we have CICD systems that can automate all that for us. And again, that's what I mentioned before that I love about the CLI tools is I can do that manually, learn about it and then it's super easy to just throw it into a workflow. And to use GitHub actions to take care of it. So what I wanted to do here is just to make sure that the Kubernetes manifests ran and so if we exit out of this, we got a back off loop on the DB but it did deploy so I'll have to check that in a second. So they weren't so kind green. Green is good but then you check
the cluster and the DB is back off. But what I'll do in the meantime. Is we'll walk through the workflow. So there might be
just something wrong with the database container, but we'll take a look a little bit later. What's important is that you get to see this workflow and how it's all panned out. So it starts where it's a combination of the build and deploy. So the slides that you saw at the very, very beginning where we had the build and then deploy, they're combined to a single workflow and so.
I begin by setting some environment variables. I could use these as secrets, but I kept them here because they're not super sensitive information. I have to have the resource group, the ACR name, the key vault name, the AKS name, and the cert name, and all of these are required to set up the workflow. So there are two jobs like that you saw in in GitHub. There's the build and then there's deploy. Starting with build, it's going
to be running on Ubuntu latest. And the first step is to check out the code. So it's going to pull down the GitHub repository and it's going to build the application with the docker build command. So this is the Azure voting app rewritten in Rust. So it's going to compile that and then eventually
output on the local machine a container image. And then it's going to take it with the GitHub Shaw that you can see there at the end of line 20. So fortunately and lucky lucky enough for us, Trivy has a GitHub Action already built, so you don't have to Shim it into a run a run job or anything like that, or create some kind of batch script that has a nice GitHub Action that you can utilize. And so we're going to use the Outcome Security Trivity action and specify the image so that image is coming up from above, which is the what the docker image or docker build just completed. We're going to format with JSON and output a patch JSON scan only for OS vulnerabilities, not exit or not air on exit. So we're going to set that to
zero and then ignore unfixed and then only worry about critical and high. But then we have to do some logins so the action's then going to have to log into Azure and so I have a secret called Azure cred stored as a GitHub secret. And that is a service principal that is tied just to a particular resource group, and you can constrain that more if you wanted. If you want your runners to have even more security, I then have to log into ACR, so I'm using the Azure CLI to do that. And then once I'm logged into all those things, I have to push that image that I just built locally with a new compiled code with docker push.
Now Copacetic is a relatively new project and so there isn't an action available. But because it's a command line tool, we can just stream it into a run command. And so here I have a function that I'm using to download the dependencies. So this requires buildkit to be able to run and then also it requires it's copacetic binary as well. And so if you're just curious, I
have to do some trickery here where I go into the bin directory I start buildkit in the background. To start that dependency so copacetic could connect to it, pop back out, wait a little bit just for safety reasons, make sure that it's running properly, and then I can run copacetic. Once copacetic has patched that image, I can use docker push to push that back up there. Now there is a little bit of redundancy here that I see probably getting ironed out in the future. You're noticing that I had a container up in the registry and I had to build it locally.
And then because copacetic built it locally, it has to be pushed. There is a limitation right now on copacetic where the image that it's patching needs to be in registry. So you can't scan or you can't patch some image that's local that has to be in the registry. Then the next part is to set up notation and so there is a setup notation action that I wrote a while back.
And you can specify the version. So if you want to go back a couple versions of notary or whatnot, or maybe there's some breaking changes that you want to avoid, you can version that being installed on your runner. It also has the ability to install the plugin. So I'm installing the Azure Key Vault plugin and I'm passing in the certificate name and the ID. So that was something that I had done prior to the lab that you didn't see is I had to download the notation binary. I had to download and put the plugin in the right directory.
And then I had to add a key that was then used to sign the images. And so that's what that action takes care of for you. And then I'm signing the image and so I'm just using the run command to do notation sign. I'm passing in the environment variable for the cert and then the image with the username and password and so the notation username and password or the ACR token that I mentioned earlier and those are both stored. As a GitHub action secret, so that concludes the build the deploy, it checks out the code, logs into Azure and then I have to get my Kubernetes configuration. So I wanted this all to be a complete end to end deployment where I'm building the image and then I can deploy it out to the cluster. So I'm just
getting the credentials and then I'm replacing the images in the manifests. I'm using said you could use something like customize as well and then changing into the manifest directory and then applying all the configurations in that directory. So since we have a few minutes I'm curious what the database is saying. So if we maybe take a look at the logs logs. We've described nothing. OK, well, they're not being. The demo guys are not nice to me. That's fine. I've got
a run that worked prior, just before we ran this, maybe 30 minutes ago. So even though the action said it was successful, the deployment to the cluster failed. But if we take a look at here, we can see all these actions being done inside of the GitHub action itself. So all the output that we saw on the screen before as I mainly walked through is now captured in your log files. And if we go to deploy we
can see that Kubernetes was able to apply the manifest and it didn't get rejected by ratified because of the images were signed. And if we look at the portal we should see a bunch of images in here with GitHub shaws as the tag and if we look at patched. We should see the notary signature. So that wraps up
the demo. It looks like we have plenty of time for some questions, yes? Yeah, so all these open source things, it doesn't matter what container edge tree I send it in, I can just do all these security pieces that you have and send it up. Or is it has to be Azure Container Edge? So the question is, does it matter what container registry you use? And so the answer is yes, there's. I
believe Zot allows the signatures, so the registry determines whether or not the signature is allowed on there. Like it supports having the metadata be added to the registry. So ACR and Zot are the only two that come to my mind immediately. So yeah, it depends on where you're
hosting if these tools will work with it. Yep. Example. Token. How do you create it? Just was an AZ. So the question is how do you create the ACR token that I used to sign it? So I chose to use a token. You could actually use another command
and convert your AZ login commands into a token and pass that, but I just created there's an option on your ACR instance, an AZ command that you can use to create the token and then it generates the password for you that I used. Yep. And then, yeah, there's also the QR code, if we want to scan that in. So online audience as well can do that. So I'll take a couple questions from the screen. So the question is there are partner solutions such as Systic or Trend Micro, Why use open source and not these solutions? That's a great question. So after
a lab yesterday, the gentleman walked up to me and kind of asked like what does this mean for the security tools that are running like the scanners that I mentioned earlier? And it doesn't negate the need for any of those. So all of those serve a purpose for the things that run. All these tools are to try to catch things as they get into the environment and then prevent those unauthorized ones from getting there. But once they're there, they're not as helpful. And so you still need those
other solutions on the other side to continue to monitor and maintain. And then we have another another question that says, are there any plans for Docker files to auto generate? Visual Studio to not be set by the root user? That I don't know, but I'm sure there are plenty of Visual Studio people at the conference that could be asked, but yes. Question. So is only the. Latest.
Image deployable or can you have multiple words that are digital? So the question is with. The certificate and key vault which was named ratified. So that's just the name of the certificate that was generated there. Can you deploy more than one of them? And the answer is yes. So as long as so the signature can exist on on A tag and so if you have one image with multiple tags, each of those tags can have a signature and then those can be run on the cluster. It doesn't, there isn't any overlap there. So like say you wanted to run multiple vision
revision, say you wanted to run multiple revisions. You could do that by using different tags and have different signatures on each tag. So the question is, with trivia can you control what version of the patches is being deployed? With trivia you can manipulate through the scanners and severity and you can also add a trivia ignore file. So if you want to have exceptions, but I think beyond that you can't control precisely you can have an exceptions and you can kind of filter your results.
But it's all coming from the trivia database and then generating the patch JSON that I had or just outputting in the terminal. So there's no way you want to use this to deploy production. When you put it in for production at that time the patch version could have changed, so there's no way for you to just lock it down and say I only want version one to be installed and not version 1.1 which might be released in. A couple of days, so. Well, trivia only produces the vulnerability report, so it's not going to patch anything the copacetic would patch. Oh, so yeah, copacetic's just going to take whatever Trivia's output is and then run with it. So if you can manipulate
and modify the output, then you could. But oh, go ahead for me like. This entire just wrap it up. Summarize. A lot for me, right? So it seems like the patching is the key taking ability. I can still manage
my automated patching through this open source right? And that's what I'm looking to achieve through this. OK. Yeah. OK. Yeah. So more of a summary, right, where a good take away is automating the patching and alleviating that burden, right. OK. I'll just see if we got another one here. So the question is, is it
possible to have a link to? The GitHub or the Git Lab CID CICD pattern of the presentation, yes. So that's a really good cue I think I have on this next slide. So the next steps here, if you want to scan this QR code, this will take you to the GitHub repository that all this, so there's terror farm code in there that you can run and that'll spin up the entire environment. It'll create the certificate for you. It'll have the workflow available in there that you can use, and so you can take that and fork it and modify it for your own environment. There's a workshop
file inside the docs that walks through it as well, everything that I did in the demo. So yeah, absolutely for that question. Oh, question in the back time for one more. I'll be honest, not something I thought about. I just use it as a default. Yep, Okay. Well, with that, I want to redirect you to some related sessions throughout the day. So there's next level DevOps that is going
to talk about the framework around this. So kind of like a higher level that all of this stuff that I talked about today would trickle into, there's ship it safely. So I use GitHub Actions. There's a lot of new features coming with GitHub Advanced Security. And then there's also GitHub Advanced Security for Azure DevOps that is a deep dive in a QA that I would recommend that can also be on kind of the back end of these processes I was talking about here today. So with that, I want to thank you all for coming, thank you for your time and attention, and enjoy Bill.