Due to Less Pollution, Secrets Stored on the Cloud are Now Clearly Visible
- Welcome everybody. My name is Rod Soto and I work at Splunk as a principal security research engineer. I've been at Splunk for now two years. Previously I worked for Lexic technologies and Akamai technologies.
And I am here today with Jose Hernandez. - I'm also a fellow Splunker, currently a manager to our research team here. Our goal is to basically build detections not only for our customers, for the community as well. I've been an old colleague of Rod we've worked together Akamai, Prolexic which is now Akamai. And I also helped co-found a company called Zenedge which is now Oracle's web application firewall and DDoS mitigation services. - So thank you, Jose.
We're here today to talk to you about a serious issue that we've been tracking and researching about, and that is the leakage of secrets onto public repositories. And we're gonna explain how a simple leakage of a, for example, secret key can lead to compromise. We also have a bit of data for you at the end. We're gonna show you a demo of it too. And we are gonna show you how important is to keep track of your secrets. And since we're talking about secrets, before we delve into what those secrets are, we had to talked about where those secrets are involved, and how they are used or pass through, many to call it in a certain way, operations within enterprises.
So when we talked about that, there are two things that we have to talk about. One of them is Devops, and one of them is ITOPS. These two things or two philosophies. One of them is focused on software development. And the other one is focused on deploying, management and monitoring of infrastructure, information technology operations.
So what happens here is basically we have a number of processes. So for example, as you're looking in this slide, you can take a look at the DevOps process. This is what is called two chains. So the two chains are basically conceptual phases that you go through when you are publishing, developing creative planning software. So you can see it's a continuous process, in this continuous process you had a very large number of technologies that are involved, as you can see in the slide, there's all kinds of technologies involved. But the important thing about it is that for these technologies to talk to each other and to keep some sort of a level of security, not only at the local level, but when the software is basically published, which is usually done at the cloud platforms.
One of the main mechanisms to do this is the use of secrets. Secrets, we're gonna be delving in a minute, what secrets we're talking about, but as you can see in this slides, as you go through all this process and these technologies, at one point the secret will go from one point to the other. What happens with this two chains is in many stages these technologies are either not monitor or they are trusted or they basically taken for granted to be secure.
And we will see that in the next slides Next please. Looking at what I just presented, in terms of IT operations, in terms of DevOps, all of this two processes had something in common. Most of them nowadays are linked one way or the other to cloud providers. So we used to talk about the perimeter that ends on your gateway, that is no longer a reality. We now have what we call a converged perimeter. What is a converged perimeter? A converged perimeter is basically that gradient that comes in between your classic perimeter, where you have your, whatever's behind your gateway, your internet gateway, either can be a WAN, LAN.
Whatever you want, it can be an (indistinct) if you want but it's not the internet. And the mixed in that gradient with cloud technology. So many of you are probably familiar that cloud providers are now offering services which basically you can be saving data, you can be polishing software, you can be doing actual operations on servers that are in the cloud, but it looks like you are basically inside your own perimeter. That's what we call a converged perimeter. It may not be easy to distinguish because in many instances is seamless and one of the mechanisms to make this seamless is the back and forth of secure communications, or they use those secrets in order not to force your users or technologies to constantly try to log in again and again and again. And here's where it comes the concept of federations, which is also another issue when we're talking about the use of secrets, not only at the local level, but at the cloud level.
So please keep this in mind, the perimeter as we know it in this day and age where we're adopting cloud platforms and technologies, in order to do many things from software to IT operations, it's no longer the same perimeter. We now have a converged perimeter, a mix of what we used to call our classic perimeter and the cloud services that were adopting. Next up, please. With this converging, where we had to basically go back and forth and operate in this different platforms. We need mechanisms that allow us to authenticate, that allow us to secure and provide our users in our applications a way to talk to each other.
So if we put together the processes of ITOPS, DevOps, and now we put this on their context of, a converged perimeter, there are credentials everywhere, from SSH key pairs, which is obviously commonly used to log into servers. The Slack tokens, for example, which is basically a tool or communication tool that is used between many developers in IT operations, SAML tokens. The SAML tokens are basically a way to authenticate through an API. We also are looking at IAM secrets which is usually, had to do with identity authentication management, how we authenticate users, certificates, key pairs. And of course, we had things such as API keys, Oauth tokens, example of that it's Oauth tokens and Google platform for example, or Azure.
In this environments, we have quite a bit of credentials, which makes the job of defenders even more complicated, 'cause if you had issues before, when you only had a perimeter, let's say with passwords or password policies or people storing stuff that was available to others. Now it gets worse because now you had to deal with plenty of different sets of new credentials that can expose your organization in ways that many of you probably can't see it yet, but it is happening. And we will show some examples of it. Next up, please. So continuing my description of why is it that now we have all this new credentials, and we have this risk scenarios that are increased because of the converging of this new cloud technologies, which many times IT administrators and CIS admins or even SEC operations had to say it. They tend not to see things into cloud like they belong to them because the line between the cloud provider and the customer is responsibility.
It's not very clear. And because there is no clarity in that the risk increases, the risk like I said before, now a converged perimeter and now there is a number of credentials that are passing back and forth which many times I had to say the biggest problem is the lack of visibility at the cloud level, because it is assumed somehow, that this cloud environments are secure. Because it's a responsibility for example of the provider or because many people somehow don't think that this will affect them. The truth is that it will affect you.
And then here are some risk scenarios that you should consider within this context. One of them is you're having developers with very high privileges that are either pushing code that are logging into servers. Example, developer logs in locally, has high privileges then happens to log in into a cloud server, reuses the said credentials, makes a mistake and hard codes credentials in some code. That is a problem. When this credentials get leak, they may end up in public repositories. When this credentials are embedded in public repositories and because of the speed and the dynamics of software production for example, the link to production systems is almost immediate.
So it's very hard to catch it or stop it. Once the mistake has been made. Also remember, like I was saying before, because a lot of these environments are ephemeral. Ephemeral means you can destroy it any time you want. They're usually dismissed and they're poorly monitored.
I'll give you another example of this. Your developers or your IT operations are just wanting a specific application or stack. They just go to a public hub of where this container is for example, there are containers and they just download them and they don't check if there are backdoors in it, then they put the credentials in it. This thing goes into the two chain of a company that's developing software.
And somehow this ends up either in your private container repository or even worse at a public repository of software. And finally, we've been looking at this lately, is the use of federated credentials, when you're using federated credentials it means if the credential is compromised, either inside or at the cloud, it can be reused. And it can be reused, you can go North-South, East-West, and we're seeing this already. In fact, we're seeing campaigns where now part of the post exploitation TTPs involve the search and the use if found of federated credentials at the cloud connected environment. Next up, please.
So let's talk a little better what the primary source of leaked credentials is. It's usually a public repository. So what you see here is the main public repositories for software. And here we had Bitbucket, GitHub, which is very popular. S3 Buckets, you'll be surprised the number of S3 Buckets that are public and exposed. Things like Open DB or Gitlabs.
So if you are an attacker and you're trying to find somebody that either by omission or neglect, embedded credentials that could be reused, this would be your sources of leaked credentials. And we're going to show you that today, we're gonna show you with data, how this is a problem. This is a current problem and it needs to be addressed. Next up please. Here's a little bit more what I was just saying. With the use and the creation of this sort of technologies that enable seamless operation, between what used to be called the North-South meaning from perimeter out of the internet, and back and forth or a lateral movement, which is East-West.
It is now possible in many of those environments where there's a converged perimeter, the possibility to pivot North-South, what does this mean? This means that because you now have either reuse credentials that work inside the perimeter and at the cloud level, or you have federated systems that allow the seamless pass-through operation from North-South or East-West then now it is technically possible to basically do what we used to call lateral moment, but instead of East-West could be North-South. Why? Because credentials persist for an endpoint to cloud, there is a risk of credential abuse that is increased in federated environments. And this allows pivoting, because now I can pivot between endpoints and cloud providers.
And like I said before, we're starting to see the adoption of POS exploitation, TTPs that involve the search and use and abuse of either cloud connected credentials, or abuse or reuse of federated credentials. So this obviously can come from different vectors, but one of them is the unvetted images. Unvetted images are a clear and present danger.
Why, because, like I said before, many companies, for some reason, because these are ephemeral environments, meaning you deploy it, use it for what you want to use it and then destroy it. Somehow they pass as they are secure or they don't care much about it. And what happens is they end up in a private enterprise repositories where they had been implanted back door and many other vulnerabilities that basically could pass through simply because you have a developer that wanted many and a specific piece of code that was in that container. However, this wasn't scanned, this was not look upon security scanners, or it's not even inventory by operation. That's a huge risk. And then re-emphasizing that recent post exploitation DDBs such as Pass The Cookie and Golden SAML are examples of converged perimeter attack vectors.
Again, as we are entering a series of environments, where now a lot of these cloud technologies are part of our perimeter, our converged perimeters, this TTPs will become as well a standard. And what happens is with these items of risk and items that they can lead to obviously issues and even compromise, organizations need visibility. Not only inside the perimeter and at the cloud. I'd like to emphasize again, that the problem starts here, from becoming aware that the cloud does not belong, although it may belong to the provider, it's your responsibility to keep it secure. And it's your data and you have to watch it. Next up, please.
So one of the things that we had been using as a referential is Mitre ATT&CK Cloud Matrix. We believe this framework, it's a very good referential for looking at the different faces in possible TTPs that can be used. In environments where we have this variables of multiple credentials, converse parameters, and exposure of leaked credentials, which is in the case that we're trying to focus today, the main attack factor. It's pretty easy to go to some of these repositories and basically obtain leaked credentials. So here's something that we put this here because we wanted to make many aware that there is a Mitre ATT&CK Cloud Matrix, and many of the stuff that you have seen today, we use this as a referential.
Next up. So here's the following up on the Mitre ATT&CK Cloud Matrix. Here's some examples of TTPs that we believe you should be aware of if you have this type of environment. Things that attackers can do and that you have to watch for.
For example create permanent or temporary keys. If you have, we've seen cases, for example, where developers had root keys of an AWS environment, that is pretty bad, you should never get root keys. You had to enforce segregation of duties and principle of least privilege because basically a person with a root key, once you have a root key, you can do whatever you want and take over. You had to watch users and even privileged users creating trust policies cause trust policies, that can be attached to a role for example, can allow somebody to escalate privileges. You had to watch for temporary tokens and probably tokens can be abused such as Oauth2 tokens and GCP or watch for commands such as AssumeRole or GetSessionTokens. These are things that for example, would allow any user or an attacker to assume someone's role that has higher privileges or refresh a session because they were able to obtain the token.
Let's say they were able to obtain a compromised and endpoint, for example, where there is an SDK, or a CLI or AWS. They might be able to extend their sessions by executing this commands or even increased privileges, they had to do a little work to find the correct roles. But these are things that you had to watch for. Also things that you should be aware of is if you have an Azure environment, you have to watch for the creation of federated domains. Federated domains would allow users from that new federated domain to access your environment.
And there is a technique basically where you can create a backdoor federated domain to then basically access it. This is just something you had to watch for, and the same thing with service principles, service principles that are associated with the creation of federations. So this should be an event for example that if you see, you should pay attention to it.
And finally, one of the things that we've been seeing lately is the forge of SAML assertions. Some of the assertions involve environments such as the ADFS, like Active Directory Federation Service, and of course can be used in Azure, Microsoft Azure, and they can be used in Amazon Web Services. And with that, I am gonna pass it to Jose. - Thank you Rod for sharing a bit of an understanding of what are the potential vulnerabilities and issues, TTPs here that an attacker can exploit leaked credentials. I wanted to pivot a little bit and talk about how we can exploit this actively. How can we actually leverage leaked credentials, study them and gather them.
And there's a few things to consider right off the bat. Like when we started tackling this problem, one is what are the low hanging fruits for us to monitor? Usually leaked credentials are low hanging fruits for actors. It's easy to commit mistakes. I've made mistakes before in development where, I've committed either something sensitive on a repo and I didn't realize it. And I caught it afterwards. And I've seen again, colleagues, do those mistakes.
So it's a low hanging fruit for actors to get a leaked credential, reuse it across to attack an environment. Specifically GitHub, it's an industry standard tool used by DevOps to commit code. So again it's somewhat of a common tuition to use and common place to find leaked credentials.
But again, in our demo today, one of our, we shared the example we kind of picked on GitHub, but there's a lot more targets out there. There's Gitlabs, Bitbucket and other basically revision control systems as a service that you can leverage and exploit for leaked credentials. Before again I jump into the demo and I shared working pieces, wanna talk about the challenges with hunting for leaked secrets in public repositories. The first and most obvious challenge is public repositories like GitHub, typically enforce API limits.
And so if you're searching through their API to extract essentially credentials, they'll typically limit you by query by user or repo. We found some API specifically in GitHub that allows us to grab all events coming in. And so were not necessarily a constraint in the tool by user or repo, but this is a usual constraint to think about as you're trying to gather leaked credentials. Verifying whether a piece of code has a credential in it or not by hand does not scale.
When Rod and I originally started this research, we were just running searches across GitHub and finding different files and then reading through them to see if they had leaked credentials and we realized that we weren't gonna scale, doing this manually. Which really gets me to the third point of some of the challenges, but also some of the opportunities here which is there's not a whole lot of automation. Although this problem has been, in the last six months I've seen a whole lot of improvement and this has changed. This entire process for essentially parsing, storing and reporting for leaked credentials, it's ripe for automation. There's a lot of opportunities for automating things specifically to solve this problem.
So now we're set, let me jump into a quick demo and like everything in life, it's not fun unless we do a live. So I'm gonna try to do this life. I've already cloned down git-wild-hunt. Actually, before I jump, I've already cloned git-wild-hunt tool and I have it configured here. Git-wild-hunt is the tool we developed essentially to search for leaked credentials.
And it works very simply by taking in a GitHub advanced search. It would then parse all the results from that search and try to find inside of it whether there was a leaked token in it or not. Let me show you the configuration for it, it's extremely simple.
Let me share what the configurations look like really quickly. It just needs your GitHub token, where do you wanna write the results into and it logs it's actions. And what rej access to match for leaked credentials. If you wanted to search for example, leaked credentials for AWS, we can write an advanced search for, we can grab an advanced search that would look for anything under the AWS path, which is where AWS stores its credentials This is an example of that. So GitHub advanced search, look for anything on the .path with a founding credential in it. That's usually where AWS stores their credentials natively, AWS CLI specifically.
And the tool is gonna search all of GitHub for this file and see if it matches given the reject patterns that we have any potential credentials in it. So I'm gonna go ahead and run it really quickly. And you see right off the bat, we have about 350 results currently right now in GitHub, that match this file path. And now it's gonna go through every result and essentially see if it finds a potential leaked key. And you can see already that is already finding keys.
So it's telling you hey, I found a key on this repo. Here's potentially the key ID and again, it runs through every single piece of code that gets returned for that GitHub advanced search. And it's checking for about, I wanna say about 30 plus patterns, rejects patterns that match different credentials. And those patterns are things like Google API keys, Google Cloud platform tokens, Mailshake, Mailgun API tokens, Azure as well as GCP tokens. Squarespace, Stripe API tokens, it's all documented nicely in the project but there's about 30 plus checks with different potential tokens that can be matched on given an advanced search. Once the tool finishes processing all the results, and it's about to be done here, we're in the 270 something, it will write all the results into a JSON file.
And the primary reason we're writing this into this JSON file is to make it very easily consumable and portable afterwards and here it's all done. Excellent. If you go here, we wrote our results into results@JSON 'cause we're not creative with names.
And you can see here, there's a structure of all the results of all the matches essentially it found. It grabs back not only the URL or where it matched, but also the type of check, essentially leak, what the match was and then it brings back a little bit of information about the user in GitHub that leaked it as well. And again this is everything that's available to the GitHub API. Now, cool thing is since this is a simple JSON, one of the things that we're doing here, is we're indexing in Splunk this data, to make it easily reportable. And what Rod and I have been doing is we've been collecting, we've been running this tool for the last eight months and collecting all the data out of it in a Splunk instance.
And we have a bunch of good information out of it. By the way, this is the GitHub project, just wanted to share with you really quickly. You will clone it, you have some examples.
If you wanted to hunt for specific credentials, these are the ones that we've been hunting for for the last eight months. So specifically GCP, AWS, Azure, (indistinct) Kubernetes, Jenkins, Circle CX secrets and just generic credentials again. These are the different registers that it ships out of the box. I'm not creative.
I totally borrowed this from TruffleHog. This is exactly what TruffleHog searches for, and again the biggest difference if compared to something like this to TruffleHog is TruffleHog will search for these credentials inside a specific repo, where here we're searching across all of GitHub and then finding leaks for these specific type of keys. So with that said, let me show you after collecting there for eight months, here we put together a dashboard to allow us to quickly get a pulse of how things are going from leaked credentials perspective, specifically I call this the leak pie.
And so you can see most leaks that we've gotten, have been literally passwords in URLs. And some of it is a bunch of GCP service accounts as well on here, as well as API keys. If you break down by type, the most common type, again, password by URLs. You can see here a pulse of how things, it's awesome to see that over the months that we've been running this tool, there has been less and less leaked credentials in GitHub. Which means, again, it makes me feel happy that the problem is getting solved, to be honest with you since we started noticing it. In short this is what we've been leveraging the tool for.
Now, let me flip back here to our slides. I wanna share some other metrics that we gathered. A few of the metrics that we gathered here are, there's about 276,000 companies that we noticed that leak credentials out there actively.
And the way we're gathering the companies is, if a GitHub user has a company listed under their GitHub profile, that's how we know there's a company behind it. And so we counted them, 270 it's quite a bit of companies leaked. And this is by the way, the breakdown of leaked by companies, I forgot to mention.
Mainly passwords in your URLs, GCP tokens and AWS API keys. Now, one interesting fact was the average leak time was about 52 days. So when we saw a secret be leaked, it took 52 days for a secret to come off the GitHub project. And like I showed you in the trend, this has been trending down all around which is really, really nicely. But you can see, I do some trendlines here to show you that, it's a common problem, even in large corporations, that have like distributed DevOps team, if you're using specific things like GitHub.
Again, very common, this is a very common problem across multiple companies. I wanted to show you a bit of a, again good for these companies that the trend as you can see has been going down since we've been monitoring this, which is amazing. But it affects everybody. This is a very common problem we've seen across the industry. And now I'm gonna pass it back to Rod to close it off for us and give us a bit of how you can apply this in your own environment. - Sure and thank you.
Basically, what we want you to take from this presentation is that you have to become aware that if you have any cloud assets where you either have either a DevOps process or ITOPS process where you're deploying servers for applications, deploying servers for software, publishing applications, publishing patches software, you're exposed. So it is important for you to be aware not only what happens at the end of this road, which is the credential gets leaked and published, but you can prevent this by having visibility, not only at the cloud level, but at the local level where you had to have visibility into what your developers are downloading, what your IT people are publishing. Where are they getting this containers from? Are they scanning them? Is there a process to inventory and scan applications for libraries, for example or containers for either libraries, vulnerabilities, at the operating system level. Things that may be brought inside from a unvetted on check entity.
So be aware that, this is if you have an enterprise or even in a small medium company that has cloud assets, this concerns you. This is something that you need to be aware of. My suggestion in some of the cases that I've been approached by some very concerned security people is sit down with them, talk to your IT operations, talk to your development managers and talk to them and say hey, what do we use for cloud services? What type of exposure can we have by what you're downloading? Is there an inventory of the containers? Is there a process in place to sanitizing, to research them and to watch what's being published or even leaked? In fact, we know for a fact, that GitHub has released a tool to prevent this type of issue.
So this is something that you can use and you can also use the tool that we show you, to see if your company is being exposed by somebody, either accidentally or because there's not a process. And this is not checked, being exposed publicly on the internet by leaking secrets. So once you get to this point, you can basically develop procedures and prevent this from happening. - I just wanna add, thank you so much for hearing our plea today. I think Rod and I are extremely happy to see again, this trend down overall, and I hope you enjoy the talk.
- Thank you.