Automating Analysis with Multi-Model Avocados - SANS DFIR Summit 2018
I. Have. Something here because. I'm. Gonna be talking to you about, automating. Analysis, with multi-model avocados. So. I, have. Some shirts here, with. Some avocados, on them I might. Just throw them out to you guys but, you might have to pass them around because, the sizes are kind of funky so. If you get like one and just pass it till someone can use it I guess I don't know here's a shirt whoo. All. Right cool. By. The way. The. Multi-model, avocados. I'm talking, about here, is a, wrong ODB it's a database just some case by, the way I can predict the future there's. My twitter, handle right, there forensic, Matt follow. Me because, I have. A feeling, at the end of this presentation, you're, going to say things like oh where. Can I find that tool where. Can I find this information you're, talking about well I'll tweet about it throw up a blog post later cuz none of that stuff, is done yet so you, know just. Follow me on Twitter so when I talk about automated. Systems. I. Don't. Want you guys to immediately. Think oh this guy's trying to push a find. Everything. Forensic, button or anything no it's not really like that um. But. When we talk about an automation system, what does that entail so. Oftentimes when, we do our investigations. I'm just gonna set well you know what here, here's another shirt, all. Right oh that. One has a fox on it actually, because. We. Will get into that in a minute all, right, so, when. We go through our forensic. Investigations. This. Usually, in tells that. We're. Looking for very specific things we. Go generally. We're given a forensic image right or we made the forensic image and we want to pull some stuff out look at it so. We. Do an extraction we run a tool on let's. Say we're looking for past, executions. This could be prefetch right so we pull out prefetch, we run a tool on it we get output. But, we need to know that something happens, so, let's. Go ahead and extract out the registry. Let's, look at the bam keys the damn keys. Because. We need to put more information together so by the end of it we've extracted out I don't, know 20 different artifacts, ran 20 different tools and now we have to do something with this output right so, automation. System we, want to use. Tools we want to take that output, ingest. It into, a storage, of some sort, then. When it's in a storage point we. Want to do some type of analysis, so that in the end we get meaningful reports and, it, would be great if we, just didn't, have to do a whole lot between then, and the, tool. And the or the image in the report, so. That we have more time to do meaningful stuff so, in looking at this we. The. Better the tool the. Better our output, can be and by. Having better output, it. Enables. Us to do better analysis, and better analysis, leads to a better report. So, we. Have a tool problem, though there's a challenge here because. When. I look for a tool I want all of the, artifact, not, just some, of the artifact. But. I'm scared, that did Poole thought I said, give. Me a lot of other give me a lot, of artifact but what I want is the whole artifact, I want all of it but, the problem here is that there's output, that's. For. Humans, and then there's output, that. We need that's for analysis and the, two don't go together very well right so, when, you get output, for, humans, you're, looking at TSV. Output, you're looking at comma-separated value. Output right. Excel. Right. It's. Very linear. But. The. Problem, is when a tool starts trying to give us things that it thinks we would want to see. It, can start skipping some data because, in reality, the. Data structures, behind these artifacts, are very, nested, they're not flat data structures, and, so. When you try and flatten it out you get a lot of either, duplicated. Data or there's some data it's just like I don't know how to show this to you because it's, too much for a human to look at so we need that output for analysis, why. Do we need this so. Oh real, quick. Yeah. So. Output. For analysis might, be way, more complicated. Data because, it's gonna be very, nested. Data. Structures right so generally, like JSON, type. Data or XML. XML, can be nested as well but, it's very difficult, to go through that with the human eye so, we looked at. Why, do we need this because, there's so many artifacts, and when we take a step back. The. More artifacts, we can put together the. Bigger picture, we start to see. So. Shale bags tells us something MF T's tell us something, prefetch, us n all. Of these things tell, us one thing but when we start looking at all of them together it, shows us a much larger picture. So. This is why we, need data. For analysis. Because. When. We have this nested, data. Generally. We have things, that we're missing in something, like TS V form. Because. You're just able to. Throw. All, of the, data into it instead of just some of it, now, we can link these artifacts, together and. Start. Looking at a much larger picture all together.
Really Helps, us to see, the larger picture, and, created. A automated. Analysis, workflow. One. Of the things we can do is correlation, short, time. I really. Hope I don't hit anyone's, coffee and knock it over that would be bad actually maybe I should not do this. Value-based. Correlations. So in. A second we'll see an example of it but this is like MF. Tees they have file, reference, numbers right how many other artifacts, out there have file reference number you can find file reference, numbers in the US in Journal in the log file in prefetch. Data laud. Leap. Files. They're. All over the place so now we can correlate and link, these documents, together which is great what, about range based how, what. If I wanted to say oh I want to see, thirty. Seconds, of a given, artifact, after I see, something, else in another artifact, what, if I see, something. Bad, in the prefetch, and I want to look at historical. Records, five, seconds, after that thing, happened, in the prefetch or after we know executable. Ran right we can do that where that's a correlation. You, can create your own custom, functions which is awesome, because a lot of the time we process, forensic, artifacts, and we, still have values that are decoded, right, powershell base64. Right if we get PowerShell, logs we, still have to decode with base64. So, we need to some way to do decoding, on-the-fly so that we can do more analysis, on the, database side, so. Then I'm going to talk about something a little bit more complex, which is pattern. Based searching, and this, is when you can actually group and aggregate, artifacts, and look for very specific, patterns, and this, is great and this even goes back to what, the fireEye's guys we're. Talking about with. Needing. Some type of resilient. Pattern. Resilient. Signature, talked. About this in a second here's, an example shell back difficulties. You're. Looking through shell bags and, you. See. Two, folders on the same volume in this case it's e. Did, these two are these two folders from the same volume now. Sometimes. We can figure this out really fast because, maybe. One's, an NTFS volume, and one's a fat volume and, we. Know that fat doesn't have the MFT sequence, numbers which we would see if it, was in TFS, but, it's not always that easy, but. To answer a question like this let's look at our underlying data shell bag data you. Have. MFT. Entries this is important, to us you have sequence, numbers, and then. You have the name of the folder itself. What. About link data alright, so you see here an example of. Nested. Data, a link. The. Link file structure, is not flat by any means it doesn't even resemble TSB. It's. Very nested what are we interested in we're interested, in the file name, we're. Interested, in any, type the volume serial number we're interested, in reference. Numbers, sequence, numbers entry numbers so, the cool thing about this is their shell items per folder. Of per. Folder and file of the, local path so you see test file oppai, for up there you would have an entry for that at entry, for test folder Road 0 0 1 and what that means is you have parents. You have entry. Numbers and sequence. Numbers for every single path in the link file this is great because that means we can correlate link, files and shellbacks together because. We want to know were, they two separate, volumes, so. Here's an example you. Don't really need to know what, it says but the just is we're gonna iterate through our link files and say hey let's. Compare, anything where, the file name the entry, numbers and the sequence numbers, are the same with. Those in the shell bags because, in the end what we want is something like this where we can say hey, know, we, found a correlation. And, we can tell you that these two folders while they're on the same either they look. They, come from the e driving, the, shell, bags they're not the same volume, that's. Important, because now we know if. We were looking at just the shell bags we might say hey these things it's the same volume but no it's two separate, Drive shell, bag stores it on a. Drive letter basis. So. Tool shoutouts guys I'm not making this stuff up I didn't just generate, this data. From. Somewhere random, no, I used real tools out there right so thanks Eric, Thank You GC thank you Dave, Dave.
Kaolin, He's. Generous, he. Throws, out tools it's cool, what. About a more complex question, I did. A wiper run. What. Files were wiped where. Do we go to for this information, of. Execution. Artifacts, right we. Have file history artifacts. So. Two, that come to mind prefetch. We, can find execution, artifacts, there us, end journal historic. File activity great. Things what's. In our prefetch, data. Prefetch. Data has, run times. It, has the file names in there. What. About us in record data, that's. Going to give us on, a per, change basis, one. The, reason. The file was. Has. The log in it was a while was changed. Basically. The. Timestamp, of the change itself file names reference, numbers again more, correlation. Points for later. But. What. Would a query look like where we could say hey did, a wiper run where. Their files erased. What. If we could automate this process and. Just, have this query, that. Goes through and it looks for some known examples. So. We know if we roll if, we look through our prefetching we see something called eraser we know that's a wiper. Let's. Look for, the. File system, activity. 10 seconds, after we knew and we know between, the time eraser, ran and 10 seconds after it within, the u.s. syn records, to. Find out what. What was happening on the disk when we run something like this. This. Is the results, and. So. We, see, prefetch. Was ran we, find, that from the run time and then. We're looking from that point to ten seconds, after that point in the u.s. in journal and we see something real nice looking. These. Look. Like wiped files, we. Can see that because our first file, and. In this example it's in plain text we. See the data overwrite, change. After. That we see that it. Goes through a iteration. Process of being renamed, several, times. Finally. The. Same file, has. The file delete, operation. Done on it and we. Know it's the same file because of the entry number so. There's. A problem with this though, well. The first tool shout-outs all, right so I'll back I made some rust tools they're really cool if, you're into turning. Artifacts, into JSON, these, are great tools for it, plus, is cool that they're written in rust so they're super fast and you, can find them on the github links. It up there. So. Real, quick, so. There's a problem with this and. The. Problem is we can't quantify this. We. Know we, see five seconds, worth of historical. Activity and it does look like erasing. But. We need to go further and we'll get to that in a second so let's, talk about on-the-fly, decoding, real quick this, is important. This, is a cool this is the Windows, partition, diagnostic. Event. Log and this. Is something Jason Hale wrote a blog, post on the other day it was really cool because, traditionally. Volume. Serial, numbers, which. We were just looking at in link. Files. We're. Only really, there if you had readyboost, enabled, and, now it's, very rare that you have readyboost. Enabled. By, default, so. It's. Hard to link things up with volume serial numbers, to their correct device, information. You. Could do you, could do time range, correlations. Right we could say okay well when was the last time that, a, device, was plugged in and then go about it that way but, we don't need to now, we could do some on-the-fly decoding, and say hey let's look at these events because these are new and, Windows.
10. And there's. A hex string of the, entire VBR. Blocks. Which. Is awesome so. It's within the VBR, blocks that we can pull out the volume serial, numbers. But. We need to make a custom function to do that so. This is basically how you would do that the, wrong go DB ships with a shell and you can register functions. With it that. Works out great for doing things like this so, I want to do a query and I want to grab device information, based, off of this windows event and, I want to be able to decode, on the fly those volume serial, are the, VBR, blocks. So, this. Is basically what that would look like we call our. We. Call our custom. Function, win event and then. What's, the result something, like this so those BS sins are being pulled out on the fly from. Those raw, heck stumps which is huge because. Generally. Your, tool is going to give you some. Type of encoded, data and then you're gonna have to further process that even, more so now we can do this all in the storage level. More, tool shoutouts, events, JSON, was, used for, those. Examples. All, right so now we're going to talk about pattern, searching so pattern searching, is cool because. This. Is where, things. Can get dynamic, because the problem, is, you. Can't always rely on an, md5 hash you can't always rely on the name of the file because those things change right. So. We want to create some type of query, that's able to aggregate and group our artifacts, and we want to look for something very specific once, again you. Don't need to know what it says you just need to know that this is how easy it is to, do the groupings, do. The aggregations. Why do we need to do this because, let's go back to the white files we. Can't quantify if, we tell a lawyer, hey, we, know something's, been white. What. Are they gonna ask they. Want to know what was white how much of it was it important, we, need to be able to give them those answers, so, we want to automate this process right, so, what we can do is we, we, now know that there's a pattern behind our wiping utility, and. In this, case when we examine, the data we see that you. Have a minimum of 8 file name being. When, the file gets erase they, have a minimum of a tree names so. You have the, original name that, gets has the data overwrite, operation. It. Has a minimum, of 3 actions done to it before it gets renamed. To. 6 more it gets renamed 6 more times each, rename. Within the USN Journal that. File, name, under, that file name has 5 operations, done to it followed, by a seventh sequential. Rename. Where, it, finally. Gets deleted, so. This is actually a pattern that we can use and being, able to quantify what, all is you race so. We want to be able to create, some type of signature. That can look for, this, type of activity, and what's, cool about this, is you don't, have to apply this to just file erasing, this is to. Forensics. In. General. Like we just we want to look for pattern based type, of things so now we throw our signature, into our query and this, is the result. That we get we found. 138. Files, that, matched, that pattern, that we can now say we know all of these files were erased because of Batchelor pattern here. Are the original names here, are the, wipes names and those, are the times when, that file was erased it's. Pretty big, so. I want, to talk a little bit about. How. We can automate this entire. Process, and while. It why it's important, right because. We're. Seeing a lot more that. Lawyers. Are coming to us and they're giving us more and more evidence, but, they're wanting answers, a lot, faster, than we can give, it to them. So. A, lot. Of this stuff can be automated so it should be automated, but, we shouldn't have to reinvent the whole wheel there's a lot of tools out there that we want to utilize in this so, can we make a system, that utilizes other, people's, tools. Uses. The output, puts. It in a centralized, storage place allows, us to do analytics, on it and then, give us more meaningful, reports. Because. Very. Few times do we are we able to just run one, tool. Get. One report, out of it and just hand that off usually. It's, a combination of, multiple tools, multiple. Output having, to put that output together creating. A report out of it and then. Being, able to hand that off so, here's, an example here. Is a demonstration that. This, can be done. So. We've, downloaded the, wrong go DB package, from. Orongo, DB site it's. A zip which, is awesome.
What's, Cool about this is you don't have to install anything you. Can just stand this up wherever and, it. Works on there, they have the OSX version and they have one that runs on Linux and, Windows. And again. You don't need the job, who cares, all. Right so, all you have to do it's this easy you. Just run you, start up your service, boom. It's. Up and running right, so. It gives you a socket much, like Qabbani does to where you can go there you're gonna see an interface so, now we, need to mount a drive. And, then. Once. We mount this strife by the way I love Arsenal this. Is one of my favorite tools it makes things very easy alright. So we have our tools folder. This. Is a tool that I worked. On as proof of concept, code and what. It does is it iterates through a, volume, a live, volume, and it. Extracts, out files and then runs tools on those files so in this case we're looking at prefetch, it's, basically, looking for all the files with a PF, ending, it's, going to run the, Russy prefetch on it same with MFT, we've got some journal, in there and. Then the, event logs as well but. This is great because we can just utilize other, people's, tools when, we do this and then, do analysis, with that that output, all in one central storage location, so. We. Need to open up we're. Gonna open up our shell we've got to start it in admin mode because you need that raw. That, raw handle, to a logical, volume because we had mounted, our image. So that image. Is now mounted, to drive, letter I it's, a logical. Image. Always. Give a dash H tells, you how to use the tool. So. What we're gonna do is we give it the source volume in, this. Case it's, eye. Then. We need to tell it to give, it a temp folder location. Because. A lot of these files we have to export out before. We can run something on it. So, then, we. Give it the name of the database so. Now it created a database in our wrongo cluster is going through its parsing out all the link files registry. Us. In journal MFT, we see how many records are being loaded in there some, set up API. Prefetch. Right. So. Now what. We want to do is we. Can pull up I guess. I gotta let catch. Up by the way it doesn't go that fast I wish it did but I had to speed it up otherwise, we'd be sitting here for a while Python, slow but it allows us to do amazing things. So then Oh first. We need to use our Rangga shell because we're going to create our user functions. Which. We had done earlier and this is going to allow. Us to create reports, using, some of our more complex. Custom. Functions so, now. We're ready to go to our interface. Of course. They're by. Default it's a route with no password because why wouldn't you. So. Here's all of our collections we had link files us and stuff like this this, shows you this is just what that data looks like, within, the database. So, here's the us and us, n is relatively, flat but we nest out the reference numbers so we can do those correlations. Here's. Windows. Event logs. Event. Logs are great examples of, highly nested, data, so. Let's run some queries so some of this we already saw this. One is looking. At powershell. Script blocks. So. This, is what we're looking in the windows events we. See the, powershell, stuff. Here's. An example i just wanted to show you all what the interface looks like because we're going to. Just. Cut over to the reports in a second but this is this is what the interface looks like for, a wrong, ODB. So. It, gives you just. The interface that, ships with it gets. You past those complex, issues. Of. Getting. Through the data with searching navigating, the things i talked about earlier so let's, run some scripts, we have these, are going to. Create. Template. Excel, files so now we, went from image, to excel reports, just like that so, there's, one, of the powershell, blocks you can see, right there the. Dll. Importing. Things like the virtual, alec. Things. That we would question here's, a. Spreadsheet. Of, our. Eraser. And what files were erased after that and then this one is, the. White, files using, our signature. Based, searching. So. That, just gives you an example yes. We can't automate, a lot of this stuff. It. Doesn't mean it's cutting anyone out of a job it's going to allow us to get to more important.
Analogy. It's going to enable us to look for more complex, things and, just. Get. Rid of the low for, the, low-hanging fruit and. One of the things I love about this, conference is there's so much relevant, data being taught in the presentations. I want a system, like this that when, I watch someone talk about a new artifact, I can plug it into this system using their new tool and now every, case that comes through, our lab we. Can be accounting. For that, information. Pretty. Big I like it I hope you liked it. That's. About all I've got for you right. On. Well thank you guys I have more t-shirts. You.