TechDays 2018 - Windows Defender ATP Machine Learning Defences
Good. Morning. Thanks. A lot for making it in so, early after. The party. Congratulations. For being here at 9:00 a.m. I appreciate, it I. Want. To first open up with sort of a normal day, in the life for us so. What. This shows you here, is just an average, day in the middle of May this year when. 2.6. Million people in. 232. Countries, around, the world, were. Attacked, with. 1.7. New. Malware. Attacks, that. We had never. Seen before, totally. New. 1.7. Million on a single. Day. What's. Worse, is that. 60%. Of, those. First, seen attacks. They. Were over, within, the hour. In. The security industry we, like to think a lot about response. Security, response, how. Are we going to respond, to an, attack. Well. When. You've got an hour, or less and, you've. Got 1.7. Million new things that you have to deal with in a single day, there's. No response, time that matters, you. Either. Get it right from. The start or, you, failed. We've. Got half, a billion people counting, on us every, single day to, get this right, to. Block this stuff the, second, that we see it and. We've. Got lots, of attackers out there who, want. To ransom. Your systems, they. Want to steal your credentials. They. Want to get the, banking, credentials of, consumers, right they are working, hard. 1.7. New things every, single day. So. This. Is stuff that keeps us up at night right. And being. Able to predict, and be able to stop these threats it's absolutely. Critical so. We're. Investing, right Microsoft. Is spending a lot, of money to, try to protect you we've. Got, thousands. Of full-time security. Professionals, at Microsoft. We're. Spending over, a billion, dollars u.s.. Dollars every, single year in security. Every. Day we're processing. Trillions. Of signals, and we're, finding five. Billion new threats every single. Month. So. I think it's safe, to say that when it comes to security, at, Microsoft, we're. All-in right. We. Are investing. To try. To stop, this from happening. But. We know that this has to be easy for you right Security's. Not going, to work if it's complicated. We. Know it's not going to work if there's, a heavy burden on your end users if they're bombarded. By security. Alerts every, day they, don't know what to do right. This. Has to be low-cost. On, the client, that's. Why we're doing things like building in, the security, not. Bolting, it on that's. Why we have such a high focus on performance. Low. Performance, high. Effective. Protection, at, the, client, that's. Why we're investing in security, admin, tools and Security Response tools, for the professionals, that need it because. We know that you need a console, where you can see everything. That's going on where you can respond, where you can put in your automatic, remediation. Things for when things do go wrong right. We're. Investing, to help you. So. That's enough about the overall product, stuff, let me tell you a little bit about me and why I'm here talking to you today so, I, work. For the. Research team the Windows Defender advanced. Threat protection research. Team and. Team in particular focuses. On, using. Machine, learning to, predict. Threats so, that day in the life that, I was telling you about where we see 1.7. Million new, things every single day that's, what keeps my, team up at night because, we are accountable, for. Predicting. That it's a threat and blocking. It at first sight and, I'm, really honored to, be here today to be here in Sweden, I. Love visiting, this country I'm, really. Honored to be here to tell you a little about how that works and, kind. Of demystify. This, whole you, know machine learning thing, right because I think for a lot of folks it's sort of black, magic is it really. Real. Today. I hope, that by the end of, this session you'll, believe that it's real and you'll understand, the value of it.
So. I do want to start off with a little bit of demystification, and I'll. Start with terminology. What. Is it is that AI is, it machine learning I don't know which we, use them interchangeably today. The. Term artificial, intelligence. Some will argue actually. Started, back in the antiquities back, with the great philosophers. When, they. Thought, that it might be possible to, take, something synthetic. And make. It seem human. But. The real research actually, started in the 1950s, the real research behind artificial. Intelligence, and in, 1959. The term machine, learning was born now. Artificial. Intelligence, was really focused on trying. To make something appear to be human something, appeared, to act human. But. Machine, learning actually took that one, step further so the idea behind machine, learning, is that you teach, a machine, to. Be smarter. Than a human. But. It. Didn't, really have. The kind of gravitas that it has today didn't really have the effect that they wanted it to have back in the 1950s. Because they. Didn't have the data they, didn't have the compute power right this, is really just dream. And. Then in the 80s fun. Decade great decade, and. The 80s. Still. Didn't have the compute power still, didn't really have the algorithms, right we turned to something called expert, systems so the, 80s, neither. Of these terms were really very cool we. Looked at expert systems which is basically, an. Automated, way of, telling, a machine to sort of act like a human again. But. Today, what's. Old is, new again. And. AI and ml, you'll, hear used interchangeably. But. I'm. A little old-school myself, so. We. Really focus on using machine learning for the, classic, definition, which, is to train, a machine to, be smarter. Than a human. So. There's two basic types of machine learning and it's, pretty easy to separate them out in terms of, labels. So a label, is a. Name that you apply to something that goes into your machine learning model that tells you what. That item, is and. For. Our purposes, we, focus, on something, that is malicious and something that is not malicious or something that is clean, so. The big difference between supervised. Learning and. Unsupervised. Learning is that supervised. Uses. Those expert, labels so, the expert labels feed, into, the machine learning algorithm, the, machine learning algorithm, learns. From those labels and learns from the properties, of the, files. Or samples, that go into that machine learning system, and then. It predicts, what. The. Whether. Something is malicious are clean on brand-new, samples, that it has not seen before. Unsupervised. Learning on the other hand doesn't. Need those labels. Unsupervised. Learning is. Really. Good at removing, human, bias and, we all have it right security. Researchers, are no exception. You. Have to develop bias, to. Become an adult otherwise your brain just won't, function. Unfortunately. This causes, blind. Spots right if you think you know the answer then.
You're Going to miss something and the. Beautiful thing about machines. Is that they. Don't have to have these biases, unless you teach it to them so. The really nice thing about unsupervised. Learning is, that it can provide a great complement. To your, supervised, models, by uncovering those bias biases. For. Example, there are clustering. Algorithms, that, will, group, like, things together without, having any kind of predisposed, labels, it's. Really good at finding the unknown, unknowns, right pockets. Of things that, have, yet to be, discovered by a researcher. It's. Also good at finding, similar. Malware. Families, that may, be a researcher, didn't know about yet so it's a fantastic complement. To a supervised, approach. Another. Way to use unsupervised, learning is through anomaly, detection and this is simply the machine learns what's normal, and then, when something that is outside, of that norm occurs it doesn't. Label it good or bad but, still it's a great compliment, to pull into a supervised, approach, to, find, some things that perhaps a normal researcher wouldn't find. Supervised. Learning is, really. The, primary, method, we use for malware prediction, however and it's. A fantastic, way. To scale. Researcher. Expertise. So, if you. Think about what. A human, can hold in their brain right it's, about seven. You know digits I think that's the, industry, or the the average of what a person can hold and you're in your brain seven, different variables, at the same time. So. A, machine. The. Capacity, is infinite, right. So. When a malware researcher. Looks. At a piece of malware and they find maybe five attributes that, make it malware and they label it as malware that. Pile actually. Has hundreds. Of thousands of other attributes, that they just simply can't take the time to go figure out well. There's machine, can so. When the researcher. Labels that file and those. Hundreds of thousands of attributes, go into the machine learning algorithm. Then. The machine can say oh actually it's, this you know set of 1,000, that.
Make This thing malware, and I, can use that to, predict, new. Malware that maybe the researcher, you know wouldn't have predicted there they're simple seven. Component. Algorithm, was, too simple, to find and that's, how you can take one single thing that of malware researcher, has labeled, and turn. That into a prediction for, tens of thousands, for hundreds of thousands, it's. A fantastic, thing to scale. So. Talk a little bit about. Machine. Learning how, we build machine learning for the. Windows Defender ATP, product. Starts. With having, a really great set of labeled. Samples. And. This is critical, so you, may have seen people. Take samples. From virustotal. Build. Machine learning algorithms, that, can then predict, new malware, samples. But. Where's the clean files if, you. Don't have a, really. Select. Curated, set of clean, and malware, in your. Training set your. Machine learning algorithm. Will never work in the real world so, this is a critical, piece of the development, process for us is making. Sure that our sample set is indicative, of the real world and of, what our customers, see day to day. Feature. Selection is another critical piece so we. Have machines, automatically. Select, features, for us because sometimes the machines can find them faster, than researchers, and we also have researchers, who. Specify. Certain things that they know are indicative, of malware, and all of these features go into, the machine learning model for training. So. Then comes the experimentation, part, this, is where you try out different learners. Maybe. You want to try a deep, learning model, because. This is a back-end cloud-based, system. You've got M, anat compute, power deep. Learning model, although it's super intense you, can run it or maybe, you're developing this for a client but. The end of the client, needs to be super performant, so you're going to look at a different type of learner something. That's fast something, that won't have a lot of impact. On the endpoint and, then. You want to look at the parameters. Of these learners because, that, can change the outcome as well so, a lot of work goes into this experimentation. Phase for us. Then. When we feel like we've got a good product, we. Should and we should. But, it's not on by default so it's not blocking, with. The, ATP, product, we, can ship this to the cloud or the client in non-blocking. Mode to. Make sure that it's really, performing. The way that, we saw, in our experimentation. Phase, sometimes. It takes us a couple of weeks sometimes. It takes us a month we do a little more tweaking and tuning and when, we see that we've got it right and that this is going to be good for customers we, turn it on. This. Gives you an idea of, the. Data that, goes into these models and. It's. Pretty. Intense so we're. Looking at static. Attributes, of files and you know the normal stuff that you might expect, we're, looking at partial hashes, we're. Looking at who signed the file is the. File signed, but. We also look at behavioral, components, was. This related. To another file, was. This injected, from another file. Was. This downloaded, from somewhere what. Are the contextual, elements. Related. To this entity, that. Make. It more, relevant. To malware or, more. Like a clean file all. These, components are important. And. Take. Something like the URL. Which. Can. Be one static. Property. But. A machine, can actually turn, that in, components, so, we can do things like Ingram's where you take three. Characters, that are in a row or skip grams and, those. Become features and models so you can see how these, thousands. Of features can, turn into tens. Of thousands, of features hundreds of thousands of features and in some of our models millions. Of features. Now. I'm gonna give make a joke okay so have you heard the the phrase there's an app for that. Anybody. Yes. Okay, there's an app for that okay I'm gonna start saying there's. A model, for that because there is. Alright. Is. It. New malware is, it, old malware. There's. A model for that different. Models okay. Yes there is a model, for those. What. About Program, Files, how. About JavaScript. What about Java, VBS. There's. A model for that yeah for every, single one of them. Alright, so what. About my enterprise, it's. Not like what you see at the consumer, level it's. Different different malware, are different clean files okay, okay there's different, models for that.
Alright. So you, know this the static, file properties, it's different, from this behavioral, stuff for like surely your model, doesn't you know work on both of those actually no yeah there are different models we've got a model for behavioral we have a model for static file properties. There's. A model. For that. There's. Lots of different models okay and the. The diversity. Doesn't, just extend, to what, we're trying to target it, actually. Extends into the learners, and the, numbers of features that go into those models so. For example, for, something that might have. Millions. Of features and we, use a, deep neural network, to, predict whether something, is malware, that's. A great cloud model right because, it's, you, know takes a lot of time to run takes. Some time to analyze not something, you'd want to you on the client necessarily. Something that's super performant, and cloud, versus. A client based model, where maybe you just have 30, features and or. A few hundred features, and, you've, got a linear, model which is super small, because, you don't want to send a big huge update down to your client and super. Fast right, so. We have a large diversity, of models to serve multiple. Purposes and. You. Might think. Yeah. That sounds, really inefficient why, do you have so many different models why can't you just have the perfect, singular, model that does it all. Well. Turns. Out that's a vulnerability. There. Have been a lot of research, published. On how, it's, very easy, to, exploit, and tamper, with a big. Singular, model, alright, so. We. Actually did a talk at blackhat this, year. To. Discuss this it turns out that diversity, just. Like diversity is, critical to. Our businesses, critical. For, our society. Diversity. Is also, critical, for machine. Learning. And. If you're interested in this topic it's. A one-hour discussion. The. Presentation. Is uploaded, to blackhat and you, can go have, a look at it and see the, research that we have done the. Real attacks we have seen in the wild on our, models, and. The. Red team. Exploits. That we perform to see how. We could, make our model stronger, it turns out that, diversity, is really. Key to having, a tamper, resilient. Machine. Learning system. So. I'll, talk a little bit about where, within, the window windows defender ATP, product, this machine learning is incorporated, and it's in multiple layers so. This. Funnel, here shows, malware. And clean files that, are trying, to get through right, the.
Orange Dots for the malware, so. Most. Of our protection, comes, at the client level we. Have. Local, machine, learning that runs there we have heuristics, we. Have behavioral. Detection, that helps protect right at the client whatever without, ever, having to talk to our cloud. And. Of. Course this protection happens, instantly, if. You. Don't have, an answer at the client let's, say the client thinks that this is suspicious, but it's not really sure it, will send all of that metadata that you saw in a. Small blob up to, the cloud for, analysis and the. Cloud through, these metadata, based, machine. Learning model, rules well, return, a response within. Milliseconds. About. Whether it thinks, that what. The client, saw was. Malicious, or not so. This happens in milliseconds, is this instantaneous, this, is real-time protection still. If. Those. Metadata, based, models. Don't, know the answer, we. Might request, a sample of the activity, that the client saw and the. Reason why we do this is because there's, some additional, feature, extraction. Capabilities. That we have at the cloud that, wouldn't necessarily be performant. To do on the client right again we're, thinking about the end user we. Don't want to bog them down with something that's hogging up all their resources so we're going to grab that symbol send, it up to cloud do, a quick extraction. Of additional. Features and run, a deep learning model against, it to figure out whether it's clean or malicious. This. Happens within seconds, and this, is a communication. Between the client and cloud the cloud says you, know I think that thing's suspicious, just hold for, a couple of seconds the client says gotcha, I'll wait, waits a couple of seconds and once. The, cloud. Has extracted those properties run the deep learning models since, the, response back the. Client knows whether it can let that file, run or whether, it needs to go quarantine, that file because it's bad. Yep. The. Last two are a little slower so the first ones that I talked about these are all real time protection capabilities. Last, few are a little slower so, this first one is called detonation. And, with. Detonation. You, actually, run. That malware. Within. A virtual machine you. Know somewhere some malware will evade this doesn't. Work for everything, but we, do some additional feature extraction. There we run that data, back through our machine learning models it helps strengthen them and sometimes it, does, result, with a protection, response within a couple of minutes and adds. Another layer of Defense. With. Big data this is analysis that sometimes processes. Over days it uses signals from across the company. Sometimes, it takes just a couple of hours but it's a little slower it doesn't provide real time protection but again provides one, more layer down. The chain. So. What. If your clients, can't, talk, to the cloud. It's. Ok it's, ok we're. Still going to take care of them ok we have some, answers for you it's. Not the end of the world all right so let's say the protection, stops, at the client. Well. Even. With client. Or cloud. Connected. Clients. 75%. Of the protection, for them still. Happens at the client and, the. Best majority, of that protection happens. Pre, execution. So even when you're talking to cloud, we're. Stopping, it in real time right we're not necessarily waiting. On the behavioral, characteristics. To kick in so at the client you're going to have, local. Machine learning models, you're, going to have heuristics, and for. Persistent, threats for things that some, threats do stay around for a couple of days sometimes they stay around for weeks or months we. Will send down protection to the client because you know if you don't have to make a cloud call to block a threat why, do it so we'll, ship some protection down to the client so that it can block it when it sees it so all, of these things still exist right, now the thing that you may, see if that, client is not cloud-connected. Is more. Behavioral detection. Kicking, in so. Let's, say that threat. Goes through you, know the regular. Local. Heuristics. Local. Machine. Learning protection and we. Don't have an answer and we can't talk to cloud a file. May run, when. Something runs it needs to do bad things or, even if it's fireless, it may need to do some process, injection, to do something, bad on the system, the, Maur has.
To Do something even if it's fireless, there will be a behavioral signal. There so for. These clients that don't have cloud connectivity, you may see more, behavior, based detection on them. But. There's some other mitigations, that you can consider we, have a really, large number of attack, surface, reduction, rules things, that lock, out exploit, vectors, make, it really hard for attackers to, abuse Microsoft, Office documents. PDF, documents. There's, USB. Controls, there's, all types of mitigations. That you can deploy. Through, attack, surface reduction, rules and for, your disconnected, environments. You may want to consider that in fact, if any of you are thinking about. Disconnected. Environments. Or you've got clients, that are, intermittent. Ly disconnected. I would, love to talk to you after, this because i'm doing some research on, the. Threat. Models for those different, types of scenarios, so please, find me sometime today i'll be at the booth i'd love to talk to you more about that. So, in addition to the attack surface reduction, rules you can also think about application. Control that makes sense for that client. So. That was a lot of philosophical. Stuff, a lot of product, detail. I want. To walk you through some, real-world, examples. Of. How machine. Learning helped, keep our customers, safe. So. This first one is, your. Payment was, declined. All right most. Of these things are, using some type of social engineering, lore to get people to, open. Up that document or, install. That, file so this one was your, payment, was declined, and in. This, case there, was only one. Victim, this. Victim was in the UK this. Is one of these examples where, we only saw, one, instance, of this particular, attack. Started. Out as an email and that. Email had a link to a malicious document, that, malicious document. Was, hosted, on a. Good website so it wasn't a website that could be blocked right it's, just the document, was malicious, it, had a macro, inside it and when. That macro runs. It. Connects to a website and, downloads. The actual payload of the attack which you know would do something malicious to the that, person if they had been infected. So. What. Did we do, how did, ATP, decline. Your. Payment, was declined. This. Is what we did to protect the, system so at. The client we, did a quick extraction. Of dynamic. And Static properties, and. We. Sent all of those details to the local, machine learning model that was on that client and in, this case these signals, they, were enough local, machine learning model said yeah that's malware. You need to block that, and. Sent, that information to the client and the client blocked, it with the milliseconds. Person. Was protected. No. Cloud call required. This. Next example. Was. A very large-scale attack, it. Affected, mostly Russia Ukraine, and, Turkey but, the impact was pretty big and the. Impact wasn't just on the folks who got. Encountered. The malware but it was also big on the company, whose, reputation was damaged, because this happened. So. The company in question is called media, get and, media. Get it's. A consumer, focused, company, they. Create. Torrent, software, and to. Be honest they're they're what we call potentially. Unwanted, software, so if you are a, commercial, customer, and you. Have chosen, to use the block. Potentially, unwanted. Software. Option. You. Wouldn't have had any of this torrent software, in your network but, for most consumers, you, know they don't use, this, feature. It's. An, enterprise, feature. So. For consumers you know they may have had this torrent. Software in fact many, of our consumer, folks did and what. Happened here is that these attackers. Infiltrated. This company, and they. Were able to. Send. A tampered. Copy of, their. Software down. To all of their clients so. It was essentially. A fake, version, of the. Media get torrent, software and. They. Sent down this malicious, version. Of the software to all the clients they, do, anything at the, time but. What they did is they added, this, unusual. Persistence mechanism. So a feature, to, make it really hard to uninstall, or get that. Software. Application, off of the box and. From. Our behavior, monitoring. Capabilities. We, know that this is a technique, that's often, used by malware so this was our first flag, that something, was, going wrong here but. Still think they weren't doing anything right so they put in the persistence, capability, and waited, well. Then they started downloading.
Coin. Miners now, this was their endgame right to get all these systems infected, and then download cool miners and make. Make a fast buck. So. What did we do to. Stop this attack, how. Did we foil, dough. File. These. Aren't funny okay, this must be some translation, thing here okay. Again. We. Extracted. Dynamic. And Static, properties, and for, us it, was the malicious, behavior that, tipped us off this. Kicked. Off a query, to our cloud we sent all that metadata to our cloud and we had multiple machine learning models flagged, this as now we're clear, as day and. Within. Milliseconds. We sent that response back to all the clients that had received this bad update, and, you. Know the clients were instructed, blocked, this malware. Don't. Let it run again. There's. A ton of blogs on this if you're interested supply. Chain attacks. Were a big thing this past year so lots. Of details if you want to go find out more. This. Next example is, another. Really, small, one and this. Was a malware family, called earth, sniff it. Was, highly, targeted, mostly. Focused, in the US and. Again. It used a malicious document, to try to trick people into, opening, it Kame. Is attached to an email. And. The, malicious document, when you open it it says oh you need, to enable, these additional, word features which. Enabled. The macro. To run once. The macro runs it kicked off this heavily off you skated, PowerShell. Script and that PowerShell. Script would go download, her. Sniff which, was a, password. And information, stealer, so something, that's trying to get the, credentials, of the person who's on that machine and he log, in you know banking information, that they may have these. Folks were off after, all kinds, of information. The. Particulars. That they used were highly, differentiated, so. These attackers, knew. The. Location, of their targets, and they. Knew the local, businesses. That supplied. Those targets, so, for example, in Nebraska. In the, documents, that they sent to those, victims, they, used, DMS. Which is the name of a local, landscaping. Company, in. St.. Louis Missouri they. Used Dolin, care this. Was an assisted, living facility for. Elderly, people right. That. Was in the st. Louis area. Then. In Johnson. City Tennessee they, used Dockery. Which was a floor covering, company, and, these are just three examples, there. Were like 20, different. Business. Names that, they use to try to trick, these people in these localized. Areas, into, thinking that it's, a supplier, sending. Them information that. They need, to open, and you know act, on. Really. Tricky social engineering. So. Who were, the targets, here. There. Were consumers, definitely. Some consumers, that received. These emails. Lots. Of local. Small, commercial, businesses. But. They, didn't stop there. This. Hit universities. 15, people, in universities and, other educational. Entities. This. Hit people in primary. And secondary schools. Right, I got, to the folks that are running your kindergardens. Also. People in the government, to. Folks in government public administration. Got, these mails. One. Person, at a health provider and, another. Person at an, electricity and gas company, right. So. They. Want all. Of our, and I. Don't, care if you're a consumer they, don't care if you're a business, they don't care if you're a government, there. Is always, something to steal from, these different targets, and because, they were localized, any of, these folks could. Have been using these suppliers, right. So, it was a social engineering, engineering, lure that worked, across. The board. So. How did we stop these attacks, again. Started, at the client level we, extracted, these properties, the. Clients. That these attacks were, suspicious, not enough probability. To block it at the client, we. Sent that information to cloud our. Cloud based models. Returned, malicious, and within, milliseconds. It. Was blocked. Alright. So. A few key points that I hope you take away. The. Attackers, are clearly investing. 1.7. Million new. Attacks, in a single day, we. Have to invest to stop them and so. Do you. We. Want to provide you the right tools and capabilities to do that we. Know that the client, has. To be security, has to be effortless, or it's not going to work we, want to have very low impact on the client and we want to give you the tools that you need to to.
Administer, Those clients, and respond. To breaches, when they do occur. We. Also want to provide you not, just connected, solutions, but also disconnected. Options to again come talk to me if you have those, scenarios because I want to learn more about that. I'm. Most. Important, I hope, you, see that, the M and, machine, learning doesn't. Stand for magic, okay, it's. Real, and it's. Absolutely. Critical to. Protect, customers. Against the threats that we see today all. Right. Thank. You so much for showing up so early in the morning and I, hope, this was valuable to, you thanks.
2019-01-25 17:35