Frontiers in Machine Learning: Security and Machine Learning
Hi. Welcome to the security, machine learning session, at today's frontier machine, learning. Event. We have a great lineup of speakers, for you today. Our first speaker is alexandra. Madley, who's a professor, at mit. The director of the center for deployable, ml, and faculty, lead for the csail, msr. Collaboration, on trustworthy, and robust, ai. He'll be talking today, about, what our models, learn. Our second speaker, is don song professor, at berkeley and ceo and co-founder, of oasis, labs, she'll be speaking to us about ai and security, challenges. Uh. And the lessons we've learned in future directions. Uh don unfortunately. Uh will be giving her talk, pre-recorded, but will not be here for the, live q a afterwards, because of, um. Because of, family related travel. Our third speaker is jerry lee a senior researcher, at microsoft, and he'll be speaking about, algorithmic, aspects of secure and machine learning. Throughout the event feel free to, add in questions, in the chat and we'll. Answer them. During the chat or. At the live q a at the end of the session. Thank you very. Much. Hi, my name is alexander, madhuri, and what i want to talk about today is, what do our models, learn. Okay and as usual, this is a joint work with my amazing students. So, the point of start, is the fact that you know it's no surprise, to any of us, namely that machine learning. Can actually often is unreliable. Of course you know one example of this is. The notion of additional examples that we probably are all familiar with, but this goes beyond that, of just you know a. Month and a half ago, and we had this, you know the situation, where, on the highway. Tesla model 3. Crashed, uh you know without, trying to even break, into an overturning, vehicle. Probably what happened, was that. This vehicle. Was out of distribution, for this you know for the. For the pilot for the tesla pilot, and that's what it essentially said okay assume i don't know what it is so i can just go forward, confidently. And that's how the crash happen. So things are not always, uh, working the, the right way and the question is why does it happen this is something, i am thinking a lot, and my students, are thinking about this a lot with me. And kind of one, answer that we already came up with was that, kind of at the root of all of that, is certain. Mismatch. Namely it turns out that if we look, at all the tasks that we ask, our model to, solve like all the classification, tasks that we ask our models to solve. Well there turns out to be many ways to succeed, at this task. And some of this waste, actually the best of these ways might be different, actually very different, to how we as humans, solve these tasks. Okay, and this kind of is at the root of many of the unreliabilities. That we observe, in the real world. And, today, what i want to do is i want to kind of go back to this mismatch. But, kind of consider, it at a bit more, meta level, namely in the context, of, so-called classification, task misalignment. So let me explain what it is. So kind of the idea here is that usually, when we think about supervised, machine learning, we have always this kind of this view here that there is a data set, and then there is a model. That we keep training on the data set and we, improve the model to get better and better accuracy. And this is kind of. This is exactly the bread and butter of machine learning nowadays, and that's kind of what we focus on but kind of the point i want to make is that there is one element, of all of this that we kind of usually, don't think about.
In This element is that there is, in addition to this data set there is actually a motivating. Real world. Task for instance object recognition. And kind of our data set and the corresponding, benchmark. Is meant to be just a proxy. For the three word tasks. And, if you kind of, think back to this missing piece. The question that, you should start asking yourself is okay so, if there is this motivating. Motivating, task that we really want to solve. And then there is just this projection, which is the benchmark we are actually trying to solve. Is there maybe, a possibility. Of us overfitting, to this benchmark. And kind of this is exactly the question, that i want to zoom in on in today's talk. Okay. So in particular. The question i want to ask is you know how well our data sets. Reflect, the real world, in particular. What data set biases, do our models pick up. How are these biases, introduced, in the first place. And you know the point of start here just to also give you some, you know some kind of idea of what i'm talking about, i want to study these biases. In a very simple setting namely, background biases, so this is joint work with my students. Kai, logan and andrew. And kind of the idea here is that like look at again at the simplest possible, bias. Namely. How much, the decision, of a model depends, on the background. Of an image, even though we ask it to correctly classify, the foreground, of this image. As we all know, uh. Our models. Definitely, depend on such backgrounds. Okay and you know that's actually not surprising, because, so do humans humans, also use backgrounds. When they solve the classification, task, it's easier for you to. Recall. Who is the person in front of you if they are actually in the usual environment, that you interact with that person so if your work colleague. You would see them on vacations, it would take you a while to figure out, who who they are. Okay, so kind of so we know that background is definitely a signal. So there's definitely a bias in our model. But the question is like can you get a bit better grasp of like how much of a bias it is in particular. How much it differs, if at all, to how, you know humans, use this bias in their classification. Okay, to this end, we created, this uh, you know uh just, various version, well as you took imagenet, and we kind of separated, into, the foreground, and background signal, and created a bunch of, different versions of this data sets to kind of allow us to, get a fine-grained, understanding. Of dependencies, between different, types of, background signal. And. Long story short what we did we just wanted to study, how, the model's performance. Is, affected. By. Different, mixing, and matching, of this background signal. And you know in general, we rather like well there was a, bunch of finding that, i will not have time to go over all of them in this short talk, but essentially realize that this signal. Really plays a major role, in the performance, of the models. And just, one finding i wanted to bring your attention to is this notion of an adversary, backgrounds. So what we found, that actually, for most inputs like over 87, percent of inputs. We can, fold the model, into wrong classification. By just, choosing the worst case, background. For this image, so the foreground. Picture, is the same. And we just choose a, you know, adversarial, background, and suddenly the model, is fooled. Into, misclassification. Of this foreground, object. In fact this gets even more interesting. Namely, it's not only that for every, for most of the images there is necessarily, a background. It's actually like, some of the backgrounds, are adversarial, for many foreground, objects, so here are just some examples, for instance my favorite one is the one in the. Bottom left corner, which just shows that a person holding something. Turns out to be a very strong. You know signal, model, that whatever is in the foreground, is a fish. Even if this is. Not a fish at all. Okay.
So This kind of shows us that like things are not exactly, as we would expect them, you know again humans use background but they think he wouldn't be fooled by that, so you might ask, what would it take, to get, models, that do not have these problems. Okay, and in the paper we showed that even just the simplest possible, thing like randomizing, the background. Uh you know during training especially, like breaking this correlation, between the foreground, and the background. Already helps tremendously. And more importantly. Even if you don't try to explicitly. Train against this you know, over reliance, on backgrounds. What you get is that you know more accurate. Models essentially models that do better, on imagenet. Also, end up, being more background robust, interestingly, that doesn't mean that they don't rely on backgrounds they actually do rely on background. But they rely on it in a way, that actually. It makes it less prone to being fooled by adversarial, backgrounds. Okay. So, this was just a very simple. Example, of the bias and the study of a fine green study of it, but now you know kind of you want to ask a broader question so you know okay so there are these biases, but where where do these biases, come from, especially. Can we have biases that come, from not just from the nature of the visual world but actually, from the way our datasets, are constructed. And the spoiler. Is the answer is yes. But let's look into that. So essentially the point that there is many biases, that our, dataset. Convey. To our models. So just to give you a describe to you one of them is particularly interesting. Let's take a look at these three, images from the image that from the image and data set, and three, immigrant labels corresponding, to this images. If you look at these images and in this labels. You probably, will say that okay this is this this look like correctly classified. Image of classifier, the imagenet. Input. However what we, what you realize. Is that actually. None of these three labels, is correct. According to images labels, and the correct image that labels are now in the image. So. This is kind of says okay so maybe we are not as good as humans and image and classification, as you would like to. But then, you might think and saying well. Where are these initial labels. Actually, the wrong labels. You know if you look at the image on the left you know it could equally, be monastery. Or a church and you can't really tell, which one is which or which one is more right answers over here. So these are kind of just, some you know handpicked, examples, of like the images labels maybe, being, a little bit, kind of not exactly. Reflecting, the ground truth of the image. But you know the question is okay so is this just, you know just some outliers, or is it kind of significant. It, does signify. Some deeper problems with our images labels. Okay so what might be the problem. And kind of you know when you look into it you realize this is not these are not outliers, by any means. And essentially. Kind of the issue here is the way our data sets are created. So kind of when we usually, think about how our data sets are created, we think of this process when they are like real-world images, and then there are expert annotators, that essentially, choose for each image, the perfect.
You Know the perfect, class. To er to label this, this image with this class, so that's how we, idealize, this process but that's, completely not scalable, this is not how we can, you know images. How we can get labels for millions of images. So what we do instead like one of the most popular ways of doing it, is to kind of go kind of the other way around, so what we do is we first, settle, on what are the possible labels we are interested, in. And then essentially, we source the images. By just plugging, this you know desired. Class labels. Into search engines and getting all the candidate, images. For the given cell for the given uh for the given label. And then essentially, well, just to make sure that the images we source that actually really images corresponding, to the desired, label we do the crowdsource, validation. And what we do essentially we just you know a long story short we just present this image. To a, to a cross sourcer, with a desired label. And we ask does this image contain. Object, of the class. Described by this label. And depending, of you know whether the answer is yes or no we either keep this image as correctly labeled with this label, or we discard it. So this is nice because it scales very nicely especially when you ask these questions in batches. But the problem is that we kind of it's a very leading question so we, always, ask only about the single candidate, label for a given image, and we never make the, you know the human we are asking, even aware, that could be possible, like that could be some other classes. That could be correct, for the same image. So. Now kind of this is a problem this clearly introduces some bias there is kind of some, default, label for the image and if the default label for the image is not, correct. Then it's just discarded. And now the question this is the paper that we recently, published. You know asked the question okay so how much of the problem this is for, imagenet. And the answer is. Well it is a problem, and but like how do we actually, try to get a hold of this, well for now, well to even get any, to perform any kind of study of these questions, you have to first get more detailed annotations just because you need to confirm, the current imagenet, annotations. Against, some more grunt rule annotations. Over there, so how do you get them well this is actually kind of tricky but what we ended up doing was actually, relatively, simple or actually, surprisingly. Successful. What we did was, essentially like looked at a, bunch of image net train models. And we kind of, got, the top five predictions, from these models for for each of the images that we considered. And this way we got a narrow down set of classes. That, could remotely, be plausible. For this image. Now once we manage, to essentially, get a small collection of possible classes for each of the images that could be the candidate classes, we just ask, the. Amtrakers. To essentially, like you know look at the image. And, ask them okay, for each of these possible classes you know would this was this label be valid for this image, and also we actually asked them about the information, to give our information, about how many different. Objects, is there in the image. And also, what would the main object in the image according to them be. And we did it of course you know we upgraded, over many workers. And this way we got this like very fine-grained. Grunt through annotations. For the you know for a subset, of the image and validation, set. So this was actually like exactly what we need for doing the step that i will describe, in a moment. But like i just want to point out that this is actually can be viewed as a nice, bootstrapping. Of the original image annotations, in some ways we got this initial, image and annotations. That kind of got. Got created in the way i just described. But then we kind of iterated on this, via training models and then, uh you know annotating. Using the labels directly by these models, to kind of bootstrap this knowledge and, clarify, this knowledge and kind of and distill, it in just a little bit more, and we think that this process will be useful, also in other contexts. But yeah, now that we have this. Grunt through label or morphing great labels we can ask ourselves, okay so how accurate, the original, image labels are. Well, we discovered, a, number of. Interesting. Interesting, phenomena. So the first one was the prevalence, of multi-object, images. It turns out that around, 20. Of test images, contain, more than one object. Here is one example it's an extreme example that has, really multiple. Multiple objects, but there is like 20 percent, of test images, that actually have at least two objects, in the in the picture. Okay so how does it affect accuracy. Well, remember, you know like we ask our models to always. Answer just like we are talking about top 1 accuracy, so we are kind of we just want this the model to output what is the correct label, and if it's wrong we just say the model is wrong.
So, Uh what it turns out is that actually, if you look at the accuracy, of our standard modulus, on the. Single, uh labeled images so images just with a single object, this accuracy, is actually, much better. Than what we view as the kind of state of the art. And kind of, only when you look at the performance, on the multi-label, images, then this performance, becomes. Actually, significantly. Worse. Okay, and, what is even more important, is that if we try to correct, for this kind of obvious, unfair situation, when there can be multiple objects in the image and what you essentially do you, make the prediction be correct if it matches, either of the images, either of the objects in the picture. Well essentially, this performance, gap disappeared, and it turns out that essentially. You know. Whatever, the current models are lacking, in terms of performance, on them on this multi-label. Uh, multi-label, images. It actually disappears. And can be. Like largely explained. By just taking into account the fact that there are there are multiple. Objects in the image. So this is nice. And now but now the other question that kind of, comes up in this context is saying okay. If there are at least two objects in the image you might ask okay so, which one, should the model answer, if it's forced to answer, you know like, to choose only one of these objects. And, kind of, what we found out to someone surprising, that often, the choice that the model makes. Is not the one, that the human would make particularly, that the object that. The model. Identifies. Is not the one. That the human would view as the main, one in the picture. So here are just some examples. And kind of, you know what's happening here is that like you know since, uh you know the. The image sourcing, process. Of the er well that created that we used to create an imagenet. Kind of, was, more, like willing, to, use this kind of more specific. Uh and unique, labels, for the particular images because you know you get images as a response, to your query. Well, our models. Also have learned that this is the way to go and they do pick up on these dataset biases. Even, though these biases, go against. Human. Preference. So this is kind of another thing that we find in terms of discrepancy. And you know there is a bunch of other observations, that we had and i, invite you to look at the paper to to take a look at them, but the last question i wanted to ask and kind of that we asked in the paper is okay, so once we have this, correct. Ground trend. Ground truth labels and when we have once we have access to human annotators. We can ask, how good immediate models, really are, while once we account for these issues with labeling.
So In particularly. We, run this human-based, evaluation. In which what we did is that whenever we presented a model with an image. And we got some answer we did not just compare this answer against you know the, the image and ground truth or against, our gram truth. Uh, labels even, what we did we just asked the person okay. Here is an image and here is a pro like and here is a, possible, class, is this class a valid, labeling, for this image so essentially we just ask humans to evaluate, the validity. Of the labeling over by the by the model. What did we find, well what we found is that the good news is that essentially, like, as our models, improve. In terms of just pure, image and accuracy. So does this human base evaluation. Also improve so so far so good our, current driver of the design of the system seems to be aligned. With the you know with the you know quality. As assessed by humans. However. At this point it turns out that annotators, often can't tell apart. You know the predictions, of the model, versus the correct, images, predictors. So in some ways. You can claim, that at least, for some, classes. Essentially. Our current models already hit, the baseline. Of you know non-expert. Human annotators. And kind of, this. Gives us a question okay so like how useful, it is to try to improve the, performance, on the imagenet. Further. Maybe essentially, if nonexpert, annotators, is our, golden standard here maybe imagenet. Is already solved, and we should be looking, for some other data set and other tasks, to kind of to try to you know improve our, computation, models further. Okay. That's all i wanted to say so let me just summarize, and, and talk about some takeaways. So first of all, i hope i made it clear that you know, what modules do and do not learn is not always clear to us even though we kind of intuitively, assume we do. And we really might need to study this, before we, focus on further. Improving, performance, on our current benchmarks. And this is a particularly, important if you think about robustness, because because exactly. Like problems robustness. Emerge, from these biases. That are misaligned, with the biases that human uses use. Also. You know like as we showed models are affected by biases of the world and all of our data. Data, pipelines. And, you know once we understand. What these biases, are we can find, ways, to explicitly. Account for it and there is some recent work, um, with jacob steinhardt, in my students, that kind of show how to do it in a more. Refined, form. For some more subtle biases, that arise in our, data pipelines. And finally moving forward. You know the question is you know, what are the other biases, that our models, learn from the data. And you know, how do we train the models in our in their presence essentially okay for some biases we actually might want our models to pick up on them but for others we might definitely, not want our models to use them and how do we kind of choose which one we do which one we don't and how do we enforce our models how how do we force our models to obey these wishes. And also you know, how do we measure, you know performance, on the underlying task we care about. As opposed to just, overfitting. To this benchmark, itself, okay so how do we, how do we do it in a way that actually really gets at the essence of the real world task you want to improve on and not just an artificial, number that's just a proxy, amin to an end. So, this is all i've got. Thank you and i'm happy to, discuss, it during the q a session. Thanks. Hi thanks for being here. My name is dong song i'm a professor, at uc brooklyn, and also the founding, ceo, of oasis, labs, today i will talk about, ai and security. Challenges, lessons. And future directions. As we all know deep learning is making great advancements. For example, alphago, and alpha star. Have all won over world champions. And deep learning is empowering. Everyday, products. As we deploy, deep learning. And machine learning. We, there's an important aspect that we need to consider. Which is to consider, the presence of attackers. It's important to consider the presence of attackers, as we deploy. Machine learning. For a number of reasons. First, history, has shown that attacker, always follows the footsteps. Of new technology, developments. Or sometimes, even visits. And also this time the stake is even higher with the ai. As ai can choose.
More Systems. Attacker, will have higher and higher incentives. To attack, ai. And also as ai becomes more and more capable. The consequence, of misuse, by attackers. Will also become once more severe. As we consider machine learning in the presence of attackers. We need to consider, several different. Aspects. So first. How attackers. May attack, the ai systems. Attackers, can attack the ai systems, for a number of. In a number of different ways. One, attackers, can attack the integrity. Of the learning system, for example, to cause a learning system, to not produce. The intended. Uh, correct, results. And can even cause an immune system. To produce. Targeted. Uh. Uh. To produce targeted outcome. Designed, by the attacker. And also. The attackers, can also attack the confidentiality. Of the learning system. For example. To learn. Sensitive, information, about individuals. From. The machine learning. Systems. And, to address, these issues, we need to develop. Stronger, security, in the learning systems. Attackers, can also try to misuse, ai, and, learning systems. For example they can try to misuse, ai to attack other systems. Including, finding vulnerabilities. In other systems. Try to devise new attacks, and so on. And also attackers, can try to misuse, ai to attack, people. People are often the weakest link. In the, security, system. We have already seen examples, such as deep fake. And fake news, that actually can be generated. From ai systems. To address these issues, we need to develop, stronger security. In other systems. So given the. Time limits. Of this presentation. I will mostly, focus on the first aspect. How attackers. Can attack. Ai and learning systems. And what we can do about it. And first. Let me talk about, the. Integrity, aspect. How, the attacker. May attack the integrity, of the learning system. And first let's look at a motivating. Example. And. In the, context, of self-driving, cars. For, the autonomous, vehicle as it drives, through the environment. It needs to observe the environment, for example to recognize, traffic signs.
To Make correct decisions. These are. Photos. Of. Real world. Traffic signs. Here stop signs. And the second column shows a real world. Stop sign. Example, in, berkeley. And. Today's. Learning system, can actually. Work very well, and recognize, these stop signs very well, as you can see even when. The real world, stop sign. Impromptu. Has some markers, on. On the stop sign. But however, what if the attackers. Actually. Create. A. A specific. Perturbations. To these traffic signs. Uh for example, adding. A specific. Perturbations. Designed. To, form. Uh the learning system, in this case the image classification. System. And try to fool. The. The image classification. System, to give the round label. So in, the. Third and fourth column here. You. You are seeing. A real world. Adversary, examples. That have been created. To, then food the learning system. To give the wrong. Answer in this case misclassified. Stop sign, as a speed limit, sign. So, as you can see atmospheric, examples. When they are constructed. Effectively. They can fool the linear system to give the wrong answers. And one question that we have been. Exploring, is, besides. Uh, adding perturbation. To digital, images. Can adversarial. Examples, actually exist, in. The real physical, world and also can they remain effective. And the different viewing distances. Angles, and, conditions. And. Here. The images here showing. The examples. Of. Adversary, examples, that actually. Are created. In the real world. And showing that in the, adversary, examples, can actually be, effective, even in the physical world, and can remain effective. Under the different, viewing distances. Angles, and conditions. And. The, real world, adversarial. Example. Uh. Traffic sign the stop sign. That we have created, have actually, been, on exhibits. At the science, museum, in london. And. Teaching people that. Uh it's important. As we develop, learning systems. And. To. Pay attention, that they, could, to these learning systems. Can be fooled. By, attackers. And cause them to. For example. Make the wrong, prediction. So that's just one example. Assuming that the adversary, examples, can, full image classification. Systems. Uh. Other researchers. And my research group we have all, been exploring, this important phenomena. Across, different, uh domains. In different, in deep learning. And unfortunately. What we have found is that adversary, examples. Are not, just. Limited. To image classification. System, and in fact they are prevalent. In different types, of, deep learning systems. Including, different. Model classes. And for different tasks. Including, generating, models. Deep reinforcement, learning. And. And many others. And also. Adversary, examples, can be effective. Under different thread models, as well. Besides. Um. Besides the early example. I showed. Called, white box, attacks. Where attackers. Need to know the details, of. The. Learning systems. And, including, the actual, model. The attacks, actually can also be effective. In what we call black, box, model. So in a black box model the attackers, actually, don't know. Any details, about the model itself, including the model architecture. And the, weights, of the model. And. Our work and other research's, work have shown, that adversarial, attacks. Can even, be effective. In. In this type of model. In black box attacks. And we have, developed, different types of black box attacks. Including, xero career attacks, and. Where attackers, don't even have. Access, or query access. To the. To the victim model. To be able to actually. Uh develop. Develop, effective. Attacks. On. The victim model. And. Also. When attackers, have. A query. Query access. To the victim model attackers, can develop even more effective. Black box attacks. So now let's look at a complete example. In. One of our recent works. In this case we are actually. Studying. The. Deep learning model. Used for, machine translation. Machine translation. Is an important. Task. And has made a huge, progress. So in. In this work. We. Essentially, showed. Two. Results. One is a state-of-the-art. Machine translation, models. Actually. For example has, that has been developed. By google and others. It actually can be, easily. Stolen. Through the query access. Using. What we call a model stealing, attack. And as a second step once we. Can, still the model. Essentially build the imitation, model of the original model. We can then. Develop, adversarial, attacks. On, this. Imitation. Model. And then, using. What's called a transfer. Attack. We can then, use a text that's been developed. On the imitation, model. To then. Successfully. Attack. The, real world's. The remote model. So so in the first. Step. Our work shows that we can actually. Very effectively. Uh. Develop.
This, Imitation, model. By, doing, a number of queries. To, the. Machine translation, api. In the real world for example. Um, with the google translate. And at the. Clouds. Api services. We, are able to. To build this imitation, model. That can. Achieve. Uh, close, a very close, in performance. To the original. For example google translate, and so on. Based on standard benchmarks. And. That's the first step. And then in the second step, using. The, the learned imitation, model. We can, then. Uh. Develop, adversarial. Attacks. On the local imitation, model. And then. Um. And then, using, the. The transfer, attack. To show that. To demonstrate, that these, attacks, that we have developed, using the local imitation, model. Uh they can be effective. On the. On the cloud service the cloud api. The real world model as well. And we consider, different types of attacks. So here is a. A type of attack called the targeted, flips. The goal here is to replace. An input token, in order to cause a specific. Output token. To flip to, another specific, token. So for example. Here. Um. We have an english sentence. I'm going to freeze, is now below, 6 fahrenheit. Please help me. So with the google translates. Uh. With the original. Sentence. It. Provides, the correct. Uh translation. In german. As shown here. And. Using, our attack methods. Based on, a loss function. That we construct, and using, optimization. Based methods. We. Are able to create, an attack, here, by just changing. Uh one. And. One, token here, changing from the six fahrenheit, to seven fahrenheit. When we feed this new sentence. I'm going to freeze it's now below seven fahrenheit, please, help me, when we fit. This. This input. To, the. To the imitation, model. Then. The translation. Mostly, remains, correct. But, uh now instead of translating. It correctly, to. Seven fahrenheit. It actually translates. Into. 21. Celsius. So as you can see this type of attack. Could. Cause severe. Consequences. Uh as they can be fairly stealthy. And only change, the important. Uh. Important parameters. Uh in the statement. And, when we, upload. These quick. These constructed. Attack sentences. To, the. Actual, google translate. Api. It. Shows that it provides. The same translation. Results, showing that the attack. Is successful. There are other examples. As well. So here. Uh. Showing, that. We can. Create, easily, created these. Nonsense. Sentence. For example, in english it looks, nonsense. But then when it translates. Through this. Machine translation. It actually translates, into something. That can. That has real meaning. And, also, under, the, construction. Uh our specification. By the attacker. So for example in this case. We can create, this. Nonsense sentence, and then. Here, in the first one, it translates. Into i have to kill you. And in the second one it translates, into, another, essentially a malicious. Uh. Ill meaning, sentence. And again when we feed these constructed. Attacks. To, the real world. Cloud, apis. We see that the attacks. Remain. Effective. So these are examples, demonstrating. That. Also. Adversarial. Examples, are not just limited to vision, they are also effective. In. For example natural language domain. And this, really is a very rich, field. There has been. Many different types, of attack methods that have been developed. Based on. Optimization. Methods. On different metrics. Beyond, the l norm matrix. And, also, including, non-optimization. Based attacks. So for example, in some of our earlier, work, we also showed that one can use, against. To generate. Adversarial. Examples, as well. Overall. Given the importance. Of. This domain. We. There has been. A huge volume, of. Work. And different types of approaches, proposed. For, defenses. Against, the adversarial. Attacks. But unfortunate. For example just in the, last, uh. Last couple years there have been. Uh hundreds. Of. Papers written, on the topic. However. Today, still we don't have sufficient, defense, today. Today strong, adaptive. Attackers. Can still easily, evade, to this defense. And what we have uh. Discussed, so far, is just the, tip-off, aspect. It's, in the. General. Area. Of adversarial, machine learning. Adversary, machine learning is about learning in the presence of adversaries. And. Attacks. Can happen. At the different, stages. Of, learning. One it can happen. At the. Increased time. As adversary, examples, that shows, that can fool the learning. System. To, give the. Wrong prediction. At influence, time. And attacks can also happen, at. During training time, for example, attackers, can. Provide. A poisoned. Training data sets, for example, including. Poisoned, labels. And. Our. Poisoned, data points, to food the learning system, to learn the wrong model. Overall. Adversary, machine learning is particularly, important. For security, critical, systems. I strongly believe that security, will be one of the biggest challenges.
In Deploying, ai. And this is just the. First aspect, of how attackers, may attack. The integrity, of the learning system. And attackers, can also, attack. The confidentiality. Of the learning system, by trying to learn sensitive, information, about. Individuals. Here let's also look at. A quick example. And this problem is particularly, important. As. As we know, essentially. Data, is the field, of. Machine immune systems. And a lot of this data, is really sensitive. And hence as we train machine learning systems. It's really important. To ensure that the learning systems. Provide, sufficient. Data privacy. For, individual, users. In one of our recent studies. In collaboration. With the researchers, from google. We set out to. Study the following, question. Do neural networks, actually remember. Training data, and if yes. Can, attackers. Actually. Exploit. This. Vulnerability. And to try to extract the secrets, in the training data. And to. From just simply creating the learned models. And in our work we showed that unfortunately. This is the case. So in particular, we studied the task. Of. The. Language, model. Training a language model. And, when we change language model when we give, a, sequence, of, characters, or words. To the model the model will then try to predict, the next. Character, on the next word. And our work showed that as. In one example we showed that. When we tried to train. Um. A language, model using. Uh. Unrun, email data sets, which naturally, contained, the actual. People's, credit card and social security numbers. And attack here by just recording. The learns. Language model. It can actually. Automatically. Extract. The original. Users. Credit card and social security, numbers, from just creating these trained models. And this demonstrates. Uh the importance. Of protecting, users, privacy. As. Even as we train machining, models. And luckily in this case we have a solution. Uh in this particular, case, instead of training, um. A, vanilla, language model instead. If we train a differentially, private language model. Our work showed that. The, um. The trained model actually, can significantly. Enhance. The privacy, protection. For. Users data privacy. And at the same time we can still, achieve. Similar. Utility, and performance. So given the interest of time, i won't go into the details, of, explaining. What differential privacy, is it's a formal notion, of privacy. And, we also have done recent work, in developing. Techniques, and tools. To automatically. Verify. That. A machine learning, algorithm. Is. Is differentially. Private. And. The work. Called the duet. Has won, the distinguished, paper awards. At, the. Recent, conference. Programming language. Conference. Finally. I just want to. Conclude. With. Another, important topic related, to machine. Learning. I'm currently, a responsible, data economy. As we know that data is critical, to the modern economy. And. And given that a lot of the data is sensitive. So we are facing, unprecedented. Challenges. In how the sensitive, data, is being used. Individuals, are losing control, over how their data is used. And. They are not getting sufficient, benefits, from their data. And also businesses. Are continuing. To suffer from, large-scale, data breaches. And it's becoming, more and more, cumbersome. And costly, for businesses, to comply. With new privacy, regulations. Such as gdpi. And ccpa. And it's difficult. For. Businesses, to. Get access. To. To data, due to the, data silos, and privacy, concerns. What we need is. To develop new technologies. That can unlock. Important, values, in data. But. Not. At the cost, of, privacy. And hence there's an urgent need, for. Developing. A framework. For building, a responsible. Data economy.
This Is something, that's. My research group at uc berkeley has been working on, and also. We are. Taking some of the research technology. Into the real world at oasis, labs. And in particular. To build a platform. For responsible, data economy. By combining. Different. Privacy, technologies. As well as, blockchain, technologies. To build. A secure, distributed. Computing, fabric. To enable, users to maintain control, of their data, and write to the data and also at the same time to enable data to be utilized, in a privacy-presuming. Way. And one of the first use case. That, we will be launching, using this technology. Is, in the. Genomic. Use case. To enable, users, to for the first time become owners of their genomic, data, and also at the same time to enable their genomic data to be utilized. In a privacy, preserving, way. And also. Uh we are, launching. A summit, called the responsible. Data summits. And, the first two tracks, on, responsible, data in the time of pandemic. Uh discussing. How. We can, use it, responsibly. In this. In this. Special, time, of course 19. And also how to, utilize, responsible, data, and. Technologies. And develop, responsible, data policies. In the real world. Please visit. Responsibledata.ai. So to summarize, there are many challenges. At the intersection. Between ai, and security. How to better understand. What security, means, for ai and learning systems. How to. Detect, when the learning system has been fooled and compromised. And how to build more resilient debating systems, with stronger guarantees. How to build privacy, preceding, learning systems. And how to build a responsible, data economy. And this is in collaboration. With many of my, students. Post-ops, and collaboratives. Again, i strongly believe that security and privacy will be one of the biggest challenges, in deploying ai. And building, a responsible, data economy, is critical. These require community, efforts. Let's tackle the big challenges, together. Thank. You. Hello in this talk i'm going to be covering a number of recent advances, in the development of algorithms, for secure, machine learning. This talk is going to be covered, structured, in two parts, in the first part i'm going to be, covering recent advances. In defending against attacks, at train time. And in the second i'll go over defenses against test time attacks. Uh this talk is based off of a number of papers some of which i've listed here. Okay so let's just, begin, so. Let's get started with robustness, at train time. So. There's a number of reasons why training data might be corrupted for instance, imagine you're in some crowd source, setting like federated learning where you cannot fully vet, all of your data points and there could be malicious, entities, adding in data to try to change the behavior of your learned. Model. Alternatively. Another very common setting for scientists in particular, is when you have very large data sets collated, over many different labs. And gathered by. Various different people, using various different equipment. As a result the data is very heterogeneous, and this can cause uncontrolled systematic, noise which can look just like outliers. And i want to mention this is really this is really an issue in practice, a striking example of this is, the so-called, backdoor, attacks which don might talk about. Where uh adversary, can add a small amount of carefully designed data to the training set, um. Such that when a standard deep net is trained on this corrupted dataset this network behaves as if it's normal on regular data points. But when i feed in data with a pre-specified, perturbation. Like as you can see for instance like this flower or this sticker. The network misclassifies. The image. Okay so this is this. For instance a very big issue if you're. Trying to. Use this for, self-driving, cars or this kind of stuff. The fundamental, difficulty, however with defending against these attacks is that. What it means to be an outlier is very unclear, especially, when we go to high dimensional, data. Um and constant, data this sort of the picture that we have, for data which is, sort of like a bell curve or at least relatively close to the mean, not too spread out so if the outliers don't want to be too obvious. Then they, can't be too far away from anything, and so usually they cannot affect. Any estimates or any classifier by too much at the very least. But this picture changes dramatically in high dimensions. So obviously i can't, change, i can't draw high dimensions here. Um. But typically in high dimensions, everything, even in liars which are the blue points. Are very noisy. So typically, they're very far. From the mean, and, individually, have very little signal to noise.
As A result outliers, can look individually, just like in liars. So as you can see here each one of the red dots, looks just. Fine, but in aggregate they can really mess up your classifier, quite a lot. So that means somehow we must look for outliers at a much more global, level. So this is something that was uh. First considered, by statisticians, in the 60s. In a field called robust statistics, which is. Something that really even predates modern machine learning. And it says that a data set is corrupted. Or epsilon corrupted i should say, if there's an epsilon fraction of outliers. Which corrupt basic statistics, of the data. For instance like the, mean or the covariance. So the formal theorem model is as follows, we have some data set, represented, by blue points. But we don't get to see. This nice. Uncorrupted, data set, instead we have to give it to a malicious. And powerful adversary. This adversary, gets to look at the data. And decide to change an epsilon fraction of the data, however. They choose. And this this adversary, is like all powerful, in particular they have full knowledge of the algorithm, and everything, that the defender is trying to do. Then the data is returned to us after being corrupted. Obviously without the colors. And then our goal is to learn statistics. Or information about the original uncorrupted dataset for instance the mean or the covariance. Unfortunately, this is a pretty difficult problem, at least for a long time. Especially in high dimensions because for over something like 50 years, all methods for high dimensional, statistics. Uh over generalizing, slightly. Uh. Either were computationally, attractable in high dimensions. Or had error which scaled really poorly, as the dimensionality, increased. So essentially, you either just couldn't run the algorithm. Or you could but the output you get is just meaningless. Okay this was finally broken actually, in 2016. Uh, when. My collaborators, and i along with a, concurrent work of life rav impala, we're finally, able to break this curse of dimensionality. By giving the first polynomial, time algorithms. For some of the most basic problems in robust statistics, particular robust mean estimation. At a high level the idea is that if we can, detect corruptions, to the mean. Uh sorry we can detect corruption, to the mean by inspecting spectral properties of higher order moments. Pictorially. The, the pictures as follows so, suppose, we have. Uh this corrupted data set where the blue points are the inliers and the red points of the outliers. And the true mean is supposed to be this blue x. But because of the outliers. Has been dragged over here. Well in particular. Notice that along the direction in which the mean was moved, there must be a lot more action and there should be just because all the. All the red points are sort of, in aggregate kind of pushing in this direction. In particular, the data is supposed to be spherical. But because the bad points are dragging this direction. This causes a lot more action than there should be, in this direction, and so the actual covariance, is rather than spherical more like elliptical. On its direction. And formally, this manifests, itself as a large eigenvalue. Of the empirical covariance. So i can't get into too much details, um, for the sake of time, but from like a thousand feet. In the air this says that. Uh the more that a single data point contributes. To large eigenvectors, or the eigenvalues, of the covariance. The more outlier-like. It is. We were able to, measure this quantitatively, by devising a score function. Based on something called quantum entropy regularization, which allows us to measure this in a principle, manner. Uh. So i'm going to flash some equations here mostly just to make myself feel better. But. You don't need to really understand this but the point is that, these scores which are the tau eyes. Are things that you can really easily compute, in like nearly linear time. Uh, and if they're larger then the points must contribute. Uh more to like being outliers. Uh. I should say that this is actually formally used in our paper, to get really fast algorithms for robust municipation, particularly nearly linear time algorithms. But by themselves, they're already, just useful. As an empirical tool for outlier detection. Okay. And we can measure this we measure this in a number of basic settings again i don't want to get into the details although i'm happy to, take questions about it offline. Uh and we found them essentially that these empirical. Quantum mechanics scores or sorry these fundamental, scores, we're just consistently, better at finding outliers, in a number of settings.
So Here are some plots i'm just going to flash. Uh roc, auc, is some sort of measure of how effective. The the method is at detecting outliers. And as you can see sort of we consistently, do better. On this test than previous, methods in particular. If i could draw your attention to the, plot on the right, so in particular. Ours is much better um. Again, against all these previous benchmarks, and this is also true in another, setting, which again i don't really want to get into too many details about. Okay but that's that's the high level idea. And i should say i've only covered, just a little bit of sort of. This. Sort of wave of current, work on dated poisoning. But there's been a lot of exciting progress in this field that i can't cover. So for instance, we've, had provable defenses. Against, stochastic, optimization. Which is a task that generalizes, training stuff like deep nets as well as practical and proven robust algorithms for many other objectives, not just mean estimation. As well as empirical defenses, based on sort of these same ideas. That we've developed for, these backdoor, texts that i mentioned at the beginning of the talk. Against deep networks, and much more. Okay, but that's all i really want to say about that. Now let's pivot to robustness. Test time. So as alexander, has presumably, already discussed. Uh adversarial, examples, are a well-documented. Phenomena, in deep learning. Namely. Uh, it's well it's well known that you can take an, image. Any image. And add. Imperceptible, changes, to the pixel. At test time. And you can cause a standard, neural network to reliably, misclassify, the image. And i want to stress that while this was first developed in academic context, this is not just an academic, concern. So this is something that. Happens, in the real world, and, can be carried out in a number of different settings. So here are some pictures for instance they can cause stop signs to be misclassified. Speed limit signs or cause facial recognition, to misclassify.
People. And sort of you can imagine the, security vulnerabilities, of all these sorts of things. Now broadly speaking. There are two types of defenses that have been proposed against these attacks. Uh the first type of attack, is empirical the uh sorry. Two types of defenses. Attacks that have been proposed. The first type of defense is empirical. Namely, defenses, that seem to work well in practice and that. We don't know how to attack but we don't know how to prove work. However many of these have been, proposed and then, subsequently, broken, in like weeks, or months of publication. So this is there's a sort of. You know cat masking, here. One notable, exception. To this is the adversarial, training framework of madrid, all, which to date is still, unbroken, and, if i had to guess likely is genuinely, robust but again we can't prove it. But to try to break this sort of cycle of attack and defense attack of defense. Another paradigm, is. That of certifiable, defense. And these are defenses, which probably cannot be broken. However they usually pay something for the certifiability. In particular. A lot of these defenses that have been proposed. Don't scale or get much worse numbers than empirical defenses. However, one, recent and very promising approach that my bridge just got something called randomized smoothing. So the idea behind randomized moving is very straightforward. The formal definition. Is. Here but again i don't want to, get into the formulas, or pictorially. You can think of it, as the following. Suppose, that this picture, is uh, the surface of your classifier, so that every region. Uh with a color is a region of space that's classified, as one class so all the blue points are classified, as. One class. All the cyan. Points, are classified as another and so on and so forth. But this is your base classifier, you then want to smooth, this classifier, so given the point x, the smooth classifier, samples a bunch of random gaussian points centered at x, and counts what fraction of them fall into each region. The corresponding, histogram that you see is the likelihoods. That this moves classifier, science to each point. So here notice that the smooth classifier. Still, assigns, the blue region as the most likely class for x at smoothing. But this is not always the case. So. The reason why we care about randomized smoothing is that recently there's been a lot of work culminating, in this paper by cohen at all. Which proved the following very strong robustness guarantee for any smooth network.
Again For the sake of time i don't really want to go into the details. But at a high level, it says that if your smooth classifier, is very confident. At a point x. Uh, then. Your smooth classifier, is also robust for some, l2 radius, or some l2 ball around your point x. Okay but, again i don't want to get into too many of the details here but it just says that, all you need to do is you need to train, a smooth classifier. To be very confident at a point, and. If your smooth classifier is very confident at that point. Then you have robustness, to adversarial perturbations, sort of for free. But this is not a trivial task how do you actually train networks such that the smooth classifier, is effective. What if like. If i, add gash and noise to my data, and everything all this pattern all these like nice patterns and everything are destroyed. Turns out you can do okay. Um, so. Cohen at all, the paper i mentioned already. Does, pretty well already with some basic, gaussian data augmentation, a very classic technique anyways. But, in recent, in the recent work that has appeared in europe's. We showed you can do much better. Um. Namely, by combining. Uh this framework, of. Randomized, smoothing, with the empirical defenses, of adversarial, training. And particularly we showed that by directly robustifying, this booth network. Using average sale training. On some sort of smooth. Loss. You can train the network such that the resulting network is much more certifiably. Robust. Which is kind of nice because. Combining, certifiable, defenses. By combining certifiable, defenses with empirical ones, one can. Dramatically, improve. Improved certifiability, which is not at all clear up here right since, the empirical defenses come with no guarantees whatsoever. And again, for the sake of time i don't want to get into too many details but. You can see here briefly that indeed our method dramatically improves upon the previous state of the art. And, indeed this is kind of cool sometimes our certifiable.
Accuracy, So our provable, accuracy. Is even higher than the empirical, accuracy, of the previous. Smooth networks which is something that. You know it says that our our network is just genuinely much much stronger. Okay, so this is these are results in c410. Uh again i don't want to get into too many, details about these results, we also have results for imagenet. And axi by combining, our techniques with other. Uh, heuristics. Like pre-training, or this kind of stuff, you can even boost these numbers even higher. Okay, but so far this has been all about l2. Now what about. Other norms like lp. Particularly. L infinity is a sort of standard benchmark for adversarial, examples. And norms like l0, or l1 correspond much more closely to adversarial patches. Which are these like real world attacks that we saw earlier in this talk. However gaussian smoothing is no longer effective for these norms gaussian smoothing is very much tailored. For l2. So how do we how do we try to attack this problem. Uh. In workdays, to appear at icml this year uh we, you know with many, caveats. Begin to characterize, optimal smoothing classifications. For many other norms. So we begin to develop this theory of like how to do randomized smoothing, much more generally. So from the upspan, side we give formal methods to derive, you know. Again with caveats. And sort of quote optimal sampling schemes for arbitrary, norms. Based on. Geometric, objects. Such as wolf crystals. Which is some object that arises naturally in physics. As well as. Some other methods, which i won't get into. And from the lower bound side, turns out you can actually show very strong lower bounds, demonstrating, that randomized smoothing has inherent limits at least current techniques for it. For a number of norms based on the sort of rich theory of geometric, metric compatibility. Uh so in particular for l infinity. Uh the best you can do is actually just, pretend that you know embed l infinity and l2. Uh by losing this root d factor. And then use gaussian smoothing. So that's kind of neat. Okay. Uh um this this sounds pretty you know, abstract. But, it turns out you can actually show that these methods improved. Methods yield. Improved certifiability. Compared to the prior. State of the art, in practice so, here are some results where you can show. You know. By, using these. Smoothing, techniques, that we developed before. Uh and then just just trying them out just trying these, smoothing distributions, that we tried out you can actually, dramatically, improve. Uh the previous, state-of-the-art, numbers up to like 20 or 30 percent. So despite the fact that this is you know. Using some very rich mathematics, very deep mathematics, is actually just. Very easily practical. Or, very, i should say. Yields. Dramatic. Improvements, in practice as well. Okay so that's pretty much all i have to say. Just to wrap up let me just say that this is you know this this. Field of algorithms, for secure machine learning is a very new. Area of study and many new algorithmic, ideas are still needed to overcome, many security, issues that we face. Uh in practical, ml. Uh. Okay that's pretty much it if you have more questions i'd be happy to answer them at the q a. Thanks. Hi and welcome to our live q a uh following, our security and ml pre-recorded. Talks. With us today for this uh live session is alexander, hermody. And, jerry lee. So. Um. We're going to be taking questions, via the same chat interface, so feel free to. Add in any questions you want. For either alexander, or jerry or, or both. And we'll, we'll talk through them here. To get things started. Uh i wanted to ask, both of you, um. About a question that was touched on briefly, in the live. In the live chat during your talks. I'm curious. If you would say a few words about how you would define, robustness. Kind of in the ideal, way. Some mentioned. Defining, it based on for example, how a human would behave.
What's Your take. Let's start with alexander. Sure. As i already said in the chat you know this is a great question. And there is no good answer. So, in some ways like i think just. Part of the reason why there's no good answer is that like to us robustness, means so many things and we try to pack it into one definition. So yeah so, one definition, that kind of much of my work is focusing, on is saying. Robust, means. Robust, to whatever, human would be robust to and this kind of makes sense, in the context, of, you know vision, or sounds. Uh, and yeah and that's one definition of robustness, is a very difficult definition of robustness, because it's essentially, like, if you wanted to take it too extreme. We would need to, essentially. Like you know. Like solve a like agi. Because you need to have you know synthetic humans. But then there is the other definition that we kind of it's it's closer, that i think we don't think enough but we should especially, in the, domains, like system security, or any domains, where, humans are not the golden standards we want to attain. And then this question is just robustness, means, providing. Some invariances. That we would like our model. To, obey, okay so for instance, like we just want to know that if i, you know if there is a stream of packets. You know if, you know, actually this micro might not be a good invariance, if we kind of, you know. Move the timings of each a little bit by not changing the order, it should not affect the you know it should not affect the performance, so essentially this is like different, kind of priors, we embed into our model because we know. That you know the, they should not be material for the decision, so this is the whole, new definition, would you say it's just about that, but yeah what robustness, means is really like in the eye of the beholder. Yes. How about you jerry what do you think, yeah i mean. I think alex alexander, is clearly correct there really is no. One. Complete definition, of correctness, or sorry robustness. I i should say that you know adding on to what alexander was saying i think in large part. Uh what we do. Or the stuff at least that i work on is understanding. Um. Like how to make. Models, robust to miss specifications, in the model when the data distribution, is has some sort of drift. And this drift can be caught be. Because of either attackers, or just because of, natural data drift and so this is another. Way we can think about things which is also just i guess that's another way of rephrasing this idea of building into variances. But. It's a very difficult question. Let me just say one more thing is that it's a very difficult, question, so like with as if all, all the difficult, questions you should try not to, you know confront, it head-on. Instead we should just try to look at concrete, context. And some like small pieces of it and that's how we build the intuition, and hopefully a generalizable. Toolkit so you know let's not despair. Right away. Yes we don't have to borrow the ocean all at once we can start small and. Understand, specific, situations, i totally agree with you. Um. Kind of related to this i guess uh, definitional. Questions. Uh there's also some discussion, about threat models. Um. And i think. Uh i'll start with you jerry. How do you start to think about. The. Different kinds of threat models and how do you choose which ones to focus your work on. What kind of benefit do you do do you think you get by. Switching between different threat models. From time to time what does that. How does that help you. So. This is very much related to the question that we just asked because certainly, developing a threat model, against which we need to be robust, is really the question of, what does it mean to be robust. Um. From this perspective, i think it really depends on the. Motivation, of the problem like. Or the context of the problem suppose i'm trying to. Make a network. Robust, in some real world setting i need to understand what sorts of attacks. Uh. One might actually care about that setting, or if i'm just trying to.
Make Some sort of algorithm robust, um. Outlines, training time you know what what are the outliers, gonna be. Um or like what kind of realistic constraints can i impose on the outliers. Um, so this is sort of. I think. The way that is most natural to me it seems to be, that we need to take a very like contact specific, approach to this try to understand. You know what are the actual constraints, that might matter in practice and then try to build that from that, to get a realistic, theory. So where do you both get your. Um. Motivating, contexts. The specific, application scenarios you decide to go to where do they come from how do you decide which ones to, which ones are most exciting to you. Alexander. Sure so yeah so that's a great question, and like so i have, two studies, well this is a part of the same strategy but there are two kind of boats. So one mode is i want to focus, on an application. That really matters, like much of my work is about deployable. What happens, where we deploy. Ml in the real world what kind of changes arise so, you choose some use case, and just try to see, what will be the first thing that you will hit there like what will be the major problem. And, once you identify this problem you try of course always to clean it up, so you can actually try to study it in isolation. Model and benchmark, and that's essentially. One way we come up with like, part of my work, the other one so this is the work about kind of trying to understand, exactly the definitions, of robustness, and how to even know that our ml, model is robust. And the other strain work is about developing the toolkit. And for that, i just choose the simplest, threat model. That kind of makes this toolkit sweat you know so like that is infamous, lp, robustness, things which yes of course, it's clearly just a tiny. Piece of, you know what robustness, would mean.