Machine Unlearning: An Emerging Fundamental Technology | Peter Triantafillou, University of Warwick

Show Video

good morning everybody I'm very happy to be here I have to say I'm particularly happy that I was not fist bumped because then I wouldn't have been able to go back to my one of my palaces I wouldn't look good okay so I did not know how much you are versed or how well versed you are with respect to maal learning so the talk is a little bit very high level okay if you have any specific questions please feel free to ask uh during The Question period or even during the talk I don't mind and certainly feel free to to drop me an email afterwards so we can we can uh take this offline if needed okay so uh machine and learning so the thing that I want to do is if nothing else I want you to leave from here with understanding of what it is that we understand about what machine un learning is okay and you see this not uh particularly easy okay so what is the problem we all have heard about this you know modern AI this Frontier AI as it's called or they do amazing things okay they can predict really things that were not uh uh thinking that they were possible to be predicted before okay so the problem is that these models these Frontier models are trained with data of course and this data may be problematic or some of the data may be problematic what does problematic mean well it can for example oops sorry back there you go so it could mean there so uh some data that's sensitive okay your your personal data okay it could mean that some data is actually obsolete it's no longer true facts in the ground have changed that what the facts were when the model was drained it could be that some of the data is biased we've all have heard stories about uh using these modern uh AI models and basically they uh their inference is biased okay and the problems that this may cause we also have heard uh problems with cyber attacks against machine learning models I mean this is a big issue this you you hear of these National actors trying to actually uh damage the data with which particular modules are trained so that in the end the modeles is not going to be as good okay and you can actually have humans making errors when they producing this data so we have this data that somehow it could be curated it could be annotated somehow and these annotations are crucial for the model in order to do its job correctly and some of these annotations may be wrong think for example of you know classification labels for images when you're doing say image classification tasks some of this data may be wrong okay so uh so what does machal learning have to do with this well it we we're trying to deal with the consequences of this problematic data again we're talking about data with which the model has been trained and some of this data is problematic which can do what it can cause a lot of different problems societal problems big tital harm if you see it as as an AI model safety problem or as a critical infrastructure uh safety problem so for example you can have unauthorized use of data or use of private data being publicized to the world with which you're not comfortable you can has you can have uh uh we've heard stories about for example AI being employed in the American judicial system and they carry biases and they're used for example in cases where they were deciding they were helping a judge to decide whether somebody deserves parole or not if this data is biased then the judge's decision will be biased okay uh you can think of uh problems with with the original data such for example uh in medical applications where the data is misclassified that can cause misdiagnosis with respect say to the prediction of cancer or mistreatments or scheduled in a hospital if you think about critical AI infrastructures a hospital told somebody can misdiagnose something and then another AI will take this misdiagnosis and will schedule uh some particular treatment which will be wrong and that much harmful to the person that's doing this or even schedule operations surgeries that have to take place which are un necessary and so on and so forth okay or you can think about energy grids or transport transportation systems and AI systems being interconnected to one another and then having one failure in one AI system cascading down the uh the chain of of the interconnected AI models so you can think for example of a localized blackout in an energy grid being actually widespread okay same with telecommunication infrastructures so these are the consequences and they're quite Stark okay they can be quite grave causing societal harm so then what is machine un learning about it's trying to address this need so I hope what I just said so far with this couple of sentences points to the need that we it would be great if we had this tool that can actually correct the behavior the erroneous behavior of the model that is caused by this problematic data that was used during training okay so that's what machine learning is about okay we're trying to remove the effect of this problematic dat data or as it's called forget make the model forget the problematic data and or unlearn the problematic data okay these terms are used interchangeably okay with me so far okay pretty straightforward so I want to put everything in in a global picture so you can see where machine learning comes into play so everything starts with data using this data we use some kind of a learning algorithm and then basically we build say a deep neural network that's a So-Cal modern AI to do whatever okay then we use this model to do our inferences our predictions or whatever our classifications whatever okay and then of course we all know that new data may be coming in okay Community has dealt with this for decades so there are this uh now they're called continual learning or lifelong learning or Fus up learning so there is this collection of algorithms that deals with adapting the previous model this previous model here with this data and update the model so it can continue making correct predictions correct inferences okay so that's pretty much a stateof the-art until this unlearning uh thinking came into place and there's been research on this serious resch six seven years now but now it's really becoming a buring field because of its importance as I've indicated a couple of slides ago so what completes this picture this is need now to start removing data or removing the effect of data this problematic data that I talked about earlier so what we need is we need a bunch of unlearning algorithms which are in contrast with the learning algorithms or the continuous learning algor we've known before we know in love and what this uh using this un uh the data that we want to forget to unlearn and these unlearning algorithms and also perhaps using the the original data set or the model that we want to change and the new data and the News algorithm the goal is to produce this updated neural network okay this updated model that will now will have unlearned the things that we want to unlearn okay so that's what the goal is and one thing that I want to stress is this notion of model recycling we could of course try to retrain from scratch as I will show in a minute in other words take the original data set remove the data that we want to remove and just train the model that you would with your learning algorithm on the difference of the original data set minus the data that you want to forget and that would do the job okay except that this is very costly for example if we're talking about large language models that you all know and love training a large language model can take weeks up to months thousands of gpus energy equivalent to the it's estimated that training one version of J GPT takes about what the average house what in New York City the collective household use for one year okay you don't want to do that now if your data if you're not talking about a large language model then things are a bit not so expensive not so prohibitive but they're still costly in terms of GPU time in terms of resources in terms of the time that you have to wait there while the model is being retrained okay so this is what the notion of model recycling is so we don't want to do everything from scratch we just want to basically take the previous model that we have there and slightly tweak it given the set of data that we want to forget so to get the desired effect with me cool all right so and we want to do this of course spanning different data modalities which is another problem for some of these data modalities and tasks the unlearning problem is well defined and it's slightly easier or we've made more progress I would say so whatever you deal we're dealing with a an image data set say if you have a task such as image classification and some of your images are corrupt and you want to remove them so that's one application or you may be dealing with with tabular data kinds of stuff that we have in hospitals for example okay or in a database in a relational database system for those of you that are are are knowledgeable in this and and tabular data analytics tasks or you may be dealing with text large text corpora and large language models being trained in them okay this is what you want to do okay so you can have a bunch of NLP tasks and what you want to do is unlearn the problematic data without actually hurting damaging the general models uh performance more about that in a minute okay so let's give an example with some image data this example by the way came from the first Workshop in with colleagues from Google research and Google Deep Mind and some other academics and and my group at warri we organized the First new reps uh competition on our learning so you have the link there if you're interested in this I I urge you to go uh and take a look at it it there was about more than 1,200 groups participating in the competition about 17 1800 uh researchers from all over the world okay it shows how how interesting this problem is to the community okay so we start with some image data set there uh this pointer is problem so we start with some IM data set there and we train the model and then somebody says Okay I want Define for get set the bunch of images here it's the first image that you see there that you want to remove okay and then basically given all of this as I said earlier you want to come up with an unlearning algorithm and using this unlearning algorithm basically you want to come up with the So-Cal unlearned model and as I mentioned earlier The Brute Force solution to this would be basically to take the difference between the original data set and this forget set and retrain the model from the beginning okay and that basically use your learning algorithm from a and you would get your retrain model so how do you know if you've done a a good job how do you know if you succeeded okay so this is a big part of what we mean by unlearning so we could say for example are these modules the same any takers here do you think these models will be the same no they will not be the same well even what what what does it mean I mean if I change when I was doing this learning using this my learning algorithm or this particular data if I change the seed before I started training or the mini size or the running rate or these hyper parameters the values then I would get a different RIT Trin model there so what does that even mean so the question itself has no meaning okay which is part of the problems of defining what un learning is okay so the answer here is no so this modules cannot be the same well then what how do we know that have done a good job right so excuse me why do you say cannot be because there is a Randomness inherent in neural network training okay so it it's it's really doesn't make sense to expect that this would ever be the same okay so instead of asking uh look at these models we don't ask if they're the same or how close they are but what we can do is make this more mathematically tractable somehow generate distributions from the outputs of these models and then we can start playing with Notions such how close are these distributions that's something that mathematicians can help us with okay and it it as a problem makes much more sense there's many different ways to Define uh from statistics to Define how close to distributions are and so on and so forth okay so coming back to our uh image data set example then so we have the retrade model there which serves as the Oracle okay and we have the unlearned model after we apply that unlearning algorithm and we ask the question and we basically say this U this unlearning algorithm must be such that the train model and the unlearned model are as close together as possible in other words the distance is as small as possible okay or another thing you can say is listen in the end what I care about is the accuracy of these models okay the performance of these models and now I don't care about how the model look internally okay if the modeles can actually predict the right thing at the right time with the right input then that's all I care about so basically then we're saying this U this unlearning algorithm must be such that it guarantees the same accuracy and what does accuracy mean here we have actually three different things that we have to worry about one is on the retain set that is on the data that actually remained there after we want to remove the data that we want to remove the problematic data we want the accuracy to be as good as the as high as the accuracy in the retrain so we don't want to damage the accuracy on the retain set because we removed some examples that were problematic also we want the accuracy on the freet set to be the same and we also do not want to hurt generalizability so if we take an unseen example from an unen data set and we give it to an unlearn model we would like to see performance accur accuracy that's as close to the accuracy we would see if we had just basically retrained the model okay so that's another way of viewing how we can actually characterize the performance of unlearning and have some confidence with whether what we've done is actually correct a point here worth making is that we actually uh this is very very difficult to achieve but because for time let me move on and I'll get back this in a minute towards the end okay so I didn't want to just stay at this high level so I was just wanted to give you a glimpse as to what art Solutions are what some of the state of the art Solutions are and what do they look like when you're dealing with different problems with different learning task with different modalities so if we look at the image classification problem okay so I have a bunch of classes I've trained a bunch of images and I want to basically give it a new image I want to say which class this image belongs to okay so this is basically uh is the essence of of a paper that we presented during last newps it's no longer the state of the art but I'm just giving you it was the stateof the art a few months ago but there is a lot of new papers coming in no longer the state of the art uh but I just want to give you a glimpse to what the solution looks like so assuming that I have this uh definition of that that basically tells me how close the distributions are okay so this is a well-known the C liar uh distribution okay so it basically says that here wo are the weights of the model is my original model and w u are the weights of the unlearned model so giv some input X I want the probability distribution that comes from the output of the original model to be very close to the outputs of the unlearned model that's why this thing says okay and this is what the solution looks like this is basically what the what the paper proposed quickly so there is three terms here this is the loss function that you're trying to minimize when you train the neural network and basically says that try to minimize the distance between the original model and the unlearned model for the retained data and this is how you do it by playing with this uh distance D that we defined earlier for the actual retain data try to also train the model that that does well on the actual examples coming from the retain data set and with respect to the to the to the data that comes from the forget set this DF examples then you want actually maximize the difference okay so basically you minimize the negative that's why I've highlighted the minus Point sign there okay so this is like doing gradient Ascent instead of gradient descent for those of you that are are in the know all right and if you want to see what the new State ofth art is it's more complicated why I didn't present it here I'll be presenting this in a couple of weeks in the in the next newps okay in Vancouver so if you're there come talk to me all right and to give you an idea of what a solution would look like in large language models and when we care about for example forgetting memorized data for copyright reasons and also for privacy because if you have there's a lot of research that shows that if you if if the model verbatim regurgitates data about you then there is well-known algorithm that can basically identify who you are in cases where you would not want to do that so the question is I have a large language model that's been trained on this huge comporter say all of the web all of the internet and now I want to remove for it from it information that uh refers to me or information that say comes from a source for which the model did not have access or authorization to use the data trained on you've heard for example the you know the legal battles between the New York Times or various institutions and all these large language models like TPT and so on so forth okay so this is what so the problem here is memorization it it turns out that the more memorized the textual textual sequences the more likely the model is to regurgitate this ver bam so if if I was strength on something that said Peter was in London at the FTC conference in November 15th and I want that to be known okay so if the model has in this text sentence many times that it's memorized it and because it's memorized it it will regurgitate it so this basically is is a Formula it may look a bit scary but it's actually very simple it says giving a a token sorry given a sequence a sentence of t words T tokens How likely it is given the previous tokens a prefix of the sentence identify the next one so that that's how you measure how how memorized this particular sentence is and the state-of-the-art solution before we actually intervened is basically said you take a a test segment that the model has not seen and you take sentences from there and you look what the model memorization of this is and as long as for your forget set you can achieve a memor an average memorization level that is actually what you would see is less than what you would see from from a test set then that's fine if the model we can we can attest that the motel has forgotten this okay so the solution we presented uh a few months ago at the icml conference which is also one of the major machine learning conferences uh is basically doing something like this it basically says uh construct this set D gamma where gamma is the the the highest memorization score you can actually deal with and you can get this value from looking at memorization scores from test data sets that the model has not been trained with and then basically only for this Gama to something like gradiate Ascent which I alluded to earlier of this is not the complete solution and then keep doing that until basically you get uh none you forget said the average there will be no element in foret said that has the highest memorization score okay this is actually a very Niche idea that deals with the fact that the previous State ofth art the notion of correctness was an average thing and this average thing does not guarantee anything so we came up with membership inference attacks that show that this breaks easily and how to fix it and this is basically how uh we fixed it using this per example notion of memorization and if you want details look at the our HML paper okay so where are we how am I doing with time I'm okay yeah okay um so this is basically where we are and there's many problems to be solved there's many things we don't understand um we're not even sure that we understand what we don't understand at this point okay is a version is a nient field basically but uh some of the things that are close to my heart and that are open problems are the following first of all how do we identify the problematic data currently there are human in the loop mechanisms where for example domain exper uh experts like doctors sit down and look what went wrong or could what could possibly go wrong and the ident ify uh problematic instances of the training examples okay you can think of what's called redeeming efforts within organizations where they create simulations for cyber attacks seeing what the uh the output of the model would be or they they basically feed the model they create particular uh data instances data inputs where they fit it into the model and see what the how the motel could behave okay or after the fact when something has gone wrong some medic sits there and says Okay so this particular image was not good some of this have been automated already for example in electronic trading scenarios you know which of the last few trades were containing outliers that led the model astray during its its inference but some of them are not so it'd be great if we could somehow move this Auto automation further uh much more and perhaps some of the things that John would speak after me uh can lead to there so the the other thing is we're talking about these modern AI models but there's many different there's there's a big variety of them okay so there is discriminative models and you have different versions of them even with convolution NE networks you have different versions of different architectures doing these and then you have you know the generative models which could be you know uh diffusion models or Transformers whatever again this pose different problems okay so defining the problem uh for generative models is usually differently defined than the model for for discriminative models for discriminative models you have access to the original data typically okay and the original training data and you can say oh these problems were these examples were bad so please take this out un learn these for generative models you don't know what data the llm has been trained on so actually the problem there needs to be defined as if if something is generated I didn't like that was generated so you can ask the model don't generate anything like this but what does that mean right so um the problem is that internally the model has some weights and these weights are important not just for one example or not for S examples they they're they're important for for many different examples some of which you may want to unlearn but some of which you may want to to still know about how to deal with them okay and also so you can see see it as also as an alignment problem for example make sure that what the output of the model is in llm aligns with human values so you don't want to use to generate toxic language or images with uh extreme nudity if if you using an educational tool for for primary school students for example right so things like this uh or you want to you want to avoid copyright and so on and so forth and the the definition there changes so we lack crisp definition for what is the difference at least of the uh of the unlearning definition in generative models versus the definition that we that we're more comfortable with for discriminative models so we have some work there I really do not have time to discuss this but this is Al uh this is currently an open problem the other thing is is about evaluation so how do I evaluate if a generative model has actually if I indeed have unar learned something so the way this is done so so you have a generative model you typically prompted you use some prompts and you look at the output so and we all know the problem with this right so I don't know if so there is very uh well publicized examples where you know that the model people claim that the model would you know has been taught has been not to do this not to exhibit certain behaviors and then when you give it some fairly naive initial prompts the model indeed does behaves well but then when you engage with a model you can actually if if if you're if you spend a lot of time doing some clever prompting there then you can actually get the model do what pretty much you want okay so there is this famous example where the model was trained not to produce obscene language and the after the first three or four attempts the model was behaving nicely but after n attempts where n was about 20 I think you know you could see the output of the model was was really obscene okay so it's prompting so if I I guess this is a a general problem with with experimental science with empirical science right so you only know it works for the tested data so you have to be really manage and as exhaustive as possible with respect to what the test data is which is which is a problem the other thing is about coming up with efficient algorithm to to check for correctness so I mentioned in the beginning that you know even for discriminative models we have this notion of an oracle so if you retrain from scratch then you would be fine if as long as you're close to to the behavior of the retrain from scratch model okay but then we said you better off generating uh a distribution and check for the distributions what does that mean it means that I don't have just one retrain model I have thousands so I can get the distribution for every input and then I look at compare distributions but then that means you have to train thousands of models to be able to check for correctness right and this may take until the cows come home so I again coming up with enough uh uh performance metrics uh with appropriate performance metrics that will allow this to be done efficiently is a great ask and it's a tall ask for now for the community and going be Beyond unlearning a specific examples such as for example un learning entities uh you may want to forget ask the model to forget okay don't say anything about Peter okay I don't want people to know that I'm a professor at the University of war and now we can do this okay there are alums that does this so when they talk about Peter they will not say that he's a professor at the University of War okay but they may say that Peter is a criminal you know that's been uh convicted of of of you know killing uh 10 people okay which is we got it to say we got to not say what we wanted it to not say but it can hallucinate and say things that you know I can assure you I have not killed you know many people okay so and also you want to forget associations that Peter visited so and so in London uh on November 15 2024 okay that's much harder okay so I think I'm running out of time I hope I at least got you to start thinking about this this area uh see about what what some of the solution may look like and have a think about the the the many uh outstanding problems that are out there thank you for your time than thank you Peter wow wellow about unlearning we we learned a lot there thank you yeah do we have questions for Peter over there yeah we don't have one running over here you go hey great talk thanks a lot um so if I'm understanding correctly some of the techniques that you're talking about are uh subtractive in nature so you're trying to remove information that has been encoded into the weights of the model um as I'm hearing you talk about this I'm curious are there any like uh additive techniques that you've explored such that maybe the model is missing information that might cause it to Output uh harmful uh output for instance and are there any techniques to introduce that data into the model additively without having to retrain all of chat GPT for instance uh so that's a a really great question the short answer is to my knowledge no but the point that you bring up is sometimes you can forget by adding the right information is is a is a very important one and I actually I cannot thought about that so that's a great idea so thank you for that so what I do know is there are techniques that basically do something with you say with the additive but in an adversarial way for example I manag now to have my model forget something the original model had Associated that something with something else so I I can in produce new information to it and then transitively the Moto can pick up again and undo the unlearn so it's again with additive information but in in the other direction for what you were pointing but this notion of actually adding information that will help me unlearn couple with other information there I think is a it would make a great PhD topic thank you we we got time I think for one more quick question you go thank you very much um this is a very brilliant uh presentation and I must say you are I mean you are trying to solve a very difficult problem I just want to ask um do you have case studies why you have actually applied this process the whole life cycle of the UN learning that maybe we can study so that we can gain a better understanding it would be good if you can refer us to such papers then the second thing I want to ask is how do you actually identify the problem for example some Biers in data are inherent how do you identify such buyers and also quantify them thank you yes your question is how do we know this bias yeah okay so uh starting from the second one this is actually uh some the first item that I actually pointed out that we need additional research on to automate this identification problems I can tell you how people do this now depending on different applications they use as I said you know red teaming so basically a bunch of experts get together and they either create with simulations to see what the output of the model would be or basically they come up with interesting um Corner cases for examples and they try to to test what the how the model would behave and then they say oh okay so for these types then we identify there is a problem there so that's I know a bit hand waving uh Medics do a better job so for example there are medics there that actually say oh I know my this image uh they look at the model misfiring okay the INF is being wrong and say okay so I go back and say what caused it and then they look at images and they which is a painful process uh and they can identify the images that were wrong or some of the images that were that that caused the problem but in general this is not possible uh it's very uh human uh um resource constrainted and we need more work on this that's why identified as a key problem now with respect to the other to the first issue you brought if I understand correctly I mentioned already a couple of our papers that we have in the last uh two three big uh machine learning conferences so you can start from there you can see exactly what we've done and what the related uh references are there okay thank you

2024-12-22 20:36

Show Video

Other news

【离限户外】巅峰对决 2025！vivo X200 Ultra、OPPO Find X8 Ultra 、小米 15 Ultra 户外综合体验 2025-06-03 01:19

Claude 4: Everything you need to know 2025-05-29 15:05

Pre-Inca Complex Built With Advanced Technology - Qorikancha 2025-05-25 15:33