Machine Unlearning: An Emerging Fundamental Technology | Peter Triantafillou, University of Warwick

Machine Unlearning: An Emerging Fundamental Technology | Peter Triantafillou, University of Warwick

Show Video

good morning everybody I'm very happy to be here  I have to say I'm particularly happy that I was   not fist bumped because then I wouldn't have  been able to go back to my one of my palaces   I wouldn't look good okay so I did not know how  much you are versed or how well versed you are   with respect to maal learning so the talk  is a little bit very high level okay if you   have any specific questions please feel free  to ask uh during The Question period or even   during the talk I don't mind and certainly feel  free to to drop me an email afterwards so we can   we can uh take this offline if needed okay so uh  machine and learning so the thing that I want to   do is if nothing else I want you to leave from  here with understanding of what it is that we   understand about what machine un learning is okay  and you see this not uh particularly easy okay so   what is the problem we all have heard about  this you know modern AI this Frontier AI as   it's called or they do amazing things okay they  can predict really things that were not uh uh   thinking that they were possible to be predicted  before okay so the problem is that these models   these Frontier models are trained with data  of course and this data may be problematic or   some of the data may be problematic what does  problematic mean well it can for example oops   sorry back there you go so it could mean there  so uh some data that's sensitive okay your your   personal data okay it could mean that some data is  actually obsolete it's no longer true facts in the   ground have changed that what the facts were when  the model was drained it could be that some of the   data is biased we've all have heard stories about  uh using these modern uh AI models and basically   they uh their inference is biased okay and the  problems that this may cause we also have heard   uh problems with cyber attacks against machine  learning models I mean this is a big issue this   you you hear of these National actors trying to  actually uh damage the data with which particular   modules are trained so that in the end the  modeles is not going to be as good okay and   you can actually have humans making errors when  they producing this data so we have this data that   somehow it could be curated it could be annotated  somehow and these annotations are crucial for the   model in order to do its job correctly and some of  these annotations may be wrong think for example   of you know classification labels for images when  you're doing say image classification tasks some   of this data may be wrong okay so uh so what does  machal learning have to do with this well it we   we're trying to deal with the consequences of  this problematic data again we're talking about   data with which the model has been trained and  some of this data is problematic which can do   what it can cause a lot of different problems  societal problems big tital harm if you see it   as as an AI model safety problem or as a critical  infrastructure uh safety problem so for example   you can have unauthorized use of data or use of  private data being publicized to the world with   which you're not comfortable you can has you  can have uh uh we've heard stories about for   example AI being employed in the American judicial  system and they carry biases and they're used for   example in cases where they were deciding they  were helping a judge to decide whether somebody   deserves parole or not if this data is biased then  the judge's decision will be biased okay uh you   can think of uh problems with with the original  data such for example uh in medical applications   where the data is misclassified that can cause  misdiagnosis with respect say to the prediction of   cancer or mistreatments or scheduled in a hospital  if you think about critical AI infrastructures a   hospital told somebody can misdiagnose something  and then another AI will take this misdiagnosis   and will schedule uh some particular treatment  which will be wrong and that much harmful to   the person that's doing this or even schedule  operations surgeries that have to take place   which are un necessary and so on and so forth okay  or you can think about energy grids or transport   transportation systems and AI systems being  interconnected to one another and then having one   failure in one AI system cascading down the uh the  chain of of the interconnected AI models so you   can think for example of a localized blackout in  an energy grid being actually widespread okay same   with telecommunication infrastructures so these  are the consequences and they're quite Stark okay   they can be quite grave causing societal harm so  then what is machine un learning about it's trying   to address this need so I hope what I just said  so far with this couple of sentences points to   the need that we it would be great if we had this  tool that can actually correct the behavior the   erroneous behavior of the model that is caused  by this problematic data that was used during   training okay so that's what machine learning is  about okay we're trying to remove the effect of   this problematic dat data or as it's called forget  make the model forget the problematic data and   or unlearn the problematic data okay these terms  are used interchangeably okay with me so far okay   pretty straightforward so I want to put everything  in in a global picture so you can see where   machine learning comes into play so everything  starts with data using this data we use some kind   of a learning algorithm and then basically we  build say a deep neural network that's a So-Cal   modern AI to do whatever okay then we use this  model to do our inferences our predictions or   whatever our classifications whatever okay and  then of course we all know that new data may be   coming in okay Community has dealt with this for  decades so there are this uh now they're called   continual learning or lifelong learning or Fus up  learning so there is this collection of algorithms   that deals with adapting the previous model  this previous model here with this data and   update the model so it can continue making  correct predictions correct inferences okay   so that's pretty much a stateof the-art until this  unlearning uh thinking came into place and there's   been research on this serious resch six seven  years now but now it's really becoming a buring   field because of its importance as I've indicated  a couple of slides ago so what completes this   picture this is need now to start removing data  or removing the effect of data this problematic   data that I talked about earlier so what we need  is we need a bunch of unlearning algorithms which   are in contrast with the learning algorithms or  the continuous learning algor we've known before   we know in love and what this uh using this un  uh the data that we want to forget to unlearn   and these unlearning algorithms and also perhaps  using the the original data set or the model that   we want to change and the new data and the News  algorithm the goal is to produce this updated   neural network okay this updated model that  will now will have unlearned the things that   we want to unlearn okay so that's what the goal  is and one thing that I want to stress is this   notion of model recycling we could of course try  to retrain from scratch as I will show in a minute   in other words take the original data set remove  the data that we want to remove and just train   the model that you would with your learning  algorithm on the difference of the original   data set minus the data that you want to forget  and that would do the job okay except that this   is very costly for example if we're talking about  large language models that you all know and love   training a large language model can take weeks up  to months thousands of gpus energy equivalent to   the it's estimated that training one version  of J GPT takes about what the average house   what in New York City the collective household  use for one year okay you don't want to do that   now if your data if you're not talking about a  large language model then things are a bit not   so expensive not so prohibitive but they're still  costly in terms of GPU time in terms of resources   in terms of the time that you have to wait there  while the model is being retrained okay so this   is what the notion of model recycling is so  we don't want to do everything from scratch   we just want to basically take the previous  model that we have there and slightly tweak it   given the set of data that we want to forget so  to get the desired effect with me cool all right   so and we want to do this of course spanning  different data modalities which is another   problem for some of these data modalities and  tasks the unlearning problem is well defined and   it's slightly easier or we've made more progress  I would say so whatever you deal we're dealing   with a an image data set say if you have a task  such as image classification and some of your   images are corrupt and you want to remove them so  that's one application or you may be dealing with   with tabular data kinds of stuff that we have in  hospitals for example okay or in a database in a   relational database system for those of you that  are are are knowledgeable in this and and tabular   data analytics tasks or you may be dealing with  text large text corpora and large language models   being trained in them okay this is what you want  to do okay so you can have a bunch of NLP tasks   and what you want to do is unlearn the problematic  data without actually hurting damaging the general   models uh performance more about that in a minute  okay so let's give an example with some image   data this example by the way came from the first  Workshop in with colleagues from Google research   and Google Deep Mind and some other academics and  and my group at warri we organized the First new   reps uh competition on our learning so you have  the link there if you're interested in this I I   urge you to go uh and take a look at it it there  was about more than 1,200 groups participating   in the competition about 17 1800 uh researchers  from all over the world okay it shows how how   interesting this problem is to the community  okay so we start with some image data set there   uh this pointer is problem so we start with some  IM data set there and we train the model and then   somebody says Okay I want Define for get set the  bunch of images here it's the first image that you   see there that you want to remove okay and then  basically given all of this as I said earlier you   want to come up with an unlearning algorithm and  using this unlearning algorithm basically you want   to come up with the So-Cal unlearned model and  as I mentioned earlier The Brute Force solution   to this would be basically to take the difference  between the original data set and this forget set   and retrain the model from the beginning okay and  that basically use your learning algorithm from   a and you would get your retrain model so how  do you know if you've done a a good job how do   you know if you succeeded okay so this is a big  part of what we mean by unlearning so we could   say for example are these modules the same any  takers here do you think these models will be the same no they will not be the same well even what  what what does it mean I mean if I change when   I was doing this learning using this my learning  algorithm or this particular data if I change the   seed before I started training or the mini size  or the running rate or these hyper parameters   the values then I would get a different RIT Trin  model there so what does that even mean so the   question itself has no meaning okay which is part  of the problems of defining what un learning is   okay so the answer here is no so this modules  cannot be the same well then what how do we   know that have done a good job right so excuse  me why do you say cannot be because there is a   Randomness inherent in neural network training  okay so it it's it's really doesn't make sense   to expect that this would ever be the same  okay so instead of asking uh look at these   models we don't ask if they're the same or how  close they are but what we can do is make this   more mathematically tractable somehow generate  distributions from the outputs of these models   and then we can start playing with Notions such  how close are these distributions that's something   that mathematicians can help us with okay and it  it as a problem makes much more sense there's many   different ways to Define uh from statistics to  Define how close to distributions are and so on   and so forth okay so coming back to our uh image  data set example then so we have the retrade model   there which serves as the Oracle okay and we have  the unlearned model after we apply that unlearning   algorithm and we ask the question and we basically  say this U this unlearning algorithm must be such   that the train model and the unlearned model are  as close together as possible in other words the   distance is as small as possible okay or another  thing you can say is listen in the end what I care   about is the accuracy of these models okay the  performance of these models and now I don't care   about how the model look internally okay if the  modeles can actually predict the right thing at   the right time with the right input then that's  all I care about so basically then we're saying   this U this unlearning algorithm must be such  that it guarantees the same accuracy and what   does accuracy mean here we have actually three  different things that we have to worry about one   is on the retain set that is on the data that  actually remained there after we want to remove   the data that we want to remove the problematic  data we want the accuracy to be as good as   the as high as the accuracy in the retrain so we  don't want to damage the accuracy on the retain   set because we removed some examples that were  problematic also we want the accuracy on the   freet set to be the same and we also do not want  to hurt generalizability so if we take an unseen   example from an unen data set and we give it to  an unlearn model we would like to see performance   accur accuracy that's as close to the accuracy we  would see if we had just basically retrained the   model okay so that's another way of viewing how  we can actually characterize the performance of   unlearning and have some confidence with whether  what we've done is actually correct a point here   worth making is that we actually uh this is very  very difficult to achieve but because for time let   me move on and I'll get back this in a minute  towards the end okay so I didn't want to just   stay at this high level so I was just wanted to  give you a glimpse as to what art Solutions are   what some of the state of the art Solutions are  and what do they look like when you're dealing   with different problems with different learning  task with different modalities so if we look at   the image classification problem okay so I have  a bunch of classes I've trained a bunch of images   and I want to basically give it a new image I  want to say which class this image belongs to okay   so this is basically uh is the essence of of  a paper that we presented during last newps   it's no longer the state of the art but I'm  just giving you it was the stateof the art   a few months ago but there is a lot of new  papers coming in no longer the state of the   art uh but I just want to give you a glimpse to  what the solution looks like so assuming that I   have this uh definition of that that basically  tells me how close the distributions are okay   so this is a well-known the C liar uh distribution  okay so it basically says that here wo are the   weights of the model is my original model and w u  are the weights of the unlearned model so giv some   input X I want the probability distribution that  comes from the output of the original model to be   very close to the outputs of the unlearned model  that's why this thing says okay and this is what   the solution looks like this is basically what the  what the paper proposed quickly so there is three   terms here this is the loss function that you're  trying to minimize when you train the neural   network and basically says that try to minimize  the distance between the original model and the   unlearned model for the retained data and this is  how you do it by playing with this uh distance D   that we defined earlier for the actual retain data  try to also train the model that that does well on   the actual examples coming from the retain data  set and with respect to the to the to the data   that comes from the forget set this DF examples  then you want actually maximize the difference   okay so basically you minimize the negative that's  why I've highlighted the minus Point sign there   okay so this is like doing gradient Ascent instead  of gradient descent for those of you that are are   in the know all right and if you want to see what  the new State ofth art is it's more complicated   why I didn't present it here I'll be presenting  this in a couple of weeks in the in the next newps   okay in Vancouver so if you're there come talk  to me all right and to give you an idea of what a   solution would look like in large language models  and when we care about for example forgetting   memorized data for copyright reasons and also  for privacy because if you have there's a lot   of research that shows that if you if if the model  verbatim regurgitates data about you then there is   well-known algorithm that can basically identify  who you are in cases where you would not want to   do that so the question is I have a large language  model that's been trained on this huge comporter   say all of the web all of the internet and now I  want to remove for it from it information that uh   refers to me or information that say comes from a  source for which the model did not have access or   authorization to use the data trained on you've  heard for example the you know the legal battles   between the New York Times or various institutions  and all these large language models like TPT and   so on so forth okay so this is what so the problem  here is memorization it it turns out that the more   memorized the textual textual sequences the more  likely the model is to regurgitate this ver bam so   if if I was strength on something that said Peter  was in London at the FTC conference in November   15th and I want that to be known okay so if the  model has in this text sentence many times that   it's memorized it and because it's memorized  it it will regurgitate it so this basically is   is a Formula it may look a bit scary but it's  actually very simple it says giving a a token   sorry given a sequence a sentence of t words  T tokens How likely it is given the previous   tokens a prefix of the sentence identify the  next one so that that's how you measure how   how memorized this particular sentence is  and the state-of-the-art solution before we   actually intervened is basically said you take a  a test segment that the model has not seen and you   take sentences from there and you look what the  model memorization of this is and as long as for   your forget set you can achieve a memor an average  memorization level that is actually what you would   see is less than what you would see from from a  test set then that's fine if the model we can we   can attest that the motel has forgotten this okay  so the solution we presented uh a few months ago   at the icml conference which is also one of  the major machine learning conferences uh is   basically doing something like this it basically  says uh construct this set D gamma where gamma is   the the the highest memorization score you can  actually deal with and you can get this value   from looking at memorization scores from test data  sets that the model has not been trained with and   then basically only for this Gama to something  like gradiate Ascent which I alluded to earlier   of this is not the complete solution and then keep  doing that until basically you get uh none you   forget said the average there will be no element  in foret said that has the highest memorization   score okay this is actually a very Niche idea that  deals with the fact that the previous State ofth   art the notion of correctness was an average thing  and this average thing does not guarantee anything   so we came up with membership inference attacks  that show that this breaks easily and how to fix   it and this is basically how uh we fixed it using  this per example notion of memorization and if you   want details look at the our HML paper okay so  where are we how am I doing with time I'm okay   yeah okay um so this is basically where we are  and there's many problems to be solved there's   many things we don't understand um we're not even  sure that we understand what we don't understand   at this point okay is a version is a nient field  basically but uh some of the things that are close   to my heart and that are open problems are the  following first of all how do we identify the   problematic data currently there are human in the  loop mechanisms where for example domain exper uh   experts like doctors sit down and look what went  wrong or could what could possibly go wrong and   the ident ify uh problematic instances of the  training examples okay you can think of what's   called redeeming efforts within organizations  where they create simulations for cyber attacks   seeing what the uh the output of the model would  be or they they basically feed the model they   create particular uh data instances data inputs  where they fit it into the model and see what   the how the motel could behave okay or after the  fact when something has gone wrong some medic sits   there and says Okay so this particular image was  not good some of this have been automated already   for example in electronic trading scenarios you  know which of the last few trades were containing   outliers that led the model astray during  its its inference but some of them are not so   it'd be great if we could somehow move this Auto  automation further uh much more and perhaps some   of the things that John would speak after me uh  can lead to there so the the other thing is we're   talking about these modern AI models but there's  many different there's there's a big variety of   them okay so there is discriminative models  and you have different versions of them even   with convolution NE networks you have different  versions of different architectures doing these   and then you have you know the generative models  which could be you know uh diffusion models or   Transformers whatever again this pose different  problems okay so defining the problem uh for   generative models is usually differently defined  than the model for for discriminative models for   discriminative models you have access to the  original data typically okay and the original   training data and you can say oh these problems  were these examples were bad so please take this   out un learn these for generative models  you don't know what data the llm has been   trained on so actually the problem there needs  to be defined as if if something is generated   I didn't like that was generated so you can ask  the model don't generate anything like this but   what does that mean right so um the problem is  that internally the model has some weights and   these weights are important not just for one  example or not for S examples they they're   they're important for for many different examples  some of which you may want to unlearn but some of   which you may want to to still know about how to  deal with them okay and also so you can see see   it as also as an alignment problem for example  make sure that what the output of the model is   in llm aligns with human values so you don't want  to use to generate toxic language or images with   uh extreme nudity if if you using an educational  tool for for primary school students for example   right so things like this uh or you want to you  want to avoid copyright and so on and so forth and   the the definition there changes so we lack crisp  definition for what is the difference at least of   the uh of the unlearning definition in generative  models versus the definition that we that we're   more comfortable with for discriminative models  so we have some work there I really do not have   time to discuss this but this is Al uh this is  currently an open problem the other thing is   is about evaluation so how do I evaluate if a  generative model has actually if I indeed have   unar learned something so the way this is done  so so you have a generative model you typically   prompted you use some prompts and you look at  the output so and we all know the problem with   this right so I don't know if so there is very  uh well publicized examples where you know that   the model people claim that the model would you  know has been taught has been not to do this not   to exhibit certain behaviors and then when you  give it some fairly naive initial prompts the   model indeed does behaves well but then when  you engage with a model you can actually if if   if you're if you spend a lot of time doing some  clever prompting there then you can actually get   the model do what pretty much you want okay so  there is this famous example where the model was   trained not to produce obscene language and the  after the first three or four attempts the model   was behaving nicely but after n attempts where  n was about 20 I think you know you could see   the output of the model was was really obscene  okay so it's prompting so if I I guess this is   a a general problem with with experimental science  with empirical science right so you only know it   works for the tested data so you have to be really  manage and as exhaustive as possible with respect   to what the test data is which is which is a  problem the other thing is about coming up with   efficient algorithm to to check for correctness  so I mentioned in the beginning that you know even   for discriminative models we have this notion of  an oracle so if you retrain from scratch then you   would be fine if as long as you're close to to the  behavior of the retrain from scratch model okay   but then we said you better off generating uh a  distribution and check for the distributions what   does that mean it means that I don't have just one  retrain model I have thousands so I can get the   distribution for every input and then I look at  compare distributions but then that means you have   to train thousands of models to be able to check  for correctness right and this may take until the   cows come home so I again coming up with enough  uh uh performance metrics uh with appropriate   performance metrics that will allow this to be  done efficiently is a great ask and it's a tall   ask for now for the community and going be Beyond  unlearning a specific examples such as for example   un learning entities uh you may want to forget ask  the model to forget okay don't say anything about   Peter okay I don't want people to know that I'm  a professor at the University of war and now we   can do this okay there are alums that does this  so when they talk about Peter they will not say   that he's a professor at the University of War  okay but they may say that Peter is a criminal   you know that's been uh convicted of of of you  know killing uh 10 people okay which is we got   it to say we got to not say what we wanted it  to not say but it can hallucinate and say things   that you know I can assure you I have not killed  you know many people okay so and also you want   to forget associations that Peter visited so and  so in London uh on November 15 2024 okay that's   much harder okay so I think I'm running out of  time I hope I at least got you to start thinking   about this this area uh see about what what some  of the solution may look like and have a think   about the the the many uh outstanding problems  that are out there thank you for your time than thank you Peter wow wellow about unlearning we  we learned a lot there thank you yeah do we have   questions for Peter over there yeah we don't have  one running over here you go hey great talk thanks   a lot um so if I'm understanding correctly some  of the techniques that you're talking about are   uh subtractive in nature so you're trying to  remove information that has been encoded into   the weights of the model um as I'm hearing you  talk about this I'm curious are there any like   uh additive techniques that you've explored such  that maybe the model is missing information that   might cause it to Output uh harmful uh output for  instance and are there any techniques to introduce   that data into the model additively without  having to retrain all of chat GPT for instance   uh so that's a a really great question the short  answer is to my knowledge no but the point that   you bring up is sometimes you can forget by adding  the right information is is a is a very important   one and I actually I cannot thought about that so  that's a great idea so thank you for that so what   I do know is there are techniques that basically  do something with you say with the additive but   in an adversarial way for example I manag now  to have my model forget something the original   model had Associated that something with something  else so I I can in produce new information to it   and then transitively the Moto can pick up again  and undo the unlearn so it's again with additive   information but in in the other direction for  what you were pointing but this notion of actually   adding information that will help me unlearn  couple with other information there I think is   a it would make a great PhD topic thank you we we  got time I think for one more quick question you   go thank you very much um this is a very brilliant  uh presentation and I must say you are I mean you   are trying to solve a very difficult problem  I just want to ask um do you have case studies   why you have actually applied this process  the whole life cycle of the UN learning that   maybe we can study so that we can gain a better  understanding it would be good if you can refer   us to such papers then the second thing I want to  ask is how do you actually identify the problem   for example some Biers in data are inherent how  do you identify such buyers and also quantify   them thank you yes your question is how do we know  this bias yeah okay so uh starting from the second   one this is actually uh some the first item that  I actually pointed out that we need additional   research on to automate this identification  problems I can tell you how people do this now   depending on different applications they use  as I said you know red teaming so basically a   bunch of experts get together and they either  create with simulations to see what the output   of the model would be or basically they come up  with interesting um Corner cases for examples   and they try to to test what the how the model  would behave and then they say oh okay so for   these types then we identify there is a problem  there so that's I know a bit hand waving uh Medics   do a better job so for example there are medics  there that actually say oh I know my this image uh   they look at the model misfiring okay the INF  is being wrong and say okay so I go back and   say what caused it and then they look at images  and they which is a painful process uh and they   can identify the images that were wrong or some of  the images that were that that caused the problem   but in general this is not possible uh it's very  uh human uh um resource constrainted and we need   more work on this that's why identified as a key  problem now with respect to the other to the first   issue you brought if I understand correctly I  mentioned already a couple of our papers that   we have in the last uh two three big uh machine  learning conferences so you can start from there   you can see exactly what we've done and what the  related uh references are there okay thank you

2024-12-22 20:36

Show Video

Other news

IT Academy Tech Talk: Shaping the Future of AI at HBS 2025-01-16 06:05
20 Military Technologies That Will Change The World 2025-01-12 11:14
CES 2025: Deep Dive on Intel Core Ultra 200H & 200HX Series Processor Performance | Intel Technology 2025-01-10 22:02