I'm Scott Loughlin, this is the Data Chronicles and here are your data points. On today's episode, we continue our ongoing AI mastery class with a check-in on what is happening in Europe. There is a lot of current attention on the EU AI Act, and for good reason. It's by far the most comprehensive and influential AI legal framework out there. And it's also on the
immediate horizon. But importantly, the AI Act did not preempt the GDPR, which also is the most comprehensive and influential data protection legal framework. So at this moment, we have one technology and two sets of comprehensive laws that govern its use and development. Sounds like that will be easy to navigate, right? This is really an AI legal theme. Like so many other areas of law, AI is regulated by specific AI laws, for sure.
But equally important, it's also regulated by others that apply to all different types of technologies. And as listeners of this series know, that is a long list, and privacy and data protection is probably at the top. AI is creating some very difficult privacy and data protection issues and unearthing some broad themes on how governments seek to balance innovation versus the rights of people and data subjects. And we're seeing that really everywhere, and maybe most recently in France, after the Keneal issued new guidance that hits on some of those key AI concerns relating to data subject rights, i.e. the ability of accessing or deleting data, and transparency, the ability for people to know how their information is being used. These are
both core tenants of the GDPR and go to the heart of how personal data may be processed to create and use AI-based solutions. To discuss these developments in France and to make sense of them in light of what is going on across Europe, I have invited my partner Etienne Drouard to the podcast. Etienne is based in our Paris office and spends much of his time advising companies at the intersection of GDPR and all things AI. Etienne, welcome to the podcast. Hi, Scott
Thank you for having me. So Etienne, I want to dive in to maybe some of the recent developments that are happening within France and the Keneal, but also want to think about this in light of what's happening more generally within Europe. Let's start by understanding what the new guidance within France looks like. And in particular, some really tricky issues surrounding notice obligations and D-SAR rights as it relates to training data and AI. What are the new guidance documents coming out within the Keneal? And why are they so significant? So the Keneal has some other regulators who just try to see to have a timeline approach of AI. And they would like to understand whether you can redo again an AI model or retrain a model when people have asked for the deletion of their personal data.
And basically, the Keneal considers that this would be too complicated sometimes to undo what has been done by an AI model when it has been trained. And therefore, they are focusing their guidelines on a way to correct data at the output level, but not on a way to delete data within the AI model itself. So instead of trying to do the impossible, they try to implement some rights for users, but at the output level. And this is the first proposition we see in this direction, which illustrates some divergence between regulators. Some of them are highly focused on the first step of an AI model, which is before the model exists when you process training data. And some of those are
more focused now on what is the end result and the impact and risks on users' rights and individuals' rights, resulting from the use of an AI instead of the building of an AI. Yeah, I mean, it's really interesting. Maybe kind of just thinking about this issue and how complicated it can be.
Obviously, AI uses training data to develop that intelligence, and that training data can come from a variety of different sources, and it could be composed of a lot of different types of diverse data. Some of that data may have nothing to do with people. It could be machine-oriented data. It can be data that observes what's happening in the environment, what's happening in space, any number of different things. But importantly, for this conversation, some of that data we recognize is going to be personal data in its form. And that data may be fully on its face identifiable. Some of that data may be soon-automized. Some of that data may have been attempted to be anonymized. It can
be all different types of methods, all different types of varieties. And so trying to apply traditional data protection-oriented protections and principles to this type of technology, and in particular, how that technology is created, presents a real challenge. And that to me is, when I'm looking at these new guidelines, what I think creates a real interesting perception about how to deal with that. And as you're saying, data subject rights are kind of going to be a big component of that. If personal data has been fed into a training
dataset, which is then used to create a model, what rights do the individuals have to know what that data looks like, whether that data was accurate or not, whether they have the light to delete that data, pull that data out, which we know would mean that the underlying model may be impacted and may frankly just be impossible, unless you're just telling people to delete the model, in which case all of that time and investment may go away. So how to balance that principle, I think is a really interesting one. And for my perception is that within Europe, but in particular within France, that there would be a very conservative view that would be looking at this to say, we're going to emphasize the rights of the individual data subjects to exercise their DSR rights, and without necessarily figuring out what impact that's gonna have in innovation. And what was surprising to me when I read this guidance, was that it seemed like the Keneal was being a little bit more flexible. So interested in your thoughts, maybe first on
the DSR, and then I also wanna get to how people are thinking about the transparency obligations, which is another complicated fact. So they have a choice to make with their strategy. Some of them are closely attached to what happens when data is collected, before it is even processed through an AI model. Some others know that they have lost a train of data collection, which could be the original sin of any infringement to the GDPR. And therefore they try to
figure out how to build up some rights and principles after the collection of data, during the processing of the model and for the outputs. Let me give you two examples of the, without oversimplifying their approach, but the kind of big choice they have to make in terms of strategy. When generative AI tools were first introduced on the European market, the Italian Data Protection Authority was considering an unlawful collection of personal data or insufficient level of information of data subject. And France did not focus on this point because it was too soon for them to consider data collection and the infringement to GDPR principles based on the collection of data. And this is where they are trying to find out a common approach, a common strategy, which changes the game compared with some other GDPR decisions they had to make before outside of the AI scope.
So those of them are not only Italian, but some others, especially two lenders in Germany, and to some extent, a Dutch approach also. Some of them will focus on the first collection of data during the scrapping process, would find lots of reasons to consider some infringements to the GDPR because there was no prior information, because there may not be a legitimate purpose because consent would have been required, because there is no natural relationship between the use of data for AI purpose compared with the first purpose for which data have been displayed on the web or collected. And that's the original sin that you're-- Yes, exactly. So if you were not witnessing the original sin, what can you do then after? So the other half of regulators will not consider only the data collection issue, which may be a dead end in terms of compliance or governance, but they will consider once the model exists, can it be corrected? Can it be complying with the AI Act principles in terms of transparency, explainability, avoiding some discrimination, basis, et cetera. And they will more focus on the end results of an AI model instead than focusing on the primary data collection. And this is where the KNEEL has been originally in two
steps when regulators were discussing legal basis for which data can be collected. They did not choose in France the consent requirement compared with the Italian approach, for example, but they were more considering the conditions of a legitimate interest and the balance with individuals rights. So on this first cornerstone of legal basis, they choose what is the most pragmatic way to avoid a mere prohibition of AI services. We would be outside of the game if we support a strategy which would prohibit any construction of an AI model. And this was not the French choice. Some other regulators also agreed that it
would not be the right way to consider AI by submitting everything to prior consent, which would have meant prohibition. And this is where the EDPB in its December, 2024 decision on legal grounds of AI models have opened a wide door in favor of legitimate interest approach and a balance between individuals rights and the legitimate purpose of training AI models instead of consent. This is the beginning of the curve, the evolution of regulators from a purely individual based consent approach to a more societal consideration of AI systems considering the need for innovation, legitimate purpose would be recognized and some balances should be found to protect individuals rights. And after this first step, these guidelines dealing now with individuals rights are also considering how I would be protected when an AI model is used at the output level because the result of this use could impact an individual instead of focusing only on the primary collection of data. So on these two main questions, a purely principle based approach and then after a user rights approach, they have chosen not to be raising the prohibition strategy first, but to think better on the efficiency of the result of their strategy. Yeah, I mean, it's really interesting dynamic that you're describing at the end, right? So if I'm understanding current state of play, first thinking about the inputs, the trading data, the data that's coming in to be able to create the model, GDPR applies.
There are differences amongst the member states about the basis for creating the model using personal data. Some, as you say, much more conservative, essentially require consent. And then maybe to the other way that this could be based on legitimate interests, giving you more flexibility to innovate without going to individual data subjects when frankly, that's probably not possible for scrape data or data for which you didn't have a direct relationship to the individual. That, and so we recognize that maybe some will have done this in ways that the individual DPAs would be approved of, and maybe many others have developed it in ways that the DPAs would not approve of. But that's only the one angle. Then you're talking about the outputs.
Regardless of how you created it, we also need to be thinking about GDPR principles with respect to the information coming out of the models. And that's almost a separate and different type of analysis and different set of considerations than what went into the model. May I capture that correctly? Exactly. So they were more focused on users of AI than on creators of AI because they did not witness anything during the creation process. It's too late. It's invisible. It's thousands of sources of data on which they will not be able to investigate in a first step.
And also, it's kind of another dynamic. In terms of jurisdiction, powers and scope of their powers, most of them are not yet appointed as the future AI regulators. So they only focus on GDPR, but some of them are also applying to become the national regulator of artificial intelligence. So in order to preserve some chances to be appointed as the AI regulator, some data protection regulators need to show that they will foster innovation, they will have a pragmatic approach instead of a purely prohibition-based approach. And this also makes this distinction between regulators. What is their future with AI regulation if they start by prohibiting it and instead of finding some balances? So it also explains the French position, which is still applying to be designated as the AI regulator for France.
And some other approaches for which being the AI regulator is a lost battle and therefore they will only focus on GDPR. And this is why these consensuses that we can find within the EDPB are difficult to find and always very important. For example, in this December 2024 decision about legal basis for which you could process personal data, you may find more than 15 provisions saying that for the future they will have some further reflections on a case-by-case basis because they did not reach an agreement on every point. And therefore on a step-by-step approach, they tried to find some consensus. This one was on legal basis and this was a huge choice, not promoting always consent, but also considering some conditions for legitimate interests. And for the further steps, each regulator who wants to show the next step to the others will issue some local guidelines in order to try to initiate a direction which may be a consensus at the EDPB level within the next weeks or a month. So this takes time, but on a step-by-step basis, the most
conservative ones who want to still be part of the game for the AI regulation become pragmatic. Yeah, it's an interesting dynamic. But maybe one thing I want to add on this, because it's, I think, an interesting way that the inputs and the outputs may end up being related. And then at least in the United States, when entities who have been engaged in the creation of different types of AI modeling, different types of technology that would have been in violation of underlying data privacy principles or data privacy laws, at least past versions of the FTC would say, "Well, that is something that should be evaluated under what would be thought of as like a fruit of the poisonous tree type of review." In other words, if you created the algorithm in violation of the law, then the company or the organization who created that shouldn't always then continue to be able to benefit from its ongoing use. And as a result, if the inputs were bad, you didn't do all the things you were supposed to do, then all of the outputs are going to be tarnished, and thus you can't create the outputs.
And what is, I think, interesting in the way that you're describing it is that mentality at least has not had wide application amongst all of the DPAs in Europe, and they are separating out what are the privacy principles that come in on the inputs, and then separately turn the page, what are the privacy implications with respect to the outputs without necessarily saying all outputs are unlawful because the inputs were unlawful? Exactly. And they also try, and this is the next step I assume, they also try to make a distinction between the manufacturer and the user. So AI models will be governed by some investigations and transparency principles, where these two approaches will be in parallel, consent, prohibition, or legitimate interest, a theoretical discussion, which at the end of the day may be clarified through investigations. Who witnessed any scraping strategy? What was the end result? Is it part of the AI model, or is it just part of the repository of training data? This will require one, two, three years of investigation before providing any answer about liability of an AI model. And this is why in the meantime, nobody can wait for
some clarifications of the regulatory approach, and they turn this page with this more pragmatic approach. And they also make a distinction between the liability of the AI model and the potential liability of the user of an AI model which was built on a wrongful basis. On this very point, they have made clear that the user is not responsible, is not the accomplice of the AI model.
So even if the model is in itself under investigation or scrutiny, it may not have any impact on the liability of the user of the AI model, which is a way also to move forward instead of finding issues with no answer before some years. Yeah, I mean, another kind of important distinction, because I think at least in some of the ways the US laws are developing, under non-privacy-oriented laws, kind of more ones that are akin to the EU AI Act, is that the obligations are falling onto the users, perhaps more than the model creators. And that is creating a very difficult dynamic between user and creator, in that there's often a significant information imbalance between the creator and the user. In other words, the creator has a much more intimate familiarity with the training data, the controls that were used to create the tool, the lawfulness of the processing, which was created long before the user even knew about the product. And when the user comes to the table and they say, "Hey, look how great this thing is," they really have no idea what went on under the hood or in the background to create that product, nor do they have the ability to access that information.
This is where the GDPR may stop to be the main focus and the AI Act will begin to provide some answers, because the combination of the GDPR with the chain of liability between successive data controllers and the AI Act, which distinguishes the manufacturer of an AI, the deployer and then the end user, at the end of the day, the EU regulators will need to combine both legislation to have an approach with liability questions that is consistent. And the idea is that when the end user is at stake because of a liability issue, not only the agreement with the AI model producer, but also the regulation will force both stakeholders to discuss a lot, provide some explanations in a reverse engineering process. So at the end of the day, even if the user doesn't understand how the algorithm works, it has to raise some issues when it sees that there is a bias resulting from the algorithm and not from the use of it, which should be escalated to the manufacturer. And this is where contractual relationship now and future regulations between these stakeholders would have to be combined, not only under a privacy approach, but also under the AI Act liability understanding of respective liabilities. And this is for the next step for sure. So far, data protection authorities have already chosen not to stick on the liability of the manufacturer and consider that there may be a divide between the liability of an AI model and the liability of the user. Maybe also they did that
with a political approach. Since you will find more European users than European AI models, they would like to create this divide in order not to prevent users from using AI and put aside the question of the liability of the model in another's field. You're saying this is gonna follow the same, perhaps as what we see on cloud and technology where it's been created outside of Europe, widely used in Europe, but then enforcement goes after the organizations who are the ones who created it out of Europe relating to European users. Yeah, it's not only a matter of protecting privacy, but it is also a matter of where and against who can we be efficient and raising the right questions. And this is where there is a lot of politics starting to grow on the point where you put your efforts as a regulator to get some results in order to protect users or in order to sanction some stakeholders without blocking the whole machine of the use of AI.
Yeah, interesting. So one other question I think that came out of the Keneal guidance was around transparency, which is obviously, in addition to the D-SARS ones we just talked about, another difficult area, how do you provide notice to individuals about how their data is being used to train? How is it that you can provide notice to individuals about outputs from AI? Where did they draw that balance? They had some choices. When you scrap data from the web, you are performing an indirect connection of personal data and you should inform users of the collection of their personal data unless it is disproportionate, unless someone did it before, or unless it does not protect people in itself, because it's useless in terms of enhancing their protection. So they have chosen to consider Article 14 of the GDPR about the indirect collection of personal data in a very flexible and lenient manner, much more than before, considering that any company that scraps data or any AI model who starts to build a repository of data for training purposes could display some explanation on its online services available on the web without having to provide a direct information of persons at stake. So this may create a difference between two categories of market players. Those that will process personal data resulting from their own users, social networks, for example, they have some available means to provide a direct information to their users about the use of their traffic data, browsing data for AI purpose. In that case, the direct information of users is feasible, it's not disproportionate, because the service provider has a direct access to the personal data and to the users about which personal data is used for an AI service. This would be very different for those
that are scrapping data from the web without having any customer relationship with person at stake. Those will only have to display a policy on their website. They may have to provide to offer a right to object that people will have to initiate and exercise in order to ask for their data not to be used in further training without having to inform everybody of something since the source of data is very diverse and it could be absolutely disproportionate to contact everybody on this planet, just to tell them we may process your data for AI training purpose. So here there can be a divide between being the service provider in direct relationship with the customer or being building an AI model from any publicly available data on the web. Here, the French authority has said that you don't have to inform people directly. You can provide a right to object. It doesn't mean to retrain the model. It may mean to install a kind of a filtering list for further training of data or a filtering list in the output when your AI system is used. This is much more pragmatic than reinventing the wheel once
someone has been exercising their right to deletion, for example. So that's where the canals come down and I guess is that may be different from how other member states have come down on the same issue in the same way that you were describing the basis for processing consent versus legitimate interest. Transparency also has those same conflicts.
Exactly. The method which would consist in granting the right of access and right of deletion in the model itself and would impose the retraining of data I would summarize in a prove it. So someone should prove that his, her personal data is in the model or in data repository and what would it change in the AI model itself. So when this litigation will arise there will be a strong discussion about how you can prove that your personal data is in the model and how it could be corrected. And this kind of investigation will need some discovery.
It will take long time. And this is, we don't know if these kind of strategies will work, but so far they will require lots of work that these regulators have not yet performed which means having a technical approach and understanding of the functioning of an AI model how it is built and the difference between a repository of data and the model itself. So it should have been the first question they should have had on their plate but it will come with investigations on a much later scale. So, you know, Antianna, this dynamic that you're describing where, you know, some of the, there's a different point of view between the DPAs around the application of GDPR. There's different, differing positioning
politically about who is the regulator with respect to AI. There's probably some level of competition and dynamics between member states on some of these questions. And they're providing differing points of view on some of these really important issues. One, this strikes me as kind of undermining some of the reasons why we have the GDPR to begin with to create the harmonization across member states so that companies understand that there's a single set of rules with only very specific differences in specific member states on specific set of issues. But two, if we don't have that harmonization, it presents real difficulties for innovation and for companies who are looking to engage in the development of artificial intelligence models and artificial intelligence based solutions within Europe. And so maybe my last question for you is, as you're helping organizations navigate these really treacherous waters, how, in your view, should they approach these differences when they recognize that trying to deploy multiple versions of the tool with different set of rules is probably very difficult, if not impossible? Oh, I think the only way to survive is to consider the average position of regulators instead of the most flexible or the most conservative ones.
Because the history of harmonization is to take time. I've been starting to observe closely this harmonization process when I started at the French Data Protection Authority, 1996. And I can ensure you that today, they still have some cultural differences, political approach that may be different, et cetera. So under the GDPR, member states have decided not to create a European regulator of data protection and to maintain national regulators coordinated within the EDPB. But the EDPB has no power to enforce a decision against national regulators. Then under the AI Act, you have the European Commission and some champions in each country dealing with the AI, which is another layer of complexity because GDPR would always work in parallel with the AI Act. These regulators at national level may not be the same
one. They would have cultural differences in their approaches. They don't have the same purposes too because they don't apply the same legislation. So at some point, either under the GDPR or under the AI Act, the harmonization would not only be a wishful thinking approach, but something resulting from a procedure between these regulators. And within the next five years, the European Commission's work should be, I hope it will be, to organize the governance and arbitration between these various regulators since they don't have the same purpose and the same role or the same powers. At sectorial level, it is also critical for the financial services. How can you deal with
scoring and profiling as a bank using AI between the AI regulator, the privacy regulator, and the banking regulator? And in numerous sectors, you would find this arbitration process. So the sole way to survive this complexity today is on the institutional standpoint to create some harmonization mechanisms between these legislation that overlap a lot and regulators too. And in the meantime, lots of things that are dreadful should not be prioritized by businesses because this is only a construction process. It's an ongoing work on progress mechanism.
And it is quite difficult to stay focused on your purpose when you have some threats coming up one day from a country, from another. The better you can document your situation and explain the evidence of what you do, the better you will be in pushing back some risks because the first obligation of a company is to explain. And not only not to change, but at least to explain. And this is where there is lots of work to do so far to train and dedicate these regulators without being in a direct opposition with any of them because they need first to understand before taking any decision, making any decision or adopting any sanction. So at least within the next two years, documenting and explaining is
more important than being afraid of a threat from a regulator. Yeah, really interesting kind of practice points there, Atee, and I think it's maybe the best that they can do. I mean, it's almost kind of that's where we're at at this moment, right? It's I feel like we're in that position where we, somebody has designed the first car and, you know, we're have the assembly lines all hooked up and we're starting to push the cars off the lot. And nobody knows what rules we're going to provide for the roads that are surrounding it. And, you know, the roads have to happen because the cars are, you know, off the line. People are using them, people are driving them. And, you know, later on, it
sounds like the regulators will catch up and figure out what they wanted if they weren't necessarily being clear and direct at this point. Yes, the only way is to move forward with a stronger view on your projects and dynamics to be ready to be transparent. But I would not advise anybody to slow down because it will not create any competitive advantage. Yeah. Well, that's a great place maybe for us to end today's conversation. Atee, and I really appreciate your joining the podcast, sharing your expertise and insights with our audience and with our clients. I always feel like I learn so much when you and I are able to have this type of conversation. So I appreciate your doing it for the larger Data Chronicles audience.
With great pleasure. Thank you very much, Scott. With that, I'm Scott Loughlin, this is The Data Chronicles, and those were your data points.
2025-03-10 04:35