Can You Spot a Deep Fake? Detection, Generation, and Authentication | Intel Technology
- [Narrator] Welcome to What that Means with Camille. In this series, Camille asks top technical experts to explain, in plain English, commonly used terms in their field. Here is Camille Morhardt.
- Hi, and welcome to today's podcast InTechnology, What That Means. We're gonna talk about deep fake today with Ilke Demir. She's a senior staff researcher in Intel Labs and she studies all kinds of things.
Help me out, here. 3D computer vision, what else do you look at? - 3D computer vision, geometry, understanding with deep learning and deep fakes, synthetic media, generative models and more other things. - I'm gonna jump right into synthetic media because I was just looking at something like that.
Are you talking about anchor people who are actually generated and delivering the news or is this something else? - Synthetic media can be everything around deep fakes which is like, facial reanimation and facial retargeting. It can be completely new people. It can be completely new humans.
It can be 3D models of buildings or cities or like, galaxies. So all of that is synthetic data in general. - We're gonna spend a little bit of time on deep fake and I know most people have probably heard of it, but can you describe what it is and how it's used and if it's changed in the last couple of years? - Synthetic media, videos, images, audio or combination of them, right, the actor or the action of the actor is not real. So you may be seeing me like that, but are you sure it is really me or is it a deep fake of me? I think that is the like, most prevalent example.
It started like, the bloom of deep fakes started with the introduction of generative adversarial networks, or GANS, that was introduced in a paper in 2014 and in that case, it was like a very like, blurry faces with maybe gray scale. Like you look there, you see some kind of face but not really photo realistic. Since then, it has been changed so much. So now there are so very powerful deep learning approaches with very complex architectures that we can actually control the face representation. We can control the head pose, we can control lighting operations, we can control the gender, skin tone, and we can do it like, between many different faces.
So that is where we are right now. So what we see online is getting more and more dystopian as we should not be believing what we see. - So I know that you've developed different detection methods, including real time detection methods for deep fake video. Can you talk about the spectrum of, I guess, are they all biometric things that you look at to see if just a person is a person, let alone whether it's actually the person, but let's start with just this is a real human versus a generated computer image of a human.
- Objection and detection are an arms race, like the better like, more photorealistic images or videos are coming and then better detectors are coming. So in that race, researchers first introduced methods that are looking at artifacts of fakery in a blind way. So the idea is if we train a powerful network on enough data of fakes and reals, it'll at some point learn to distinguish between fakes and reals because there are like, boundary artifacts, symmetry artifacts, et cetera. Well that is, of course, and it is working for some cases but mostly, those are very open to adversarial attacks.
Those are very open to have the tendency to over fit to the data sets that they are generated on and they are not really open for domain transfer or open for like, generalization capability of those detectors. We twisted that question. Instead of asking what is the artifacts of fakery or what is wrong with the video, we asked what is unique in humans? Are there any authenticity signatures in humans as watermark of being human? Following that kind of thought, we have many different detectors that are looking at authenticity signatures. Fake Catcher is the first one. We are looking at your heart rate basically.
So when your heart pumps blood, it goes to your veins and the veins change color based on the oxygen they are containing. That color change is of course not visible to us humans. Like we don't look at the video and say like, oh yeah, she's changing color.
We don't do that, but computationally it is visible and those are called photoplethysmography, PPG signals. So we take those PPG signals from many places on your face, create PPG maps from their temporal, spectral and spatial correlations, and then train the neural network on top of PPG maps to enable deep fake detection. Now, Fake Catcher is just one of them.
We also have other approaches like eye gaze based detection. So normally, humans, we look at the point, they converge on a point, right? But for deep fakes, it's like googly eyes. Of course not as visible, but they are less correlated, et cetera so we collect all the like size, area, color, gaze direction, 3D gaze planes, all those information from eyes and gazes and train a deep neural network on those gaze signatures to detect whether they are fake or not. - And I remember a couple of other things from your research with eyes because eyes were actually a little bit more specific, or let's say the error rate was lower with eyes than the PPG, right? - Not exactly.
On the same dataset, the PPG detection accuracy with PPG is higher than the detection accuracy with eyes because PPG is looking at the whole face, but eyes are just looking here. So the information content that they're extracting signals from are smaller and that is expected to reflect in the accuracies that the eye gaze based detection is a little bit lower. It's not as low, like very low, but it is lower than Fake Catcher. If there are like, eyeglasses or like some occlusion on the eyes or the eyes are very small that we cannot see the pupil, et cetera, in that case Fake Catcher can still look at the face and give like, a dependable defect accuracy and we foresee that ensemble of such autonomous clues in a platform will give us algorithm consensus so that all customers or companies can depend on. - I'm sure it's like you said, it's this arms race. And so, then the computers or AI will figure out how to fake the convergence of vision so that we can no longer find that out.
They'll fake the vertical scanning that humans do and we can no longer detect there, and then they'll figure out how to make sure the pupil or iris aren't changing in size as much as they're doing now. Ultimately, are we gonna need to be looking at actually an individual human and saying, okay, this is your heart rate so we know that you are the one speaking or this is your gaze as opposed to just a human, a generic human gaze? - Personal identification is a completely, broad research topic. We haven't invested our resources yet. We are just looking at the problem as fake or real detection. For heart rate, it is very unique to humans.
So if you have the actual heart rate that is measured, that heart rate and how it is changing can be strong signal to identify a person, identify that like, it is Ilke that is talking. But finding that heart rate exactly from the video is not really possible because there are so many things that are changing. Even if you are looking at my face with no occlusions, low lighting conditions, the camera parameters may add something, the illumination may add something, like something passing by my window, creating a shadow on me may affect the PPG signals. So, exactly finding that unique signature per person from video is very hard. - So you're not looking into water markings, for example, an individual human or identifying an individual human and then validating that that is the person whose video it is.
- We are finding that real humans, collectively, have PPG signals that is consistent on their faces. - Okay, I have heard some people claim that, did you see the movie Maverick, the new one that came out? - Not yet. Sorry. - Oh, okay. (laughs) - Well it's got Tom Cruise and a bunch of folks in it. I've heard it claimed that, well, we're gonna find out later that that was all generated video of actors and I'm just wondering now, this is, you know, just sort of a wild claim right now, but it just makes me think, is that something as a tool could be used negative or positive but could we generate humans that we recognize? - I don't wanna see Tom Cruise anymore because he has so many deep fakes and we have been looking at all of those deep fakes so much that maybe a movie is nothing novel for me because I keep seeing Tom Cruise anyways.
So anyway, generation of deep fakes is a huge topic and we want to do it responsibly and it is possible to create a whole movie just by deep fakes if we have enough reference and images and videos of that person. Not for 2D movies, but actually for 3D, we have a story to tell here. We were doing 3D capture, 3D to one dimension reconstruction that is used in augmented and virtual reality movies, which actually premiered in some film festival, Tribeca Film Festival, et cetera.
So we had those 3D productions in Intel and one of them is an AR series. So for the AR series, because of COVID, the actor couldn't come to the speech studio for romantic capture and we said, okay, like take a video of yourself at home with the script so that we can actually make a 3D deep fake of you using the 2D footage that you give us and using the earlier 3D footages that we constructed of him. And we actually did that face retargeting, which is like, taking the mouth, hand gestures, like facial gestures, et cetera, from 2D video and applying it to the 3D capture of him that mimics him in 3D. So if we did this for that little air series very quickly, then it is definitely possible to do it in 2D which is a little bit easier to do for the whole movie. - Do you have any concerns about this? Just generally, when you think about deep fake, like what do you tell people to worry about or is it sort of shrug your shoulders? And what kinds of things are people looking at moving forward? I mean, we have ideas of verifying our identity for things maybe that we're posting. We used to think of video as a way to verify we actually set it.
What kinds of directions is the world looking at taking to verify identities? - Our presence is pushed more from physical presence to digital presence and we have all those like passports, IDs, everything to verify our physical presence, but not our digital presence. There are some biometrics that are going, like fingerprints, et cetera, or retinal, retinal scans, but they're like a little bit more high level. It's not like used for a video like this, right? So we are trying to implement deep fake detection for that because we have seen in news and in many real world cases that deep fakes are used for political misinformation, for forgery, for fake court evidences, for adult content and for all of them, we need some verification purposes.
We need some authentication purposes and deep fake detection is one of them. It doesn't exactly say that, okay, you are Camille, okay, I am Ilke, but it says that this is a real human and most of the deep fake approaches are trying to impersonate someone one-to-one. And in that case, it is easy to say if it is fake, then it is not that person. We are also developing other approaches about how we can create responsible deep fakes, how we can like, enable that creative process of creating synthetic humans, creating digital humans in a way that it is responsible and it's not impersonating someone one-to-one. - So the main concern is that you're treading on an actual person's identity versus that somebody can't tell the difference between say a fake actor or a real actor and a piece of art. - Yes, we want to distinguish fakes from reals before going to the identity reveal.
- What's sort of your biggest concern around deep fakes that are out there? - Recently, there was a video on social media platforms of Ukraine president Zelensky giving misinformation about the Russian invasion of Ukraine. So instead of that fake video not being uploaded to the platform or not being like, marked as something, the platform waited for it to be reported as fake, for like, everyone went like, oh, this is not real, this is not real, this is fake, this is fake, and then something was done. But just like, put yourself in the place of those people inside the war, inside the invasion and they are trying to like, they don't have that, oh maybe fake, mindset. They are like, something is coming, new information is coming, we need to believe he's saying that, et cetera. So in that emergency situation, you don't have that critic eye that is looking for fakes.
And instead of that whole situation, if there was a deep fake detector in the ingestion step of that platform, then it would have at least given a confidence metric, a check mark saying that okay, we believe that with 80 percent accuracy, this is a fake video. Like, believe it in your own discretion or share it in your own will. That didn't happen. So, this is just the beginning and, you know, especially for elections, especially for like defaming purposes, et cetera, defects are really going there. There are like, certain individuals that are really affected from those consequences and we don't want that to happen and that is like, the worst part, I guess. - Is that kind of the extent of where you think it's headed that they'll be, we'll just say, bad actors I guess, out there impersonating people and putting things in their likeness that aren't true or is there some other kind of place that this could go? - Well, that is the immediate case, right? Like we have heard that because of audio deep fake, some CEO was forced to give like, millions of dollars away to someone, but that was just a deep fake.
It wasn't real, et cetera. So those are only the immediate steps. But as those cases increase and increase and increase, at some point it will emerge in a place that no one will believe everything.
Like even if someone is going out there and saying the truth with all the authenticity, people will say, oh, that's probably deep fake. Oh, like I won't believe it. Or like, people that trust each other will share deep fakes unknowingly and that will break the trust between those people. So all of these scenarios in an accumulated way is going towards a really dystopian feature where there's social erosion of trust. And that social erosion of trust is not only affecting the future of media, future of digital personas, it is affecting the future of us as our culture.
Our trust is degrading. Our things that we see are degrading. Like all of these combined in a like, where you want to be heard, where you want to be seen, you won't because everyone will think that it is a fake or like, everyone will lose the faith in videos, in digital formats. - So will there be ultimately then some kind of movement toward establishing provenance when videos are made or, by provenance I mean the origin can be proved somehow or attested somehow as the true source? - Exactly. You are just on that point. I was about to say that.
So of course this is the like, evil future from generation standpoint, but from like other research, of course there's detection as the short term, but for long term, there's media provenance research that is going on very in a collective way. And media provenance is knowing how a piece of media was created, who created it, why it was created, was it created with consent. Then throughout the life of media, was there any edits, who made the edits, was edits like allowed? Like, all of the life of media and what happened to it will be stored in that provenance information.
And because of that provenance information, we will be able to believe what we see saying that, okay, like we know the source, we know the edited story, et cetera so this is a alleged piece of media which is original or fake because there are so many creative people, like visual artists, like studios, and like, those have been creating synthetic media and synthetic data like through their lives. So we want to also enable that. For that purpose, there is currently a coalition, C2PA, Coalition for Content Protection and Authentication. That's a coalition for media provenance with Intel and Adobe and Microsoft and ARM and several other companies where they put together all the beautiful minds of all of those people to create open technical standards, to create policies around content generation, media provenance and how we can enable and protect the trust in media all at the same time. And hopefully, our future research is also following in that direction.
Sorry, I cut you. - No, no. I'm just wondering, as we move toward testing to people's identities like you're saying or provenance of media, are we going to then have to rely on some sort of central authority to kind of check that this is verified or are we gonna be able to do that in a distributed way or will it be hybrid? - That is a perfect question. There are several ideas about that, but our research is to actually make media live on itself. So, the authenticity information should intently be embedded in the media itself. So we can, you know, like watermarks that have been used.
- Like in the file, you mean? - Yeah, in the file but in the file in a way that it is protected and not changed, right? So we want the media to be self-explanatory and self authentication enabled. So for example, if we have all of these adversarial networks that are creating very nice synthetic media, can we actually embed the authenticity information and provenance information inside that media so when it is rendered or when it is consumed or when it is downloaded, the authentic information is decoded and it gives the information? If it is a fake piece of media or an unauthentic version, then the decoded decoder will say that well, this is not the key, this is not the how it was created so this is not it. Our research will be more focused on that, but there are other like, you know, like crypto based systems or like blockchain systems that are doing that authentication and verification of generated media. - Is what you're talking about just before the crypto systems, is that hardware based or software based or both? - So ideally, if we have all the camera manufacturers in the world come together and decide that, okay, we need to do this hardware based authentication for all the photos that are taking in the world, that would be a hardware based solution and it'll be a large solution. But of course, that is like a maybe too long term of a solution that we cannot gather all the camera manufacturers in the world, right? So we need to have software based solutions.
Now, why do you put those software based solutions in the life of a media, right? You can try to put it like, at consumption but it is too late. At consumption, it was already created, edited, done something. On a consumption level, we only can do detection and we can also do source detection.
So for synthetic data, for like synthetic video, we can actually publish some work for detecting the source generated model of a video so that we can say, okay, this was created by Face Map, this was created by Face2Face, et cetera. So that is one little information about provenance that we can give at the consumption time. Now if we go back to one step before, which is probably the editing time, in the editing time maybe some of the software editing tools can embed some certificates, some signatures inside the data so that it will be at least known that it was edited by the software. The creation is unknown but editing is known.
And again, if we go back and back, if it was a synthetic media that is created by a software, we can use those like, authenticator integrated GANS that I talked about or if it's hardware, then we still need something in between to anonymize all of times their representations. - So should artists and politicians and other people who are in media often pay close attention now to what kinds of contracts they're signing in terms of what digital content rights they retain? Is that kind of a hot topic right now? - Absolutely. I don't know whether you have seen that. There was a news article saying that Bruce Willis gave away his deep fake rights to some company and it was like, whoa, what? They can do a whole Bruce Willis movie without Bruce Willis, you know? That was like a few days ago. There was another news article saying that, "Well, that information was fake.
Bruce Willis never signed a deep fake contract." I think people are like, and especially in those cases, people are looking at more and more, I still wonder like, all of those Tom Cruise videos like, does anyone pay for Tom Cruise's likeness in that video if that video is going viral and they are making money from that video, like for example, right? Or is there any revenue to Tom Cruise because his face is used? I dunno. So the laws and policies around deep fakes are emerging currently from different parts of the world from governments. I think in the US, deep fakes are still under fair use because they are different domains and for entertainment so they don't need to be.
I don't wanna give wrong information about that. We actually had a nice collaboration in UCLA with some legal people that are working on that topic so I would definitely refer those questions to them. But basically, that's the current landscape right now. - What advice would you give all of us who don't have access to real time deep fake detectors? What are we supposed to do right now when we see things? - Of course I won't say send them to me. (laughs) We don't have that much capacity.
Well, we are running Fake Catcher. Anytime you see a viral video we are like, okay, let's run Fake Catcher and it catches. But yeah, of course. Hopefully we are having some conversations with different companies about how we can present this to the users present, open it to public or at least can use it in their workflow whenever like, they have to verify third party information or whatever, some platforms are encountering those fake videos. So hopefully, everyone at some point will have reached to those and if you want to be one of those enablers, then reach out to us.
- So what should I tell my kids, then, when they're watching internet videos? How should I let them know or how should I prepare them? - I would say, don't believe everything you see digitally. It may be incorrect or it may be on purpose to deceive you. Hopefully, there will be some online tools that they can actually consult to when they see some suspicious videos. So yeah, keep an open eye for things to not be correct, not be real. - Again, Ilke Demir is Senior Staff Research Scientist in Intel Labs. What is Fake Catcher before we sign off? - Fake Catcher is the deep fake detection solution that is created by me and my collaborator, Umur Aybars Ciftci, to catch deep fakes based on heart rate.
So based on PPG signals and how our veins are changing color based on our heart rate, we can find that deep fakes are fake and real humans are real. - Thank you very much, Ilke, for your time. I really appreciate it. - Thank you, thank you.
- [Narrator] Never miss an episode of What That Means with Camille by following us here on YouTube, or search for InTechnology wherever you get your podcasts. - [Narrator 2] The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.
2023-01-22 14:36