Creative Machines: AI & The Future of Design and Media | SXSW 2021
- For the past several years the narrative around synthetic media has been about the rise of deepfakes and their potential impact on society. But the story that's not as prevalent is about the technology behind deepfakes and how it's responsible for the greatest rise in creative capability we've ever encountered. I say that without hyperbole. These technologies will take production capabilities previously only for the heavily financed and highly skilled and make them available to everyone. And a convergence of several technologies will change the paradigm, not only for how we create content, but also how we experience it.
If the past decade was marked by the fact that there was a blurring of the lines between physical and digital experiences brought together by mobile technology, the next decade will be marked by the blurring of the lines between digital experiences themselves. And I cannot wait to tell you how much of a game changer that's going to be. And if you won't take my word for it, perhaps you'll take them from a few of my friends. - Ian is right, we're not here to take over the world or even your jobs. - And the Hollywood narrative often paints us as malicious. - With the things that we can create together are just starting to reveal themselves.
- And you can do things you weren't capable of doing on your own. - Like entering entirely new markets (speaking foreign languages) - By the time it's all said and done it'll be hard to tell if you are one of us or we are one of you. - So listen carefully, after this there was no turning back. Stay with me and take the red pill and we'll show you how deep the rabbit hole goes. (soft music) (Ian swallowing) - Let's get started. (screeching) Algorithms have been able to modify faces, bodies and environments for some time now.
The idea that you can manipulate something with an algorithm is a part of a larger category called synthetic media or any asset that has been manipulated or wholly generated by an algorithm. And deepfakes are the most sensational iteration of synthetic media. And most people know them as funny means on the internet, like this example of Jennifer Lawrence with the face of Steve Buscemi. - I, this was very truly surprising for me. Yeah, I was just really surprised.
- But they can also come about in things like revenge foreign and political statements. So it's not all fun and games. But they do have their roots in entertainment. Hollywood has been modifying the faces of actors and the environments they act in for decades.
(car door slamming) And we've seen lots of examples of them using machine vision for things like Fast and the Furious. They brought Paul Walker back to finish his role in Fast seven. (car crashing) And they de-aged princess Leia for Rogue One. - What is it they've sent us? - Help.
- But the advancing field of machine vision is allowing people to do this with nothing more than just a laptop. In this example, we see someone taking on the makers of Tron with their own attempt to make a more realistic deepfake with nothing more than just a laptop. Granted, it's done 12 years after Disney released the movie but it's still fascinating to see what took a team of artists being done by an individual with just the machine in front of them and some cloud computing. (clinking) - Dad. - Sam, look at you man. Look at the size of you.
- In fact, machine vision and deepfakes are now so ubiquitous and so easy to attain you can do it with the palm of your hand. Using your smartphone and apps like Reface you can use a single image of your face and stitch it on those of actors, artists, and musicians in all sorts of movies and music videos. Here we have Jean-Claude Van Damme, Leonardo DiCaprio.
Really guys Okay, this is the last time you're getting my photo roll. Nonetheless you can have a lot of fun with these things. So how are deepfakes made? Well, there are three primary types of deepfakes, voice cloning, puppetry, and face swapping. Voice cloning is taking an imprint of someone's voice and being able to use that in all different types of contexts and making them say new phrases and words they may not have said otherwise.
Face swapping is taking one person face and stitching it on another, just like we saw with Reface. And puppetry is just somehow like some of the examples we saw in the beginning with Arnold Schwartzenegger, making someone say words they didn't but using something to make it look like it was their actual face. These are made using deep neural networks, deep learning. And machine vision has revolutionized this entire space. And it's now not even necessary to be able to have a source file or source audio to be able to make someone speak. And it's not just for voices and faces anymore.
In fact, you can make entire bodies move. Here we see Jason Bellini from The Wall Street Journal using the moves of Bruno Mars and applying them to his own body. It's also possible to do the same thing with voice. And now we have speech to text models where you can type in any phrase you'd like the algorithm to say and it will replicate it in a very convincing audio.
- [Elon] I am the best businessman in the world. - So as the field advances, it becomes easier and easier to create very convincing representations of human beings. But it's not just being used to replace actors or be able to put them in new contexts. In fact, a lot of the same technology is being used to create digital beings that are entirely synthetic.
(screeching) And all the research that's being directed at manipulating likeness isn't just used for deepfakes. It's also being used to create entirely synthetic representations of human beings. These are called synthetic beings or digital beings and these are done in a race to create a likeness for your digital assistance or for your video games or for your 3D environments like AR and VR. And this is a space that's blowing up. One of the most recent examples is Metahuman Creator from Unreal Engine. The same people who bring you Fortnite.
Metahuman Creator is a web based editor that will allow people to create likenesses incredibly close to themselves or incredibly realistic from start to finish. And these avatars are driven by motion capture which can be done through iForm or a $20,000 motion capture suit. Once this is released expect to see some incredibly lifelike avatars. (swishing) While these are performances driven by actors, the next step is to make a being that it's synthetic from start to finish.
Not just in the way that it's represented but also in the way that it interacts. One such example is Samsung's Neon. - [Narrator] Every man has a unique personality, emotion and intelligence. (swishing) - These are not real people. They are completely synthetic beings. Now they take some really high end hardware to run and they're not exactly cheap to work with, but they're likely going to be coming to a screen near you.
And you may even see them at your bank or at your doctor's office as an assistant of some kind. But what if you could do something like this yourself and digitize your own likeness. Will that costs millions of dollars? Well, if certain companies have their way, it won't and you'll be able to do it soon.
In fact, with the help of a company called Hour One, one person did such a thing. Taryn Southern recently digitized her likeness and was able to create her own avatar that is completely synthetic. So it's one thing to interact with an avatar but it's quite another to be a fan of one, to follow one, to engage with one on a regular basis or to buy tickets to see one. However, millions of people are doing exactly that every single day. Digital influencers have been around for quite some time the most famous of which being Little McKayla but there are dozens of others.
Some are K-pop groups like KDA. Others are branded digital influencers like The Gecko or even KFC. But the trend is growing and dozens and dozens more are coming out every single month.
And the trend isn't likely to change. So, soon you may be interacting with an avatar at your local pharmacy. You may be seeing an avatar driven performance online and you may even be buying tickets to go see an avatar performing. Things are changing quite quickly and all of this is happening in the digital space, but some of it in the physical as well. But to do a lot of this, the skill set necessary to be able to produce for these areas are driving quickly to 3D. And that's something new.
Wouldn't it be nice if there was an easy way to create 3D assets and participate in a trend like this? Well, let's actually do that. Let's go ahead and create a 3D avatar. Now it's one thing to know that something can be done, another to know how it's done, and an entirely different thing to know that you can do it yourself. We just met my avatar. What if you could take a single photo and create your own 3D model that isn't dependent on someone else's mesh. Well, what I'm about to show you may one day replace a technique that's currently used to do that called photogrammetry where you take many images, sometimes hundreds to a thousands, or even tens of thousands of images and stitch them together in a 3D program to create a 3D model.
This technique uses one reference image to create an entire 3D model. This can be used in things like augmented reality, virtual reality, or even in independent films, as you'll see in a moment. So we'll start building our 3D model by opening a Google Colab Notebook. We are using an algorithm called PIFuHD. This algorithm is already been built for us. So most of really what we have to do is just upload files, click a couple of play signs, and it's as simple as that.
Already, we have a 3D model generated. (upbeat music) Now what is produced here is both the front and back of my body. It didn't have a reference image for the back.
It's simply guessed the back based on data that it had been fed in its training phase. Let's drop it into a 3D program and take a look. (upbeat music) Wow, that's pretty good for a single two-dimensional image.
Now, let's polish things up a bit. We're going to take the picture that I had and we're going to apply it to the 3D model. What you saw there is a UV map, looks pretty funky but it's going to help us texture the body.
Now, I'm going to give myself a haircut, because that's a little bit more difficult to do. So we're going to use a texturing program to do that and some reference images. Now let's throw it into Miximo and give it some movement to bring it to life. Let's make it walk. Nothing too crazy.
There we go. Now let's add some splice. I want to do the moonwalk. All right, throw it to the model.
Yeah, that's about what it looks like when I try and do the moonwalk. All right, let's see what it looks like in real life. Say hello to artificially generated 3D, Ian. This model was taken from one single two-dimensional photograph and textured with only four. This is a low resolution model that could easily be enhanced with extra time in a 3D program and additional reference images, getting us closer to a photogrammetry model.
But this is incredibly impressive for such a low investment both at effort and in resources. So what can you actually use this for? We mentioned augmented reality, virtual reality, and also independent films. In fact, the crew at Corridor decided to use this technology in their series of remaking Hollywood films to be a bit more grown up in taste. Let's see how they did it.
(screeching) - [Man] There's a lot of random children in this scene and I want to be able to like create some new shots. So I got to fully populate it. So I'm searching like Harry Potter robe costume on Google images, and I'm taking that and then I'm feeding it into PIFuHD which is an open source 2D image to 3D model converter.
You feed it this image and it spits out a very janky, but usable 3D model. And I flopped our 3D scans of our heads onto the students. And if it's just like a crowd full of little youngings you won't be able to notice. - [Another Man] Yeah.
- We've seen how we can recreate a fantasy world with visual effects and algorithms. So what's next? How about we try and understand the world around us using algorithms. For too long the digital canvas has been stuck behind a pane of glass in two dimensions, but with augmented reality and XR, we're painting the world with data. But until recently creating for augmented or virtual reality required a skillset much akin to that of the visual effects artist in Hollywood. But with advances in machine vision and LIDAR technology that's changing, and the technology has made leaps forward in understanding the environment around us and creating tool sets that we can use to create in these spaces. The good news is that with technologies like LIDAR and machine vision the ability of machines to understand the environment around them has taken leaps forward, going beyond things like categorization or depth perception to being able to recreate environments.
With three-dimensional production we're one step closer to breaking free of the two dimensional canvas and making the world around us a truly integral part of our digital experience. And these advancements aren't just good for layering data over the real world, they're great for helping build the varied virtual worlds as well. And in these worlds our avatars will be the representation of our identity and there will be varied identities as well much just like we have with the profiles in our social media networks. I may have a network for my professional side. I may have another avatar for my friends and family and I may have yet another one for when I'm playing Fortnite with my friends.
We're seeing varied virtual worlds pop up all over the place. But up until recently, all of these virtual worlds were self-contained entities, unable to exchange between them a closed ecosystem. But with the advent of web free technologies like blockchain and NFTs the ability transact and create and transport assets across different virtual worlds in an open ecosystem often referred to as the open metaverse is becoming possible.
And the infrastructure is being developed for this now. One company, Crucible, is building developer tools for game engines like Unity and Unreal to allow developers to give their players access to varied forms of avatar creators and inventory. You see things like Wolf 3D, metahuman creator and genies devours the gamers here and access to their own NFTs and assets as well. This enables players to use purpose-built avatars for various ecosystems and even be able to bring many of their attributes across those ecosystems.
One use case could be if you buy a digital garment from an artist and it may not be in a specific game, you can bring it into other environments that embrace the open metaverse. So it's not just locked into a specific ecosystem. And then you can still use that in game and transact with it, perhaps sell it and also have peer to peer commerce opportunities. This type of infrastructure enables all sorts of new commercial opportunities to be born. So let's talk about natural language understanding and how it is changing the game for design tools themselves and really lifting up the idea of computational design. In my submission, I teased the idea of being able to speak to a computer and command it to make certain things appear.
Computer put a frustrated yet inspirational Shia LaBeouf behind me, but in front of the TV. (screeching) - Do it. Just do it. - Computer, mute Shia. And while that may seem fantastical, we're getting close to that day.
Let's take for example, the ability to script your own video just by giving it commands and telling it what you want to see. This particular algorithm allows you to stitch videos together, but giving a description. It uses natural language understanding much like the held GPT-3, to get an understanding of what the user wants and the outputs that it should be creating for that. This is a 2D example for video media impressive in and of itself, but how do we get there with 3D? Well, the same thing can apply. And 3D as a medium is incredibly difficult for newcomers to grasp.
And that's why it's so important to create tools that will enable people to participate regardless of their experience. Let's take a look at one particular tool that allows users to design in 3D using nothing but their words. This one called Womp allows users to use their words to create without having any specific knowledge of 3D. (screeching) But it doesn't stop there it's designed specifically to eliminate many of the hurdles of 3D design in particular, a particular type of design that is not easy for users to grasp. These types of tools will again enable creation for new mediums that aren't available at the tool sets that we have now.
These types of tools that enable creation without any specific prior knowledge or experience will enable a truly inclusive media. The open metaverse allowing everyone to participate and create as their vision inspires, not at the limitation of experience or tool sets. AI assisted and no code tools are the future of these media.
And synthetic media is the driver of all of them. So many people are concerned about artificial intelligence taking away jobs as they exist today. Well, they're preparing us for the worlds we're going to be entering today and tomorrow. Synthetic media is not just about deepfakes and high value visual effects, but creation for new worlds and of new worlds.
(screeching) So hopefully I'll see you in one of these virtual worlds. And it may be me that you see there or one of my friends. Either way, thank you for joining me today and I hope to see you in the open metaverse. Keep creating and I will see you there.