Webinar: Visual Computing-Tracking the Top Trends and Opportunities
So now I'd like to introduce today's speakers. Today's featured speakers are Ron Fedkiw and Gordon Wetzstein. Ron Fedkiw is a professor of Computer Science at Stanford University. He received his PhD in Mathematics from UCLA. And his post-doctoral studies were at UCLA in mathematics and at Caltech in aeronautics before joining Stanford. He was awarded an Academy Award from the Academy of Motion Picture Arts and Sciences twice, the National Academy of Science Award for Initiatives in Research, a Packard Foundation fellowship, Presidential Early Career Award for Scientists and Engineers, along with several other awards.
He is currently on the editorial board of the Journal of Computational Physics and he participates in the reviewing process of a number of journals and funding agencies. He has published about 120 research papers in computational physics, computer graphics and vision, as well as a book on level set methods. For the past 17 years, he has been a consultant with Industria Light + Magic. He received screen credits for movies such as Terminator 3, Star Wars Episode 3, Poseidon, Evan Almighty, and most recently on Kong Skull Island.
Gordon Wetzstein is an Assistant Professor of Electrical Engineering and by courtesy, of Computer Science, at Stanford University. He's a leader of the Stanford Computational Imaging Lab, an interdisciplinary research group focused on advancing imaging, microscopy and display systems. At the intersection of computer graphics, machine vision optics, scientific computing and perception, Professor Wetzstein's research has a wide range of applications in next generation consumer electronics, scientific imaging, human computer interaction, remote sensing and many other areas.
He received a PhD in computer science from the University of British Columbia and graduated with honors from the Bauhaus in Weimar, Germany before that. He was won the Alain Fournier PhD Dissertation annual award, is the recipient of an NSF CAREER award and has won Best Papers at the International Conference on Computational Photography, as well as the Laval Virtual award. So both of them are quite accomplished. And I should also mention that they are co-academic directors of our new visual computing graduate certificate which I will talk more about after their presentation. So now I'd like to first turn the floor over to Gordon for the first part of the presentation. >> Thank you so much for the introduction, Stephanie.
Time for visual computing, because there's a lot of dynamics in both computer graphics, computer vision, but also in the [INAUDIBLE] perception in virtual reality. So what we'd like to do today is walk you through some of these topics so you get an overview of what visual computing is all about. So Stephanie already introduced this, but I pulled up this nice photo of Ron where he received a technical achievement award.
And sorry about that, but it was really a great achievement and he's done a lot of groundbreaking work in simulation and visual effects and is now working on machine learning as well. I work more on the imaging side, I also work on displays, computer vision and more recently on virtual reality as well. So you may wonder what computational imaging is all about and all these topics are about.
So I just wanna walk you through some of these and show you some of the top trends in these topics. The computational imaging is usually this idea of co-designing the optics, the sensors, and the algorithms for camera. And traditionally cameras have been designed by designing the optics forming an image on a digital sensor or in a film play and then just looking at the image. But with the rise of computational power, we can now co-design these things so that the computer can understand the image and extract meaningful information.
So what does it mean in practice? In practice that means the camera today doesn't work similar to what a camera did a couple of hundred years ago, even though it looks the same. We now capture multiple images with different exposures every time that we hit the button, and that's called high dynamic range imaging. The modes of your cellphone cameras will actually capture different images all at once. The processor on the phone will then compute a single image that combines the most useful information from all of these images. And that's pretty much in standard mode.
We can also think about super resolution, so trying to resolve features in an image that are not visible in the resolution offered by the detector. And there are many other imaging modalities, like sending the depth of fields, capturing light fields, and doing more exotic imaging modalities like compression imaging. So this all sounds great, and how does that translate into practice? Well, we've seen a couple of startup companies coming out of Stanford and also here in the Silicon Valley, such as Lytro which built the consumer electronics version of the electrocamera that you can buy in the store now. This camera allows you to do things like post-capture refocus. So you can take a picture and if you got the focus wrong, you can actually change it after the fact. This camera also allows you to extract depth information of a scene, which is increasingly important for things like autonomous driving, consumer electronics, gaming, and other applications.
Another consumer electronics startup company, just down the street here from Stanford, is called Light. Light has started to release a camera that they call l16. It has 16 different cellphone camera modules.
You can see that here in the middle, and they all have different focal lengths, different properties, and they can all snap a single picture of a scene and then the computer will actually figure out how to merge all of these images into getting a really wide field of view, super high resolution image with unprecedented image quality. Light cameras aren't just meant for consumer electronics. They also enable new experiences, for example, in virtual reality.
On the right you can see Facebook's new 360 camera, captures images in the 360 degrees sphere and can then process these to fuse them together into a seamless virtual reality experience. So this also ties in back to enabling new types of experiences with these imaging systems, for example, in virtual reality. So it turns out that the similar idea can also be applied to displays. This code design of algorithms and unconventional optical systems together with focusing on understanding human beyond the visual system. And a couple of research ideas that people have been working on are high dynamic range displays, super resolution displays, new types of projectors or displays.
And of course, we all are also interested in developing the next generation of head mounted displays for AR and VR. So I wanna talk about one of these technologies a little bit more, which has already migrated into the consumer electronics world, and that's high dynamic range displays. So when I did my PhD about ten years ago, that was a really nice research project with a couple of papers, things that you'd learn in a course.
But now you can actually buy a television at Best Buy or anywhere else that has that technology integrated. And every company has a different name for it. For example, local dimming is what Samsung calls it. And the basic idea is that you can modulate the backlight of the television, the LEDs that are actually inside the television together with the LCD that is usually being used to create a contrast that is much higher that what you get with a normal display. So that creates a much more lifelike experience, and much better user experience overall, okay. So the basic idea between behind computational imaging, for me it's this bridge between optics and modern signal processing optimization.
So we mostly work with geometric optics so we think about light as a wave travelling in straight lines in space. My research group also does a lot of work on using wave optics and more complicated optical models that are more physically accurate. But for the most part in computer vision and imaging, it's good enough to think about light as waves. And then use modern techniques in optimization, deep learning, machine learning and so on to design new kinds of devices such as cameras and this bot. And these have applications in so many different domains. Every one of you probably has a whole bunch of cameras already in your pocket.
Cuz your cellphone likely has at least two of them. But we also think about new and emerging depth cameras for consumer electronic applications, or cameras on drones. We can think about surveillance, medical imaging.
Cameras are in space, they're being used for biomedical applications in microscopy. And they're being used, of course, also for emerging autonomous vehicles or self-driving cars, where the computer has to make the decision of what to do next. So, designing the sensors for these kinds of systems is absolutely critical. So now I'd like to think about computer vision a little bit more and how it's different from imaging. Imaging is very close to the hardware. We design algorithms and hardware together, but at a very low level.
Computer vision is more concerned with inferring high level information of the scene. Such as, you give me a couple of images of your living room. And the algorithm can figure out what the 3D shape, and the texture, and the colors are. So that we can use that in, for example, in an online program where we can place new furnitures in your room. And you can try out different arrangements of your room.
You can order furniture from IKEA online and place them directly into your living room. At the same time these techniques are also useful for autonomous driving, for robotic vision, and for many different other fields. But in general computer vision is concerned with high level decision making, or inference on data that is captured with camera. So another topic that comes up in computer vision quite a lot is scene understanding.
So given an image, what's going on? So if we look at the upper left image here, this is a picture taken from a car. The car wants to understand where the lanes are. It wants to understand where other cars are, how fast they are driving. On the lower left we see an example of segmentation.
We wanna understand, is it a pedestrian we're looking at? Is it a bicyclist? Is it a driver? Is it a building? Is it a dynamic part of the scene? And so on. So segmentation, object tracking, object understanding and recognition, these are key challenges in computer vision. And on the lower right, you see a classic example where we take an image and the computer will figure out which object is what. And try to find it also in a database of existing objects. So for example, we can say hey, on the upper left there's an airplane. On the lower left there's a human, and maybe we can even so some face recognition on that.
So these are all graphical computer vision problems. Tracking is another one. So we wanna be able to predict the path of the motion for different types of objects in a scene. So computer graphics, I don't wanna talk too much about it. We've learned a lot about teapots, and bunnies, and strange rooms with reflective objects. Juan will talk a lot more about this, but computer graphics is the science of taking a mathematical description of light, reflectant models of geometry of the scene, and computing an image of this that is ideally indistinguishable from the real world.
Another emerging area is virtual reality, and augmented reality of course, too. It's an area that's been around for a few decades already. But in the last few years we've made so much progress and we now have the consumer electronic devices that deliver really immersive experiences. And virtual reality is being used for many different applications, from simulation and training, to entertainment, to visualization. But you shouldn't forget about robotic surgery for example, too.
On the lower right, you see the da Vinci, a minimally invasive robotic surgery system. Where a surgeon actually remotely operates the robotic arm to be able to do a surgery live, either in the same room or even in a distant location. And so that's virtual reality too because the surgeon only relies on the visual stream of the endoscope that they see in the remote spot, which is basically a virtual reality display. Fundamentally VR is the new medium that enables totally unprecedented experiences. And we still have to learn how to use this medium and create new types of experiences effectively. So it's a very exciting area of a lot of dynamics and development.
Enhancing Virtual Reality is also one of the grand challenges of The National Academy of Engineering. So not just from a content creation side it's interesting, but also from an engineering side. This brings together computer vision, display, human perception, computer graphics, and many other disciplines. So you can see that's kind of in this overview slide here.
There's just a HoloLens in the middle. The HoloLens has so many different parts. It has custom made chips such as CPU's, GPU's, central processing units, graphics processing units, but also image processing units. We have a whole array of sensors that do scene understanding, hand tracking, positional tracking, and other things. We need to understand human perception and it's limitation so we can build effective displays. And at the same time, we need to create content, for example, using things like cameras or computer graphics.
So I just wanna spend one or two minutes on telling you a little more about our research area, or one of them, to advance virtual reality display. So normally a phone or a small display inside a head mounted display behind a lens. And your eyes will look at a magnified view of this image. This will be simply bigger but it also gets further away.
So each eye will basically look at a different plane floating in space. That's how we can think about it. And that will be at a fixed distance.
So the focal distance of this display is gonna be at one, fixed distance, and that's not a natural viewing experience. If you think about 3D vision. And here's just a one-minute overview of 3D human perception. There are many different cues that the human brain uses to interpret depth from a scene. So there's vergence, that's the relative rotation of your eyeballs in their sockets. It's an oculomotor cue, so a muscular cue.
This is driven by binocular disparity, the fact that we see two different images with our eyes. And this is easy to create with virtual reality displays because we simply render slightly different perspectives in front of each eye. But here's the challenge, the eye also focuses it's crystalline lens.
And that's driven by retinal blur, which is similar to a depth of field effect in the SLR camera. This is a cue that we can't easily reproduce in virtual reality because we cannot optically correct this retinal blur cue easily. All these cues are coupled together and that's what creates a lot of trouble.
So in the real world as you look at different objects, your eyes will verge towards these distances, so they will rotate. But they will also accommodate, so they will focus to these distances. So all these cues are always working in harmony together. In virtual reality there's mismatch between these cues, I'll try and illustrate that here.
We can render images so that you converge at arbitrary distances. But the focus of your eyes will always be linked to that floating plane, the magnified image in space. So this discrepancy between different cues that the brain receives is something that usually creates. Eye stress, visual discomfort nausea and other undesired effects. So, our group does a lot of research in trying to address this for example and here is again an image of the fixed focused to light where we place a phone behind a fixed distance of the lens, a couple of simple ideas that we're addressing that's focus issue would be to either actuate the display by slightly displacing it inside the head mounted display, which changes the magnified image distance. Another way would be to use unconventional objects that can dynamically update the focal power of the lens, and this is a very effective way to change the magnified image distance and accommodation of the human observer.
In practice, we need to use eye tracking for our head monitor display so we build a very simple prototype that you can see here, which has a stereoscopic eye tracker and a motor mounted to the focus adjustment mechanisms of the. Okay, so here's this prototype. We can basically trace the motor very quickly based on the tracked eye position of the viewer. And with only few millimeters of mechanical statement we can actually change the focus plan of the display by a large amount of the entire range of which you can focus.
So that's what you can learn about on the overview of what virtual reality, computer vision and imaging is all about. These are graphics and all of these topics including virtual reality this consist. Photograph of the Stanford's EE 267 class.
So this class is focussed on helping students to build the head-mounted display from scratch over the course of one quarter. And with this I would like to hand over to Ron who can tell you more about special effects in computer graphics and some other topics in visual computing. >> Let me start before you even get into it. Just make sure there's a reason that you and I are talking about a little bit more in depth. The three topics we're talking about which are the Hollywood Effects in a VR/AR. If this is graphics traditionally, there is a lot of small things going on in graphics a lot of impact, but there's been sort of modes of graphics that everybody sees.
In your house, what people typically see are movies and games, those are two things. So if you think about it and look at all the movies and all the tech that goes into that, think of George Lucas, Star Wars, Pixar, a lot of graphics was driven from that. The other things are all these video games. If you look at the AR/VR that's sort of the real time aspect of graphics from the games continue to that direction with not only the talk today, but also his research program, and similarly for the Hollywood special effects.
That's something that I've been pushing forward for the last 20 years. I'll talk about that in the beginning and then where we're going next, and that's also part of the reason that Gordon and I are sort of running the center, is we're sort of looking in the future of graphics and where it's going. It's important at home when you're thinking about this, like what you want to do with visual computing.
That you realize there are these two big driving technologies which are real-time and not real-time, or movies and games, in that sense. And our research to talk, everything represents that in some flavor, but there's a lot more to look into later. So I had this very general Venn diagram to get started.
And it just shows the Hollywood special effects now. You're sort of mixing in a lot of different things. You've always had rendering and graphics, geometry and graphics, animation and graphics, that's always been there. And more recently, physical simulation has come in.
That's something that has come in the last 20 years and even more recently, computer vision is playing a much bigger role. And in machine learning we are going to as we go. So, going back to my lab what we've done for 20 years is critical simulation you've seen it in many movies here are some examples, Davie Jones, Harry Porter etc, etc. And when doing physical simulation for movies what we focus on is many many degrees of freedom, water, wrinkles and and cloth waves underwater, smoke that sought of thing and you really want high visual fidelity, very rich details. But what that comes from is computational physics.
There's equations for the water. There's models for the. And so we look to all those things in order to do the simulations. But if you look into computational physics, what you find is that your algorithms are really good for finding a drag on a wing.
Where the lift kind of wing or modelling a scram jet engine and vaporization of drops. They're very very accurate or faithful to real world data. And it's not we want for graphics. You want stuff which is sought of hyper real in that fence. So computation of physics is a motivation for what we do but it's not just taking the equations from computational physics and reimplementing them in a graphics pipeline.
It's doing something different with them. During last 20 years, I've put half of my papers in computational physics, and half of my papers in graphics in my 40 PhD students 30 former and 10 current. It really struggled along with me, to look back and forth between those two different areas and target simulations and methodologies towards those areas and it turns out, it's very, very different. You can't pick up stuff from one and put it in the other and vice versa and that's been something that really helped up lead the field in this area is understanding what goes where.
You're like [INAUDIBLE] is that an idea for computational physics or an idea for graphics? Yeah, if something in physics, does it work in graphics or does it not work in graphics, and vice versa. And that's been the essence of my research program for the entire 20 years that I've been in it, is just differentiating carefully between those two areas with each idea. No round peg in the square hole-type stuff. You have to avoid that. So more recently, say two years ago, I started realizing there's a third area of simulation which is very, very different from what I had been doing.
And it took me a while to even realize that even though I'd spent 20 years differentiating the feature film technology from the computational physics technology, putting about 50 papers in each one. But it started to sink in a little bit and the idea was that if somebody went out and filmed some or filmed the waves on the ocean, and you want it to somehow use those waves in your simulation to get better looking water, or water that looked like a lake, or an ocean, or a bathtub, or whatever it was, and use the data or you scanned in some cloth. You saw how it moved. You wanted to use that. Or, for human tissue, if you took a bodybuilder, scanned in their muscles and how they flex, then you can do a simulation of biomechanics in the muscles. You want to be faithful to that.
How would you actually do that? I originally started out doing that with what would be number two on this slide. I took the computational physics approach to be faithful to real world data. I pushed into biomechanics for awhile, and what they did there, but it turns out the biomechanical stuff doesn't really lend itself well to the data.
And so, it took a while but we start to realize it's really is a third paradigm for a physical simulation which is popping up. And just isn't true for computational physics as far as like smoke water, fire and human bodies, it's also true for rendering in other areas. Think of that not as just simple simulation but just simulation in general. Anytime you're synthesising the real world, you want to think about different ways of doing it. You're either trying to be faithful to the real world, which would be the computational physics or you're trying to have something really cool, like a game or a movie, that would be the feature film type stuff on this slide. Or you really want it to sort of have the look and feel of the real data.
So, we start looking more carefully into data statistics and machine learning and the first thing is data. Data is what's really driving this. There's more and more data everywhere.
Wherever you look, people are collecting data, storing data in Internet, pushing data around and as more and more data start showing up, they get better ways of collecting data and the better ways you get of collecting data, the more and more data you get. Statistics and machine learning are obviously the areas that work with the data. Actually, statistics, you would think just the statisticians work with data, that's been their job for a long time. The thing is statisticians never had that much data, so in theoretical statistics, it was really a way to talking about dealing with small amounts of data. Even today, if you look at political polls, they poll 1,000 people and that's supposed to represent what tens or hundreds of millions of people would do.
Well, why not poll 100,000 people, or a million people? That would be much better. But statisticians worked very, very hard at using tiny amounts of data to represent large populations. But the world's changing.
Now we have large amounts of data, we can get a lot more data and they're not really taking advantage of it at the speed one would expect. If you look at math departments, going back historically. Once computers start taking off, these new mathematicians started showing up. They weren't theoretical anymore, they were applied mathematicians, they were numerical analysts, computational physics type people. And it was sort of a new way of thinking about the world through the computer, as far as mathematics went, as opposed to the way it was done through pencil and paper in the past.
And once you had the computer, you had things like numerical algebra, numerical optimization, numerical ODEs, and PDEs, and new mathematics had to spring out of this. Similarly, as we add all this data and the statistics that they never had, you're gonna need a new way of dealing with statistics. Then what's happened, which is kind of neat from my standpoint as a computer scientist, is that people in my department, Daphne Koller, Andrew Sebastian Thrun. They sort of picked up the ball on this and started taking all this data and the computers and everything else In developing this new area which we now refer to as machine learning.
But machine learning people are really the next generation of applied mathematicians. They're coming out of statistics departments because of the data similar to the way the applied mathematicians come out of the math departments because of the computers, going back in the past. So it's really drawn me towards machine learning, which is part the artificial intelligence groups. In fact on Tuesday, I formally joined their AI lab at Stanford. So the new part for me, that machine learning people have been really knocking the ball out of the part for five, ten years now.
And the new part for me is looking at all this stuff through the lens of physics, because I do simulations. And so when look to things similar to physics as opposed to statistics, things are a little bit different. Here's a good example, suppose I had ten balls. And I took each ball and I dropped it from shoulder height to the ground. And I measured how long it took to go to the ground and I tried to measure the acceleration due to gravity.
And suppose 9 balls fell with speed 9.8, then 1 ball hovered in mid-air. A statistician might say, well, okay, it's about 9.6 gravity on average. Or they might say, no, no, that one's an outlier, let's get rid of it, it's about 9.8. A physicist might say, holy crap, that ball's floating in mid-air, what's going on? [LAUGH] So it's a different way of thinking about the data, which is like looking at what's actually happening with the physics as opposed to just statistically kind of waving your hands and saying it's noise, it's outliers, this and that. It's looking at the data a little bit differently.
A good example is incompressibility. For example, if I have a bunch of data and it doesn't represent incompressible flow, I know that data needs to go through some filter to make it incompressible. As opposed to just saying this noise outlier is your other thing. These are often called priors in machine learning, and many people use these priors. But it's sort of an advantage that you have if you come from physics to look at it, and it's something we're trying to do in our group.
So again, just to summarize before I show you a few examples, the goal of our research these days is, and it's underline at the bottom, is to design new numerical algorithms to better allow for the incorporation of data into the simulations themselves. Make things super real, not just a cloth simulation for a movie, which is really nice, but super real. Pulled from real world cloth. So some projects, the first project that got us into it was face simulation. And I've been at Stanford for 17 years, full professor, I've been here for a while. So not a lot of things are really over the top impressive.
I've seen all the impressive stuff for 17 years, and you kind of get a little bit callous, or a little bit more relaxed as time goes on. But this idea really hit me. It was the Wednesday after Thanksgiving. And we had this idea for muscle simulation for the faces.
And my grad student took the idea and implemented it. And by that evening, he showed me a simulation with a face that was better than anything that I'd ever seen before, by a quantum leap. And the big thing we did was we took a bunch of data of faces moving around. And we designed a simulation system that could actually target that data. So in the past when we did simulations, you flexed muscles, you moved them around, whatever.
They're biomechanically accurate, physically accurate. But what we did here is we learned how to target the actual data, and it really made a big difference. In fact we already used it in King Kong Skull Island. We had the idea, we wrote a paper in less than two months. It was implemented in less than 12 hours, paper in less than 2 months and it went into movie production literally right after that. Then within less than one year we finished an entire movie with it from idea conception to completion, and it really made a huge difference in the movie.
We've been building models for a long time. This is maybe a 15-year-old picture now of one of my graduate students that we flew up to Canada and covered in latex and built some high resolution muscle models using the old Matrix technology from the Matrix films if you remember that. These days it's easier, you can just scan in a statue with the scanners. We have much better graphic technology now and build a triangle face model out of that.
We outfit these with finite element models, biomechanically accurate muscles, everything you would imagine. And we use cameras and computer vision in order to track what faces do. In the end then, you have some model for how the face moves. Typically using a lot of this you can build an exterior shell model which is on the left, which is just triangles moving around.
But using some of our new technology, you can take a skull and put it inside that face on the left. This is what the right picture is, there's a skull underneath. And put muscles underneath the skull and flex all the muscles in order to try and make the face deform in order to match the thing on the left. So on the left is just a sculpt made by a modeler. He just went in and used some animation software to move around all the triangles, get a really nice looking face. On the right is a muscle simulation which is supposed to mimic that, and if you look at the two, they're very similar, very similar.
But there's some small differences. For example, look at the lips on the left. They're a little bit washed out, a little too smooth, and on the right, they're a little more vivid.
And we captured that bit more vivid because we used the simulation stuff underneath. I might say, okay it's a small change, why does it even matter? The reason it matters is the sculpt on the left was supposed to be perfect. An artist made this, it was gold standard, and it was something we used in the FX industry as a perfect. And then we ran one muscle simulation on it one time just to check it out and try to target that data. And we were like, wow, the lips could be better.
We actually don't have a perfect thing, it's actually much improved. One thing we struggle with in the effects industry with humans is we make them more and more real. There's something called the uncanny valley where all of a sudden if something gets too real it's disturbing if it's not completely real. There's a gap there between things getting better and getting all the way. And so when you get to this gap, you have to be very careful. You typically pull back and make the halt green or change things a little bit so it feels less real, so you can accept it.
In fact, take a completely real human and change it just a little bit, for example, put it in a casket. That's pretty scary, right? Or a zombie, you think about, that's kind of scary. So these things which are almost real but not real, they're very disturbing. And they don't work well in movies, so we're trying to get by this uncanny valley. If we knew what was wrong, we'd just fix it. This is an example of when we know what's wrong.
The thing on the left we thought was perfect, but it turns out the lips could be better, we learned that from the simulation, so we can make it better. And this is the main thing we want to address in the movie industry right now using this technology, which is can we make the faces real and get through the uncanny valley to the other side? Here's a video that shows some of the other benefits of this though. In the upper left is a bunch of sculpts done by an artist. Just a triangle mesh moving around.
In the middle is the result of the muscle simulation targeting in every frame just like in the last slide. In the far right, you can see the skull and some of the muscles moving around underneath the middle to show what the skull muscles had to do in order to target the simulation. What’s nice here is, if I take the far left model which is a bunch of triangles and I build a skull for it on the far upper right in muscles, and I can figure out how to target the animation, then I have some knowledge of how things are moving which aren't just triangles. So if I take that Yoda statue we scanned in on the lower left and I fit a skull and muscles to that, then I can take the upper right skull and muscles and map that one to one to the lower right skull and muscles and move them around.
And I can use it to animate the scanned in Yoda mesh without any artists' help or any animation or any packages or anything. It's literally just a muscle simulation going forward, making Yoda do stuff. Existing technology on the upper left, map them to the triangles on the lower left Yoda, and try to push things around, but since there's a fat lip on the guy in the upper left compared to Yoda's very thin lips, you'd actually see a fat lip moving around on the Yoda, it wouldn't look good.
There's not a good one to one correspondence between the facial geometries of the Yoda and the guy in the upper left. But the muscles underneath have a fairly good one to one correspondence, because we're all actuating things the same way. And if I actuate a muscle or open my jaw or smile or do something like that, that looks different in every person. But everybody's basically trying to activate the muscles in the same way. And so you get a nice one-to-one mapping. It's much better than principle component analysis or any of the other methods, because it's more physically driven.
So another project I'm gonna talk about is cloth. This is one of our very big ones. We've been doing cloth simulation in the lab for a long time.
These are actually ten-year-old pictures of cloth and all the details we've been able to collect. Cloth's used a lot in movies. Frozen's like a cartoon example of using cloth, and Yoda in Star Wars is a good example, his robe, of a more realistic version. The thing is, the real world has much, much better cloth than what we see in movies.
If I go into a movie theater and shut out all the lights and turn on the sound and tell you a story, and show you the cloth on Yoda, you might say, that looks real. That looks like real cloth to me. But you also think Yoda is real. And Yoda's not real obviously.
But somewhere during the movie where it was dark with the lights out, and you really wanted to believe your suspension and disbelief, you actually start thinking Yoda was real. So it's very easy for me to make a pretty crummy piece of cloth appear real to you too. You want it to be real. But you want Yoda to be real. We have in the movie theater.
But if I go into e-commerce or something and wanna do some cloth on your body, you're not gonna wanna see something that's supposedly real and doesn't really fit. You return it to the store after you order it. So there's really a gap between the movie cloth where you're willing to accept the cloth, where you want the cloth to seem real. And the cloth that you actually would have in e-commerce, to show how it's fitting on your body. Something you could buy reliably online, know how it's gonna fit, ship to your house where it had to fit that way and be happy with it and not return it to the store. So this is another place where we wanna use this data.
We wanna take real world cloth, scan it all in, get all the data from how it looks. How it folds, how it behaves, and stuffed it into the simulations so we can do simulations of it. We can learn from it and have it actually represent what would happen.
So you could scan a body model of yourself, move your body around, have some virtual cloth on you that moves and folds and acts just like the real cloth. Moreover, if I can actually mimic the real world cloth and get the data into my simulations, I can go back to Yoda and make a robe on him which is so super real that you couldn't even make it in the real world. And it behaves the way the real world does but it's even better than the real world. And that's what we really want in the movies is something which is hyper real and even better. But if we can't do the real world cloth, it's gonna be tough. So what we're doing in the cloth right now is it's the very early stages, but we're already learning a whole bunch about it.
For example, if you look at the figure on the right here, if you have a curve and the curve represents a scan of a real world cloth, there's a lot of folds, this is a black curve and I down sample it to something I can simulate on the computer which would be the blue curve. Notice that she has how the distance gets shorter. The lengths of the black line between two of the blue points is longer than the lengths of the blue lines, I'm compressing all the edges. And so if you build a database doing this for machine learning or anything else, you scan a bunch of clothe express it down and make a function from the coarse cloth to the fine cloth and then you simulate your cloth, what's gonna happen is you're simulating the coarse cloth and you can't look up the fine cloth. Let me explain that with a projection of an example. Look at the checkerboard on the far left.
If you have a piece of paper and it's orthogonal to the camera, and you project a square, it's going to show up as a square. But if you bend that thing away from the camera, it's gonna stretch the square out. So in order to make it actually appear like a square on a camera, as they do in this paper up in 2016, you have to take the edges of the square and compress if the paper is bent. And so you have to learn how to compress things when they're real length in the projection. I did a very good job with that so we need to do the same thing in cloth.
If you look at the upper right here, if the resh shape of the piece of cloth is long and then you look at the line below it and you compress it, it wrinkles, that's what happens to real cloth. And in our simulation, if we only have one edge there, when we compress it, so if you look at the third figure down the simulation edge and when we compress it, it just needs to shrink. It can't bend, because it's a single license, we don't have the details actually to bend it, but we want to be able to take the real world data of the bending and wiggling and project it onto there.
So we needed to actually shrink, so when we project the wiggling back on there, we get the right thing. We can't just shrink the cloth by relaxing all the springs by elements connect it, cuz then it also stretches too much, and so what you really need is in set up forcing edge length on your simulator cloth to be equal to the ref length, you need to constrain them so they can't get bigger than the ref length and sag like the lower right picture. But you want them to shrink left from their edge length, because you want them to be compressing, where you're gonna add the real-world data from the wrinkles layer. No cloth techniques do that, so we had to go out and design a new, numerical method for simulating cloth, which allows for edge lengths to be constrained with an upper bound but no lower bound. And so, if you look at the lower right, that's an example of using our new Inequality Cloth, with inequality constraints. And if you look at the left picture, that's using the equality cloth and what you can see is when it gets really detailed in the folds there, the equality cloth has much bigger folds, some buckling, and some resistance to the fine details and wrinkles you'd normally expect.
Whereas, the equality cloth actually starts doing the right thing. They can start folding a little bit better. Again, it's subtle. And there's no data here. But it is in the real world, you'd have many, many more wrinkles, which would be superimposing on this cloth.
And the cloth's already buckling doing the wrong thing cuz you're enforcing the edge length constraints to be equal. Which is incorrect for a compressed version of cloth, then you get this picture on the left and you can't put the data on it anyhow. Think about it like you have a sheet of paper and you compress the paper a little bit and you want to get some wrinkles. Well, if you didn't compress the paper, if it stayed it's normal size, just added wrinkles to it, it'd magically be getting longer, and adding geodesic distance. And that's something you don't want.
You want the cloth artificially gaining wrinkles. And that's what everyone does now. You take Equality Cloth and add wrinkles to it, so they're actually increasing its surface area, which looks really bad and strange. It's like your cloth is growing wrinkles out of nowhere. So, this is where a pipeline is gonna look like for cloth. Basically, if you start on the far right, the fine result is scanned in data from the real world.
Then you compress it, which would be the blue bubble, second from right, and you get the coarse result. Which are the compress cloths, and those would be things you'd store on the computer. Now going to the far left, if your inequality cloth simulations, which will be sort of coarse. And you look them up in your database of course results, and then use your machine learning functions that you've learned or to find the fine results from the real world that map to that, and stuff those two things together. The idea is you wanna hybridize scanning of real world cloth with the best simulations we can do to get ultra-real cloth. I wanna mention the bodies for like one second here.
We're putting body, mouth together with muscles. We're scanning in weight lifters. This is Michael Black's work from Europe.
We're scanning in people to learn about the muscles. We're trying to add muscles to the body, because you want to really learn how the body works under the cloth, before you do the cloth on top. The idea here is you scan in a really lean body build where the muscles work first, and then shrink the muscles down to normal, it adds that on top to get a more normal person. So trees are the last topic I want to address.
And for trees, a tree is really a multi-body system. You can think of a tree as continuous, with maybe the leaves as having angles there, but as a tree bends and moves around, you can think of it as a bunch of cylinders. So multi-body systems are commonly done, we've done a lot of those in our lab.
For example, Davy Jones tentacles or something, that we have done. We can model tanks, and skeletons, and all kinds of things, as multi-body systems. The problem with a tree, this tree in particular, which I see on my way to Stanford every day on my bike, is that that's a lot of bodies. All those leaves, all those branches, put a little cylinder for everything. Not to mention that's not just a bunch of cylinders, that's an amazing configuration of geometry, but I have to bend and move and deform.
It's impossible to do it with a multi-body system that's way too complex. We worked for a long time trying to build something especially for that. In fact we made this specialized approach which would allow us to do something really big. Again, the idea was driven from real world data. We wanna actually take a tree model like this, scan the tree in, build the tree, simulate the tree, simulate the wind hitting the tree and everything else.
We used the real world data to drive it, so it looks ultra real. So building a simulation system for this was impossible, and we had to do some research just to do that. The neat idea we had in that research was people often and all these robotic mechanisms, build these approximations to the exact solutions to multi-body problems. We invert that a little bit. We actually made an approximate problem, and found the exact solution for the approximate problem.
It's got similar errors in many ways, but because it was an exact solution, we could solve it without any of the stiff issues in numerical OAEs. We got rid of stiffness and time steps to billing and everything else. So we could actually simulate something with many, many, many bodies. The simulation was wrong. it would never converge to the right answer, cuz we made approximations in the equations themselves. But if you go back to people who approximate the exact equations, they never get there either.
And so you can bound the errors, that's not so bad. We used this to simulate a whole bunch of really detailed trees. This shows you some geometries we simulated, but we hated all these trees. We used the best systems we could to build these trees, but we never were able to simulate them to our liking because of the fact that we just didn't like the geometry. We really wanna simulate that tree, and so we ran into a problem where we couldn't synthesize in graphics the kind of tree that we wanted to simulate, we wanted real world data once again for simulations. And this time we're having trouble getting the real world data.
So we went out and tried to use 3D Computer vision to reconstruct the tree. Fact we took drone and cameras and flew them around a tree and did all the usual stuff. Then we started getting things, but you know what, we couldn't get the tree. We couldn't do it, and what we realised was in computer vision, a lot of what they do is use texture to make up for missing geometry. So when you look at something, you feel like the three geometry is there, but it's not really there.
In fact, when you look at this picture there, you might say that's a decent look of a 3D tree, but actually you're not seeing a 3D tree at all. What you are looking at is a slide which is projected on your 2D computer screen. It's completely 2D, there's no 3D geometry in any of these pictures, actually nothing I've showed you has any 3D geometry. You look at that tree, there's no 3D geometry there. You think you see something in 3D, but you don't. It's projected the 2D in your eyes, and your brain are doing shape from shading other algorithms that make that feel 3D.
And in fact, whenever we do 3D reconstructions, of things in your computer vision and we think we're building 3D models, we look at them on our computer screen. So we don't really have a good evaluation. So it turns out that we actually need not just a new simulation to incorporate the data, we need new data for the simulations. Typical computer vision doesn't really construct the data that you need in order to go through and articulate the tree up and build a multi value system that can be simulated. It's good for looking at, it's good for placing furniture, it's good for all these different things in a rough way.
But you want to simulate all this, find out the stress and strain on the floorboards from your furniture and that sort of thing. Or actually put a bunch of cylinders into this tree and have it bend in the wind and do its thing. It takes a lot more detailed geometry, even new computer vision and machine learning algorithms in order to just collect the data. She found herself an interesting spot where we actually need really high fidelity geometry, the computer vision really doesn't provide yet.
And so we need not only the new simulation technology take care of the data, but we need new ways of collecting the data to provide to the simulation. And so it's a two-way thing, that's what led me more recently into the artificial intelligence lab to work with all these new applied mathematicians, which we call machine learning people. And the last slide just on wind inference, it's one of the cool things you wanna do with the tree once we can actually build the model. Once we can solve both the problems I mentioned, then you can actually build this tree and simulate the tree. And then video capture the tree, and actually figure out how the wind is affecting the tree and pull that wind off and use it for other trees or other models, or for making movies and that sort thing. So I'll wrap up there at 10:50 and we'll have ten minutes of questions.
>> So we wanna head in to the Q&A, as reminder please type your questions into the Q&A box so that Gordon and Ron can take your questions. But we do have a couple of questions to start. So if you guys maybe wanna take the first couple? >> So first question is, what career skill sets, education, training should we be focusing on now in the future for visual computing in a workplace? One thing I wanna mention with graphics and visual computing in general is that people usually think of visual computing as complementary to something else. So you should take a bunch of visual computing courses, take the ones in certificate, that's why George and I picked those, is they give you a good basis.
But we always want people to have something else. And so, I talked about all the machine learning and the data science and that sort of thing. It would be a nice thing to mix a lot of this with that. People also mix visual computing with human computer interaction, HDI classes.
Where the hackers will mix visual computing with systems type classes. Graphics, just one of those things and visual computing in general is one of those things. Where it's nice to mix them in a hybrid way with other areas. And so almost everyone who comes to me says they wanna do graphics, then what they wanna do it for, I always tell them to add something else to it. It's always typically the visual certificate is the core. But ask yourself, do I care about systems? Do I care about HDI? Do I care about hard works.
People make graphics cards, that's part of graphics. So architecture or something. But find the other thing and once you know the two things that you want, visual computing plus Whatever the other thing is, it will guide you through the visual computing courses to choose the ones that are most appropriate than hybridization. Computational geometry, math. Leo Guibas proves a bunch of theorems all the time. So if you take his classes, you can get more into mathematics.
But that's my advice, is to always find the other field you want in addition to visual computing and hybridize them. >> Okay, so the second question is, will the technology improve better and more accurate medical research after this like cancer, diabetes and others? >> So that's an interesting question. Visual computing isn't necessarily linked directly to research and medical imaging.
However, a lot of the techniques that we're talking about in sensing, in computational imaging, and the applied mathematics and imagery construction are also being used in medical imaging. We work very closely with faculties from radiology department, for example, that work on computer tomography, that work on MRI scanning. Throughout the medical imaging technique use very similar principles and very similar algorithms. And even though I don't think anybody of us is directly working on cancer research, a lot of the techniques may be able to help discover new things [COUGH] in those fields as well. >> Okay, can you scroll up a little bit? I'll take the geometric model one. Is there an estimated timeline for developing the next level geometric model needed to address many of the challenges explained today? That's one question.
I'm also gonna take the question, does your work involve cognition neural networks, which is two down, cuz I can answer those together. Last quarter, in the spring, there were two very interesting classes taught. One by Leo Guibas on geometric modeling using machine learning and networks.
And one by Fei-Fei Li and her students on convolutional networks. I actually took both those classes. I mean, I know a lot of stuff from reading and stuff. But I listen to everything lecture anyhow, in case there's something I was missing.
There's fantastic stuff happening right now in the field of computer vision on both 3D geometric modeling, which is one of the big things that are needed in accomplishing neural networks. My favorite classes at Stanford that aren't visual computing are Leo's class I just mentioned, Fei-Fei's class I just mentioned, and Andrew Ng's 229 machine learning class. So I think anyone interested in visual computing with data or anything else should definitely make sure to do Andrew's class, Fei-Fei's convolutional network class, and Leo Guibas' geometry classes. But that's really one of the future areas that CVPR and computer vision is that 3D geometry.
And Leo and I talked, in general, about putting together simulation in geometry and how some of the stuff in geometry, which isn't ready, can be proved with simulation, and some of simulations that present ready can be improved with geometry. So there's a real back and forth between that. Convolutional networks are really finding a basis for things. You have things like autoencoders or what-not for a piece of cloth, let's say. What is a good basis of representation for that cloth? You could try doing Fourier analysis and finding sine waves of the folds.
Or you could do a convolutional network or an auto-encode. There's many ways of doing that. But these are, I think, two really big things importantly related to the visual computing. >> All right, I'll take a question on how does visual imaging integrate with images and vision in neuroscience and deep learning? >> So a lot of the classes we teach actually draw students from many different departments.
We have a big neuroscience institute here at Stanford. And a lot of these people actually take our classes as well because even though visual computing is mostly focused on consumer electronics and entertainment on areas like this. Again, the underlying techniques, the underlying algorithms for processing images, for learning images, for using deep learning convolutional neural networks and things like that for learning the fundamental building blocks and the structures of the images and what in fact is equally applicable to other fields. And although visual computing is not focused directly on neuroscience, the same techniques that we teach in our classes are applicable to those areas as well.
Okay, there was a question on business opportunities also further up. Can you give us some applicability to business scenarios beyond web rendering of products? So I think visual computing has a lot of applications in many different fields that you may not think of immediately. I mean, what comes to mind? This presentation is movie industry, games, virtual reality, things like that, consumer imaging.
But there's a lot more beyond that. A lot of the students here at Stanford actually go out and found their own startup companies or get involved with new and emerging trends. And things like, think about virtual reality as something in the analogue to where the first phones came out and people started writing apps for that. So it's like a gold mine, almost, where you can think of any kind of an application, write an app, and it can make a huge impact. That's where we are now with virtual reality, where people use it for gaming >> But think about architectural walk-throughs or virtual furniture placements or any kind of problem that you may have at home. You can use these techniques that you're learning in these classes and in visual computing to solve some of these problems.
And that's what makes it very exciting. >> Something else on that, which is interesting. So I have 30 former PhD students and 10 of them are doing machine learning. 10 of them are Google, 10 of them are in academia.
They're pretty split up in different areas. What do you hear back from the ones who don't actually do as much visual computing any more than within other areas is that it's really handy to be able to put a mockup together of what you're doing with some graphics. And one thing we focus on in some introductory classes, like 148, is just getting your hands dirty with some open GL, some rate tracing, learning to deal with rendering geometry, and the basics. And they find that for any project you're doing, just the ability to do some interesting computer graphics is useful for them in their other projects.
And again, I cover the hybridization of different areas. And how we think of that you may end up in the other area 90% of the time. But if you can do graphics a little bit, everyone you know is gonna want you to throw things together for them. And it's just a common theme that if they think of you as a graphics person, that you can just do better demos of everything, and so keep that in mind, too.
It's really something to have as a tool in your skill set for anything that you wanna do. >> So let's take two more questions and then we're gonna wrap up, just in the interest of time. So what do we think about some other questions here? >> Okay, yeah, I'm gonna take a question. How soon can we expect to AR, like HoloLens and other things, to substitute personal computing effectively? I think this is a great area of research right now. And we see a lot of transition into useful products.
We see VR being used in gaming. It's being shipped with PlayStation. Microsoft announced that it's gonna ship one with the Xbox and then maybe other game consoles where it's useful today. Apple recently announced that the iOS 11 will include a lot of functionality for augmented reality, but that's gonna be mostly phone based.
I think AR, I just think about things that are integrated in your eyeglasses and gives you a display and then a window to this digital world completely replacing other computers, that's still gonna be ten years out just so. And that's a very active area of research right now. But I wouldn't expect this to come in the next two years or so. >> Okay, I think we're actually gonna wrap up after that question. I wanna give a big thank you to Gordon and Ron both for a wonderful presentation, and for taking time to answer attendee questions.
As a reminder, the recording will be made available within a week and sent to participants. We wanna thank you for taking the time to attend today and we will see you at the next webinar. Thanks so much.