A Fireside Chat with Turing Award Winner Geoffrey Hinton, Pioneer of Deep Learning (Google I/O'19)
Hello. I'm Nicholas Thompson I'm the editor in chief of Wired, it is my honor today, to get the chance to interview, Geoffrey. Hinton there a couple well there are many things I love about him but. To that I'll just mention in the introduction the. First is that he persisted. He. Had an idea that he really, believed in that everybody else said was bad and. He just kept at it and it gives a lot of faith to everybody who has bad ideas myself included and, then the second, as, someone who spends half his life as a manager, adjudicating, job titles I was, looking at his job title before the introduction and he has the most non-pretentious. Job, title in history. So please, welcome Geoffrey Hinton the engineering. Fellow at Google. Welcome. Thank you so nice to be here with you all right so let. Us start, 20. Years ago when. You. Write your some, of your early very influential papers, everybody. Starts to say it's. A smart idea but, we're not actually gonna be able to design computers, this way, explain. Why. You persisted, why you were so confident that you had found something important, so. Actually it was 40 years ago, and. It. Seemed to me there's, no other way the brain could work it has to work by learning. The strengths of connections, and, if. You want to make a device do something intelligent you've got two options you can program it or it can learn right, and we. Certainly want programmed so. We have to learn so this had to be the right way to go so, explain, though well, let's do this explain, what neural networks are most the people here will be quite familiar but explain, the, original insight and how. It developed in your mind so. You have relatively, simple processing, elements, that are very loosely, models of neurons they. Have connections coming in each, connection has a weight of it that. Weight can be changed, to do learning and what. A neuron does is take the, activities. On the connections times the weights adds them all up and then. Decides whether to send an output and if. It gets a big enough sum it sends an output if the sum is negative it doesn't send anything, and weight about it and. All you have to do is just wire up a gazillion of those with, a gazillion weight with cuisine. Squared, weights and. Just. Figure out how to change the weights and it'll do anything it's, just a question of how you change the weights so. When did you when. Did you come to understand, that this was an, approximate. Representation of how the brain works oh it. Was always designed as that right, it was designed to be like how the brain work but let me let me ask you this so at some point in your career you under start to understand how the brain works maybe it was when you were 12 maybe it was when you were 25 when, do you make the decision that. You will try to model computers, after the brain Oh, sort. Of right away that was the whole point of it, the. Whole idea was to have a learning device that. Learned like the brain like, people think the brain learns by changing connection strengths and this, wasn't my idea this, shirring had the same idea shirring. Even. Though he invented, a lot of the basis of standard. Computer science he, believed that the brain was this unorganized device, with random weights and it, would use, reinforcement. Learning to change the connections, and it would learn everything and he thought that was the best route to intelligence, and so, you were following Turing's idea that the best way to make a machine is to model it after the human brain this is how a human brain works so let's make a machine like that yeah it wasn't just choice I did a lots of people thought that like that all right so you have this idea lots of people have this idea you. Get, a lot of credit. In the late 80s you start to come to fame with your published, work is that correct yes when, is the darkest, moment when is the moment where other. People, who've been working who agreed with this idea from Turing start, to back away and yet you continue to plunge ahead.
There. Were always a bunch of people who kept believing in it particularly. In psychology, but. Among computer scientists. I guess, in the 90s, what. Happened was datasets were quite small when. Computers, weren't that fast and on. Small data sets other methods, like things, called support vector machines worked. A little bit better they. Didn't get confused by noise so much and so. That was very depressing, because we developed back propagation, in the 80s we, thought it would solve everything, and we were a bit puzzled about why it didn't solve everything. And it was just a question of scale but we didn't really know about them and so. Why did you think it was not working, we. Thought it was not working because we didn't have quite the right algorithms, we didn't require the right objective, functions, I thought for a long time it's because we were trying to do, supervised. Learning where, you have to label data and, we should have been doing unsupervised, learning where you just learn from the data with no labels. It. Turned out it was mainly a question of scale that's interesting so the problem was you didn't have enough. Data you thought you had the right amount of data but you hadn't labeled it correctly so you just misidentified. Of the problem I thought. That, using labels at all was a mistake you ought to do most of your learning without making any use of labels just, by trying to model the structure in the data I actually still believe that right, I think as computers, get faster for. Any given sized data set if you make computers fast enough. You're better off doing unsupervised, learning and. Once. You've done the unsupervised, learning you'll be able to learn from fewer labels so, in the 1990s you're continuing, with your research or in academia you are still, publishing, but it's not coming to a claim you aren't solving big problems when. Do you start, well. Actually was, there ever a moment where you said, you know what, enough. Of this I'm gonna go try, something else no. Really I'm gonna go you know sell, burgers but I'm gonna figure out a different way of doing this you just said we're gonna keep doing deep learning yeah something, like this has to work I mean the connections, in the brain are learning somehow and, we just have to figure it out and probably. There's, a bunch. Of different ways of learning connection, strengths the brains using one of them there may be other ways of doing it but, certainly you have to have something that can learn these connection strengths and I. Never doubted that okay, so you never doubt it when, does it first, start to seem, like it's working, okay you, know we've got this I believe, in this idea and actually if. You squint you, can see it's working when did when did that happen okay so one of the big disappointments in the 80s was, if you made networks, with lots of hidden layers you. Couldn't train them that's, not quite true because convolutional, networks designed by Alec are you could train for fairly simple tasks, like recognizing, handwriting. But. Most of the deep nets we didn't know how to train them and. In. About 2005. I came, up with a way of doing. Unsupervised, training, of deep nets so. You take your inputs. A your pixels, and you'd learn a bunch of feature detectors that were just good at explaining, why, the pixels, will Haven like that and. Then you treat those feature detectors, as the data and, you learn another bunch of feature detectors that we're good at explaining why those feature detectors have those correlations and you, keep learning layers and layers and what, was interesting was you. Could do some math and prove that. Each time you learned another layer you. Had you, didn't necessarily have a better model of the data but, you had a band on how good your model was and you could get a better bang each time, another layer what do you mean you had a band on how good your model was okay. So you can ask once. You've got a model yeah you can say how surprising. Does the model find this data you showed some data you say is that the kind of thing you believe you notice that surprising, yep and you can sort of measure something that says that and. What. You'd like to do is have a model, a good model is one that looks at the data says yeah yeah I knew that yep it's unsurprising okay. And. It's. Often very hard to compute exactly how surprising, this model finds the data but. You can compute, a bound on that you can say the the this.
Model Finds the data less, surprising, than this uh-huh. And, you. Could show that as, you had extra layers of feature detectors you. Get a model, and each. Time you add a layer it, finds the data the, bound on how surprising, finds the data gets better oh I see okay so that makes sense so you're making observations and, they're not correct, but, you know they're closer and closer to being correct I'm looking at the audience I'm making some generalization, it's not correct but I'm getting better and better at it roughly. Roughly, okay so that's about 2005, where you come up with that mathematical. Breakthrough yeah when do you start getting answers, that are correct and what data are you working on your this, is speech data where your first step you break this was just hungry digits, very simple gauging, then. Around. The same time they started developing GPUs. And. The people doing neural networks started, using GPUs, in about 2007. Aha. I. Had. One very good student called Vlad me who started using GPUs, for finding. Roads in aerial images uh-huh. He wrote some code that was then used by other students, for. Using, GPUs, to. Recognize. Phonemes, in speech uh-huh. And so they were using this idea of pre-training yep, and, after they'd done all this pre training that chunk, then just stick labels on top and use back propagation and. It turned out that way you could have a very deep net there. Was pre trained this way and you. Could then use back propagation and, actually worked only. Sort of beat the benchmarks, for speech recognition if, initially, just by a little bit it, beat the best commercial available, speech recognition it beat the academic, work on speech recognition on. A relatively. Small dataset, called timid I did, slightly, better than the best academic work. Also. Work. Done at IBM, and. Very. Quickly. People. Realized, that this stuff since. It was beating. Standard. Models that are taken 30 years to develop right with, a bit more development, would do really well and so. My graduate, students went off to, Microsoft. And IBM and, Google and. Google. Was the fastest, to turned into a production speech recognizer yeah and by, 2012. That. Work that was first done in 2009. Came. Out in Android, Android. Suddenly got much better at speech recognition so tell me about that moment where you've. Had this idea for 40. Years you've been publishing on it for 20 years and you're. Finally better than. Your colleagues what. Did that feel, like well. Back then I'd only had the idea for thirty years correct. Sorry sir, just. A new idea it's fresh um it felt really good that it finally got. The state-of-the-art on a real problem and do you remember where, you were when you first got the, revelatory. Data. No. No, yeah, okay all right so you. Realize it works on speech, recognition when. Do you start applying it to other problems, so let me start applying it to all sorts of other problems yep, so George Dahl who did the was, one of the people who did the original work of speech recognition applied. It to I, give, you a lot of descriptors, of a molecule, yep and you want to predict, if that molecule, will bind to something yep tract is a good drug and, there, was a competition on cargo and he, just applied our standard, technology, designed for speech recognition aha to, predicting the activity of drugs when, they won the competition, so. That was a sign that this stuff sort of fairly Universal uh-huh, and then, I, had, a student called Ilya sutskever who. Said you. Know Geoff this stuff is going to work for image recognition and, faithfully. Is created, the correct data set for it yeah and there's. A public competition we have to do that and. So. What, we did was take an, approach, originally, developed by Yana car yeah we a. Student. Called Alice Khrushchev ski was real wizard could make GPUs do anything. Programmed. The GPUs really, really well and. We. Got. Results. That work a lot better than standard computer vision that was 2012, and it was the coincidence, I think of the speech, recognition coming out in the Android so you knew this stuff could solve production, problems right and on.
Vision. In 2012. It had done much better than standard, computer, vision so, those are three areas where it succeeded so modeling. Chemicals, speech, voice. Where, is it failing. The. Failures only temporary, you understand. Where. Is it failing. For. Things like machine translation. And I thought it would be a very long time before we could do that yeah because, machine translation, you've got a string, of symbols comes in and a, string of symbols goes out aha, and it's fairly plausible to, say in between you do manipulations, on strings of symbols right which is what classical AI is yeah. Actually. It doesn't work like that the, strings of symbols come in you, turn those into great big vectors in your brain these vectors interact with each other right and then you convert it back instead of strings of symbols to go out and. If. You told me in 2012. Yeah, then in the next five years we'll be able to translate. Between. Many languages, using. Just the same technology. Recurrent. Nets but. Just. The stochastic, gradient descent from random initial weights I wouldn't, have believed you it happened much faster than we expected but so what, distinguishes, the areas, where. It. Works, most, quickly in the areas where it will take more time it, seems like, visual. Processing speech recognition sort of core human. Things. That we do with our sensory, perception, there seem to be the first, barriers, to clear, is that correct yes. And no because there's other things we do like motor control we're very good at motor control, our brains are clearly designed for that yeah and that's only just now on, your. Own that's beginning to, Pete with the best other technologies, there they. Will win in the end but they're, only just winning now I, think. Things like, reasoning. Abstract. Reasoning are. Gonna be therefore, kind of last things we learn to do and, I think they'll be among the last things easier let's learn to do and, so you, keep saying that neural nets will. Win. Everything. Eventually well, we own your own nets right right anything we can do they can do right but just because humans. The. Human brain is not necessarily. The most efficient. Computational, machine ever created well. My second there not being so certainly not my human brain it could it there be a way, of modeling machines, that is more efficient than the human brain philosophically. I have no objection to the idea that could be some completely, different way to do all this it could be that. If you start with logic, and you try and automate logic, and you, make some really fancy fare improver. That. And you do reasoning, and then you decide you're going to do visual perception, by doing reasoning yeah it could be that that approach will win it turned out it didn't but, I've no philosophical objection, to that winning it's. Just we know that brains can do it. Right. But. There. Are also things that our brains can't. Do well are those things that neural nets also, won't be able to do well. Quite. Possibly yes. And then there's a a separate, problem which is we. Don't know, entirely how these things work, right we really don't know how they when we don't understand, how. Top-down. Neural, networks right there's even a core element of how neural networks work that we don't understand, all right so we explain that and then let, me ask the obvious follow-up which is we, don't know how these things work how can those things work okay. You. Asked that when I finished explaining yes, um so. If. You look at current, computer vision systems most of them they're basically feed-forward. They. Don't use feedback connections, there's. Something else about current computer vision systems which is they're very prone to adversarial, examples, you. Can. Change. A few pixels slightly, and, something. That was, a picture of a panda and still looks exactly like a panda to you it suddenly says that's an ostrich. Obviously. The way you changed the pixels is cleverly designed to fool it into thinking it's an ostrich. But. The point is it still looks just like a panda to you and. Initially. We thought these things worked really well but then when confronted with the fact that they, look at a panda and be confident it's an ostrich and you, get a bit worried and I, think part of the problem there, is that they're not trying, to reconstruct. From the high-level representations.
They're, Trying to do discriminative, learning where you just learn layers of feature detectors and the, whole whole, objective is just to, change. The weights so you get better getting the right answer, they're. Not doing things like at each level of feature, detectors, check. That you can reconstruct, the data in the layer below from, the activities of these feature detectors and. Recently, in Toronto we've been discovering, or, Nick Frost's been discovering, that if. You introduce reconstruction. Then. It. Helps you be more resistant, to have a serial attack so. I think in human vision, to. Do the learning we're doing reconstruction and, also because we're doing a lot of learning by. Doing reconstructions. We. Are much more resistant, to adversarial attack but you believe that top-down. Communication. In a neural network is how you test, how you reconstruct, how you test and make sure it's a panda not an ostrich eye I, think, that's crucial yes because I think if you but brain scientists, are not entirely agreed on that correct. Brain. Scientists, are all agreed on the idea that if you have two areas of the cortex in a visual in a perceptual pathway, if there's connections from, one to the other they'll, always be backwards connections, right not necessarily point-to-point but there'll always be a backwards, pathway, they're. Not agreed on what it's for right it could be for attention, it could be for learning, or, it could be for reconstruction. It, could be for three and so you we. Don't know what, the backwards communication, is you. Are building your new neural networks on the assumption, that or, you're building backwards, communication. That is for reconstruction, into your neural networks even though we're not sure that's how the brain works yes. So that cheating no, if you're trying to make it like the brain you're. Doing something we're not sure it's like the brain not, at all okay um there's, - I'm not.
Doing Computational, neuroscience, that is I'm not trying to make a model of how the brain works I'm, looking at the brain and saying this, thing works and, if, we want to make something else that works we. Should sort of look to it for inspiration so this is euro inspired, not a neural model okay so the whole model the neurons we use they're inspired, by the fact your ins have a lot of connections they change the strings it's interesting so if I were, in. Computer, science and I was working on neural networks and I wanted to beat geoff hinton one. Thing i could do is i could build in, top-down. Communication, and base it on other, models of brain science so based on learning not on reconstruct if they were better models then yeah you'd, win yep that's very very interesting all right so let's let's, move to a more general topic so, neural. Networks we'll be able to solve all kinds of problems are there, any. Mysteries. Of the human brain that. Will not be captured. By neural networks or cannot for example could. The, emotion know, look so love could be reconstructed, by a neural network consciousness. Can be constructed. Absolutely. Once you figured out what those things mean we. Our neural networks right. Now. Consciousness, is something I'm particularly interested in I. Get. By fine without it but. So. People, don't really know what they mean by it there's all sorts of different definitions, and I think it's a pre scientific term so, a hundred years ago if, you, ask people what. Is life, that. It said well living things have vital force and when they died the vital force goes away and that's what be that's, the difference between being. Alive and being dead whether you got vital force or not mm-hm, and now. We don't think that sort of we. Don't have vital force we, just think it's a pre scientific concept, and once you understand, some biochemistry. And molecular biology, you, don't need vital force anymore you understand how it actually works and, I think it's gonna be same with consciousness I think consciousness, is an, attempt, to explain mental. Phenomena, with some kind of special essence. And, this. Special essence you don't need it once you can really explain it. Then. You'll explain, how we do the things that make people think we're conscious and you'll explain all these different meanings of consciousness. Without. Having, some special. Essence, as consciousness. Right. So, there's no emotion, that couldn't be created, there's no thought that couldn't be created. There's nothing that a human mind can do that couldn't theoretically, be recreated, by a fully, functioning, neural. Network once we truly understand how the brain works the Sun the inner John Lennon song that sounds very like were you just. And. You're a hundred percent confident of this. No. I'm a Bayesian, so I'm 99.9%, cold. Okay. What, is the point one well. We. Might for example it will be part of a big simulation. True. Fair enough okay. That. Actually makes me think it's more likely that we are. All. Right so what are we learning as we do this and as we study the brain to improve computers, how. Does it work in Reverse what are we learning about, the brain from our working computers, so. I think what we've learned in the last 10 years is. That if, you take, a system with billions, of parameters, and you, do stochastic, gradient descent, in. Some objective function and, the objective function might be to get the right labels, where, it might be to fill in their gap in a string of words. Well. Any old objective function. It. Works much, better than, it has any right to works, much better than you would expect you. Would have thought and, most, people in conventional AI thought take. A system with a billion parameters, start them off with random, values yeah. Measure. The gradient of the objective function. That is for each parameter figure, out how, the objective, function would change if you change that parameter a little bit, and. Then. Change it in that direction that improves the objective function you'd, have thought that would be a kind of hopeless algorithm, they'll get stuck and write and it, turns out it's a really good algorithm, and the, bigger. You scale things the better it works and. That's just an empirical discovery, really there's. Some Theory coming along but it's basically an empirical discovery now because. We've discovered that it, makes it far more plausible, that the brain is, computing.
The Gradient of some objective, function and updating, the weights of strengths of synapses to, follow that gradient mm-hmm, we just have to figure out how, it gets the gradient and what the objective function is but we didn't understand, that about the brain we didn't understand the reread it was a theory, it was a long, time ago people so, that's the possibility, but. In. The background there was always sort of, conventional. Computer scientists saying yeah, but this idea of everything's random you just learn it all by gradient, descent that's never gonna work for a billion parameters, you have to wire in a lot of knowledge, all right so and we know now that's wrong you can just put, in random parameter, and everything so let's expand this out so as we learn more and more we will presumably continue, to learn more and more about how the human, brain functions as we run these massive, tests on models based on how we think it functions, once. We understand, it better is there a point, where we can. Essentially. Rewire, our brains to, be more like the most efficient machines or change, the way we think. It's. Using relation that should be easy but not in a simulation you just thought that. If. We really understand what's going on we should be able to make things like education, work better yes, and, I think we will, yeah. I it. Will be very odd if you could finally understand, what's, going on in your brain and how it learns and, not be able to adapt. The environment so you can learn better well that's okay I don't want to go too far out in the future but a couple, years from now how do you think we will be using what, we've learned about the brain and about how deep learning works to change how education functions how how would you change a class in, a couple of years I'm not sure we'll learn, much I think. It's going to change, the education, is going to be longer but if you look at it assistants. Are getting pretty smart map yeah, and once the systems can really understand, conversations, assistants. Can have conversations, with kids and educate them so already, I think. Most of the new knowledge I acquire is. Comes. From me thinking I wonder and typing something to Google and Google tells me think you just have a conversation at acquire knowledge even better and so theoretically, as we understand the brain better and as we set our children up in front of assistants. Mine, right now almost certainly based on the time in New York is yelling, at Alexa to play something on Spotify, probably baby shark you. Will program, the assistants, to. Have better conversations, with the children based, on how we know they'll, learn yeah, I haven't really thought much about this it's not what I do but, it. Seems quite plausible to me AHA, well, we will. We be able to understand, how dreams work one, of the great mysteries. Yes I'm really interesting dreams I'm. So interested, I have at least four different theories, of dreams. So. A long time ago there were things cool okay a long time ago they were hot networks and they.
Would Learn memories. As local attractors, and. Hopfield. Discovered, that if you try and put too many memories in they, get confused. They'll. Take two local attractors and merge them into an attractor, sort of halfway in between. Then. Francis, Crick and Graham mitchison came along and said. We. Can get rid of these false minima, by doing unlearning, so. We turn. Off the input we, put the neural network into a random state we, let it settle down when, we say that's bad change. The connections so you don't settle to that state and, if you do a bit about. It. We'll be, able to store more memories and, then, Terry Sinofsky and I came along and said look, if, we have not just the neurons where you storing the memories but, lots of other neurons - can. We find an algorithm that will use, all these other neurons to help you store memories when. It turned out in the end we came up with the Boltzmann machine learning algorithm, and the Boltzmann machine learning outcome had a very interesting property which is I show. You data that is I fix the states of the observable units and it. Sort of rattles around the other the. Other units until it's got a fairly happy state and. Once it's done that it increases, the strengths of all the connections, based, on if, two units are both active it increases the connection strength that's, called kind of hebbian learning, but. If you just do that the connections, just get bigger and bigger you, also. Have to have a phase where you cut it off from the input. You. Let it rattle around and settle into a state is happy with so, now it's having a fantasy, and. Once. It's had the fantasy you say take. All pairs of neurons that are active and decrease, the strength of the connection. So. I'm explaining the algorithm, to you just as a procedure yeah, but actually that algorithm is, the. Result of doing some math and saying how. Should you change these connection, strengths so, that this, neural network with all these hidden units finds. The data unsurprising. And. It. Has to have this other phase that said this what we called the negative phase when, it's running with no input and it's, canceling. It's. Unlearning, whatever. Stated settles into it now, what Crick pointed out about dreams is that we. Know that you dream for many hours every night and, if I wake you up at random you can tell me what you were just dreaming you're back because in your short-term memory, so. We know your dream for many hours but in the morning you wake up you can remember the last dream but you can't remember all the others which is lucky because you might mistake them for reality.
So. Why is it that we don't remember our dreams at all and Crick's, view was it's, the. Whole point of dream is to unlearn, those things so, you put the learning rule in Reverse and. Terry Sinofsky and i showed, that actually, that. Is a maximum likelihood learning procedure, for multiple machines so that's one theory of dreaming you showed that theoretically, yeah we showed directly that's the right thing to do if, you, want to change. The weights so that, your, big neural network, finds. The observed data less surprising, and. I want to go to your other theories but before we lose this thread you've. Proved, that, it's, efficient. Have, you actually set any of your deep learning algorithms, to, essentially, dream right study this image data set for a period of time resort. Study it again resources. A machine that's running continuously, so. Yes we had machine, learning outcomes some of the first algorithms, that could learn what to do with hidden units were. Boltzmann. Machines okay. They were very inefficient, but. Then later on I found a way of making them approximations. To them that was efficient, and those, were actually, the. Trigger for getting deep learning going again those, were the things that learned one layer feature detectors, at a time and. It. Was a phishing form of a restricted Boltzmann machine and, so, it was doing this kind of unlearning but. Rather than going to sleep that one would just fantasize. For a little bit okay. After. Each data point so, androids, do Dream of Electric Sheep so let's go to, theories. Two three and four okay. Theory, to was called the wake-sleep algorithm, and. You. Want to learn a generative model so. You have the idea that you're. Going to have a model that can generate data, it. Has layers of feature detectors and, it activates the high level once in the low level once and so on until it activates pixels, and, that's an image you. Also want to learn the other way you don't want to learn to recognize data and. So you're gonna have an algorithm. That. Has. Two phases in the, wake phase data. Comes in it. Tries, to recognize, it and instead. Of learning the connections, that is using for recognition it's, learning the generative connections, so, data comes in I. Activate. The hidden units and then I learned to make those hidden units be good at reconstructing that, data so, it's learning to reconstruct, it everywhere yeah, but, the question is how'd you learn the for connection so the idea is if you knew the forward connections, you, could learn the backward connections because you could learn to reconstruct, yeah. Now, it also turns out that if you knew the banquet connections you could load the four connections, because what you could do is start at the top and just generate some data and, because, you generated, the data you'd. Know the states of all the hidden layers and so. You could learn the four connections, to recover those states. So. That will be the, sleep. Phase when. You turn off the input right, you. Just. Generate, data and then, you try and reconstruct. The hidden units are generated the data okay, and, so. If you know the top-down, connections, you'd learn the bottom-up ones if you know the bottom-up ones you can learn the top-down once and, so what's gonna happen if you start with random connections, and try, doing both alternate both kinds run and it works now. To make it work whether you have to do all sorts of variations on it but it works, all. Right bad as you want to go through the other two theories we only have eight minutes left I think we should probably jump. Through some other questions well. Do give me another hour I could do the other tooth. All. Right well Google, i/o 2020, so. Let's. Talk about what comes next so where is your where, is your research headed what problem are you trying to solve now I'm. The. Main thing I'm trying to solve which I'd be doing for a number of now, I. Shown, reminded, of a soccer commentator you may notice sucking carbon teachers they. Always say things like they're, doing very well but they always go wrong on the last pass and, they never seem to sort of notice anything funny about that okay, a bit. Circular so. I'm working eventually. You're. Going to end up working on something that you don't finish and. I think I may well be working on the thing I never finished but it's. Called capsules, and, it's the theory of how. You, do visual perception, using. Reconstruction. And, also how you route information, to the right places and.
The. To motivating, factors to main motivating factors, were in, standard, neural Nets the, information, the activity, in a layer just automatically. Goes somewhere you don't make decisions about where to send it the. Idea of capsules, was to make decisions, about where to send information, now. Since I started working on capsules, some, other very smart people are googling, invented. Transformers, which, are doing the same thing they're deciding where to route information right, and that's a big win. The. Other thing that motivated, capsules, was. Coordinate. Frames, so. When, humans, do vision they're always using coordinate, frames and, if they if they, impose the wrong coordinate frame on an object they don't even recognize the object. So. I'll. Give you a little task imagine. A tetrahedron. It's, got a triangular, base and three triangular faces or like right four triangles easier to imagine right now. Imagine. Slicing, it with a plane, so. You get a square cross-section. That's. Not so easy right, every. Time you start you get a triangle, it's. Not obvious how you get a square it's not at all obvious, okay. But. I'll give you the same shape, described, differently, I need. Your pen imagine. The. Shape you get if you take a pen like that another pen at right angles like this and you. Can make all points on this pen to all points on this pen. That's. The solid tetrahedron. Okay. You're. Seeing it relative to a different coordinate frame. Where. The edges, of the tête region, these. Two line up with the coordinate frame and for. This if you think of the Tetra from that way it's. Pretty obvious that at the top you'll get a long rectangle, this way but the bottom you get a long rectangle, that way and there's. A helical Weierstrass I said you've got to get a square in the middle. So. It's pretty obvious how. You could slice it to give a square but that's only obviously if you think of it with that coordinate frame so. It's obvious that for humans coordinate. Frames are very important, for perception and they're, not at all important, for complex for. Complex if I show you are tilted. Square and an. Upright diamund we're actually the same thing they. Look the same to a conflict, it, doesn't have two alternative. Ways of describing the same thing but how, is adding coordinate, frames to your model not the same, as the error you were making in the 90s or you were trying to put. Rules into the system as opposed to letting the system be unsupervised, it is exactly that error and. Because I'm so, adamant that that's a terrible, error I'm allowed, to do a tiny bit of it uh-huh, it's, sort of like Nixon negotiating, with China. Actually. That puts me in a bad role anyway. So. If. You look at continents, they, just neural nets where you widen a tiny bit of knowledge you, aren't in the knowledge that of a feature directly as good here it's good over there and. People. Would love to wire in just a little bit more knowledge about scale, and orientation, but. If you do it in the obvious way of having a 4G grid instead of a 2d grid the whole thing blows up on you but. You can get in that knowledge about. What. Viewpoint, does to an image. By. Using coordinate, frames the same way they do them in graphics, so. Now you, have, a representation in one there when, you try and reconstruct the parts of an object in the layer below when.
You Do that reconstruction. You. Can take the coordinate frame of the whole object, and multiply. It by the part whole relationship to get the coordinate frame of the part and you, can Y that into the network you can wire into the network the ability to do those coordinate transformations, and that, should make it generalized, much much better it should be the networks just find viewpoint, very easy, to deal with current. Neural networks find viewpoint other than translation, very hard to deal with so. Your current, task is specific. To visual recognition or, it is a more general, way of improving by coming up with a rule set of coordinate frames okay could be used for other things but I'm, really interested in the use. For visual recognition okay. Last question I was listening to a podcast you, gave the, other day and in it you said that the people whose ideas you value most are they young graduate students who come into your lab because they. Aren't locked into the old perceptions, they have fresh ideas and yet, they also know a lot is. There anything that you sort. Of looking outside yourself you think you might be locked into that a new graduate student or somebody in this room who came to work with you would shake up yeah. Everything, I said. Take. Out those coordinate units work. On a feature 3 work on future for everyone we'll ask a separate question so. Deep learning used, to be a distinct, thing and, then, it became sort of synonymous, with the phrase AI and, then, AI is, now a marketing, term it basically means using, a machine in any way whatsoever how do you feel about the terminology as the man who helped create this well, I was much happier when, there was a army which meant your. Logic inspired, and you do manipulations. On symbol strings and there, was neural nets which, mean you you want to do learning, in a neural network and they were completely different enterprises. That. Really sort of didn't get along too well and. Fought, for money, that's. How, I grew up and. Now, I see sort, of people. Who spent years saying your electrics a nonsense saying I'm an AI professor so I need money. So. You your. Field succeeded, kind, of eight or subsume the other field which then gave them an advantage and asking for money which is frustrating, yeah now it's not entirely fair because a lot of them have actually converted, right, okay wonderful. Well then I'm got time for one more question, so in, that same interview you, were talking about AI and you said I will think of it like a backhoe backhoe, that can build. A hole or if not constructed, properly can, wipe. You out and the, key is when you work on your backhoe to design, it in such a way that it's best to build a hole and not to caulk you in the head as you think about your work what are the choices you make like that. Um. I guess. I. Would. Never deliberately, work, on, making. Weapons I. Mean. You could design a backhoe that was very good at knocking people's heads off and, I, think, that would be a bad use of a backhoe and I wouldn't work on it all right well, Jeffrey, gives an extraordinary, interview all kinds of information we'll be back next year to talk about dreams ferries, 3 and 4 that's. So much fun thank you. You.