Triangulating Intelligence, Session 1: Matthew Botvinick, Dan Yamins, and Chelsea Finn
Hello, everyone. Good morning, good afternoon. Wherever, you are in the world, i hope you are keeping, safe, and healthy. Thank you so much for joining, us today, for the annual, fall conference. Of the stanford. Institute, of human-centered. Ai. Or high. The title of our conference, today. Is triangulating. Intelligence. Melding, neuroscience. Psychology. And ai. Personally. Intelligence. That reflects, the diversity. Of human perception. Cognition. And experience. Is a topic. Very, near and dear to my heart. The intersection. Of neuroscience. And computer, science. Is something that i've worked on since graduate school, and i've devoted, much much of my professional, life to it. So it is very, central, to the work. We do here, also, at hai. The research, that aims. To develop, novel, technologies. Inspired. By the depth, and versatility. Of human intelligence. This includes. Ai, inspired. By neuroscience. Cognitive, science, and psychology. Novel, unsupervised. Semi-supervised. Self-supervised. Or supervised, methods. For diverse, data types. Knowledge, and semantics. The big bet. In developing, ai. Is seeking, a new synthesis. A meeting of the minds, between, cognitive, science. Neuroscience. And a.i. Which is what we're going to discuss, today. This conference. Exemplifies. Some of the most, cutting-edge. Research, happening, in this area of research, worldwide. We hope to inspire, new developments. In this area. And thank you so much for being with us, and please enjoy your day. With that, i'd like to turn it over, to the co-organizers. Of this conference. Professors. Chris manning. And syria. Ganguly. Good morning everyone. Thank you for joining us today. We had nearly 7 000 rsp. Rsvps. For this event, and we're really excited, to have you all here or at least all of you who have woken up so far, and to have so many of you interested in a deeper dive, into human and machine, intelligence. I'm christopher, manning an associate, director here at the stanford, institute. For human centered artificial, intelligence. I'm also a professor, of linguistics, and computer, science at stanford, university. And i'm the director of the stanford, artificial, intelligence, laboratory. This is an event that brings together multiple, groups at stanford. And we'd like to thank our co-sponsors. For this event. The stanford, wu sci, neuroscience. Institute. The stanford, department of psychology. And the stanford, symbolic, systems, program. Let me now introduce, my co-organizer. Surya ganguly. Associate, professor, of applied, physics, neurobiology. Electrical, engineering, and computer, science. Here at stanford. Hello syria. Hey thanks chris, um. Welcome everyone so of course we had originally hoped to hold this conference, uh in person last spring. But a few things happen along the way. Uh but this is the third uh large virtual conference, that we've hosted this year so we're, hopefully starting to get the hang of it. And um we're pleased to, that all our keynote speakers have joined us today and we have a great lineup for you. Before we begin. A couple of quick logistical. Items. Um please do bear with us if there are any technical, glitches, during our time together. Unless you hear, unless, speakers, finish their talks early, you will hear time checks for speakers.
And, If they go on too long we may have to cut them off so that we stay on time. If by any chance. Networking, goes down for us or you, please do go to. Hai.stanford.edu. And look for alternative, ways to view the conference. Today we have two sessions. The first. One with three speakers. And the second one with four speakers. And we'll have an extended. Discussion. Section, live. With q a, following, each session. And then we'll wrap up the day by hearing a bit more about the broader stanford, context, for ai, cognitive science and neuroscience. In a final discussion, at the end of today's, event. We want you to be involved in the discussion. Please go to the conference, page on our website. Just go to. Hai.stanford.edu. And click the link for the conference. And there you can type in your questions. We'll try to get to as many of your questions, as possible, during the discussion, periods. For this event we're also going to use the twitter hashtag. Hashneurohi. N-e-u-r-n-e-u-r-o-h-a-i. All, one sequence of letters. If you tweet about talks we encourage you to use this hashtag. So the stanford, hai, team notice. We'll aim to take brief breaks between. The three sessions, today. And try our best to stay on schedule. Human-centered. Artificial, intelligence. Is about the connection, between, people and intelligent, machines. There are many different aspects, of that, including, shaping the impacts of ai on society. In a positive, direction. But today we're going to focus in on the opportunities. For synergy. Between, understanding. Human and machine, learning and intelligence. Before we hear from our presenters, today. I'd like to let syria tell us a little bit more about our vision, here. Okay, let's see, um. Thanks, chris. Okay hopefully you can see my side so yeah i just wanted to set up the the intellectual, themes of this conference. Um, so it's useful to start, at the very beginning. Like how did we all get here today. Right, so our story kind of at least the story of our brain started kind of 500 million years ago when the first vertebrate. Brains appeared on this planet. And then about two, about two million years ago the first glimmers, of human-like, intelligence, appeared, on our planet. And modern humans, uh in their sort of final, evolved form appeared about a hundred thousand years ago, and this is a really, remarkable. Um, achievement. Right because. You know currently, we have, zero engineering, design principles, that can explain. How a complex. Communication. Control. Sensing. Memory, and power distribution, network, like our brains. Could continuously. Scale, in size and performance, over 500, million years. Without skipping a beat without ever losing, function, we are so far from understanding, the theoretical, principles, governing. This amazing, feat, so we really have a lot to learn from evolution, even despite, this the incredible success. That machine learning has had uh in industry, uh as of today. Right so just to set the stage let's zoom in an order of magnitude, discrepancies. Between what machines, and humans can do. Right. So, oops. Sorry. Um. Sorry, hold on. I need to share my desktop. I think. Sorry. This. Okay. Sorry this worked right when i was testing it so let me try something. Different. Okay there we go. So um. So one major discrepancy, is of course the robustness. Of humans versus machines we're all we all know about adversarial, examples, where artificial networks and vision speech language and reinforcement, learning. Can be fooled by illusions, that simply don't fool humans. So why do these exist, how does our brain avoid them how do we make ai more robust. Another, order of magnitude, discrepancy, is energy expenditure, which is becoming very important in industry right modern supercomputers. Spend about megawatts, 10 megawatts, of energy whereas the human brain spends 20 watts of energy. So our supercomputers. Consume, 10 million, times more power. Than our brain but cannot do what we can do. A likely, problem is that supercomputers. Are extremely, fast, and precisely. Control. The flips of digital bits, but the laws of thermodynamics. Exact a powerful energy cost for every, precise. And fast bit flip, in contrast, biology, is slow, and noisy, but just good enough, and so there's i think powerful, principles that can glean from biology to, create more energy efficient ai. Credit assignment, is a major, open problem, so just to state it dramatically, here's one of my favorite tennis players from my youth yvonne lendl. Let's say for example you hit a tennis ball incorrectly. You have a hundred, trillion synapses. Which one screwed up. How does the brain figure out how to fix the wrong synapses, how does it rewire, itself.
Neither Theoretical, neuroscience, nor ai can solve this problem adequately, of the moment. So you know a major, open question is can synaptic physiologists, theoretical neuroscientists, and ai engineers work together to solve this foundational. Problem. Data hungriness, is a major, issue if you look at recent successes, in industries, sort of early and you translate, the amount of data that these system. Systems used into human terms. Early speech recognition, systems, use the equivalent, of 16, years of, someone, reading you text. Two hours a day for every day. Alphago. Practiced, 450. Games a day every day for 30 years. And visual answering question sets are, are equivalent, to receiving, answers, to about 100, questions, about images, every day. For 274. Years. Right so it's clear that humans use much less data to achieve what they do and so we need better algorithms, for learning from unlabeled, data. Extrapolating. Information, from other tasks to solve a new task and of course, humans have set up a system of culture, and curriculum, design to teach, humans using carefully, sequenced, tasks. And we need better ways of doing that in artificial, systems as well. Babies also learn, very very differently from humans, there's a famous experiment, where if a baby is given say two objects. One that in a video magically. When you drop it it doesn't fall, and another object. On a video where magically, if you move it through a solid surface it can move through that, surface. Then you let the baby play with those two objects. The object that didn't fall it will specifically, throw it off the edge of the crib to see if it if it really doesn't fall, the object that went through. Solid surfaces it'll specifically, bang it against the crib. Right so this is remarkable, this is saying that even babies, they build complex, internal models of how our world evolves they pay attention to events that violate, their world model, they perform, active experiments, to test their model and, gather their own data they don't just passively, receive data, they use the world model to plan and imagine alternate futures and as they grow up, they can take actions, to bring about these alternate, futures, into being into reality.
Right We're so far from doing that in artificial systems though you'll see some examples. Uh. In our talks today. Okay. So. An oft-quoted, trope, in in arguing for why we should ignore biology, in creating artificial, artificial, machines is the analogy for example between birds and planes. Now that we have, uh, jet planes it seems almost like folly to. To build airplanes, designed, upon the operating principles of birds. But if you unpack, this this analogy, a little bit, you realize that there's multiple problems that need to be solved in making objects, fly. Two primary, problems are lift and thrust. Keeping something up and moving something forward. Of course, airplanes, and birds solve the problem of thrust in very very different ways. Jet in jet engines for, for airplanes, and wings for for birds. But the problem of lift is solved in exactly, the same way an aerodynamic. Wing that creates, low pressure on the top and high pressure on the bottom. In fact our our artificial, gliders, behave very, much like gliding birds. So just as there exists, laws of aerodynamics. That govern, all flying, objects, whether biological, and artificial, and woe unto the object that tries to violate the laws of aerodynamics. Because it will fall out of the sky. What we'd really like to pursue. Is an understanding, the fundamental, laws and principles, that govern. How high-level, cognition, and behavior. Can emerge from non-linear, distributed dynamical, circuits whether biological, and artificial. Of course ai systems may do some things differently from humans to achieve their end objectives. But there are likely, common laws that govern. Artificially, are intelligent, systems just that there are, just as there are common laws that guide uh. Flying objects. All right, so um if you're interested in more info and thoughts, i actually wrote a blog post uh titled the intertwined, quest for understanding, biological, intelligence, and creating artificial intelligence. Which you can find on our blog. And there's much more, much more detailed understanding, of some of the things that i, talked about here. Um just to give you an overview, of the topics, we'll we'll discuss today so in our morning session and we have an amazing, speaker, uh lineup. So in our morning session we'll start with matthew bottfenik. From deepmind, who will talk about, really melding ai, psychology, and neuroscience. We'll have dan yamans, talking about alternate, methods for learning especially, self-supervised. Learning. For for from unlabeled data which was one of the, ideas that i mentioned. Uh, for visual, for visual representations, and also other representations. And and dan works both in psychology. And an ai. Chelsea, finn has done beautiful work in robotics, and metal learning and she'll tell us how not to create a robot's, mind, um, knowing not what to do is a very powerful, uh. Way to zoom in and what it to actually do, and then after that we'll have a panel discussion, chris and i will have a panel discussion with the speakers. And by the way we're relying, on on you for your participation. This is a qr, code which if you um. Uh scan you'll be able to go directly, to our website, where which is right here. And you can, write in your own questions you can vote up or down questions throughout the entire morning and afternoon. And we will see all of your questions and we'll ask uh uh. We'll ask, as many of the interesting questions as we can. Okay in the in our afternoon, session we'll have a theorist sanjeev aurora. Talk, uh talk to us about how we can. Uh, use deep learning to implement, a very serious human value which is privacy. Yeti, choi who's done beautiful work on natural language processing and common sense reasoning. We'll talk about common sense intelligence, cracking the long-standing, challenge. A long-standing, challenge in ai. Audi aleva will talk about. Incorporating, lessons from cognitive, science into ai. And josh tenenbaum, will talk about some of the scaling, issues, in terms of how much data we need, by looking at alternate algorithms, like probabilistic, programs, and talk about models, or game engines in our brain that might govern the way we think about, uh.
Physics And reasoning. And then we'll have another panel discussion with these speakers where again we'll take your questions that will accumulate, and. And, and vote upon, uh, that you guys will vote upon, throughout the day. And then we'll end, with an overall, panel discussion. We'll have two very important visitors, bill news. From our co-sponsors. Bill newsom. Who's the director, of the woodside, neurosciences, institute at stanford and also a member of the department of neurobiology. And michael frank who's a director of the symbolic, systems, program. And also the department of psychology. Um and chris and i will join them in sort of an overview, of a future looking view of how neuroscience. And psychology, and ai can work together and just general thoughts about the day. This should be a lot of fun. Okay so that's my last slide so without, further ado, i will, introduce. Um. Matthew botvenig. Who. Is coming to us from deepmind. And he will. Talk give us our first talk. Thanks. Great, uh hopefully you can hear me. Uh, can can you hear me i can't hear anybody, confirming. Yeah, okay good and you can see my slides. Now. Yep. Fantastic. Okay well then i'll just dive right in, um, thanks so much, for the opportunity. To uh to participate, today and and. I'm honored to be kicking things off. Uh i wish i could have been there, in person with everybody, on the farm that's my alma mater. Um, but uh i'll take what i can get um and i'm gonna give. A presentation, today that i hope will set the, set the table for a lot of subsequent, talks because i'm just gonna speak very broadly, about my perspective, on ai psychology, and neuroscience. This triangle, that um, that the the meeting's about, um, like faitha, this triangle, has been sort of definitive, of my intellectual. Life um ever since. I was at stanford, um and i feel like i've spent um most of my life now kind of swimming around in this triangle, from corner to corner, uh and thinking about how they might all interrelate. Um, i started doing that in academia, at penn and then at princeton, but over the last four years i've been working. Uh i have one foot in academia. At university college london but i spend most of my time at, an ai company called deepmind, which is in london. I haven't actually physically been to london since uh since the covid thing began, but. I'm looking forward to going back there someday and and stopping the telecommute. That we're all doing. Um. So for those of you who don't know uh deepmind. Um, it's uh it's it, it's an ai company that's involved in a rather diverse set of, um research activities but it's perhaps best known for something called deep reinforcement, learning. Which is, a combination, of deep learning or artificial, neural networks. On the one hand, and reinforcement, learning or reward driven learning. On the other um the first, high profile, product of this line of research, was, um in a paper, uh, that um, deepmind published in nature in 2015. Where they showed how deep reinforcement, learning, could be used, um, to learn, uh. How to play classic, atari, games. At superhuman, levels and the basic idea is pretty simple you just take a, multi-layer, neural network. And. Give it a screenshot. Ask it to output an action that would, one could execute on a joystick, and then you train the whole thing using reward, signals, which are based on the points in the game, and this the, basic approach has been elaborated. Um, uh, you know quite dramatically, and in many ways since then both at d mine and beyond. Some of you may know. Uh um, deepmind's recent work applying deep, reinforcement, learning to the much more complicated, video game of starcraft, which was reported in nature last year. Uh and also work augmenting, the deep reinforcement, learning. Uh. Formula, that i've briefly described, already, with model based research. To attain, superhuman. Performance, on. Complex, board games like go and chess. Further work augmenting, deep reinforcement, learning with. Particular, kinds of memory systems. So that they can, engage in structured. Inference. As in this uh differential. Differentiable, neural computer work that was reported in nature in 2016.
And Then quite recently. Extending deep reinforcement, learning to the multi-agent. Setting. Um, in work that we reported. Uh, um last year ooh sorry last year, um, this is uh, deep reinforcement, learning agents learning to collaborate, and compete. In um, visually and rich environments, in the game of capture the flag. So um that's just the but you know a taste of the kind of. Ai research, that goes on at deepmind, and d-mind, is an ai company. Um, the group that i lead, uh the members of of which are shown here is nicknamed, the neuroscience. Team and we're the one place, at um, deep mind where, um neuroscience. And cognitive, science psychology, research is is going on as part of the mix. Um and we have, most of the people involved in the team are doing. Uh. Kind of mainstream. Ai research most of the time but there's a little bit of a difference of culture in the sense that most of the people on our team. Come originally from a background, in neuroscience. Or, or cognitive, science. And a lot of what we do is sort of informed by that background. Um. We've we've written a few things just generally, laying out, a perspective. On how. One might bring these fields together, especially, in order to support ai. When i first arrived at deepmind, um i teamed up with the ceo, demasa sabas and a couple of other amazing. Colleagues. To write a position paper for neuron, about neuroscience, inspired, ai. And then much more recently. We wrote a review, looking at how this paradigm, of deep reinforcement, learning can be used, um, as a kind of basis, for translating, phenomena, between neuroscience, and ai in general. Um, uh. What i want to do is. Say a few words about the kind of, working philosophy. That we've arrived at in my group. Um, and give you a quick whirlwind, tour of, um some of the work that that approach, has led to, so it's been about four years of experimenting. And. Thinking about the right way to pursue, this. Uh virtuous, circle among neuroscience, psychology, and ai, and i guess what i the way that i would summarize, the approach we've landed on is that it's it's fairly, opportunistic. It's based on an ongoing, dialogue, between, neuroscience, and psychology, on the one hand, and ai on the other, without really worrying too much from moment to moment which, field, you're, you're standing in, um. Importantly, we we try to set things up so that there are a lot of bilingual, people, working. Uh, on, uh, on on our neuroscience, projects, and on our ai projects. So that it's not, the transfer, of insights, is not, between, people, uh in every case but sometimes it's happening within one brain i think that's important. One thing that i want to emphasize. Um, is that, i believe that the transfer, of insights between fields is often subconscious. We tend to focus on the, schema, of, what an insight a concrete, insight from one field being transposed, to another field, but i think what often happens is that instead, a familiarity. With neuroscience. Will. Kind of give analogical. Support, for a new idea, in ai or vice versa.
And Sometimes it's even hard to track the origins, of these connections. Um. And finally. And this is really the point i want to start with in in delving into some specifics. Uh. The the kind of um, synergy, that we're after between these fields is really just a continuation. Of something that's been going on ever since. Uh ai. Um, uh originated, as a field so let me say something about that. So if you if you um, look carefully at the montage, of people in my group you'll you'll have noticed that they're all, quite young. Uh and that as far as i can tell is true of uh, uh, the the general population, of people doing ai research these days, and what i've noticed is it means that people aren't always vividly aware of the history, of the field in which they're working, so um, you know it turns out that a lot of people that i interact, with that deep mind i get the impression that they think that ai research. Began in 2012. Um which of course is when neural networks, came roaring back with um, a victory, on image classification. Tasks, and one thing i'd like everybody to have a more awareness, of is that in fact, um this, this kind of explosion, of work with deep learning since 2012. Really, is just the tip of a historical, iceberg. Um and looking at that like what's underneath the water there the deeper history, of, deep learning research. Makes, extremely, vivid how the synergy among these fields can can play out, so if you go back to the very earliest papers, that describe, implement, computer implementations. Of neural networks. They're very explicitly. Motivated, by. Ideas from neuroscience, and observations, from neuroscience. If you look at the earliest paper that describes, a learning algorithm, for artificial, neural networks. Same thing it's basically, a psychology, or neuroscience, paper. If you look at the first high profile report of the epic making. Back propagation, algorithm which we all use, in deep learning research. Ubiquitously, to this day. The first author, is a psychologist. And, the first explorations. Of the implications. Of deep, of um, uh back propagation. Uh, in in neural networks. Were reported, in these um, these historically, important books, uh called their parallel distributed, processing, volumes. Uh, which were written largely, from a cognitive science and neuroscience, point of view, and i just want to give a shout out to jay mcclellan, who was one of the key authors in these volumes and of course as a stanford professor. One reason that i think a lot of young people believe that, ai research and deep learning research began in 2012, is that. There was in fact a quiet period, in neural network, research, leading up to that. Neural network research used to be referred to as connectionist, research. And, there was something that some of us refer to as the connectionist, winter. Where. It became very difficult to publish, uh neural network research, because other computational, paradigms, became, a much more popular in fact i have to laugh because this is a. From a, meeting in 2013. Where a famous connectionist, research, researcher named mark zeinberg, presented, and he entitled his talk i remember, connectionism, sort of sardonically. Um. But if you look at the issues that came to the fore even during that connectionist, winter. You see again that they were inspired, not only by psychology, but in fact by the same. Neural network, ideas, that had been, popular, just years before. So if you go to, judea pearl's, pivot, pivotal book on, structured bayesian, inference. Which set the scene for a lot of bayesian cognitive, science that took center stage during that, this winter. Not many people are aware that in fact uh, pearl's work was inspired. By connectionist, research by romohart, mcclutt, removal heart and mcclellan. And of course in 2012, when neural networks did come roaring back especially in engineering. Neuroscience. Again played a pivotal background, role so if you look at convolutional. Neural networks which were responsible, for triggering this renaissance, and deep learning.
Um. Those of those of us who know the history, um are aware that in fact they were very directly, based on earlier work by fukushima, and others. Which proposed, neural network architectures, that were directly inspired, by. Knowledge about, visual cortex, in the brain. Um. The same kind of connections, exist, in. Deep reinforcement, learning so if you look at the reinforcement, learning algorithms, that are brought to bear in this work and in particular temporal difference learning which is one of the most important. The, the originators, of that algorithm, sutton and barto. Started out thinking very specifically. About animal learning. This was not a computer science project per se they were trying to figure out how animals learn in the ways that they empirically, do. And of course temporal difference learning as many of, the people in the audience will know, has since been tracked in back into neuroscience, as an account, of how the dopaminergic. System works. One more point along those lines the one of the, main things that allow deep reinforcement, learning to work, practically, speaking. In the satari. Project, is something called experience, replay. Which is, a, replay, of. Memories, and further learning based on that replay, that was directly inspired again by neuroscience, based on observations, about what happens in the hippocampal, formation. Very recently, those who keep up on the ai literature will be very aware of something called a transformer, model that's been applied, in um language modeling. Uh to great effect recently. By multiple groups. And here again even though, these breakthroughs. In language modeling, are happening very much in, a computer science or ai context. If you just. Go back a little bit you'll see that the original, approach to exactly, the same problem. Was pioneered, by a cognitive scientist, jeff ellman. Who really started, uh, the work that's now. Bearing such uh amazing fruit. Uh in in, in this recent research, so anyway all of that is just um. My kind of cranky, old man version, of a historical. Um perspective. And i i like the people that i work with to know that we're just pat we're just carrying on a tradition. But um having, having given you that historical, background. Um let's talk about what's going on now, and the way and and in, in the background, of uh what i'm about to describe, i think you'll see, um this three-part, strategy. Um. Let me tell you about some of the work that we've been doing in my group at deepmind. Um again kind of trawling these waters, these triangular. Waters among neuroscience, psychology. And ai. So this is going to be a bit of a whirlwind, tour, and i apologize, if it's superficial, but i'd rather everybody get a kind of bird's-eye, view of the range of things that we've been doing and the flavor of our approach rather than, the details of any one project. So let's start with a paper that we published in nature neuroscience, in 2018. Uh and uh which is closely connected with a review that we published. Last year in trends in cognitive science. Which is about something called meta reinforcement, learning. Here we started with. An observation, about recurrent neural networks, so our current neural network is just a deep neural network that has connections, among its internal, units, which allows the system to have memory, in its activations. Very much like working memory in the brain. Um what we observed, is that if you train, a recurrent, neural network, uh of this kind using reinforcement, learning that's what the red arrow stands for you use reward prediction errors, to train the network's, connection weights. And you train it to go from visual observations. To actions, in the standard reinforcement, learning way. Along with some auxiliary, inputs and outputs that we can ignore for the moment. Um, something very interesting happens. If in addition, you train the system, not just on one task for example, a two-armed bandit problem but instead a family, of interrelated. Tasks. What happens in that context, is that after the system has enough experience. With that family of tasks. You can turn, off the reinforcement, learning algorithm. You can freeze the connection, weights in this network. And then when you present the system with a new problem from the same family. It can solve it even without synaptic, changes so this is just. An illustration, of a network with frozen, weights.
Solving, A two-armed bandit, task you can see it exploring left and right actions in an easy problem and a hard problem, and this is just a regret curve showing that the system is doing pretty well on this bandit problem, compared to off-the-shelf. Off-the-shelf, band. Algorithms. That form a, gold standard for comparison. So that meta reinforcement, learning principle was, purely an an a.i, um. It was purely an ai observation. Um but because the people that were involved in this research were bilingual, in the way that i mentioned earlier, some of us got to thinking about, potential, analogies, to brain function, and we thought well, if if we know that the dopamine, system. Appears to act, in a way that's analogous, to this reinforcement, learning algorithm we're applying. And we know that the prefrontal, cortex, has highly recurrent. Connectivity. Lots of loops. Then maybe we can think of this recurrent network that we're training, as a model of the prefrontal, system. As it's trained up by reinforcement, learning, maybe the working memory, functions of the prefrontal, cortex, are trained by the synaptic, changes, that are driven by dopamine. And that's what the paper is about we show things for example, like this, if you train a recurrent neural network, on a task, that. Has been used in in studies with monkeys. You find behavior, in the neural network, that closely, resembles. The behavior of the monkeys. And then if you look inside the neural network, at the, at what the internal units are coding, for, you find a profile. Of uh codes, that closely resembles, what you see in the neuroscience, study, um when uh when electrodes, have been placed in prefrontal, neurons and their receptive fields have been measured. Um that's just a quick taste of this work i encourage you to look at the nature neuroscience, paper if you're interested in more details, i'll just mention that we've carried forward this work we have a paper on archive that, characterizes. This metal learning process. Uh in bayesian, in terms of amortized, bayesian inference, uh and we're continuing, to partner with um, neuroscientist. Collaborators. In order to show that meta rl, actually provides a better explanation. For prefrontal, function, than some standard reinforcement, learning models that's that's work that we're preparing, to submit soon. But for the moment let me move on. And tell you quickly about another project, uh that, in which we've kind of trolled these waters between neuroscience, and, and ai. Uh and this is a paper that we published, uh earlier this year in nature, which is about distributional. Reinforcement, learning and dopamine. So, in the ai world, um there's been a lot of interest in something called a distributional. Reinforcement, learning algorithm. The idea here is in reinforcement, learning an important component, is making predictions, about, how, much reward, you can expect in the future, given where you are now. And the standard approach, is to represent, that, future prediction. As a single number. But the distributional. Approach says well what if instead. We represent, the future, expectation, in terms of a full, distribute, probability, distribution. So an agent playing an atari game might say well i might make this many points and i might make that many points, or a humanoid, jumping over a chasm, might think, oh well i might i might, make it over and get a lot of reward or i might fall in and get less reward, and that distribution. Evolves over time. What's been found in ai work is that this distributional. Approach has, very dramatic payoffs, for performance. In reinforcement, learning systems. And in the work that we did, we decided to ask again. Drawing our kind of bilingualism. Here, gee i wonder whether the dopamine, system might use a similar. Similarly, distributed, code. Um long story short we found strong evidence, in a in a study that we did in collaboration, with now now a cheetahs lab at harvard. That um in fact you can find parallels. So we studied, dopamine, activity, in a task where. Uh, mice, were given probabilistic. Juice rewards. And the rewards, were drawn from a distribution, that was multimodal. Shown here in gray. And we were able, using just the activities, of the dopamine neurons.
That Were recorded, from these animals, to decode, or infer, the full, distribution, of rewards. That the animals. Were experiencing. This is something that you shouldn't be able to do in modeling work we should you shouldn't be able to do this, if uh if the dopamine, system were operating, in a classical. Fashion, but of course if it's operating in a distributional, fashion, then you should be able to decode, this. This distribution, of rewards, and that's one of the things that we found, again many more details, in that in that nature paper, but in the spirit of this whirlwind tour let me move on to another example. This is work that actually we just put on archive it hasn't been published yet i think it'll connect closely with uh with what i anticipate, um, dan yamans will be talking about, in his presentation, so i'm excited to kind of put it on the table here alongside, what what uh the the cool work that he'll describe, and this is work that was led by irina higgins at deepmind, uh in collaboration, with doris saw at caltech. The basic idea here is to draw on, an unsupervised. Learning architecture, that's been of great interest in ai recently, called a beta variational, auto encoder beta vae. And the the short story here is that, this, this learning architecture, uses an objective function, that encourages, an internal representation. That discovers. Disentangled. Features. So if you train it on pictures of chairs, and a single internal unit in the system, will end up coding, for the type of the legs, on the chair, and a different internal unit will con will, will code for the overall, type of chair and a different internal unit will code for the rotation, of the chair and these disentangled, features. Depend very much on the nature of the data on which the system is trained, so if you train it on faces, it'll discover. Disentangled, features. For, uh for faces for example hair color and so forth. So again, in this kind of using our bilingualism. We decided to ask gee i wonder whether the visual system in actual brains, might discover. Disentangled, features of this kind, and since we have this ai model, we can do. A direct comparison. To the codes that are. Observed. In, visual, neurons, in a primate brain, on the one hand and individual, units in a beta vae, on the other. Long story short we find a close correspondence. So looking at, neural activity. In uh face patch in macaques. We find that the activity of individual, neurons. Tracks very closely, with individual. Units in a beta vae. Trained on exactly, the same visual data. And in fact the one-to-one, matching. Using a variety, of ways of measuring this, is much closer, for the beta vae. For many other, deep learning models including both, unsupervised. Or self-supervised. And supervised, varieties. Again no time to go into the details. But hopefully. Some of you will check out the archive paper. So the work that i've described so far is very much about. Taking ideas from ai, and dragging, them into neuroscience. Let me talk very briefly about a couple of projects. Where we've gone the other direction, and taken, insights, from psychology, and neuroscience. And tried to bring them into ai, and i'll be very brief and superficial, here because i only have limited time, but again, the the purpose is to give you an overall flavor. So one thing, we. Have a pretty good idea. Of from psychology, and neuroscience, is that the visual, system. Carves the world the visual world, up into discrete. Objects. So very strong evidence for this was shown in classic work by ugly driver and rafale.
Using An attentional, cueing paradigm. Many of the psychologists, in the audience will be familiar with this classic work, in neuroscience. It's been, uh it's received convergent. Support, from, uh studies by ralphsma, and others in the in the primary visual cortex, showing that entire, objects. Are selected, by attention, in, uh as a as holes. And in parallel. Um we have the opportunity, to, uh to experiment, with whether this might help in our ai systems, now, because we have, new unsupervised, learning algorithms. That can pick out discrete objects in visual data, for example a system called monet that's come out of our group at deepmind, but now there are a variety, of algorithms, that can do this, and in the work i'll briefly describe. We take this ability, to carve out discrete, objects, and we combine, it with another insight from cognitive psychology. Which is that visual, system, the visual scenes are represented, not only in terms of objects but in terms of relations. Among those objects. And here we have the benefit of recent deep learning architectures. That are relational, in nature. Like these transformer, architectures, that represent, ordinarily. Words, given their larger context, but what we can do is we can combine, that relational, representation. And apply it we can combine it with object identification. And apply it not only to to language, data but now to visual data as well putting in objects, instead of just words into those slots. So i'll briefly describe, in three minutes two projects, where we've used this strategy. One is a project led by louis pilato. Which is about intuitive physics. Here we. We were inspired by developmental, psychology. Where. An understanding. Of intuitive physics, in young children is probed by looking at, what surprises. Them. Basically showing the magic tricks, that violate the laws of physics, and seeing whether they're surprised, as a way of telling whether they understand. What should have happened based on the ordinary, laws of physics. And what we did was we created a data set that's very much like, the violation, of expectations. Uh videos, that are or events that are shown in these developmental, studies. Um for example showing ordinary physical events, um, illustrating. Something like directional, inertia, and then also magic tricks for comparison. We wanted a system that would be more surprised, by the magic tricks, another phenomenon, that we studied in this work is something called continuity. Again drawing develop on developmental, psychology, work, here's an ordinary, physical event, in our data set, here's the magic trick, which we want to be surprising. Object persistence. Dropping a board, on an object, and in the magic trick the board goes all the way to the ground. Again drawing on classic. Findings from developmental, psychology. We trained our system, on freeform, physics data so we didn't train it on these phenomena, we trained it on a much more, general, uh, set of videos.
And We used this strategy, of, carving, the visual data up into objects using unsupervised, learning, and then using a relational, architecture. In order to, represent those objects. In a context-dependent. Way, i won't go into the details, of the architecture. But suffice it to say that that was the overall strategy. What we find is that, using this approach, we get very robust, surprise, effects across five. Phenomena, or concepts that have been studied in the developmental, literature. And that's what the bars, in green here show the results from, lewis's, architecture, which he nicknamed. Plato. But importantly. A set of very closely, matched baseline, models, that did not use an object-based. Representation. Showed. Much less clear surprise, effects, suggesting, again, that an object-based, representation. Like the one we see in the brain, is useful, for learning intuitive physics. Okay one more minute. And i'll just briefly, give you a sense of another study that we've, just recently, completed. Uh which uses a similar approach to do visual question answering, this is based on a very cool data set just uh just published by josh tenenbaum. Push me cauley, one of my colleagues at deepmind and others. And the work on our side was done by david ding. And some other colleagues who are. Truly amazing. The the data set involves videos. And a variety, of questions, that the system has to learn to answer, some of these are descriptive, some are explanatory. Some are predictive, and some are counter factual, what would have happened if. The work that originally reported this data set used a very elaborate, architecture. That has a, a. Python, module that pro that executes, programs, that generate, the the data in order to do inference. Um, and what they showed was that baseline, models had a lot of difficulty, with counter factual, questions and predictive questions, but their neurosymbolic. Um system did a bit better. What we showed briefly, in the work that i'm going to just wrap up describing, right now, is that when we use an object-based. And relational, deep learning model, without a symbolic, component, beyond that. We actually do much better. So again. Deep learning. Based on objects, and relational, representations, as inspired by neuroscience, and psychology. Um does really well in these uh structured domains, okay one final word, uh with, going over 30 seconds with your indulgence. Hopefully i've given you a sense of the approach that we're taking trolling these waters, i just want to add one more point which is that. We're getting excited about not only exploring neuroscience. And psychology. But also. Social science. So we've published a bunch of papers lately looking at multi-agent, reinforcement, learning, comparing, it to what happens in human groups, in things like sequential, social dilemmas. And we're beginning to think this is a very important, area of study, if we want to understand how ai systems are going to interact with humans, when they're deployed, in the real world obviously something that's very central, to the hai. Paradigm, at stanford. So looking forward to talking about that maybe a bit more in the discussion. And that's it thank. You. Uh hey all um, thanks so much uh uh matt, so we're gonna take uh a couple of minutes break, uh and we're gonna switch, the video, to youtube, because i think it will improve a lot of people's uh video quality. So, um. Just hold on uh, a. Bit and, the live stream should be up again in a couple of minutes, all right thank you very. Much. Um. Okay, uh welcome back everyone, so um. Sorry for the glitch uh glitches that we had so we've switched to youtube, i just encourage you all just to be safe to refresh your browser. Um, and. Uh without further ado, i will like i would like to introduce our next speaker, daniel yamins, he's an assistant professor. Here at stanford, of computer, science. And also he's a faculty scholar, of the woodside neurosciences. Institute. So dan please unmute yourself and share your presentation, and take it away. Hi everybody thanks so much for having me. I'll also say that i'm a member of the psychology, department, at stanford. Um as well so that's an important part of our triangle. Uh here neuroscience, psychology, nai, and in fact in the stanford, neuroai, lab which i direct. We work on sort of these two mutually reinforcing. Goals, one is the idea that understanding, brains and human cognition, will allow us to build, better ai and machine learning techniques. And vice versa that those techniques will in, in turn, allow us to make better models of cognition, in the brain, so in practice what that means is we build neural networks, to solve, challenging, cognitive, problems and then use those models.
To Make quantitative, predictions, about, brain data. Um i'll my talk will be in two parts part one, we'll talk about ai helping neuroscience, and part two we'll we'll talk about the other direction, so let's work on uh first on ai helping neuroscience. A core problem. In, uh cognition, is the very simple problem it seems. Of visual, understanding, basically, being able to take input images in and understand, what's in them. The brain, areas that are thought to support this are part of what's called the ventral, cortical, pathway, so, images, come in on the retina, they get processed through a series of hierarchically. Arranged, brain areas like v1. V2. V4, it. And by virtue of doing that. Support, high-powered. Visual, cognition, by the time the data hits, in inferior, temporal cortex or i.t at the top of this pathway. As a result from a computational, neuroscience, point of view a very natural idea is to build a quantitative, model, of neurons, in that system. It's no uh news to anyone at this point, that a, natural way to do that is to use convolutional. Neural networks, which were actually. Originally, built, so as to condense, the rough neural anatomy. Of the ventral stream, um both in terms of its being hierarchical. And it's being written a topic. So of course how do you actually produce, a conjunet, that looks like the brain, well the natural strategy, that we have found so particularly. Useful, is to optimize, the network for performance, on a task, like say, 1000, way image net categorization. And then compare it to the brain, on a unit by unit basis. And it turns out that if you do this neural responses, in the brain in this case this is to a face neuron, a neuron that responds, high to face images that's the black line, um, is well predicted. By, the neurons, in the top hidden layer of them of a neural network, that is. Solving a categorization. Task, so, this neuron, is one among many but if you actually. Look at, among a lot of neurons you see that these kind of task, optimized. Convolutional. Neural networks, are today's, by far best models of neural responses. In the visual system, and you know when you take a model that's appropriately, deep with about the right number of areas to actually, match the progression. Of visual areas in the brain, um, that not only do you see that higher cortical, areas are matched by higher layers of the model, um where. High level of abstractions. Are done. Intermediate. Cortical, areas, are matched by intermediate, layers of the model in this kind of amazing, way that these layers in the middle that are hard to understand what they actually do, nobody really knows what the features, are in english, but nonetheless, these these intermediate, layers are matched by the functionally, optimized, models, and early cortical areas which are thought to to compute such things as edges, are matched by the earliest, layers, so there's really this, very strong correspondence. Between, um both sort of the neuroanatomical. Or virtual neural anatomical, structure, and the features, that come out of these deep neural networks, now of course you might say the deep neural networks are kind of like black boxes, but i i push back on that by saying that there's really. Some key principles. In this kind of go drill driven modeling, idea, one of them is that you start with an architecture, class. You have a task or objective. There's a data set, and a learning rule. Our, theory of the visual system at some sense at this point is to say that the correct, architecture. Class. Is the set of convnets. Convolutional, neural networks of reasonable, depth. The task or objective, is something, like multi-way, object categorization. That in the data set is something like imagenet, images. And the learning rule is some combination, of evolutionary, architecture, search and filter learning through something like gradient descent. And, those. Are as a way of a machine learning way of saying it but each of these ideas can also be given, a, neuroscience. Idea, with the idea of thinking of the architecture, class as a kind of circuit neuroanatomy. The tasker objective, as a kind of ecological, niche. The data set as something like the environment, and the learning rule is something like a combination, of natural selection, and synaptic, plasticity. With this flow being that essentially, you pick the right architecture, class, so that you can solve the ecological. Niche situated. In the proper environment. And update. Parameters. That change the system according to the learning rule, so you know this principle, has been really useful in thinking about lots of things in terms of not just the visual system but audio assist auditory, system and parts of the motor system it's been very successful, actually. Um in a certain sense, but it actually has some very substantial. Problems, which drive a lot of the, ways in which i think that um, future progress, is happening, and many of these relate to what surya, and, matt have talked about so on the one hand, um there's something wrong in the architecture.
Class Which is that, um. Everybody knows and the visual system has lots of recurrence, and feedback, that's definitely not present in the feed forward models that i've showed you but why where what are those connections, for. Maybe an even bigger problem, is that too much labeled data is required, in training object categorization, like tasks, that's uncertain. Syria certainly mentioned, so the task or objective is somehow broken, um, of course. These things are trained, in data sets like imagenet, not on the real noisy video data streams that real organisms, have. And finally. Back propagation, is well known to be non-biological. In a variety of ways, so. What i'm going to do, is talk about, solutions, to some of these problems. As part of the way that, ai has been able to really i think help us get a better handle on neuroscience, going beyond, just the basic equation, of uh cnns. And the visual system, so as i as i mentioned a moment ago real neural networks that is to say the ones in the brain are full of feedback both long range recurrence, and local recurrence. On the other hand feed forward structures, can't produce, non-trivial. Dynamics, basically, they can just make square waves. But real i.t, population, dynamics, in the brain, in the part of the brain called it. Actually, are useful in other words they are able to show that different images are, solved at slightly, different times. And, the reason for this, is that essentially, hard images get solved late whereas easy images or control images get solved, earlier, right, so what this really suggests all these things together suggested that we should convert from continents, to convolutional. Recurring, networks not just feed forward, but have some kind of recurrence, and that recurrence should be both, short. Short, local. As in the green arrows and long range feedbacks, from area to area as in the red arrows. And if you do this it turns out that, um if you build a sort of shallow, feed forward network with about the right number of areas as the brain, you do up so well, on image net performance, that's fine, if you make the model a lot deeper, which is definitely not how we think the brain does it you can improve performance, a lot. With standard, forms of recurrence, such as lstms. You can do okay. At augmenting, performance, although in this case most of the performance, is due to having additional. Parameters. But if you have the right types of local recurrent, structures, and the right types of long range feedback, so, reciprocal, gated units and long range feedbacks. You can actually very effectively, convert, space into time, and make. Shallow. Recurrent, networks, that have about the same performance, as substantially. Deeper. Uh, feed-forward, networks. So of course you can then ask okay do those local recurrences, and long-range, feedback so they not only give you improved image performance, but they also give you better predictions, of neural dynamics, in the visual system, so i mentioned these sort of challenge images, and easy images, before, the challenge images are the ones that are hard to solve.
For Computer vision systems and they get solved later by the brain about 20 or 30 milliseconds, later, by a macaque. Um. The question is how well do connect. Conv rnns, do, and making predictions, about when the images are solved. It turns out on this plot a fraction, of the hard images solved, versus, ability to predict it neural dynamics. Um that convolutional. Current networks are by far, better there, even than the deepest, feed forward networks deep feed foreigners, can solve challenge images, but they have a hard time correctly, predicting. When those challenges. Are going to get solved the way that the real brain does and the convolutional, recurrent networks do well at that. And in particular, the way they do well at that, is that unlike shallow feed forward networks which don't do well in performance, or deep feed forward networks which do well on performance, but have a large number of units. Convolutional, recurrent, networks are this really good sweet spot between having very high performance. But, doing so with a very small number of units so that is to say they can actually fit in the head, the idea being that recurrence, is being used to achieve this trade-off between, performance. Um, and practically, being able to be, physically, managed. So you know results like this among among many others, um say that we're perhaps not okay we haven't quite nailed down this very deep problem. But i think we're getting towards okayish, which means that models are being, harder to reject, out of hand, of course, um that doesn't address these other problems so i'll talk a little bit about that. Now. The second of these problems which is really about the supervision, problem syria mentioned others mentioned, this is a very deep problem because there's really just no way, that creatures, like this, receive, millions of high-level, semantic, labels during training, right so some form of semi or sulfur unsupervised, loss functions have to be produced that are realistically. Costly, but that actually. You know solve the task. Right so like to put this more succinctly, imagine you had like headcam data from a child. Right like infants, age say six to 32, months such as data from the sacam data set from mike frank's group here at stanford and others, um. Data that's just like recording what they're hearing and seeing, um how would you learn to use a data set like this to learn a representation. Right that's a really challenging, problem because there's no labels associated, with that, um. This long history of unsupervised, learning, auto encoders, among them which basically, have the ability, to take inputs, in produce those inputs out and it turns out that if you build a shallow version of this with a sparseness, penalty on the intermediate, layer. You get things that look a bit like v1, an area in the brain, but unfortunately, it's really hard to use auto encoding in general, because.
You're Constantly, fighting sort of triviality, and in general it's very difficult to produce deep networks, that have. Wide range of good features for this for large numbers of categories, of objects. There's other types of approaches, to unsupervised, learning like, um. Things basically, using the ability to project, out information, and then fill it back in like for example, predicting, context, and images or learning features, by impainting, or one of my favorites is image colorization, you essentially take a grayscale image and color it you knew what the colors were so you could do that. And this forces representations. To actually be pretty good because to be able to say get the right color you have to know where the boundaries are and you kind of have to know what the objects are to get the colors in the right place, but these methods have also. Been limited in terms of their ability to really predict, uh produce, deeper networks and be able to take advantage, of those structures, to, improve performance, just anything, like, um. Supervised, levels. Um. More recently, there's been some really serious progress in this in the last two years or so in the domain. Known as contrastive. Embeddings, essentially what these contrastive, embeddings do is they have something like the convnet. But instead, of training them to solve categorization. Problem what they do is in various different ways try to train, the networks, to produce embeddings, that push similar things closer together and not quite so similar things further apart. Various methods here have done quite well. Producing. Unsupervised. Structures, and batteries, that learn things that look like. Sort of proto categories, and in fact. Today's state of the art unsupervised, results. On imagenet. Um. Produce, uh transfer, to imagenet that are substantially. Better than say supervised. Alexnet. Which um, you know was one of the things that kicked off when it supervised, the deep learning revolution but in fact now we have unsupervised, methods. Um, we can do substantially, better than even the supervised, alex, net, um, so that's very uh powerful, and and it and it works by not only uh increasing, from these deep contrast of embeddings ability to do things like categorization. But also other tasks like object position, or object size or things like that where in fact, you can actually see that because of their generality. And non-test. Specificness. Then when asked and transfer tasks, to non-categorization. Tasks, unsupervised, networks can do as well or better, than the category, supervised, networks so that's that's really powerful. Um of course, it then, raises the question of how well these match neural data and you can use the same types of techniques, i briefly mentioned earlier. To compare to categorization-based. Networks. And there it turns out that if you do this, the um older unsupervised. Techniques which perform less well do not so well at predicting. Neural response patterns, but these more recent deep contrastive, embeddings do quite well here and in fact the best of these. Is better at predicting, neural response, patterns even than supervised. Categorization. Based networks, so finally really being able to build, quantitatively. Accurate, unsupervised, models, of a higher brain area. And this is true not just in say it but across. The visual system so predicting, models of v1, v4 it, throughout the entire ventral hierarchy. Of course, you would like to operate these things not just on imagenet, the things i just showed you were in imagenet not with the labels, but with the images of imagenet. Which is a strong data set but really what you get is data that looks more like this where, objects are not framed. And really what you have is the real visual experience, of children. And it turns out that if you build a video version of contrast, of learning, on the types of data that come from something like the sacam, data set this head-mounted, infant data set, that actually now we can start to really build uh networks, that achieve, almost the same ability, as as on framed, na, like super. Data like you know imagenet, but with these completely, naturalistic. Noisy, data streams, so that's really beginning to say, that at this point, um, you know, not only are we able to make unsupervised. Models that predict neural responses we can actually do so, um, from the real data streams the organisms, get and by virtue of that maybe begin to have a possible, reasonable model, for actual developmental, trajectories. Of course that's that's a big thing to ask off in the future. But what i would say here is that sort of taking together, these second and third problems the issue of um. Not needing mega labels, um, and being able to do so on real videos from real development or data streams that these these big problems.
In Thinking about the, visual system have started to become basically. Okay-ish. If not exactly. Uh okay. Um so you know i think that that means that ai has really been able to help us nail down some of the biggest problems, and trying to build. Links, between. Um, stronger, links between. Neuroscience. Um, and uh you know and and and computational. Techniques. Um i'm not going to talk about the learning rule that's something, uh that's a discussion. For, um another day entirely. Uh, but um what i would say is to take away here is is that there's been a lot of progress, in terms of using, um better ai and ml techniques to understand. Uh brains in this much more, realistic, way, um, so i'm going to now shoot gears and spend the rest of the time talking, about the opposite direction, so thinking about how cognitive, science can help us build better ai. And then maybe eventually. Better models of neuroscience. So the underlying, idea here is very similar to what matt, was talking about which is essentially you've got an agent and an environment, that agent can be a baby or it can be a machine. Whatever. Um of course there's it takes in and does perception, of information, from the environment and it acts back, out. On it, um. In that context, one of the key things that you want to be able to do is understand, how the agent can build a world model that is to say, learn how given what if i take a given action or given what's like what i see in front of me even if i don't take an action, what's going to happen next. Okay. And of course you might wonder why you should learn models but i think that we're learning role models is something that obviously, is useful from a whole variety of points of view like having compact abstractions, of high, bandwidth sensory, inputs or the ability to plan across, long temporal horizons. This is just a very clearly useful thing to be able to do, in other words, in some way what you'd like to be able to do is minimize, a loss function, where you're, predicting, the future. Um, given, the current state. But of course pixel, prediction, that's a very natural thing to do is very hard i mean there's been some progress made by that by folks like chelsea and i'm sure she'll tell you about some new stuff there. Um but it's it's challenging to do so for example. Um, yo