So now, I have the distinct honor of introducing our first keynote speaker, MIT Professor emeritus Rodney Brooks, a trailblazer in the world of robotics and artificial intelligence. Rod is a revered figure in our field. His contributions have profoundly altered the landscape of how we think about robots and AI, how we interact with technology, and importantly, how we envision a future alongside it. Rodney Brooks is not just an academic. He is a pioneer who has founded several influential companies.
His entrepreneurial spirit led him to create iRobot, famous for its Roomba vacuum cleaners, Rethink Robotics, which brought revolutionary changes to industrial automation with Baxter and Sawyer robots. And most recently, his venture Robust.AI aims to usher in an era of more capable general purpose machines.
Rod's journey reminds us of a fundamental truth, often overlooked, that the heart of scientific inquiry lies not just in the quest for knowledge, but in a deep-seated desire to make our world a better place. From his groundbreaking research at MIT to his entrepreneurial ventures, Rod has embodied this truth. His career is marked by many prestigious awards, including membership in the National Academy of Engineering and the American Academy of Arts and Sciences, the IEEE Founder's Medal, the Computers and Thought award, the NEC Computers and Communication prize, and the Robotics Industry Association's Engelberger Robotics Award. So please join me in welcoming a visionary whose work continues to inspire and challenge our understanding of intelligent machines, Professor Rodney Brooks. [APPLAUSE] Well, hello. And thank you so much for people inviting me here.
I am not a generative AI person by any means, but I want to talk about generative AI today. A lot of people see generative AI bringing manna to the world-- new things, new prosperity, et cetera. But I'm going to concentrate on the mantra-- what it tells us about us. And what are the deep scientific questions? Now, there's a variety of people here.
So I feel I need to set a baseline and talk a little bit about what is in large language models and generative AI. There'll be more about that later this afternoon or later this morning. So if you don't know the technical background, I'm going to just give a little piece of it. And I would suggest the minimal reading you should do is not the stuff under Stephen Wolfram's left arm, but the stuff in his right hand-- this little pamphlet. It's 80 pages long. It started out as a blog post back in February.
And it gives a good overview. Second thing I really strongly recommend if you don't know the technical background is the GPT4 technical report from OpenAI. It's about 100 pages. The first half of it talks about GPT4, what it can do, performance on various benchmarks.
And the second half is called the system card, where OpenAI goes into what can go wrong, what it can't do, how to jailbreak it, et cetera. It's a very interesting report. Now, you might ask, should we believe Stephen Wolfram, who has a company, Mathematica, on how ChatGPT works? Has anyone here heard of Sam Altman? Does that name ring a bell to anyone? This is what Sam says on the back of this little brochure. This is the best explanation of what ChatGPT is doing that I've seen. So it's the truth.
If you go to the website, the blog, the diagrams are in color. And this is the start of the blog. It's from February 14 of this year. What does ChatGPT do and how does it work? It's just adding one word at a time.
This is Wolfram talking. The remarkable thing is that when ChatGPT does something like write an essay, it's essentially just asking over and over again, given the text so far, what should the next word be? And each time, it adds a word. Here's a diagram from Murray Shanahan's recent paper in Nature. Time is going from the left to the right here. It's the same LLM. There's an input, a question, which sets a context.
Write me a fairy tale. And once ChatGPT has written once upon, the LLM, the Large Language Model, looks at that and says, ah, once upon a. Step over to the middle. Once upon a.
What's the next word? What should it be? Time. Once upon a time. Blank again.
Looks at what it's written so far. Written once upon a time. What's the next word? What should it be? There. Once upon a time, there, et cetera. And the point here is if we were writing a fairy tale, we'd think of the whole phrase once upon a time, there was a king, or a dragon, or something.
But it doesn't think that way. It's just one word after another in the context of what it's already generated. And it randomizes what the next word should be a little bit because otherwise, it gets really boring.
I asked ChatGPT 3.5 to write an abstract for this talk. And what will it do? It said, title. I didn't ask it for a title, but the context of an abstract for a talk-- in the style of an MIT nerd, I asked it. And it produced this abstract, which is not so bad. It's the sort of thing you should talk about if you're talking about generative AI. Did it get the nerd part? I'm not so sure.
Look at that last sentence. Where is it? Join us for a concise and insightful journey into the realm of generative AI. That sounds more like a National Geographic trailer. It's not MIT nerd talk.
So it didn't get it all right, but it generated it pretty well and has a lot of the issues that Daniela mentioned that we're going to talk about over the next three days. And I'm going to talk about three different versions of ChatGPT-- 2, 3.5, and 4. There's many other LLMs, Large Language Models, from other companies, but I'll refer to these three in particular.
How do they work? This is from the paper, "Attention is All You Need," from 2017 from DeepMind, a Google company. And it's the block diagram of how these large language models work. On the left, the question goes in. Some input, some processing happens. That gets injected into the middle of the thing on the right.
The thing on the right is the generator, the generative AI that generates the words. On the bottom of that is the output so far, which keeps getting shifted as a new word gets added. And it flows through those boxes, gets some probabilities at the top of what sort of words should come out, what's the likely next word. One gets chosen.
Shift, do it again. And what's in those boxes? Well, those boxes are a very simple computation and a special computation. As Yann LeCun likes to point out, there is no iteration here.
These just flow through boxes. It's like you've got a network. You set the inputs. And the output just flows out. There is no computation, iteration, recursion going on. It's a simple flow-through network.
And it's made of neuron models. Now, neurons are what are our brains, what's in worms brains. And the neuron model that is used really started from a paper by McCulloch and Pitts back in 1943-- they later came to MIT to the research lab for electronics-- and then modified by Frank Rosenblatt in the '50s. Now, you might notice-- no one knew much about neuroscience back in the '40s and '50s. So this is a model from the '40s and '50s of the brain.
That's what this is based on. And what do the simple neurons look like? Well, there's some numbers. They're just numbers that come in as inputs, like the inputs to a neuron going through the synapses. There's weights.
This is the j-th neuron of a big network. That's what the subscripts j are. There's weight-- w1, w2, wn. These numbers, these weights get adjusted during learning. I'm not going to talk about how the learning happens, but that's where the knowledge of the network is.
And there's 175 billion of them in ChatGPT 3.5. The weights get multiplied by those anonymous numbers that are the inputs summed together. That's the net input, net j.
And then it goes through a transfer function to produce a 0, or a 1, or a minus 1, or a 1. The top one is what was used in the '50s. It got modified, getting rid of a threshold. And now it's a continuous logistics function. And the important thing about that is it's differentiable, so it can be used for back-propagation in learning, which I'm not going to talk about anymore, but there's just some numbers go in. The weights and another number comes out at the end.
And that's what those boxes are-- just a whole bunch of those. 175 billion weights. And it works surprisingly well.
These people from a recent paper from Alison Gopnik's lab at Berkeley-- she's a psychologist. She studies children. She has children come in, and tests them, and gets them to do all sorts of things with language, with perception, et. cetera. And she and her team say-- oh, by the way, I saw Alison about three weeks ago.
And I said, did you really mean what you said here? Because I don't want to quote you if you didn't really mean it. And she said, yes, I really meant it. And they said, large language models, such as ChatGPT, are valuable cultural technologies that can imitate millions of human writers, et cetera, et cetera.
So she's very positive about ChatGPT. And she studies children, but then they say, ultimately, machines may need more than large scale language and images to match the achievements of every human child. She says, they're good. They're a valuable cultural technology, but they're not as good as children.
And this is a familiar theme. The green stuff, people think they can do, the red stuff, not so much. So that's Gopnik's lab.
She also distinguishes between transmission versus truth. Yejin Choi-- she was at CSAIL, beginning of the month. And she gave a talk. She's a natural language processing person from University of Washington.
And she talked about generation-- good at generation, not so good at understanding. Melanie Mitchell, who's at Santa Fe Institute-- she talks about memorization. They're good at that, not so good at reasoning. Yann LeCun, Turing Prize winner, along with Jeff Hinton and Yoshua Bengio for deep learning, which is the learning technique which is used in these systems-- he says, it's good at reacting, not so good at planning. And he alludes to system 1 versus system 2 from Daniel Kahneman in making that distinction. So what it can do, what it cannot do-- there's a lot of people making that distinction.
But I think Subbarao Kambhampati says it in the most interesting way. Subbarao is a professor at Arizona State. He was, until recently, the president of AAAI, the Association for the Advancement of Artificial Intelligence, the premier academic professional society for AI. And he compares LLMs to alchemy.
Now, alchemy, you might remember from about 400 years ago, was, how do you transmute metals? How do you transmute lead into gold? That sounds really silly, but you may have heard of Isaac Newton. He came up with gravity. He came up with optics. Oh, he developed calculus, along with Leibnitz. And he was master of the mint, producing all the coins of Britain.
But he spent over half his life working on alchemy. It was not a fringe science then. And what Subbarao says is they thought chemistry could do it all, but it turns out they didn't know about nuclear physics. That was really important. And he says it in a cynical way there. If you prompt it just right, chemistry might be nuclear physics.
But they didn't know about nuclear physics. And nuclear physics is what you need to transmute lead into gold, theoretically. It's still not cheap to do it. It's not cheap enough to do it.
Nuclear physics-- the technology is not well enough controlled. And he says, well, the problem with LLMs might not be much different. There's something else for true intelligence.
So what can it do? What can it not do? I'm going to look at a slightly orthogonal question. Exploration versus exploitation. And as Daniela pointed out, the next two days are going to be how we exploit this valuable cultural technology in useful ways, but I'm going to talk about exploration. What does its existence mean versus what can we make it do? This generative AI and large language models. And first, I want to start with three scientific cultural observations. Now, everyone who worked in AI last century into the beginning of this century in every AI course learnt about a bunch of things, which I think LLMs challenge.
What we all learnt has changed somewhat. And these three things are the Turing test has evaporated. Thank God. Searle's Chinese room showed up uninvited. And there were some questions for Chomsky's universal grammar.
I'm going to talk about each of these, one after the other. Turing test-- this is from Alan Turing's paper in 1950. He didn't call it the Turing test.
He called it the imitation game. The paper was "Computational Machinery." And he said, what if a person is texting either another person or a computer? He didn't say texting. He said using a teleprinter, but the equivalent today is texting.
What if the person is texting one of those two? Can they figure out whether it's a person or a computer? And this was a rhetorical device he was using-- he starts out in the beginning of the paper as a rhetorical device because he wanted to get away from the question of defining thinking or defining intelligence. But his point was, if a person can't tell the difference between a computer and another person that they're talking to, then surely, you have to admit the computer is intelligent-- as intelligent as a person-- because he can't distinguish. That was his argument. This test, or this question, got adapted by the press.
The press gets involved in technology, by the way. You may have noticed that. It says stuff, and then we believe it. We listen to it. So the press has used the Turing test for 70 years as the ultimate arbiter of whether an artificial intelligence system is intelligent or not.
Turing said, in 1950, that he believed machines would be capable of fooling people 70% of the time by the year 2000, and that the program would consist of 2 billion bits. He really stuck a stake in the ground about how complex it would be. And up until two years ago, the press was still talking about the Turing test-- the Turning test, the Turing test.
But you may have noticed, the press doesn't talk about whether LLMs pass the Turing test or not. It's assumed. And this is a little piece from Nature from a few months ago saying, ChatGPT broke the Turing test.
No longer is it, does this program pass the Turing test or not? Can it fool you? No. It's not a fine enough question. And it turns out, I think that we're more interested in what it can say rather than the fact that it does say. The Turing test was about it saying intelligent stuff, but now, we're much more interested in what it can say and what level of intelligence that is. Second thing in AI we all learned about for a long time was Searle's Chinese room. John Searle, the philosopher at Berkeley, 1980, came up with the Chinese room.
Why the Chinese room? It was because English speakers back in 1980 pretty much universally didn't know Chinese at all. So it was a separate language. And he could talk about a person knowing English versus a person knowing Chinese-- very different sorts of things. So I asked ChatGPT 3.5 to explain Searle's Chinese room. This is what it told me.
Imagine a person who doesn't understand the Chinese language locked inside a room. They have a set of instructions written in English that tells them how to manipulate Chinese symbols-- the characters in a question in Chinese. And they're input through a slot into the room.
They have no understanding of Chinese and don't know the meanings of those symbols-- the Chinese words, the characters. From the outside, someone passes messages written in Chinese through the slot. The person inside the room follows the instructions of the program and produces responses in Chinese just based on symbol manipulation. And then most importantly, to an observer outside the room, it may appear, Searle says, that the person inside understands Chinese, but no, they're just manipulating symbols. And here I emphasize the last thing the GPT 3.5 said, which is very important.
Without grasping the semantics or meaning of those symbols-- so the idea is person outside writes a question in Chinese, puts it under the door. The person inside has big books of rules written in English. Look at this symbol if it matches that.
Do this, do that. And they output an answer. Does the person understand Chinese? Does the room understand Chinese? This was the philosophical question. So I typed some Chinese to ChatGPT 3.5. I didn't tell it I was going to type Chinese.
I don't know Chinese. I use Google Translate to produce the symbols for me. Who is Ai Weiwei? A Chinese artist. And it came right back in Chinese and told me who he is.
It's the Chinese room. It's there. This was this philosophical thing that we talked about for years, and it was imaginary.
It couldn't be real, but now it's real. How does ChatGPT impact various arguments that people have had for decades in AI about symbols and grounding? My personal old argument, which I don't think works anymore, is without grounding words, tokens or symbols in visual motor stuff, the instructions would have to be impossibly large. So it's a stupid experiment.
It's an imaginary experiment thinking about it this way, but here we have this Chinese room, which is 175 billion weights, 32 bits each. That's less than a terabyte. Everyone's laptop in this room can store a terabyte. It's not that large anymore.
And it does it. Wow. What does that mean for us? We thought there was something more about grounding. And some people thought that language was too strongly grounded in nonlanguage so that a language-only solution couldn't possibly work. I used to talk about an example with Korean rather than Chinese on this, also.
But no. There's no grounding in stuff in the world for these LLMs. All they have been exposed to is billions of pages from the web or from books.
They just read stuff. And they can answer in Chinese. They can answer in any language. And some thought that clearly, it was the room and the person together that understood Chinese, not just the person.
And so it was making a category mistake in saying, well, they don't understand Chinese. It's the whole system. So would Searle now say that ChatGPT understands Chinese or not? Or would it just look like it does? And I think based on some arguments we had where I said, if it walks like a duck and talks like a duck and smells like a duck and poops like a duck, it's a duck. And he says, no, no, no, not unless it's a biological duck. Only then.
So there was an animism. But I think it brings some questions to us. What does it mean to be intelligent? Now, I'm going to just take a sidetrack and explain a little bit more about how ChatGPT works before I come back to the third one because this is important. I've talked about grounding.
What does a symbol mean? And how ChatGPT works-- at least, I'll just use the English part-- is the words are just numbered. About 50,000 in English-- either words or parts of words. So cat, dog, chair, run, bark, "pre-," "-ing," eyes, et cetera.
50,000 of them. That's the first step of processing, which is not done with a neural network. It breaks it into tokens. And each of these tokens, whether it be English, or Chinese, or whatever, is assigned some meaningless number. Let's suppose it happens to be 1, 2, 3, 4, 5, 6 for these tokens above.
Then inputs-- when I type a question, they're encoded as a string of numbers. So if I say dog running, dog-- that's number 2 word. Run is number 4 word. "-ing" is number 7 piece. Dog running is 247.
Dog barking would be 257. So these meaningless numbers-- there's no relationship between these numbers. It's just assigned.
And then through looking at lots and lots of text, the correlations between these numbers start to mean something. And they start with what are called embeddings with a special piece of learning at the start. The correlations between these tokens get learnt as a vector. And in the case of GPT2, the vector consists of 768 neurons.
And the output of those 768 neurons are numbers between 0 and 1. They're drawn here for cat, for dog, for chair. In ChatGPT 3.5, it's 12,288 numbers rather than 768. They're both of the form 3 times 2 to the n. ChatGPT 4 is probably way larger. We don't know what it is.
It hasn't been talked about publicly, but these embeddings as a vector are what represent the tokens going through the network. And when you look at the structure, a vector-- you can have a two-dimensional vector, an x-coordinate, a y-coordinate. Then you can have a z-coordinate. These have 768 coordinates, but if you look at them from a particular direction, the points change their relationship from those vectors. And here Wolfram, to ChatGPT 2, just projects one particular direction into two dimensions. And you see there's some associations, which start to make sense.
It's almost a grounding, an understanding of what's there. So duck and chicken are close together. Dog and cat are close together. Alligator and crocodile are really close together because no one who ever writes about them knows the difference between them.
So those words always are interchangeable. Over in the fruit area, apricot and peach-- they're similar. Papaya is closer to a melon. So there is some meaning there. There's some grounding, but it's all been just extracted from language.
It's not through our senses that we use. OK. So the grounding of symbols is replaced by embeddings of tokens, but it just comes from correlations of text.
That's what we need to understand to understand how the Chinese room works at all. Let's look at the third thing-- Chomsky's universal grammar, developed in the '60s. The X-bar paper was 1970, I think.
Here at MIT, linguistics department were the center of this. And the idea is that humans, children have some machinery in their head, which is able to represent all the grammars of all the human languages. And when children are exposed to that language, hearing it, it sets some parameters in their head about what the language is like, whether it's got cases in the nouns, for instance, or not, how tenses work in verbs. And there's different parameters to get set. And that's why babies are able to learn language because they have this genetic machinery that's dedicated to language.
Oof. ChatGPT didn't have that universal grammar anywhere in it. It appears to have acquired lots of human languages without the universal grammar constraint mechanism, nor reference semantics. Well, that's a bit of a problem, I think. It either means we have to modify universal grammar or we have to say, we didn't get that quite right. It's a dangerous thing to do near the linguistics department here at MIT.
I can assure you. But it was acquired in the sense of grammaticality and coherent use. Is it just because that's a vastly bigger training set than human babies? The amount of stuff ChatGPT read to get trained is way bigger than any human could ever read or know about. Is its transformer mechanism-- that stuff on the right-- is that somehow a superset of universal grammar? Does it implement it in some way? I think these are deep, scientific questions. And it seems to be a promiscuous language learner. Is it capable of a bigger class of language than humans? And if so, what constraints are there on what languages it could learn? And Chomsky posits that only one species exists with true language-- us humans.
Gorillas don't have language. Chimpanzees don't have language. Whales couldn't possibly have language because language is this universal grammar.
But here we've got this system learning language without the universal grammar. It's a scientific conundrum. So these valuable cultural artifacts, large language models, have caused us to have to rethink a bunch of things that we thought were settled for the last 50 years.
They're not. Now, there's a deeper question. Where is the power coming from? And I don't know. I'm going to suggest one example of where it might be coming from, but there's 200 or 1,000 other examples equally good. I don't know which one's right.
I'm just going to give you a flavor for the sorts of things you could ask. Where's the power coming from in these LLMs? Is it neurosymbolic-ish? A lot of people have been calling for the last few years-- we've got to get the neuro stuff from AI with the symbol stuff from AI, and join them together, and get more power. Did we actually have that happening here in some way? Non-neural AI, which has been the bigger part of AI for 56 years, from 1956 to 2012-- 2012 was when deep learning really got announced-- is about atomic symbols. Those symbols can have properties.
So the symbol person can have a property of age, name, weight, et cetera. Symbols represent the grounding of objects in the world and the concepts and relationships. And the symbols are manipulated using rules which lead to inferences. And robotics tries to ground them in real life. So I took an example here from David Poole and Alan Mackworth's latest edition of their textbook on artificial intelligence about symbol processing. And so on the left, upper left corner, you've got some predicates in part of and some arguments.
And Kim is in R123. It happens to be a room. That's the grounding of R123. R123 is part of the CS building. And then there's a rule. If x is in y and z is part of y, then x is in z.
And so you deduce Kim is in the CS building. And the idea is that the robots and their perception systems relate those symbols to stuff out in the world. And that's how symbolic AI works. And it gets really complicated really quick.
This is a bit of stuff from the semantic web of subclasses, et cetera. Complicated relationships between lots and lots of symbols. Enormous amounts of data, but way less than 175 billion weights, I should point out. So tokens and embeddings are sort of symbol-ish. Symbols have properties. Token embeddings are some sort of properties.
Symbols work when there's a calculus of manipulation of relationships. Embeddings have their own approximate calculus of manipulation in those layers there where the linear neurons work. They do some weird stuff. Sometimes, they add those embeddings.
They just add the vectors. Why does that work? Sometimes, they just look in a part of the vector-- the heads. As Wolfram says, it's a dark art. As Subbarao says, it's alchemy. We don't really know why it works, but you do this, and you do that, and then it sort of works. So there's a calculus of manipulation.
What is that calculus of manipulation doing? One or the other of these are subset. Is there an intersection that can be grown in some useful way? This is just one of 1,000 possible sets of questions you could ask. I don't know the answers to these. I'm just trying to give the idea. There's deep questions to ask. I'll talk about robotics very briefly because I mostly work on robotics.
Robots are a perception system that gets some sort of semantic understanding of the world, whatever semantics means, and then a little bit of reasoning. And out of that comes a force that has to be applied in the world to achieve a goal. That's all robots do.
They look, and they push. They push the wheels. They push an arm, squeeze the fingers.
They look, and they push. And I add kinetic energy to systems. And then you've got to sometimes get rid of it before it's too late. And in that, things in the world are objects.
Good, old-fashioned AI-- I talked about the symbol grounding problem, which I've mentioned. And what does a ladder really refer to? Why ladder? Well, I'm working on robots that operate in warehouses. The worst thing a robot in a warehouse can do is run into a ladder. That's bad because there's probably a person up there. So I'm really worried about knowing what ladders are.
Some people think deep learning did the symbol grounding problem, but actually, I don't think it did. It does labeling. It doesn't say perception.
And oops. There's the problem of stability of the grounding. So where our robots went wrong is the ladder.
Oh, there's a ladder. No, there's nothing there. It's not stable. The perception systems are not stable. You have to smooth stuff to make it work.
Brian Cantwell Smith did his PhD in the predecessor labs to CSAIL. And he's got a recent monograph at MIT Press that talks about this, I think. The symbol grounding problem is deeply not understood yet.
And there's a lot of work to do there. And the question is, are LLMs doing it with these embeddings in some interesting way? But I think the hard thing-- I feel like as a roboticist I should give an honest answer-- the hard things in robotics are perception and action. Listening to coaching is far from sufficient to a person who wants to become good at any physical skill. I had Ian and Greg Chappell as my cricket coaches in elementary school.
Best cricket players in the world. They told me what to do. I couldn't do it. Telling is not good enough.
Greg Chappell went on to be coach of India, so anyone from India knows Greg Chappell. You have to do it in the world. You have to practice it. You have to get there and do it. Generative AI is not going to lead to better robots anytime soon. That's my one statement I'm going to make of what I truly believe today.
Everything else-- speculation. And have you noticed this hype and some hubris? I'm going to talk about hype first. Hype is not new. Here's Frank Rosenblatt in the '50s. He had a handful of linear neurons-- the diagrams I showed you before.
And he didn't use digital computers to do them. He used analog computers because digital was too hard, but he had a handful of them-- less than 100 weights. Here's the research trends report from Cornell in 1958. Look what it says down the bottom.
Introducing the perceptron, a machine which senses, recognizes, remembers, and responds like the human mind. And it was just a handful of those linear neurons. 100 weights, not 175 billion weights. So hype has been around for a long, long time around AI. And so this next slide has not been edited.
It was just me typing stream of consciousness of the hype cycles that I remember. I got involved in AI when I was in high school in the late '60s, professionally in the '70s. Wrote a thesis-- a really bad master's thesis-- really bad master's thesis-- on machine learning in 1977. So I've been around a long time, and I've seen a lot of hype.
This is the [? premier ?] stuff, which I didn't know about at the time. I only knew about it afterwards. But everything else in this thing-- these are hype cycles that I remember.
I remember reading about them in high school in the '60s and so on, through my whole career. And some of these things come back again and again. Reinforcement learning is on its fourth go-around with AlphaGo. Neural networks-- we're up to volume 6 of neural networks. They keep coming back again and again.
They go away. They come back. They go away.
They come back. Revolutions in medicine-- we had one in the '80s with rule-based systems out of Stanford. We had another one with Watson. After Watson could play Jeopardy-- oh, it's going to solve medicine.
It's going to be a revolution in medicine. So we have these hype cycles all the time. Let's remember the hype. The hype is there. And where does it come from? Well, I talk about the seven deadly sins of predicting the future of AI.
It was originally in my blog. And then in 2017, an edited version of it appeared in MIT's Technology Review. And these are the seven sins. I'm not the innovator.
I'm not the sinner. I didn't invent these sins. I just cataloged sins. So there's a difference. And they're not even originally cataloged, all of them, by me.
Some of them have already been cataloged by other people in other fields of technology, but these seven sins, I think, lead to hype overpredicting what's going to happen. I'm going to talk about two of them. Oh. First, I will say, I looked at both the salvationists, who think that generative AI is going to solve everything for humans, and the doomsters, who say it's going to kill us all. And I looked at what everyone was writing. I found six sins for salvationists and four sins for doomsters of those seven sins.
They're commonly used. Here's one of them-- exponentialism. We tend to think that everything's going to be exponential. Why do we think that? Because we just had over 50 years of Moore's law, which was exponential again and again and again.
And we think that everything's exponential. So when we see a graph like this, it's going up. Yeah, we're right here. Wow.
It's going to keep going. It's going to pass human level. Eh, maybe it's not.
Maybe it's going to go like that. In fact, most things are not exponential forever because you use everything up. In the case of Moore's law, the size of the gates has gone down to just 20 atoms or something. And Moore's law has ended. If you read the original paper or magazine article from 1965, Moore's law was about economics, saying, the gates will get cheaper and cheaper and cheaper. Right now, a 3-nanometer gate is about twice as expensive as a 5-nanometer gate.
So Moore's law has definitely stopped. But we tend to think-- everyone thinks-- everything's exponential. And you hear people say, yes. Look. ChatGPT 3.5 can do this.
ChatGPT 4 can do. This so ChatGPT 5-- gosh. That'll be able to be human level. Eh.
Indistinguishable from magic. This is not my sin, not one that I first noticed. It's noticed by Arthur C. Clark, science fiction writer.
He also, by the way, in 1945, published a paper on geosynchronous communication satellites. He thought that there would be astronauts up there changing the vacuum tubes in those. He talked about that in his 1945 paper. So he didn't get it all right, but he has three laws. And I think we should look at the first law, since I'm up here. [INAUDIBLE] said something's possible-- probably impossible, probably wrong.
Keep that in mind. The number 2, I think, is what MIT does all the time. We go beyond the limits of the possible, venture a little way past them to the impossible. But number 3 is his third law.
Asimov had three laws. Arthur C. Clark had to have three laws. Any sufficiently advanced technology is indistinguishable from magic. What does that mean? Well, if you don't understand the mechanism, how do you know what the limits are? Now, I didn't know that there was going to be a poem today. But in Turing's paper back in 1950, he said that the computer-- if the person asks it to write a sonnet, then the computer is going to have to obfuscate-- say, ah, I was never good at poetry. Well, I asked ChatGPT 3.5 to write a sonnet
based on Shakespeare's sonnet 18 of what is a robot. And this is what it came out with. [BLOWS RASPBERRY] It just spat it out. It gets the three quatrains right.
It put the blank lines there. It's got the couplet at the end. Shall I compare thee to a robot's grace? Thou art more-- it's sonnet number 18. The second-to-last line in the original was, so long as man can breathe and eyes can see-- maybe a bit more modern language.
And in thee, ends in thee. The third quatrain there talks about the eternity. It's pretty damn good. And if it can do that, gosh, what can't it do? It's magic.
It can do anything. And that's where we are, but I look at this a little more closely. I did ask it, what is a robot? And all it said was how beautiful you are.
It didn't say what is a robot. Here's another sonnet I like better. I think this sonnet is better-- what is a robot. Shall I compare thee to creatures of God? You make vast maps with laser light. I admit, the rhymes in the third quatrain-- no, the second quatrain-- libraries, clumsily. Eh, not so good.
It doesn't have the eternity in the third one. Ends with give life to thee. It's a little better. I'm a little biased.
This is one I wrote back when we first had COVID and I was locked at home. And I'm published. It's in that well-known poetry journal IEEE Spectrum. [LAUGHTER] Eat your heart out, Mrs. Marriott.
She was my English teacher in high school. She couldn't have believed this. Anyway, so I thought, OK. Can ChatGPT 3.5 do better than its first attempt? So I said to it, please write another one, but this time, concentrate on what defines a robot. And it did concentrate on that.
And it's interesting. They are not born of flesh, nor earthly sin, yet in form, a certain beauty lies. The limbs not made of sinew, bone, or skin, but gears and servos in precision move. So it talks about what is a robot, but it's lost, I think, sonnet 18 from Shakespeare. Shall is about all that's left of it.
So it has limits, but we don't know how it works. We can't say how it works. We don't have an intuition. So we don't know what its limits are, and it becomes magic. Why do we do that with AI? I think it's because AI is about intelligence and language. Intelligence-- that's what got us here to MIT.
I'm smarter than the other people. Intelligence-- I've got intelligence. Language is what makes us people. And so we like to think about-- when we see AI trying to do intelligence, trying to do language, we think about ourselves.
And it's a reflection on ourselves. But there's also hubris, where people believe the hype-- maybe the same people who generate the hype-- and say, it's going to make it happen. Let me give you an example of that. So sorry.
The hype leads to hubris. And then the hubris leads to conceits. And the conceits lead to failure. So autonomous vehicles. And this is another of the sin, speed of deployment.
I was at a talk in Santa Cruz, 1987, when Ernst Dickmanns talked about his vehicle that had driven along the autobahns amongst public traffic at about 70 kilometers an hour for 20 kilometers, just driving along with the other people back in 1987. By 1995, Takeo Kanade's students, Dean Pomerleau and Todd Jochem, had this vehicle, which, with hands off the wheels, feet off the pedals-- most of the way, it drove with that condition-- from Pittsburgh to Los Angeles in a project they called No Hands Across America. And then in 2007, the DARPA Urban Challenge, which was won by Sebastian Thrun, then at Stanford-- and MIT competed in this-- had vehicles driving around in traffic. And so people thought, wow. This is doable.
That's the hubris. We can make this happen. And Sebastian went on to help cofound Google X.
And in 2012, I first went in a Google X car on a freeway in California. It worked. And everyone thought it was just going to happen like magic. The conceit was that there was going to be a one-for-one replacement for human drivers. So we didn't have to do anything about infrastructure. We don't have to change anything.
Just the cars, we're going to change. And they're going to drive amongst humans. And this is a screen grab I took in 2017.
I've colored it in a little bit. It's still on that page, if you go to that page ID. Executives of companies were saying when they were going to have level 4, full self-driving, and have it deployed. The dates in parentheses at the end are when they made these predictions.
The dates in blue are when they said it would happen. And I pinked out the ones that have passed. None of them happened. There's a few blue ones later.
The orange arrows are where I've since heard the executives change their predictions and say it was going to take longer. So for instance, fourth one from the bottom-- Daimler chairman in 2014 said that fully autonomous vehicles by 2025. A few years later, he said, nah. We're not going to do it. Other people pushed out their dates. There would be one from Tesla, which gets pushed out a year every year, has since 2014.
And they're not deployed at scale. This is the Cruise vehicles. These happen to be in Austin. There have been a lot.
There were 300 of them in San Francisco this year. I've taken autonomous Cruise rides 36 times this year. 35 times, I didn't fear for my life. One time, I did. And I don't know if you know. Weekend before last, there was more than one CEO in trouble.
Kyle Vogt resigned as CEO of Cruise. And Cruise has currently shut down all operations, even with drivers. So things haven't gone as well for GM as they thought. And there was this one-for-one replacement became inevitable-- so every company thought they had to get in on the action.
It was a big, big prize. And a lot of VC money went to many startups because it's such a big prize. What are VCs supposed to do? It's supposed to invest in things which have high return.
This looked like high return. It became a monoculture of learning-based methods. And there was a massive duplication of collecting nonpublic data sets. The amount of driving around just to collect data sets is amazing. Billions of dollars have been spent on it.
And what happened badly, I think, was it killed the idea of government-led or funded digitalization of our roads. Every time we've introduced some change of transportation, we've changed our infrastructure. Henry Ford built roads so that his cars could move around not just in rutted mud. And digitalization of roads-- back in the '90s, there were projects-- Citrus at Berkeley-- of how we could collect data from fixed assets on the roads, and transmit them to cars, and make them be able to self-drive safely. That all went away because of this conceit that we could just do one-for-one replacement.
So it slowed down safety innovations. And there was a lot of stifling of innovation. Why do so many people get it so wrong? We're going to have self-driving cars a few years ago, which we don't have. First is fear of missing out. They didn't want to miss out.
It was such a big idea. They couldn't miss out. And the other one-- this is one in the [INAUDIBLE].. Fear of being a wimpy techno pessimist and looking stupid later.
[LAUGHTER] And so what scares me about generative AI is researchers jump to the shiny new thing where they were almost there with what they were working on before, and they abandon it. And then the other thing is that-- and Ada Lovelace talked about this in 1840-- note G of her paper in 1840. Concentrate on the new applications. You get sucked in by the hubris, believing the conceits.
You think it's going to happen quickly. When it doesn't happen quickly, you say, it's over. And you walk away, whereas if you'd just stayed a little longer, you would have something. Ada Lovelace was trying to get the British government to fund the analytical engine at the time. So she suffered from that, along with Babbage.
And in industry, I worry about VC funding, swarms to high margin because VCs should be high margins where they get great return on their investments. So this is natural behavior they should do. And I'm worried they'll neglect connecting to the real world more than they should. So a whole generation of engineers will forget about other forms of software and AI.
That's my scary things about generative AI. So my message is, with generative AI, whether you're an explorer or an exploiter, examine your motivations and fears, fear of missing out, or fear of being a wimpy techno pessimist and looking stupid later. By the way, that's precisely the argument the French mathematician Blaise Pascal had-- why you should believe in God.
Because what if you didn't believe in God, and you show up. God says, how was it? Did you believe in me? It's really embarrassing. [LAUGHTER] So it's hard not to suffer from that sin.
What's the conceit of generative AI? The conceit is that it's somehow going to lead to artificial general intelligence. By itself, it's not. There's some other stuff that needs to get invented. That's the conceit. A lot of people talk about that conceit.
Don't believe it. Don't get involved in the hubris. Forget about the hubris. Work hard, whether it be in generative AI, whether it be in exploration or exploitation.
Expect to have to work hard. And something good will come out of it. Thank you. [APPLAUSE] I don't know whether we have microphones or-- There's a microphone right there. Speak up.
If you'd repeat question. So when you talk about LLM, talk about [? the LEM. ?] Large [INAUDIBLE] Model [INAUDIBLE].. Yeah.
I didn't want to try and go to all the varieties. So I was just giving a general theme talk. So yes. And there are specific dangers around that. I know that people are talking about it. Where AI has been successful, usually, it has involved a person in the loop, a person to cut out the chaff.
It happens with Google Search. Back before Google got taken over by ads, it would put out 10 things, and maybe the third one was the one you needed, or the fourth one-- you, the human. So anything with large design models is going to, I think, involve people for a while. But as I said, I'm not even in this field. I'm just an outside observer.
So in the particular details, I can't help. Yes. [INAUDIBLE] Well, there's two versions of maths. There's arithmetic, and there's theorem proving. And there's a lot of papers coming out in the archive from mathematicians talking about how it doesn't help people understand intuitively what is going on.
And mathematics is a very intuitive thing. So I think the mathematics is distinct from arithmetic. There's also a whole set of papers about that. I think it's like when Kasparov was beaten by Deep Blue in the '90s. People said, that's the end of chess.
No, it hasn't been the end of chess. And in fact, Kasparov has built this whole thing about humans and chess engines working together. So I think there's some possibilities there. Doomsters and salvationists thinking the rapture is about to come-- both overestimate the short term. So we're going to solve mathematics. And the mathematicians-- no, no.
That's us. We can't do that. I think, just calmed down a little bit, everyone.
Yeah. Question at the end of the talk that you don't believe we should completely invest in the hype of generative AI because there are other fields, like energy, robotics, [INAUDIBLE] that clearly have direct effect on our lives. Can you talk about the synergy between those, so using generative AI? Yeah.
So perhaps, generative AI can help with some things, but don't forget the basics of it. There's a basics of energy. There's some basic equations about energy. It doesn't get solved just by having a better generative AI system.
It doesn't solve all the problems. There are going to be a large class of problems. No one technology has ever surpassed everything else. Writing didn't surpass everything else.
Reading, writing didn't surpass everything else. So take it easy. There's going to be a lot of stuff. I had a very famous technologist come to me two years ago. She said, her son was just about to graduate from a well-known university in mechanical engineering. My God, what's he going to do with his life as a mechanical engineer? It's all over.
Well, there's plenty of jobs for mechanical engineers. Thank you, Rod. Let's thank Rod once again. [APPLAUSE] We're going to take a 10 minute break.
2023-12-20