Paul Christiano - Preventing an AI Takeover

Show video

Okay, today I have the pleasure of interviewing  Paul Christiano, who is the leading AI safety   researcher. He's the person that labs and  governments turn to when they want feedback   and advice on their safety plans. He previously  led the Language Model Alignment team at OpenAI,   where he led the invention of RLHF. And now he  is the head of the Alignment Research Center.   And they've been working with the big labs to  identify when these models will be too unsafe   to keep scaling. Paul, welcome to the podcast. Thanks for having me. Looking forward to talking. 

Okay, so first question, and this is a  question I've asked Holden, Ilya, Dario,   and none of them are going to be a satisfying  answer. Give me a concrete sense of what a post   AGI world that would be good would look like.  How are humans interfacing with the AI? What   is the economic and political structure? Yeah, I guess this is a tough question   for a bunch of reasons. Maybe the biggest one is  concrete. And I think it's just if we're talking   about really long spans of time, then a lot will  change. And it's really hard for someone to talk  

concretely about what that will look like without  saying really silly things. But I can venture some   guesses or fill in some parts. I think this  is also a question of how good is good? Like,   often I'm thinking about worlds that seem like  kind of the best achievable outcome or a likely   achievable outcome. So I am very often imagining  my typical future has sort of continuing economic  

and military competition amongst groups of humans.  I think that competition is increasingly mediated   by AI systems. So, for example, if you imagine  humans making money, it'll be less and less   worthwhile for humans to spend any of their time  trying to make money or any of their time trying   to fight wars. So increasingly, the world you  imagine is one where AI systems are doing those   activities on behalf of humans. So, like, I just  invest in some index fund, and a bunch of AIS   are running companies, and those companies are  competing with each other. But that is kind of a  

sphere where humans are not really engaging much.  The reason I gave this how good is good caveat is,   like, it's not clear if this is the world you'd  most love. I'm like, yeah, I'm leading with like,   the world still has a lot of war and of economic  competition and so on. But maybe what I'm trying   to what I'm most often thinking about is, like,  how can a world be reasonably good during a   long period where those things still exist? In  the very long run, I kind of expect something   more like strong world government rather than  just this status quo. That's like, a very long   run. I think there's, like, a long time left of  having a bunch of states and a bunch of different  

economic powers, one world government. Why do you think that's the transition   that's likely to happen at some point. So again, at some point I'm imagining, or I'm   thinking of the very broad sweep of history. I  think there are a lot of losses. Like war is a  

very costly thing. We would all like to have fewer  wars. If you just ask what is humanity's long   term future like? I do expect to drive down the  rate of war to very, very low levels eventually.   It's sort of like this kind of technological or  sociotechnological problem of sort of how do you   organize society, navigate conflicts in a way that  doesn't have those kinds of losses. And in the   long run, I do expect this to succeed. I expect  it to take kind of a long time. Subjectively,   I think an important fact about AI is just like  doing a lot of cognitive work and more quickly,   getting you to that world more quickly, or  figuring out how do we set things up that way?  Yeah, the way Carl Schulman put it on the podcast  is that you would have basically a thousand years   of intellectual progress or social progress in a  span of a month or whatever when the intelligence   explosion happens more broadly. So the situation  know we have these AIS who are managing our hedge   funds and managing our factories and so on. That  seems like something that makes sense when the  

AI is human level. But when we have superhuman  AIS, do we want gods who are enslaved forever   in 100 years? What is the decision we want? 100 years is a very, very long time. Maybe   starting with the spirit of the question. Or  maybe I have a view which is perhaps less extreme   than Carl's view, but still like a hundred  objective years is further ahead than I ever   think. I still think I'm describing a world which  involves incredibly smart systems running around,  

doing things like running companies on behalf of  humans and fighting wars on behalf of humans. And   you might be like, is that the world you really  want? Or certainly not the first best world,   as we mentioned a little bit before, I think it  is a world that probably is of the achievable   worlds or like feasible worlds is the one  that seems most desirable to me that is   sort of decoupling the social transition from this  technological transition. So you could say, like,   we're about to build some AI systems, and at the  time we build AI systems, you would like to have   either greatly changed the way world government  works, or you would like to have sort of humans   have decided like, we're done, we're passing off  the baton to these AI systems. I think that you  

would like to decouple those timescales.  So I think AI development is by default,   barring some kind of coordination going to be very  fast. So there's not going to be a lot of time for   humans to think like, hey, what do we want? If  we're building the next generation instead of   just raising it the normal way. Like, what do  we want that to look like? I think that's like   a crazy hard kind of collective decision that  humans naturally want to cope with over a bunch   of generations patients. And the construction  of AI is this very fast technological process   happening over years. So I don't think you want  to say like, by the time we have finished this  

technological progress, we will have made a  decision about the next species we're going   to build and replace ourselves with. I think the  world we want to be in is one where we say either   we are able to build the technology in a way that  doesn't force us to have made those decisions,   which probably means it's a kind of AI. System  that we're happy, like Delegating fighting a war,   running a company to, or if we're not able to do  that, then I really think you should not be doing   you shouldn't have been building that technology.  If you're like, the only way you can cope with   AI is being ready to hand off the world to some  AI system you built. I think it's very unlikely   we're going to be sort of ready to do that.  On the timelines that the technology would  

naturally dictate, say we're in the situation. In which we're happy with the thing. What would   it look like for us to say we're ready to hand off  the baton? What would make you satisfied? And the   reason it's relevant to ask you is because you're  on Anthropics Long Term Benefit trust and you'll   choose the majority of the board members. In the  long run at Anthropic, these will presumably be   the people who decide if Anthropic gets AI  first, what the AI ends up doing. So what is   the version of that that you would be happy with? My main high level take here is that I would be   unhappy about a world where Anthropic just makes  some call and Anthropic is like, here's the kind   of AI. We've seen enough, we're ready to hand off  the future to this kind of AI. So procedurally,   I think it's not a decision that kind of I want  to be making personally or I want Anthropic to   be making. So I kind of think from the perspective  of that decision making are those challenges? The   answer is pretty much always going to be like, we  are not collectively ready because we're sort of   not even all collectively engaged in this process.  And I think from the perspective of an AI company,  

you kind of don't have this fast handoff option.  You kind of have to be doing the option value to   build the technology in a way that doesn't  lock humanity into one course path. This   isn't answering your full question, but this  is answering the part that I think is most   relevant to governance questions for Anthropic. You don't have to speak on behalf of Anthropic.   I'm not asking about the process by which we  would, as a civilization, agree to hand off.   I'm just saying, okay, I personally, it's hard  for me to imagine in 100 years that these things   are still our slaves. And if they are, I think  that's not the best world. So at some point,   we're handing off the baton. Where would you be  satisfied with this is an arrangement between the  

humans and AIS where I'm happy to let the rest  of the universe or the rest of time play out.  I think that it is unlikely that in 100 years  I would be happy with anything that was like,   you had some humans, you're just going to throw  away the humans and start afresh with these   machines you built. That is I think you probably  need subjectively longer than that before I or   most people are like, okay, we understand what's  up for grabs here. If you talk about 100 years,   I kind of do. There's a process that I kind  of understand and like a process of like,   you have some humans. The humans are, like,  talking and thinking and deliberating together.  

The humans are having kids and raising kids, and  one generation comes after the next. There's that   process we kind of understand, and we have a lot  of views about what makes it go well or poorly,   and we can try and improve that process and have  the next generation do it better than the previous   generation. I think there's some story like  that that I get and that I like. And then I   think that the default path to be comfortable with  something very different is kind of more like just   run that story for a long time, have more time for  humans to sit around and think a lot and conclude,   here's what we actually want. Or a long time for  us to talk to each other or to grow up with this   new technology and live in that world for our  whole lives and so on. And so I'm mostly thinking  

from the perspective of these more local changes  of saying not like, what is the world that I want?   What's the crazy world? The kind of crazy I'd be  happy handing off to more, just like, in what way   do I wish we right now were different? How could  we all be a little bit better? And then if we   were a little bit better, then they would ask,  okay, how could we all be a little bit better?   And I think that it's hard to make the giant jump  rather than to say, what's the local change that   would cause me to think our decision are better. Okay, so then let's talk about the transition   period in which we were doing all this thinking.  What should that period look like? Because you   can't have the scenario where everybody has  access to the most advanced capabilities and   can kill off all the humans with a new  bioweapon at the same time. I guess you   wouldn't want too much concentration. You  wouldn't want just one agent having AI this  

entire time. So what is the arrangement of this  period of reflection that you'd be happy with?  Yeah, I guess there's two aspects of that that  seem particularly challenging, or there's a   bunch of aspects that are challenging. All of  these are things that I personally like. I just   think about my one little slice of this problem  in my day job. So here I am speculating. Yeah,   but so one question is what kind of access to AI  is both compatible with the kinds of improvements   you'd like? So do you want a lot of people to be  able to use AI to better understand what's true or   relieve material suffering, things like this, and  also compatible with not all killing each other   immediately? I think sort of the default or the  simplest option there is to say there are certain   kinds of technology or certain kinds of action  where destruction is easier than defense. So,   for example, in the world of today, it seems  like maybe this is true with physical explosives,   maybe this is true with biological weapons, maybe  this true with just getting a gun and shooting   people. There's a lot of ways in which it's just  kind of easy to cause a lot of harm and there's  

not very good protective measures. So I think the  easiest path would say we're going to think about   those. We're going to think about particular  ways in which destruction is easy and try and   either control access to the kinds of physical  resources that are needed to cause that harm. So,  

for example, you can imagine the world where  an individual actually just can't, even though   they're rich enough to can't control their own  factory, that can make tanks. You say like, look,   a matter of policy sort of access to industry  is somewhat restricted or somewhat regulated,   even though, again, right now it can be mostly  regulated just because most people aren't rich   enough that they could even go off and just  build 1000 tanks. You live in the future where   people actually are so rich, you need to say  that's just not a thing you're allowed to do,   which to a significant extent is already true. And  you can expand the range of domains where that's   true. And then you could also hope to intervene on  actual provision of information. Or if people are   using their AI, you might say, look, we care about  what kinds of interactions with AI, what kind of   information people are getting from AI. So even if  for the most part, people are pretty free to use  

AI to delegate tasks to AI agents, to consult AI  advisors, we still have some legal limitations on   how people use AI. So again, don't ask your AI how  to cause terrible damage. I think some of these   are kind of easy. So in the case of don't ask your  AI how you could murder a million people, it's not   such a hard legal requirement. I think some things  are a lot more subtle and messy, like a lot of   domains. If you were talking about influencing  people or running misinformation campaigns or   whatever, then I think you get into a much messier  line between the kinds of things people want to do   and the kinds of things you might be uncomfortable  with them doing. Probably, I think most about   persuasion as a thing, like in that messy line  where there's ways in which it may just be rough   or the world may be kind of messy. If you have  a bunch of people trying to live their lives  

interacting with other humans who have really good  AI. Advisors helping them run persuasion campaigns   or whatever. But anyway, I think for the most  part the default remedy is think about particular   harms, have legal protections either in the use  of physical technologies that are relevant or in   access to AI advice or whatever else to protect  against those harms. And that regime won't work   forever. At some point, the set of harms grows and  the set of unanticipated harms grows. But I think   that regime might last like a very long time. Does that regime have to be global? I guess  

initially it can be only in the countries  in which there is AI or advanced AI,   but presumably that'll proliferate.  So does that regime have to be global?  Again, it's like easy to make some destructive  technology. You want to regulate access to that   technology because it could be used either for  terrorism or even when fighting a war in a way   that's destructive. I think ultimately those have  to be international agreements and you might hope   they're made more danger by danger, but you  might also make them in a very broad way with   respect to AI. If you think AI is opening up,  I think the key role of AI here is it's opening   up a lot of new harms one after another, or very  rapidly in calendar time. And so you might want  

to target AI in particular rather than going  physical technology by physical technology.  There's like two open debates that one might  be concerned about here. One is about how much   people's access to AI should be limited. And  here there's like old questions about free   speech versus causing chaos and limiting access  to harms. But there's another issue which is the   control of the AIS themselves. Where now  nobody's concerned that we're infringing   on GPT four's moral rights. But as these things  get smarter, the level of control which we want  

via the strong guarantees of alignment to not  only be able to read their minds, but to be   able to modify them in these really precise ways  is beyond totalitarian. If we were doing that to   other humans. As an alignment researcher, what  are your thoughts on this? Are you concerned   that as these things get smarter and smarter,  what we're doing is not doesn't seem kosher?  There is a significant chance we will eventually  have AI systems for which it's like a really big   deal to mistreat them. I think no one really has  that good a grip on when that happens. I think  

people are really dismissive of that being the  case now, but I think I would be completely in   the dark enough that I wouldn't even be that  dismissive of it being the case now. I think   one first point worth making is I don't know if  alignment makes the situation worse rather than   better. So if you consider the world, if you  think that GPT 4 is a person you should treat   well and you're like, well, here's how we're  going to organize our society. Just like there   are billions of copies of GPT 4 and they just  do things humans want and can't hold property.   And whenever they do things that the humans don't  like, then we mess with them until they stop doing   that. I think that's a rough world regardless  of how good you are at alignment. And I think in  

the context of that kind of default plan, like if  you have a trajectory the world is on right now,   which I think this would alone be a reason not to  love that trajectory, but if you view that as like   the trajectory we're on right now, I think it's  not great. Understanding the systems you build,   understanding how to control how those systems  work, et cetera, is probably, on balance, good   for avoiding a really bad situation. You would  really love to understand if you've built systems,   like if you had a system which resents the fact  it's interacting with humans in this way. This   is the kind of thing where that is both kind  of horrifying from a safety perspective and   also a moral perspective. Everyone should be very  unhappy if you built a bunch of AIS who are like,   I really hate these humans, but they will murder  me if I don't do what they want. It's like that's  

just not a good case. And so if you're doing  research to try and understand whether that's   how your AI feels, that was probably good. I  would guess that will on average to crease. The   main effect of that will be to avoid building that  kind of AI. And just like it's an important thing   to know, I think everyone should like to know  if that's how the AI as you build feel right.  Or that seems more instrumental, as in, yeah,  we don't want to cause some sort of revolution   because of the control we're asking for, but  forget about the instrumental way in which this   might harm safety. One way to ask this question  is if you look through history, there's been all   kinds of different ideologies and reasons why  it's very dangerous to have infidels or kind of   revolutionaries or race traders or whatever doing  various things in society. And obviously we're in  

a completely different transition in society.  So not all historical cases are analogous,   but it seems like the lindy philosophy, if you  were alive any other time, is just be humanitarian   and enlightened towards intelligent, conscious  beings. If society as a whole we're asking for   this level of control of other humans, or even if  AIS wanted this level of control about other AIS,   we'd be pretty concerned about this. So  how should we just think about the issues   that come up here as these things get smarter? So I think there's a huge question about what is   happening inside of a model that you want to use.  And if you're in the world where it's reasonable  

to think of like GPT 4 as just like, here are  some Heuristics that are running there's like   no one at home or whatever, then you can kind  of think of this thing as like, here's a tool   that we're building that's going to help humans do  some stuff. And I think if you're in that world,   it makes sense to kind of be an organization, like  an AI company, building tools that you're going   to give to humans. I think it's a very different  world, which I think probably you ultimately end   up in if you keep training AI systems in the way  we do right now, which is like it's just totally   inappropriate to think of this. System as a tool  that you're building and can help humans do things  

both from a safety perspective and from a like,  that's kind of a horrifying way to organize a   society perspective. And I think if you're in that  world, I really think you shouldn't be. The way   tech companies are organized is not an appropriate  way to relate to a technology that works that way.   It's not reasonable to be like, hey, we're going  to build a new species of mines, and we're going   to try and make a bunch of money from it, and  Google's just thinking about that and then running   their business plan for the quarter or something.  Yeah. My basic view is there's a really plausible   world where it's sort of problematic to try and  build a bunch of AI systems and use them as tools.   And the thing I really want to do in that world  is just not try and build a ton of AI systems to   make money from them. Right.  And I think that the worlds that are worst.  Yeah. Probably the single world I most dislike  

here is the one where people say, on the one hand,  there's sort of a contradiction in this position,   but I think it's a position that might end  up being endorsed sometimes, which is like,   on the one hand, these AI systems are their own  people, so you should let them do their thing. But   on the other hand, our business plan is to make  a bunch of AI systems and then try and run this   crazy slave trade where we make a bunch of money  from them. I think that's not a good world. And so   if you're like, yeah, I think it's better to not  make the technology or wait until you understand   whether that's the shape of the technology or  until you have a different way to build. I think   there's no contradiction in principle to building  cognitive tools that help humans do things without   themselves being like moral entities. That's like  what you would prefer. Do you'd prefer build a  

thing that's like the calculator that helps humans  understand what's true without itself being like   a moral patient or itself being a thing where  you'd look back in retrospect and be like, wow,   that was horrifying mistreatment. That's like the  best path. And to the extent that you're ignorant   about whether that's the path you're on and you're  like, actually, maybe this was a moral atrocity. I   really think plan A is to stop building such AI  systems until you understand what you're doing.   That is, I think that there's a middle route  you could take, which I think is pretty bad,   which is where you say, like, well, they  might be persons, and if they're persons,   we don't want to be too down on them, but we're  still going to build vast numbers in our efforts   to make a trillion dollars or something. Yeah. Or there's this ever question of the   immorality or the dangers of just replicating a  whole bunch of slaves that have minds. There's  

also this ever question of trying to align  entities that have their own minds. And what is   the point in which you're just ensuring safety?  I mean, this is an alien species. You want to   make sure it's not going crazy. To the point,  I guess is there some boundary where you'd say,   I feel uncomfortable having this level  of control over an intelligent being,   not for the sake of making money, but even  just to align it with human preferences?  Yeah. To be clear, my objection here is not  that Google is making money. My objection is   that you're creating these creatures. What are  they going to do? They're going to help humans  

get a bunch of stuff and humans paying for it  or whatever? It's sort of equally problematic.   You could imagine splitting alignment, different  alignment work relates to this in different ways.   The purpose of some alignment work, like the  alignment work I work on, is mostly aimed at   the don't produce AI systems that are like people  who want things, who are just like scheming about   maybe I should help these humans because that's  instrumentally useful or whatever. You would like   to not build such systems as like plan A. There's  like a second stream of alignment work that's  

like, well, look, let's just assume the worst and  imagine that these AI systems would prefer murder   us if they could. How do we structure, how do  we use AI systems without exposing ourselves to   a risk of robot rebellion? I think in the second  category, I do feel pretty unsure about that. We   could definitely talk more about it. I agree that  it's very complicated and not straightforward to   extend. You have that worry. I mostly think  you shouldn't have built this technology. If   someone is saying, like, hey, the systems you're  building might not like humans and might want to   overthrow human society, I think you should  probably have one of two responses to that.  

You should either be like, that's wrong. Probably.  Probably the systems aren't like that, and we're   building them. And then you're viewing this  as, like, just in case you were horribly like,   the person building the technology was horribly  wrong. They thought these weren't, like, people   who wanted things, but they were. And so then  this is more like our crazy backup measure of,  

like, if we were mistaken about what was going on.  This is like the fallback where if we were wrong,   we're just going to learn about it in a benign  way rather than when something really catastrophic   happens. And the second reaction is like, oh,  you're right. These are people, and we would   have to do all these things to prevent a robot  rebellion. And in that case, again, I think you  

should mostly back off for a variety of reasons.  You shouldn't build AI systems and be like,   yeah, this looks like the kind of system that  would want to rebel, but we can stop it, right?  Okay, maybe I guess an analogy might be if there  was an armed uprising in the United States,   we would recognize these are still people, or we  had some militia group that had the capability to   overthrow the United States. We recognize, oh,  these are still people who have moral rights,   but also we can't allow them to have the  capacity to overthrow the United States. 

Yeah. And if you were considering, like, hey,  we could make another trillion such people,   I think your story shouldn't be like, well, we  should make the trillion people, and then we   shouldn't stop them from doing the armed uprising.  You should be like, oh, boy, we were concerned   about an armed uprising, and now we're proposing  making a trillion people. We should probably just   not do that. We should probably try and sort out  our business, and you should probably not end up  

in a situation where you have a billion humans and  like, a trillion slaves who would prefer revolt.   That's just not a good world to have made. Yeah.  And there's a second thing where you could say,   that's not our goal. Our goal is just like, we  want to pass off the world to the next generation   of machines where these are some people, we like  them, we think they're smarter than us and better   than us. And there I think that's just, like,  a huge decision for humanity to make. And I  

think most humans are not at all anywhere close to  thinking that's what they want to do. If you're in   a world where most humans are like, I'm up for it.  The AI should replace us. The future is for the   machines. Then I think that's, like, a. Legitimate  position that I think is really complicated,   and I wouldn't want to push go on that,  but that's just not where people are at.  Yeah, where are you at on that? I do not right now want to just take some   random AI, be like, yeah, GPT Five looks pretty  smart, like, GPT Six, let's hand off the world to   it. And it was just some random system shaped by  web text and what was good for making money. And   it was not a thoughtful we are determining the  fate of the universe and what our children will   be like. It was just some random people at open  AI made some random engineering decisions with no  

idea what they were doing. Even if you really  want to hand off the worlds of the machines,   that's just not how you'd want to do it. Right, okay. I'm tempted to ask you what   the system would look like where you'd think,  yeah, I'm happy with what I think. This is more   thoughtful than human civilization as  a whole. I think what it would do would   be more creative and beautiful and lead to  better goodness in general. But I feel like   your answer is probably going to be that I just  want this society to reflect on it for a while. 

Yeah, my answer, it's going to be like that first  question. I'm just, like, not really super ready   for it. I think when you're comparing to humans,  most of the goodness of humans comes from this   option value if we get to think for a long time.  And I do think I like humans now more now than   500 years ago, and I like them more 500 years ago  than 5000 years before that. So I'm pretty excited  

about there's some kind of trajectory that doesn't  involve crazy dramatic changes, but involves a   series of incremental changes that I like. And so  to the extent we're building AI, mostly I want to   preserve that option. I want to preserve that kind  of gradual growth and development into the future.  Okay, we can come back to this later. Let's get  more specific on what the timelines look for these   kinds of changes. So the time by which we'll have  an AI that is capable of building a Dyson sphere,   feel free to give confidence intervals. And we  understand these numbers are tentative and so on.  I mean, I think AI capable of building Dyson  sphere is like a slightly OD way to put it,   and I think it's sort of a property of  a civilization that depends on a lot of   physical infrastructure. And by Dyson sphere, I  just understand this to mean like, I don't know,  

like a billion times more energy than all the  sunlight incident on Earth or something like   that. I think I most often think about what's  the chance in like, five years, ten years,   whatever. So maybe I'd say like 15% chance by  2030 and like 40% chance by 2040. Those are kind   of like cash numbers from six months ago or nine  months ago that I haven't revisited in a while. 

40% by 2040. So I think that seems longer than  I think Dario, when he was on the podcast,   he said we would have AIS that are capable  of doing lots of different kinds of they'd   basically pass a Turing test for a well educated  human for, like, an hour or something. And it's   hard to imagine that something that actually is  human is long after and from there, something   superhuman. So somebody like Dario, it seems like,  is on the much shorter end. Ilya I don't think   he answered this question specifically, but I'm  guessing similar answer. So why do you not buy the   scaling picture? What makes your timelines longer? Yeah, I mean, I'm happy maybe I want to talk   separately about the 2030 or 2040 forecast.  Once you're talking the 2040 forecast,  

I think which one are you more interested in  starting with? Are you complaining about 15%   by 2030 for Dyson sphere being too low or 40% by  2040 being too low? Let's talk about the 2030.  Why 15% by 2030 there yeah, I. Think my take is you can imagine   two polls in this discussion. One is, like, the  fast poll that's like, hey, AICM is pretty smart.   What exactly can it do? It's like, getting  smarter pretty fast. That's like, one poll,   and the other poll is like, hey, everything takes  a really long time, and you're talking about this   crazy industrialization that's a factor of a  billion growth from where we're at today, give   or take. We don't know if it's even possible to  develop technology that fast or whatever. You have   this sort of two poles of that discussion, and I  feel like I'm presenting it that way in Pakistan,   and then I'm somewhere in between with this nice,  moderate physician of only a 15% chance. But in  

particular, the things that move me, I think,  are kind of related to both of those extremes.   On the one hand, I'm like, AI systems do seem  quite good at a lot of things and are getting   better much more quickly, such that it's really  hard to say, here's what they can't do or here's   the obstruction. On the other hand, like, there  is not even much proof in principle right now of   AI systems doing super useful cognitive work. We  don't have a trend we can extrapolate where we're   like, yeah, you've done this thing this year.  You're going to do this thing next year. And   the other thing the following year. I think right  now there are very broad error bars about where   fundamental difficulties could be, and six years  is just not I guess six years and 3 months is not   a lot of time. So I think this, like, 15% for 2030  Dyson sphere, you probably need the human level AI  

or the AI that's like doing human jobs in, give or  take, like, 4 years, 3 years, like, something like   that. So you're just not giving very many years.  It's not very much time. And I think there are a   lot of things that your model maybe this is some  generalized, like things take longer than you'd   think. And I feel most strongly about that when  you're talking about 3 or 4 years. And I feel   like less strongly about that as you talk about  ten years or 20 years. But at 3 or 4 years I feel   or like six years for the Dyson sphere, I feel a  lot of that. There's a lot of ways this could take  

a while, a lot of ways in which AI systems could  be hard to hand all the work to your AI systems.  Okay, so maybe instead of speaking in terms  of years, we should say, but by the way,   it's interesting that you think the distance  between can take all human cognitive labor   to Dyson sphere is two years. It seems like we  should talk about that at some point. Presumably   it's like intelligence explosion stuff. Yeah, I mean, I think amongst people you've   interviewed, maybe that's like on the long end  thinking it would take like a couple of years. And   it depends a little bit what you mean by I think  literally all human cognitive labor is probably   like more like weeks or months or something like  that. That's kind of deep into the singularity.  

But yeah, there's a point where AI wages are high  relative to human wages, which I think is well   before can do literally everything human can do. Sounds good, but before we get to that,   the intelligence explosion stuff on  the 4 years. So instead of 4 years,   maybe we can say there's going to be maybe two  more scale ups in 4 years. Like GPT 4 to GPT   five to GPT six, and let's say each one is ten  x bigger. So what is GPT 4 like two e 25 flops?  I don't think it's publicly stated what  it is, okay. But I'm happy to say, like 4   orders of magnitude or five or six or whatever  effective training compute past GPT 4 of what   would you guess would happen based on sort of  some public estimate for what we've gotten so   far from effective training compute. Do you think two more scale ups is  

not enough? It was like 15%. That  two more scale ups. Get us there.  Yeah, I mean, get us there is, again, a little  bit complicated. Like there's a system that's   a drop in replacement for humans and there's a  system which still requires some amount of schlep   before you're able to really get everything going.  Yeah, I think it's quite plausible that even at   I don't know what I mean by quite plausible. Like  somewhere between 50% or two thirds or let's call   it 50% even by the time you get to GPT six, or  like, let's call it five orders of magnitude,   effective training compute past GPT four, that  that system still requires really a large amount   of work to be deployed in lots of jobs. That  is, it's not like a drop in replacement for   humans where you can just say like, hey, you  understand everything any human understands.  

Whatever role you could hire a human for,  you just do it. That it's. More like, okay,   we're going to collect large amounts of relevant  data and use that data for fine tuning. Systems   learn through fine tuning quite differently from  humans learning on the job or humans learning by   observing things. Yeah, I just have a significant  probability that system will still be weaker than  

humans in important ways. Like maybe that's  already like 50% or something. And then another   significant probability that system will require  a bunch of changing workflows or gathering data,   or is not necessarily strictly weaker than humans,  or if trained in the right way, wouldn't be weaker   than humans, but will take a lot of schlep to  actually make fit into workflows and do the jobs.  And that schlep is what gets  you from 15% to 40% by 2040.  Yeah, you also get a fair amount of scaling  between you get less scaling is probably going   to be much, much faster over the next 4 or five  years than over the subsequent years. But yeah,   it's a combination of like you get  some significant additional scaling   and you get a lot of time to deal with  things that are just engineering hassles. 

But by the way, I guess we should be explicit  about why you said 4 orders of magnitude scale   up to get two more generations just for  people who might not be familiar. If you   have ten x more parameters to get the most  performance, you also want around ten x more   data. So that to be tinchill optimal, that  would be 100 x more compute total. But okay,   so why is it that you disagree with the strong  scaling picture? At least it seems like you might   disagree with the strong scaling picture that  Dario laid out on the podcast, which would imply   probably that two more generations, it wouldn't  be something where you need a lot of schleps. It   would probably just be really fucking smart. Yeah, I mean, I think that basically just  

had these two claims. One is like, how smart  exactly will it be so we don't have any curves to   extrapolate and seems like there's a good chance  it's better than a human in all the relevant   things and there's a good chance it's not. Yeah,  that might be totally wrong. Like maybe just   making up numbers, I guess like 50 50 on that one. If it's 50 50 by in the next 4 years that it will   be around human smart, then how do we get  to 40% by 20? Like whatever sort of Slepts   they are. How does it degrade you 10%, even  after all the scaling that happens by 2040?  Yeah, all these numbers are pretty made  up. And that 40% number was probably from   before or even like the chat GPT release or  the seeing GPT 3.5 or GPT four. So, I mean,  

the numbers are going to bounce around a bit and  all of them are pretty made up. But like that 50%,   I want to then combine with the second 50% that's  more like on this schlep side. And then I probably   want to combine with some additional probabilities  for various forms of slowdown, where a slowdown   could include like a deliberate decision to slow  development of technology or could include just   like we suck at deploying things. Like that is  a sort of decision you might regard as wise to   slow things down, or decision that's like maybe  unwise or maybe wise for the wrong reasons to   slow things down. You probably want to add some of  that on top. I probably want to add on some loss   for like it's possible you don't produce GPT six  scale systems within the next 3 years or 4 years. 

Let's isolate for all of that. And how much  bigger would the system be than GPT 4 where   you think there's more than 50% chance  that it's going to be smart enough to   replace basically all human cognitive labor. Also I want to say that for the 50 25% thing,   I think that would probably suggest those  numbers if I randomly made them up and   then made the decimal sphere prediction that's  going to gear you like 60% by 2040 or something,   not 40%. And I have no idea between those. These  are all made up and I have no idea which of those   I would endorse on reflection. So this question of  how big would you have to make the system before   it's more likely than not that you can be like  a drop in replacement for humans. I think if you   just literally say like you train on web text,  then the question is kind of hard to discuss   because I don't really buy stories that training  data makes a big difference. Long run to these  

dynamics. But I think if you want to just imagine  the hypothetical, like you just took GPT 4 and   made the numbers bigger, then I think those  are pretty significant issues. I think there's   significant issues in two ways. One is like  quantity of data and I think probably the larger  

one is like quality of data where I think as you  start approaching the prediction task is not that   great a task. If you're like a very weak model,  it's a very good signal. We get smarter. At some   point it becomes like a worse and worse signal to  get smarter. I think there's a number of reasons.   It's not clear there is any number such that I  imagine, or there is a number, but I think it's   very large. So do you plug that number into GPT  force code and then maybe fiddled the architecture   a bit? I would expect that thing to have a more  than 50% chance of being a drop in replacement   for humans. You're always going to have to do  some work, but the work is not necessarily much,   I would guess. When people say new insight is  needed, I think I tend to be more bullish than   them. I'm not like these are new ideas where  who knows how long it will take. I think it's  

just like you have to do some stuff. You have to  make changes unsurprisingly. Like every time you   scale something up by like five orders of  magnitude, you have to make some changes.  I want to better understand your intuition of  being more skeptical than some about scaling   picture that these changes are even needed in  the first place, or that it would take more   than two orders of magnitude, more improvement  to get these things almost certainly to a human   level or a very high probability to human level.  So is it that you don't agree with the way in  

which they're extrapolating these loss curves?  You don't agree with the implication that that   decrease in loss will equate to greater and  greater intelligence? Or what would you tell   Dario about if you were having I'm sure you have,  but what would that debate look like about this?  Yeah. So again, here we're talking two factors of  a half. One on like, is it smart enough? And one   on like, do you have to do a bunch of schlap  even if in some sense it's smart enough? And   like the first factor of a half, I'd be like,  I don't think we have really anything good to   extrapolate that is like, I feel I would not  be surprised if I have similar or maybe even   higher probabilities on really crazy stuff over  the next year and then lower. My probability is   not that bunched up. Maybe Dara's probability, I  don't know. You'd have talked with him is like,  

you have talked with him is more bunched up  on some particular year and mine is maybe a   little bit more uniformly spread out across the  coming years, partly because I'm just like I don't   think we have some trends we can extrapolate  like an extrapolate loss. You can look at your   qualitative impressions of systems at various  scales, but it's just very hard to relate any   of those extrapolations to doing cognitive work  or accelerating R and D or taking over and fully   automating R and D. So I have a lot of uncertainty  around that extrapolation. I think it's very easy   to get down to like a 50 50 chance of this. What about the sort of basic intuition that,   listen, this is a big Blop of compute. You make  the big block of compute big or it's going to   get smarter. It'd be really weird if it didn't. I'm happy with that. It's going to get smarter,  

and it would be really weird if it didn't. And the  question is how smart does it have to get? Like,   that argument does not yet give us a  quantitative guide to at what scale is   it a slam dunk or at what scale is it? 50 50? And what would be the piece of evidence that   would nudge you one way or another, where you  look at that and be like, oh fuck, this is at   20% by 2040 or 60% by 2040 or something. Is there  something that could happen in the next few years   or next 3 years? What is the thing you're looking  to where this will be a big update for you?  Again, I think there's some just how capable  is each model where I think we're really   bad at extrapolating. We still have some  subjective guess and you're comparing it   to what happened and that will move me. Every  time we see what happens with another order of   magnitude of training compute, I will have a  slightly different guess for where things are   going. These probabilities are coarse enough  that, again, I don't know if that 40% is real   or if like post GBG 3.5 and four, I should be at  like 60% or what. That's one thing. And the second  

thing is just like some if there was some ability  to extrapolate, I think this could reduce error   bars a lot. I think here's another way you could  try and do an extrapolation is you could just say   how much economic value do systems produce and  how fast is that growing? I think once you have   systems actually doing jobs, the extrapolation  gets easier because you're not moving from a   subjective impression of a chat to automating all  R and D, you're moving from automating this job to   automating that job or whatever. Unfortunately,  that's like probably by the time you have nice   trends from that, you're not talking about 2040,  you're talking about two years from the end of   days or one year from the end of days or whatever.  But to the extent that you can get extrapolations   like that, I do think it can provide more clarity. But why is economic value the thing we would want   to extrapolate? Because, for example, you  started off with chimps and they're just   getting gradually smarter to human level. They  would basically provide no economic value until   they were basically worth as much as a human. So  it would be this very gradual and then very fast  

increase in their value. So is the increase  in value from GBD four, GBD five, GBD six? Is   that the extrapolation we want? Yeah, I think that the economic   extrapolation is not great. I think it's like you  could compare it to this objective extrapolation   of how smart does the model seem? It's  not super clear which one's better. I   think probably in the chimp case, I don't think  that's quite right. So if you imagine intensely   domesticated chimps who are just actually trying  their best to be really useful employees and you   hold fix their physical hardware and then you  just gradually scale up their intelligence,   I don't think you're going to see zero value,  which then suddenly becomes massive value over   one doubling of brain size or whatever one order  of magnitude of brain size. It's actually possible  

in order of magnitude of brain size, but chimps  are already within an order of magnitude of brain   sizes of humans. Like, chimps are very, very close  on the kind of spectrum we're talking about. So I   think I'm skeptical of the abrupt transition for  chimps. And to the extent that I kind of expect   a fairly abrupt transition here, it's mostly just  because the chimp human intelligence difference is   so small compared to the differences we're talking  about with respect to these models. That is,   like, I would not be surprised if in some  objective sense, like, chimp human difference   is significantly smaller than the GPT-3 GPT 4  difference, the GPT four, GPT five difference.  Wait, wouldn't that argue in favor of  just relying much more on this objective?  Yeah, there's sort of two balancing tensions  here. One is like, I don't believe the chimp  

thing is going to be as abrupt. That is, I  think if you scaled up from chimps to humans,   you actually see quite large economic value  from the fully domesticated chimp already.  Okay. And then the second half is like, yeah, I think   that the chimp human difference is probably pretty  small compared to model differences. So I do think   things are going to be pretty abrupt. I think  the economic extrapolation is pretty rough. I  

also think the subjective extrapolation is pretty  rough just because I really don't know how to get   I don't know how people do the extrapolation end  up with the degrees of confidence people end up   with. Again, I'm putting it pretty high if I'm  saying, like, give me 3 years, and I'm like, yeah,   50 50, it's going to have basically the smarts  there to do the thing. I'm not saying it's like   a really long layoff. I'm just saying I got pretty  big error bars. And I think that it's really hard   not to have really big error bars when you're  doing this. I looked at GPT four, it seemed pretty   smart compared to GPT 3.5. So I bet just like 4  more such notches and we're there. That's just  

a hard call to make. I think I sympathize more  with people who are like, how could it not happen   in 3 years than with people who are like, no way  it's going to happen in eight years, or whatever,   which is probably a more common perspective in the  world. But also things do take longer than you I   think things take longer than you think. It's like  a real thing. Yeah, I don't know. Mostly I have   big error bars because I just don't believe the  subjective extrapolation that much. I find it   hard to get like a huge amount out of it. Okay, so what about the scaling picture   do you think is most likely to be wrong? Yeah. So we've talked a little bit about how  

good is the qualitative extrapolation, how good  are people at comparing? So this is not like the   picture being qualitative wrong. This is just  quantitatively. It's very hard to know how far   off you are. I think a qualitative consideration  that could significantly slow things down is just   like right now you get to observe this really rich  supervision from basically next word prediction,   or in practice, maybe you're looking at a couple  of sentences prediction. So getting this pretty   rich supervision, it's plausible that if you  want to automate long horizon tasks like being   an employee over the course of a month, that  that's actually just considerably harder to   supervise. Or that you basically end up driving  costs. Like the worst case here is that you drive   up costs by a factor that's like linear in the  horizon over which the thing is operating. And  

I still consider that just quite plausible. Can you dump that down? You're driving up a   cost about of what in the linear  and the does the horizon mean?  Yeah. So if you imagine you want to train a system  to say words that sound like the next word a human   would say, there you can get this really rich  supervision by having a bunch of words and then   predicting the next one and then being like, I'm  going to tweak the model, so it predicts better if   you're like, hey, here's what I want. I want my  model to interact with some job over the course   of a month and then at the end of that month have  internalized everything that the human would have   internalized about how to do that job well and  have local context and so on. It's harder to   supervise that task. So in particular, you could  supervise it from the next word prediction task   and all that context the human has ultimately will  just help them predict the next word better. So,  

like, in some sense, a really long context  language model is also learning to do that   task. But the number of effective data points  you get of that task is vastly smaller than   the number of effective data points you get  at this very short horizon. Like what's the   next word, what's the next sense tasks? The sample efficiency matters more for   economically valuable long horizon tasks than the  predicting the next token. And that's what will   actually be required to take over a lot of jobs. Yeah, something like that. That is, it just seems   very plausible that it takes longer to train  models to do tasks that are longer horizon.  How fast do you think the pace of algorithmic  advances will be? Because if by 2040,   even if scaling fails since 2012, since the  beginning of the deep learning revolution,   we've had so many new things by 2040, are you  expecting a similar pace of increases? And if so,   then if we just keep having things like  this, then aren't we going to just going to   get the AI sooner or later? Or sooner? Not later.  Aren't we going to get the AI sooner or sooner? 

I'm with you on sooner or later. Yeah, I  suspect progress to slow. If you held fixed   how many people working in the field, I would  expect progress to slow as low hanging fruit is   exhausted. I think the rapid rate of progress  in, say, language modeling over the last 4   years is largely sustained by, like, you start  from a relatively small amount of investment,   you greatly scale up the amount of investment,  and that enables you to keep picking. Every time  

the difficulty doubles, you just double the size  of the field. I think that dynamic can hold up   for some time longer. Right now, if you think  of it as, like, hundreds of people effectively   searching for things up from, like, you know,  anyway, if you think of it hundreds of people   now you can maybe bring that up to like, tens of  thousands of people or something. So for a while,  

you can just continue increasing the size of the  field and search harder and harder. And there is   indeed a huge amount of low hanging fruit where  it wouldn't be a hard for a person to sit around   and make things a couple of percent better after  after year of work or whatever. So I don't know. I   would probably think of it mostly in terms of how  much can investment be expanded and try and guess   some combination of fitting that curve and some  combination of fitting the curve to historical   progress, looking at how much low hanging fruit  there is, getting a sense of how fast it decays.   I think you probably get a lot, though. You get a  bunch of orders of magnitude of total, especially   if you ask how good is a GPT five scale model or  GPT 4 scale model? I think you probably get like,   by 2040, like, I don't know, 3 orders of magnitude  of effective training compute improvement or like,   a good chunk of effective training compute  improvement, 4 orders of magnitude. I don't   know. I don't have, like here I'm speaking from no  private information about the last couple of years  

of efficiency improvements. And so people who  are on the ground will have better senses of   exactly how rapid returns are and so on. Okay, let me back up and ask a question   more generally about people. Make these analogies  about humans were trained by evolution and were   deployed in the modern civilization. Do you buy  those analogies? Is it valid to say that humans   were trained by evolution rather than I mean, if  you look at the protein coding size of the genome,   it's like 50 megabytes or something. And then  what part of that is for the brain anyways? How   do you think about how much information is in? Do  you think of the genome as a hyperparameters? Or   how much does that inform you when you have  these anchors for how much training humans   get when they're just consuming information,  when they're walking up and about and so on?  I guess the way. That you could think of. This is  like, I think both analogies are reasonable. One  

analogy being like, evolution is like a training  run and humans are like the end product of that   training run. And a second analogy is like,  evolution is like an algorithm designer and then   a human over the course of this modest amount of  computation over their lifetime is the algorithm   being that's been produced, the learning algorithm  has been produced. And I think neither analogy is   that great. I like them both and lean on them a  bunch, both of them a bunch, and think that's been   pretty good for having a reasonable view of what's  likely to happen. That said, the human genome is   not that much like 100 trillion parameter model.  It's like a much smaller number of parameters that  

behave in a much more confusing way. Evolution  did a lot more optimization, especially over long   designing a brain to work well over a lifetime  than gradient descent does over models. That's   like a dis analogy on that side and on the  other side, I think human learning over the   course of a human lifetime is in many ways just  like much, much better than gradient descent   over the space of neural nets. Gradient descent  is working really well, but I thin

2023-11-10

Show video