Paul Christiano - Preventing an AI Takeover
Okay, today I have the pleasure of interviewing Paul Christiano, who is the leading AI safety researcher. He's the person that labs and governments turn to when they want feedback and advice on their safety plans. He previously led the Language Model Alignment team at OpenAI, where he led the invention of RLHF. And now he is the head of the Alignment Research Center. And they've been working with the big labs to identify when these models will be too unsafe to keep scaling. Paul, welcome to the podcast. Thanks for having me. Looking forward to talking.
Okay, so first question, and this is a question I've asked Holden, Ilya, Dario, and none of them are going to be a satisfying answer. Give me a concrete sense of what a post AGI world that would be good would look like. How are humans interfacing with the AI? What is the economic and political structure? Yeah, I guess this is a tough question for a bunch of reasons. Maybe the biggest one is concrete. And I think it's just if we're talking about really long spans of time, then a lot will change. And it's really hard for someone to talk
concretely about what that will look like without saying really silly things. But I can venture some guesses or fill in some parts. I think this is also a question of how good is good? Like, often I'm thinking about worlds that seem like kind of the best achievable outcome or a likely achievable outcome. So I am very often imagining my typical future has sort of continuing economic
and military competition amongst groups of humans. I think that competition is increasingly mediated by AI systems. So, for example, if you imagine humans making money, it'll be less and less worthwhile for humans to spend any of their time trying to make money or any of their time trying to fight wars. So increasingly, the world you imagine is one where AI systems are doing those activities on behalf of humans. So, like, I just invest in some index fund, and a bunch of AIS are running companies, and those companies are competing with each other. But that is kind of a
sphere where humans are not really engaging much. The reason I gave this how good is good caveat is, like, it's not clear if this is the world you'd most love. I'm like, yeah, I'm leading with like, the world still has a lot of war and of economic competition and so on. But maybe what I'm trying to what I'm most often thinking about is, like, how can a world be reasonably good during a long period where those things still exist? In the very long run, I kind of expect something more like strong world government rather than just this status quo. That's like, a very long run. I think there's, like, a long time left of having a bunch of states and a bunch of different
economic powers, one world government. Why do you think that's the transition that's likely to happen at some point. So again, at some point I'm imagining, or I'm thinking of the very broad sweep of history. I think there are a lot of losses. Like war is a
very costly thing. We would all like to have fewer wars. If you just ask what is humanity's long term future like? I do expect to drive down the rate of war to very, very low levels eventually. It's sort of like this kind of technological or sociotechnological problem of sort of how do you organize society, navigate conflicts in a way that doesn't have those kinds of losses. And in the long run, I do expect this to succeed. I expect it to take kind of a long time. Subjectively, I think an important fact about AI is just like doing a lot of cognitive work and more quickly, getting you to that world more quickly, or figuring out how do we set things up that way? Yeah, the way Carl Schulman put it on the podcast is that you would have basically a thousand years of intellectual progress or social progress in a span of a month or whatever when the intelligence explosion happens more broadly. So the situation know we have these AIS who are managing our hedge funds and managing our factories and so on. That seems like something that makes sense when the
AI is human level. But when we have superhuman AIS, do we want gods who are enslaved forever in 100 years? What is the decision we want? 100 years is a very, very long time. Maybe starting with the spirit of the question. Or maybe I have a view which is perhaps less extreme than Carl's view, but still like a hundred objective years is further ahead than I ever think. I still think I'm describing a world which involves incredibly smart systems running around,
doing things like running companies on behalf of humans and fighting wars on behalf of humans. And you might be like, is that the world you really want? Or certainly not the first best world, as we mentioned a little bit before, I think it is a world that probably is of the achievable worlds or like feasible worlds is the one that seems most desirable to me that is sort of decoupling the social transition from this technological transition. So you could say, like, we're about to build some AI systems, and at the time we build AI systems, you would like to have either greatly changed the way world government works, or you would like to have sort of humans have decided like, we're done, we're passing off the baton to these AI systems. I think that you
would like to decouple those timescales. So I think AI development is by default, barring some kind of coordination going to be very fast. So there's not going to be a lot of time for humans to think like, hey, what do we want? If we're building the next generation instead of just raising it the normal way. Like, what do we want that to look like? I think that's like a crazy hard kind of collective decision that humans naturally want to cope with over a bunch of generations patients. And the construction of AI is this very fast technological process happening over years. So I don't think you want to say like, by the time we have finished this
technological progress, we will have made a decision about the next species we're going to build and replace ourselves with. I think the world we want to be in is one where we say either we are able to build the technology in a way that doesn't force us to have made those decisions, which probably means it's a kind of AI. System that we're happy, like Delegating fighting a war, running a company to, or if we're not able to do that, then I really think you should not be doing you shouldn't have been building that technology. If you're like, the only way you can cope with AI is being ready to hand off the world to some AI system you built. I think it's very unlikely we're going to be sort of ready to do that. On the timelines that the technology would
naturally dictate, say we're in the situation. In which we're happy with the thing. What would it look like for us to say we're ready to hand off the baton? What would make you satisfied? And the reason it's relevant to ask you is because you're on Anthropics Long Term Benefit trust and you'll choose the majority of the board members. In the long run at Anthropic, these will presumably be the people who decide if Anthropic gets AI first, what the AI ends up doing. So what is the version of that that you would be happy with? My main high level take here is that I would be unhappy about a world where Anthropic just makes some call and Anthropic is like, here's the kind of AI. We've seen enough, we're ready to hand off the future to this kind of AI. So procedurally, I think it's not a decision that kind of I want to be making personally or I want Anthropic to be making. So I kind of think from the perspective of that decision making are those challenges? The answer is pretty much always going to be like, we are not collectively ready because we're sort of not even all collectively engaged in this process. And I think from the perspective of an AI company,
you kind of don't have this fast handoff option. You kind of have to be doing the option value to build the technology in a way that doesn't lock humanity into one course path. This isn't answering your full question, but this is answering the part that I think is most relevant to governance questions for Anthropic. You don't have to speak on behalf of Anthropic. I'm not asking about the process by which we would, as a civilization, agree to hand off. I'm just saying, okay, I personally, it's hard for me to imagine in 100 years that these things are still our slaves. And if they are, I think that's not the best world. So at some point, we're handing off the baton. Where would you be satisfied with this is an arrangement between the
humans and AIS where I'm happy to let the rest of the universe or the rest of time play out. I think that it is unlikely that in 100 years I would be happy with anything that was like, you had some humans, you're just going to throw away the humans and start afresh with these machines you built. That is I think you probably need subjectively longer than that before I or most people are like, okay, we understand what's up for grabs here. If you talk about 100 years, I kind of do. There's a process that I kind of understand and like a process of like, you have some humans. The humans are, like, talking and thinking and deliberating together.
The humans are having kids and raising kids, and one generation comes after the next. There's that process we kind of understand, and we have a lot of views about what makes it go well or poorly, and we can try and improve that process and have the next generation do it better than the previous generation. I think there's some story like that that I get and that I like. And then I think that the default path to be comfortable with something very different is kind of more like just run that story for a long time, have more time for humans to sit around and think a lot and conclude, here's what we actually want. Or a long time for us to talk to each other or to grow up with this new technology and live in that world for our whole lives and so on. And so I'm mostly thinking
from the perspective of these more local changes of saying not like, what is the world that I want? What's the crazy world? The kind of crazy I'd be happy handing off to more, just like, in what way do I wish we right now were different? How could we all be a little bit better? And then if we were a little bit better, then they would ask, okay, how could we all be a little bit better? And I think that it's hard to make the giant jump rather than to say, what's the local change that would cause me to think our decision are better. Okay, so then let's talk about the transition period in which we were doing all this thinking. What should that period look like? Because you can't have the scenario where everybody has access to the most advanced capabilities and can kill off all the humans with a new bioweapon at the same time. I guess you wouldn't want too much concentration. You wouldn't want just one agent having AI this
entire time. So what is the arrangement of this period of reflection that you'd be happy with? Yeah, I guess there's two aspects of that that seem particularly challenging, or there's a bunch of aspects that are challenging. All of these are things that I personally like. I just think about my one little slice of this problem in my day job. So here I am speculating. Yeah, but so one question is what kind of access to AI is both compatible with the kinds of improvements you'd like? So do you want a lot of people to be able to use AI to better understand what's true or relieve material suffering, things like this, and also compatible with not all killing each other immediately? I think sort of the default or the simplest option there is to say there are certain kinds of technology or certain kinds of action where destruction is easier than defense. So, for example, in the world of today, it seems like maybe this is true with physical explosives, maybe this is true with biological weapons, maybe this true with just getting a gun and shooting people. There's a lot of ways in which it's just kind of easy to cause a lot of harm and there's
not very good protective measures. So I think the easiest path would say we're going to think about those. We're going to think about particular ways in which destruction is easy and try and either control access to the kinds of physical resources that are needed to cause that harm. So,
for example, you can imagine the world where an individual actually just can't, even though they're rich enough to can't control their own factory, that can make tanks. You say like, look, a matter of policy sort of access to industry is somewhat restricted or somewhat regulated, even though, again, right now it can be mostly regulated just because most people aren't rich enough that they could even go off and just build 1000 tanks. You live in the future where people actually are so rich, you need to say that's just not a thing you're allowed to do, which to a significant extent is already true. And you can expand the range of domains where that's true. And then you could also hope to intervene on actual provision of information. Or if people are using their AI, you might say, look, we care about what kinds of interactions with AI, what kind of information people are getting from AI. So even if for the most part, people are pretty free to use
AI to delegate tasks to AI agents, to consult AI advisors, we still have some legal limitations on how people use AI. So again, don't ask your AI how to cause terrible damage. I think some of these are kind of easy. So in the case of don't ask your AI how you could murder a million people, it's not such a hard legal requirement. I think some things are a lot more subtle and messy, like a lot of domains. If you were talking about influencing people or running misinformation campaigns or whatever, then I think you get into a much messier line between the kinds of things people want to do and the kinds of things you might be uncomfortable with them doing. Probably, I think most about persuasion as a thing, like in that messy line where there's ways in which it may just be rough or the world may be kind of messy. If you have a bunch of people trying to live their lives
interacting with other humans who have really good AI. Advisors helping them run persuasion campaigns or whatever. But anyway, I think for the most part the default remedy is think about particular harms, have legal protections either in the use of physical technologies that are relevant or in access to AI advice or whatever else to protect against those harms. And that regime won't work forever. At some point, the set of harms grows and the set of unanticipated harms grows. But I think that regime might last like a very long time. Does that regime have to be global? I guess
initially it can be only in the countries in which there is AI or advanced AI, but presumably that'll proliferate. So does that regime have to be global? Again, it's like easy to make some destructive technology. You want to regulate access to that technology because it could be used either for terrorism or even when fighting a war in a way that's destructive. I think ultimately those have to be international agreements and you might hope they're made more danger by danger, but you might also make them in a very broad way with respect to AI. If you think AI is opening up, I think the key role of AI here is it's opening up a lot of new harms one after another, or very rapidly in calendar time. And so you might want
to target AI in particular rather than going physical technology by physical technology. There's like two open debates that one might be concerned about here. One is about how much people's access to AI should be limited. And here there's like old questions about free speech versus causing chaos and limiting access to harms. But there's another issue which is the control of the AIS themselves. Where now nobody's concerned that we're infringing on GPT four's moral rights. But as these things get smarter, the level of control which we want
via the strong guarantees of alignment to not only be able to read their minds, but to be able to modify them in these really precise ways is beyond totalitarian. If we were doing that to other humans. As an alignment researcher, what are your thoughts on this? Are you concerned that as these things get smarter and smarter, what we're doing is not doesn't seem kosher? There is a significant chance we will eventually have AI systems for which it's like a really big deal to mistreat them. I think no one really has that good a grip on when that happens. I think
people are really dismissive of that being the case now, but I think I would be completely in the dark enough that I wouldn't even be that dismissive of it being the case now. I think one first point worth making is I don't know if alignment makes the situation worse rather than better. So if you consider the world, if you think that GPT 4 is a person you should treat well and you're like, well, here's how we're going to organize our society. Just like there are billions of copies of GPT 4 and they just do things humans want and can't hold property. And whenever they do things that the humans don't like, then we mess with them until they stop doing that. I think that's a rough world regardless of how good you are at alignment. And I think in
the context of that kind of default plan, like if you have a trajectory the world is on right now, which I think this would alone be a reason not to love that trajectory, but if you view that as like the trajectory we're on right now, I think it's not great. Understanding the systems you build, understanding how to control how those systems work, et cetera, is probably, on balance, good for avoiding a really bad situation. You would really love to understand if you've built systems, like if you had a system which resents the fact it's interacting with humans in this way. This is the kind of thing where that is both kind of horrifying from a safety perspective and also a moral perspective. Everyone should be very unhappy if you built a bunch of AIS who are like, I really hate these humans, but they will murder me if I don't do what they want. It's like that's
just not a good case. And so if you're doing research to try and understand whether that's how your AI feels, that was probably good. I would guess that will on average to crease. The main effect of that will be to avoid building that kind of AI. And just like it's an important thing to know, I think everyone should like to know if that's how the AI as you build feel right. Or that seems more instrumental, as in, yeah, we don't want to cause some sort of revolution because of the control we're asking for, but forget about the instrumental way in which this might harm safety. One way to ask this question is if you look through history, there's been all kinds of different ideologies and reasons why it's very dangerous to have infidels or kind of revolutionaries or race traders or whatever doing various things in society. And obviously we're in
a completely different transition in society. So not all historical cases are analogous, but it seems like the lindy philosophy, if you were alive any other time, is just be humanitarian and enlightened towards intelligent, conscious beings. If society as a whole we're asking for this level of control of other humans, or even if AIS wanted this level of control about other AIS, we'd be pretty concerned about this. So how should we just think about the issues that come up here as these things get smarter? So I think there's a huge question about what is happening inside of a model that you want to use. And if you're in the world where it's reasonable
to think of like GPT 4 as just like, here are some Heuristics that are running there's like no one at home or whatever, then you can kind of think of this thing as like, here's a tool that we're building that's going to help humans do some stuff. And I think if you're in that world, it makes sense to kind of be an organization, like an AI company, building tools that you're going to give to humans. I think it's a very different world, which I think probably you ultimately end up in if you keep training AI systems in the way we do right now, which is like it's just totally inappropriate to think of this. System as a tool that you're building and can help humans do things
both from a safety perspective and from a like, that's kind of a horrifying way to organize a society perspective. And I think if you're in that world, I really think you shouldn't be. The way tech companies are organized is not an appropriate way to relate to a technology that works that way. It's not reasonable to be like, hey, we're going to build a new species of mines, and we're going to try and make a bunch of money from it, and Google's just thinking about that and then running their business plan for the quarter or something. Yeah. My basic view is there's a really plausible world where it's sort of problematic to try and build a bunch of AI systems and use them as tools. And the thing I really want to do in that world is just not try and build a ton of AI systems to make money from them. Right. And I think that the worlds that are worst. Yeah. Probably the single world I most dislike
here is the one where people say, on the one hand, there's sort of a contradiction in this position, but I think it's a position that might end up being endorsed sometimes, which is like, on the one hand, these AI systems are their own people, so you should let them do their thing. But on the other hand, our business plan is to make a bunch of AI systems and then try and run this crazy slave trade where we make a bunch of money from them. I think that's not a good world. And so if you're like, yeah, I think it's better to not make the technology or wait until you understand whether that's the shape of the technology or until you have a different way to build. I think there's no contradiction in principle to building cognitive tools that help humans do things without themselves being like moral entities. That's like what you would prefer. Do you'd prefer build a
thing that's like the calculator that helps humans understand what's true without itself being like a moral patient or itself being a thing where you'd look back in retrospect and be like, wow, that was horrifying mistreatment. That's like the best path. And to the extent that you're ignorant about whether that's the path you're on and you're like, actually, maybe this was a moral atrocity. I really think plan A is to stop building such AI systems until you understand what you're doing. That is, I think that there's a middle route you could take, which I think is pretty bad, which is where you say, like, well, they might be persons, and if they're persons, we don't want to be too down on them, but we're still going to build vast numbers in our efforts to make a trillion dollars or something. Yeah. Or there's this ever question of the immorality or the dangers of just replicating a whole bunch of slaves that have minds. There's
also this ever question of trying to align entities that have their own minds. And what is the point in which you're just ensuring safety? I mean, this is an alien species. You want to make sure it's not going crazy. To the point, I guess is there some boundary where you'd say, I feel uncomfortable having this level of control over an intelligent being, not for the sake of making money, but even just to align it with human preferences? Yeah. To be clear, my objection here is not that Google is making money. My objection is that you're creating these creatures. What are they going to do? They're going to help humans
get a bunch of stuff and humans paying for it or whatever? It's sort of equally problematic. You could imagine splitting alignment, different alignment work relates to this in different ways. The purpose of some alignment work, like the alignment work I work on, is mostly aimed at the don't produce AI systems that are like people who want things, who are just like scheming about maybe I should help these humans because that's instrumentally useful or whatever. You would like to not build such systems as like plan A. There's like a second stream of alignment work that's
like, well, look, let's just assume the worst and imagine that these AI systems would prefer murder us if they could. How do we structure, how do we use AI systems without exposing ourselves to a risk of robot rebellion? I think in the second category, I do feel pretty unsure about that. We could definitely talk more about it. I agree that it's very complicated and not straightforward to extend. You have that worry. I mostly think you shouldn't have built this technology. If someone is saying, like, hey, the systems you're building might not like humans and might want to overthrow human society, I think you should probably have one of two responses to that.
You should either be like, that's wrong. Probably. Probably the systems aren't like that, and we're building them. And then you're viewing this as, like, just in case you were horribly like, the person building the technology was horribly wrong. They thought these weren't, like, people who wanted things, but they were. And so then this is more like our crazy backup measure of,
like, if we were mistaken about what was going on. This is like the fallback where if we were wrong, we're just going to learn about it in a benign way rather than when something really catastrophic happens. And the second reaction is like, oh, you're right. These are people, and we would have to do all these things to prevent a robot rebellion. And in that case, again, I think you
should mostly back off for a variety of reasons. You shouldn't build AI systems and be like, yeah, this looks like the kind of system that would want to rebel, but we can stop it, right? Okay, maybe I guess an analogy might be if there was an armed uprising in the United States, we would recognize these are still people, or we had some militia group that had the capability to overthrow the United States. We recognize, oh, these are still people who have moral rights, but also we can't allow them to have the capacity to overthrow the United States.
Yeah. And if you were considering, like, hey, we could make another trillion such people, I think your story shouldn't be like, well, we should make the trillion people, and then we shouldn't stop them from doing the armed uprising. You should be like, oh, boy, we were concerned about an armed uprising, and now we're proposing making a trillion people. We should probably just not do that. We should probably try and sort out our business, and you should probably not end up
in a situation where you have a billion humans and like, a trillion slaves who would prefer revolt. That's just not a good world to have made. Yeah. And there's a second thing where you could say, that's not our goal. Our goal is just like, we want to pass off the world to the next generation of machines where these are some people, we like them, we think they're smarter than us and better than us. And there I think that's just, like, a huge decision for humanity to make. And I
think most humans are not at all anywhere close to thinking that's what they want to do. If you're in a world where most humans are like, I'm up for it. The AI should replace us. The future is for the machines. Then I think that's, like, a. Legitimate position that I think is really complicated, and I wouldn't want to push go on that, but that's just not where people are at. Yeah, where are you at on that? I do not right now want to just take some random AI, be like, yeah, GPT Five looks pretty smart, like, GPT Six, let's hand off the world to it. And it was just some random system shaped by web text and what was good for making money. And it was not a thoughtful we are determining the fate of the universe and what our children will be like. It was just some random people at open AI made some random engineering decisions with no
idea what they were doing. Even if you really want to hand off the worlds of the machines, that's just not how you'd want to do it. Right, okay. I'm tempted to ask you what the system would look like where you'd think, yeah, I'm happy with what I think. This is more thoughtful than human civilization as a whole. I think what it would do would be more creative and beautiful and lead to better goodness in general. But I feel like your answer is probably going to be that I just want this society to reflect on it for a while.
Yeah, my answer, it's going to be like that first question. I'm just, like, not really super ready for it. I think when you're comparing to humans, most of the goodness of humans comes from this option value if we get to think for a long time. And I do think I like humans now more now than 500 years ago, and I like them more 500 years ago than 5000 years before that. So I'm pretty excited
about there's some kind of trajectory that doesn't involve crazy dramatic changes, but involves a series of incremental changes that I like. And so to the extent we're building AI, mostly I want to preserve that option. I want to preserve that kind of gradual growth and development into the future. Okay, we can come back to this later. Let's get more specific on what the timelines look for these kinds of changes. So the time by which we'll have an AI that is capable of building a Dyson sphere, feel free to give confidence intervals. And we understand these numbers are tentative and so on. I mean, I think AI capable of building Dyson sphere is like a slightly OD way to put it, and I think it's sort of a property of a civilization that depends on a lot of physical infrastructure. And by Dyson sphere, I just understand this to mean like, I don't know,
like a billion times more energy than all the sunlight incident on Earth or something like that. I think I most often think about what's the chance in like, five years, ten years, whatever. So maybe I'd say like 15% chance by 2030 and like 40% chance by 2040. Those are kind of like cash numbers from six months ago or nine months ago that I haven't revisited in a while.
40% by 2040. So I think that seems longer than I think Dario, when he was on the podcast, he said we would have AIS that are capable of doing lots of different kinds of they'd basically pass a Turing test for a well educated human for, like, an hour or something. And it's hard to imagine that something that actually is human is long after and from there, something superhuman. So somebody like Dario, it seems like, is on the much shorter end. Ilya I don't think he answered this question specifically, but I'm guessing similar answer. So why do you not buy the scaling picture? What makes your timelines longer? Yeah, I mean, I'm happy maybe I want to talk separately about the 2030 or 2040 forecast. Once you're talking the 2040 forecast,
I think which one are you more interested in starting with? Are you complaining about 15% by 2030 for Dyson sphere being too low or 40% by 2040 being too low? Let's talk about the 2030. Why 15% by 2030 there yeah, I. Think my take is you can imagine two polls in this discussion. One is, like, the fast poll that's like, hey, AICM is pretty smart. What exactly can it do? It's like, getting smarter pretty fast. That's like, one poll, and the other poll is like, hey, everything takes a really long time, and you're talking about this crazy industrialization that's a factor of a billion growth from where we're at today, give or take. We don't know if it's even possible to develop technology that fast or whatever. You have this sort of two poles of that discussion, and I feel like I'm presenting it that way in Pakistan, and then I'm somewhere in between with this nice, moderate physician of only a 15% chance. But in
particular, the things that move me, I think, are kind of related to both of those extremes. On the one hand, I'm like, AI systems do seem quite good at a lot of things and are getting better much more quickly, such that it's really hard to say, here's what they can't do or here's the obstruction. On the other hand, like, there is not even much proof in principle right now of AI systems doing super useful cognitive work. We don't have a trend we can extrapolate where we're like, yeah, you've done this thing this year. You're going to do this thing next year. And the other thing the following year. I think right now there are very broad error bars about where fundamental difficulties could be, and six years is just not I guess six years and 3 months is not a lot of time. So I think this, like, 15% for 2030 Dyson sphere, you probably need the human level AI
or the AI that's like doing human jobs in, give or take, like, 4 years, 3 years, like, something like that. So you're just not giving very many years. It's not very much time. And I think there are a lot of things that your model maybe this is some generalized, like things take longer than you'd think. And I feel most strongly about that when you're talking about 3 or 4 years. And I feel like less strongly about that as you talk about ten years or 20 years. But at 3 or 4 years I feel or like six years for the Dyson sphere, I feel a lot of that. There's a lot of ways this could take
a while, a lot of ways in which AI systems could be hard to hand all the work to your AI systems. Okay, so maybe instead of speaking in terms of years, we should say, but by the way, it's interesting that you think the distance between can take all human cognitive labor to Dyson sphere is two years. It seems like we should talk about that at some point. Presumably it's like intelligence explosion stuff. Yeah, I mean, I think amongst people you've interviewed, maybe that's like on the long end thinking it would take like a couple of years. And it depends a little bit what you mean by I think literally all human cognitive labor is probably like more like weeks or months or something like that. That's kind of deep into the singularity.
But yeah, there's a point where AI wages are high relative to human wages, which I think is well before can do literally everything human can do. Sounds good, but before we get to that, the intelligence explosion stuff on the 4 years. So instead of 4 years, maybe we can say there's going to be maybe two more scale ups in 4 years. Like GPT 4 to GPT five to GPT six, and let's say each one is ten x bigger. So what is GPT 4 like two e 25 flops? I don't think it's publicly stated what it is, okay. But I'm happy to say, like 4 orders of magnitude or five or six or whatever effective training compute past GPT 4 of what would you guess would happen based on sort of some public estimate for what we've gotten so far from effective training compute. Do you think two more scale ups is
not enough? It was like 15%. That two more scale ups. Get us there. Yeah, I mean, get us there is, again, a little bit complicated. Like there's a system that's a drop in replacement for humans and there's a system which still requires some amount of schlep before you're able to really get everything going. Yeah, I think it's quite plausible that even at I don't know what I mean by quite plausible. Like somewhere between 50% or two thirds or let's call it 50% even by the time you get to GPT six, or like, let's call it five orders of magnitude, effective training compute past GPT four, that that system still requires really a large amount of work to be deployed in lots of jobs. That is, it's not like a drop in replacement for humans where you can just say like, hey, you understand everything any human understands.
Whatever role you could hire a human for, you just do it. That it's. More like, okay, we're going to collect large amounts of relevant data and use that data for fine tuning. Systems learn through fine tuning quite differently from humans learning on the job or humans learning by observing things. Yeah, I just have a significant probability that system will still be weaker than
humans in important ways. Like maybe that's already like 50% or something. And then another significant probability that system will require a bunch of changing workflows or gathering data, or is not necessarily strictly weaker than humans, or if trained in the right way, wouldn't be weaker than humans, but will take a lot of schlep to actually make fit into workflows and do the jobs. And that schlep is what gets you from 15% to 40% by 2040. Yeah, you also get a fair amount of scaling between you get less scaling is probably going to be much, much faster over the next 4 or five years than over the subsequent years. But yeah, it's a combination of like you get some significant additional scaling and you get a lot of time to deal with things that are just engineering hassles.
But by the way, I guess we should be explicit about why you said 4 orders of magnitude scale up to get two more generations just for people who might not be familiar. If you have ten x more parameters to get the most performance, you also want around ten x more data. So that to be tinchill optimal, that would be 100 x more compute total. But okay, so why is it that you disagree with the strong scaling picture? At least it seems like you might disagree with the strong scaling picture that Dario laid out on the podcast, which would imply probably that two more generations, it wouldn't be something where you need a lot of schleps. It would probably just be really fucking smart. Yeah, I mean, I think that basically just
had these two claims. One is like, how smart exactly will it be so we don't have any curves to extrapolate and seems like there's a good chance it's better than a human in all the relevant things and there's a good chance it's not. Yeah, that might be totally wrong. Like maybe just making up numbers, I guess like 50 50 on that one. If it's 50 50 by in the next 4 years that it will be around human smart, then how do we get to 40% by 20? Like whatever sort of Slepts they are. How does it degrade you 10%, even after all the scaling that happens by 2040? Yeah, all these numbers are pretty made up. And that 40% number was probably from before or even like the chat GPT release or the seeing GPT 3.5 or GPT four. So, I mean,
the numbers are going to bounce around a bit and all of them are pretty made up. But like that 50%, I want to then combine with the second 50% that's more like on this schlep side. And then I probably want to combine with some additional probabilities for various forms of slowdown, where a slowdown could include like a deliberate decision to slow development of technology or could include just like we suck at deploying things. Like that is a sort of decision you might regard as wise to slow things down, or decision that's like maybe unwise or maybe wise for the wrong reasons to slow things down. You probably want to add some of that on top. I probably want to add on some loss for like it's possible you don't produce GPT six scale systems within the next 3 years or 4 years.
Let's isolate for all of that. And how much bigger would the system be than GPT 4 where you think there's more than 50% chance that it's going to be smart enough to replace basically all human cognitive labor. Also I want to say that for the 50 25% thing, I think that would probably suggest those numbers if I randomly made them up and then made the decimal sphere prediction that's going to gear you like 60% by 2040 or something, not 40%. And I have no idea between those. These are all made up and I have no idea which of those I would endorse on reflection. So this question of how big would you have to make the system before it's more likely than not that you can be like a drop in replacement for humans. I think if you just literally say like you train on web text, then the question is kind of hard to discuss because I don't really buy stories that training data makes a big difference. Long run to these
dynamics. But I think if you want to just imagine the hypothetical, like you just took GPT 4 and made the numbers bigger, then I think those are pretty significant issues. I think there's significant issues in two ways. One is like quantity of data and I think probably the larger
one is like quality of data where I think as you start approaching the prediction task is not that great a task. If you're like a very weak model, it's a very good signal. We get smarter. At some point it becomes like a worse and worse signal to get smarter. I think there's a number of reasons. It's not clear there is any number such that I imagine, or there is a number, but I think it's very large. So do you plug that number into GPT force code and then maybe fiddled the architecture a bit? I would expect that thing to have a more than 50% chance of being a drop in replacement for humans. You're always going to have to do some work, but the work is not necessarily much, I would guess. When people say new insight is needed, I think I tend to be more bullish than them. I'm not like these are new ideas where who knows how long it will take. I think it's
just like you have to do some stuff. You have to make changes unsurprisingly. Like every time you scale something up by like five orders of magnitude, you have to make some changes. I want to better understand your intuition of being more skeptical than some about scaling picture that these changes are even needed in the first place, or that it would take more than two orders of magnitude, more improvement to get these things almost certainly to a human level or a very high probability to human level. So is it that you don't agree with the way in
which they're extrapolating these loss curves? You don't agree with the implication that that decrease in loss will equate to greater and greater intelligence? Or what would you tell Dario about if you were having I'm sure you have, but what would that debate look like about this? Yeah. So again, here we're talking two factors of a half. One on like, is it smart enough? And one on like, do you have to do a bunch of schlap even if in some sense it's smart enough? And like the first factor of a half, I'd be like, I don't think we have really anything good to extrapolate that is like, I feel I would not be surprised if I have similar or maybe even higher probabilities on really crazy stuff over the next year and then lower. My probability is not that bunched up. Maybe Dara's probability, I don't know. You'd have talked with him is like,
you have talked with him is more bunched up on some particular year and mine is maybe a little bit more uniformly spread out across the coming years, partly because I'm just like I don't think we have some trends we can extrapolate like an extrapolate loss. You can look at your qualitative impressions of systems at various scales, but it's just very hard to relate any of those extrapolations to doing cognitive work or accelerating R and D or taking over and fully automating R and D. So I have a lot of uncertainty around that extrapolation. I think it's very easy to get down to like a 50 50 chance of this. What about the sort of basic intuition that, listen, this is a big Blop of compute. You make the big block of compute big or it's going to get smarter. It'd be really weird if it didn't. I'm happy with that. It's going to get smarter,
and it would be really weird if it didn't. And the question is how smart does it have to get? Like, that argument does not yet give us a quantitative guide to at what scale is it a slam dunk or at what scale is it? 50 50? And what would be the piece of evidence that would nudge you one way or another, where you look at that and be like, oh fuck, this is at 20% by 2040 or 60% by 2040 or something. Is there something that could happen in the next few years or next 3 years? What is the thing you're looking to where this will be a big update for you? Again, I think there's some just how capable is each model where I think we're really bad at extrapolating. We still have some subjective guess and you're comparing it to what happened and that will move me. Every time we see what happens with another order of magnitude of training compute, I will have a slightly different guess for where things are going. These probabilities are coarse enough that, again, I don't know if that 40% is real or if like post GBG 3.5 and four, I should be at like 60% or what. That's one thing. And the second
thing is just like some if there was some ability to extrapolate, I think this could reduce error bars a lot. I think here's another way you could try and do an extrapolation is you could just say how much economic value do systems produce and how fast is that growing? I think once you have systems actually doing jobs, the extrapolation gets easier because you're not moving from a subjective impression of a chat to automating all R and D, you're moving from automating this job to automating that job or whatever. Unfortunately, that's like probably by the time you have nice trends from that, you're not talking about 2040, you're talking about two years from the end of days or one year from the end of days or whatever. But to the extent that you can get extrapolations like that, I do think it can provide more clarity. But why is economic value the thing we would want to extrapolate? Because, for example, you started off with chimps and they're just getting gradually smarter to human level. They would basically provide no economic value until they were basically worth as much as a human. So it would be this very gradual and then very fast
increase in their value. So is the increase in value from GBD four, GBD five, GBD six? Is that the extrapolation we want? Yeah, I think that the economic extrapolation is not great. I think it's like you could compare it to this objective extrapolation of how smart does the model seem? It's not super clear which one's better. I think probably in the chimp case, I don't think that's quite right. So if you imagine intensely domesticated chimps who are just actually trying their best to be really useful employees and you hold fix their physical hardware and then you just gradually scale up their intelligence, I don't think you're going to see zero value, which then suddenly becomes massive value over one doubling of brain size or whatever one order of magnitude of brain size. It's actually possible
in order of magnitude of brain size, but chimps are already within an order of magnitude of brain sizes of humans. Like, chimps are very, very close on the kind of spectrum we're talking about. So I think I'm skeptical of the abrupt transition for chimps. And to the extent that I kind of expect a fairly abrupt transition here, it's mostly just because the chimp human intelligence difference is so small compared to the differences we're talking about with respect to these models. That is, like, I would not be surprised if in some objective sense, like, chimp human difference is significantly smaller than the GPT-3 GPT 4 difference, the GPT four, GPT five difference. Wait, wouldn't that argue in favor of just relying much more on this objective? Yeah, there's sort of two balancing tensions here. One is like, I don't believe the chimp
thing is going to be as abrupt. That is, I think if you scaled up from chimps to humans, you actually see quite large economic value from the fully domesticated chimp already. Okay. And then the second half is like, yeah, I think that the chimp human difference is probably pretty small compared to model differences. So I do think things are going to be pretty abrupt. I think the economic extrapolation is pretty rough. I
also think the subjective extrapolation is pretty rough just because I really don't know how to get I don't know how people do the extrapolation end up with the degrees of confidence people end up with. Again, I'm putting it pretty high if I'm saying, like, give me 3 years, and I'm like, yeah, 50 50, it's going to have basically the smarts there to do the thing. I'm not saying it's like a really long layoff. I'm just saying I got pretty big error bars. And I think that it's really hard not to have really big error bars when you're doing this. I looked at GPT four, it seemed pretty smart compared to GPT 3.5. So I bet just like 4 more such notches and we're there. That's just
a hard call to make. I think I sympathize more with people who are like, how could it not happen in 3 years than with people who are like, no way it's going to happen in eight years, or whatever, which is probably a more common perspective in the world. But also things do take longer than you I think things take longer than you think. It's like a real thing. Yeah, I don't know. Mostly I have big error bars because I just don't believe the subjective extrapolation that much. I find it hard to get like a huge amount out of it. Okay, so what about the scaling picture do you think is most likely to be wrong? Yeah. So we've talked a little bit about how
good is the qualitative extrapolation, how good are people at comparing? So this is not like the picture being qualitative wrong. This is just quantitatively. It's very hard to know how far off you are. I think a qualitative consideration that could significantly slow things down is just like right now you get to observe this really rich supervision from basically next word prediction, or in practice, maybe you're looking at a couple of sentences prediction. So getting this pretty rich supervision, it's plausible that if you want to automate long horizon tasks like being an employee over the course of a month, that that's actually just considerably harder to supervise. Or that you basically end up driving costs. Like the worst case here is that you drive up costs by a factor that's like linear in the horizon over which the thing is operating. And
I still consider that just quite plausible. Can you dump that down? You're driving up a cost about of what in the linear and the does the horizon mean? Yeah. So if you imagine you want to train a system to say words that sound like the next word a human would say, there you can get this really rich supervision by having a bunch of words and then predicting the next one and then being like, I'm going to tweak the model, so it predicts better if you're like, hey, here's what I want. I want my model to interact with some job over the course of a month and then at the end of that month have internalized everything that the human would have internalized about how to do that job well and have local context and so on. It's harder to supervise that task. So in particular, you could supervise it from the next word prediction task and all that context the human has ultimately will just help them predict the next word better. So,
like, in some sense, a really long context language model is also learning to do that task. But the number of effective data points you get of that task is vastly smaller than the number of effective data points you get at this very short horizon. Like what's the next word, what's the next sense tasks? The sample efficiency matters more for economically valuable long horizon tasks than the predicting the next token. And that's what will actually be required to take over a lot of jobs. Yeah, something like that. That is, it just seems very plausible that it takes longer to train models to do tasks that are longer horizon. How fast do you think the pace of algorithmic advances will be? Because if by 2040, even if scaling fails since 2012, since the beginning of the deep learning revolution, we've had so many new things by 2040, are you expecting a similar pace of increases? And if so, then if we just keep having things like this, then aren't we going to just going to get the AI sooner or later? Or sooner? Not later. Aren't we going to get the AI sooner or sooner?
I'm with you on sooner or later. Yeah, I suspect progress to slow. If you held fixed how many people working in the field, I would expect progress to slow as low hanging fruit is exhausted. I think the rapid rate of progress in, say, language modeling over the last 4 years is largely sustained by, like, you start from a relatively small amount of investment, you greatly scale up the amount of investment, and that enables you to keep picking. Every time
the difficulty doubles, you just double the size of the field. I think that dynamic can hold up for some time longer. Right now, if you think of it as, like, hundreds of people effectively searching for things up from, like, you know, anyway, if you think of it hundreds of people now you can maybe bring that up to like, tens of thousands of people or something. So for a while,
you can just continue increasing the size of the field and search harder and harder. And there is indeed a huge amount of low hanging fruit where it wouldn't be a hard for a person to sit around and make things a couple of percent better after after year of work or whatever. So I don't know. I would probably think of it mostly in terms of how much can investment be expanded and try and guess some combination of fitting that curve and some combination of fitting the curve to historical progress, looking at how much low hanging fruit there is, getting a sense of how fast it decays. I think you probably get a lot, though. You get a bunch of orders of magnitude of total, especially if you ask how good is a GPT five scale model or GPT 4 scale model? I think you probably get like, by 2040, like, I don't know, 3 orders of magnitude of effective training compute improvement or like, a good chunk of effective training compute improvement, 4 orders of magnitude. I don't know. I don't have, like here I'm speaking from no private information about the last couple of years
of efficiency improvements. And so people who are on the ground will have better senses of exactly how rapid returns are and so on. Okay, let me back up and ask a question more generally about people. Make these analogies about humans were trained by evolution and were deployed in the modern civilization. Do you buy those analogies? Is it valid to say that humans were trained by evolution rather than I mean, if you look at the protein coding size of the genome, it's like 50 megabytes or something. And then what part of that is for the brain anyways? How do you think about how much information is in? Do you think of the genome as a hyperparameters? Or how much does that inform you when you have these anchors for how much training humans get when they're just consuming information, when they're walking up and about and so on? I guess the way. That you could think of. This is like, I think both analogies are reasonable. One
analogy being like, evolution is like a training run and humans are like the end product of that training run. And a second analogy is like, evolution is like an algorithm designer and then a human over the course of this modest amount of computation over their lifetime is the algorithm being that's been produced, the learning algorithm has been produced. And I think neither analogy is that great. I like them both and lean on them a bunch, both of them a bunch, and think that's been pretty good for having a reasonable view of what's likely to happen. That said, the human genome is not that much like 100 trillion parameter model. It's like a much smaller number of parameters that
behave in a much more confusing way. Evolution did a lot more optimization, especially over long designing a brain to work well over a lifetime than gradient descent does over models. That's like a dis analogy on that side and on the other side, I think human learning over the course of a human lifetime is in many ways just like much, much better than gradient descent over the space of neural nets. Gradient descent is working really well, but I thin