Carl Shulman - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

Carl Shulman - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

Show Video

Today I have the pleasure of speaking with  Carl Shulman. Many of my former guests,   and this is not an exaggeration, have told me  that a lot of their biggest ideas have come   directly from Carl especially when it has to do  with the intelligence explosion and its impacts.   So I decided to go directly to the source and we  have Carl today on the podcast. He keeps a super   low profile but is one of the most interesting  intellectuals I've ever encountered and this is   actually his second podcast ever. We're going  to go deep into the heart of many of the most   important ideas that are circulating right now  directly from the source. Carl is also an advisor   to the Open Philanthropy project which is one  of the biggest funders on causes having to do   with AI and its risks, not to mention global  health and well being. And he is a research  

associate at the Future of Humanity Institute  at Oxford. So Carl, it's a huge pleasure to   have you on the podcast. Thanks for coming. Thank you Dwarkesh. I've enjoyed seeing   some of your episodes recently  and I'm glad to be on the show.  Excellent, let's talk about AI. Before we  get into the details, give me the big picture   explanation of the feedback loops and just general  dynamics that would start when you have something   that is approaching human-level intelligence. The way to think about it is — we have a process   now where humans are developing new computer  chips, new software, running larger training runs,   and it takes a lot of work to keep Moore's law  chugging (while it was, it's slowing down now).  

And it takes a lot of work to develop things like  transformers, to develop a lot of the improvements   to AI neural networks. The core method that I want  to highlight on this podcast, and which I think   is underappreciated, is the idea of input-output  curves. We can look at the increasing difficulty   of improving chips and sure, each time  you double the performance of computers   it’s harder and as we approach physical  limits eventually it becomes impossible.   But how much harder? There's a paper called  “Are Ideas Getting Harder to Find?" that was   published a few years ago. 10 years ago at MIRI,  I did an early version of this analysis using  

data mainly from Intel and the large semiconductor  fabricators. In this paper they cover a period   where the productivity of computing went up a  million fold, so you could get a million times   the computing operations per second per dollar,  a big change but it got harder. The amount of   investment and the labor force required to make  those continuing advancements went up and up and   up. It went up 18 fold over that period. Some  take this to say — “Oh, diminishing returns.   Things are just getting harder and harder and  so that will be the end of progress eventually.”   However in a world where AI is doing the  work, that doubling of computing performance,   translates pretty directly to a doubling or  better of the effective labor supply. That is,   if when we had that million-fold compute increase  we used it to run artificial intelligences who   would replace human scientists and engineers,  then the 18x increase in the labor demands of   the industry would be trivial. We're getting  more than one doubling of the effective labor  

supply than we need for each doubling of the labor  requirement and in that data set, it's over four.   So when we double compute we need somewhat more  researchers but a lot less than twice as many.   We use up some of those doublings of compute on  the increasing difficulty of further research,   but most of them are left to expedite the  process. So if you double your labor force,   that's enough to get several doublings  of compute. You use up one of them   on meeting the increased demands from diminishing  returns. The others can be used to accelerate   the process so you have your first doubling take  however many months, your next doubling can take   a smaller fraction of that, the next doubling  less and so on. At least in so far as  

the outputs you're generating, compute for AI  in this story, are able to serve the function   of the necessary inputs. If there are other inputs  that you need eventually those become a bottleneck   and you wind up more restricted on this. Got it. The bloom paper said there was a 35%   increase in transistor density and there  was a 7% increase per year in the number   of researchers required to sustain that pace. Something in the vicinity, yeah. Four to five   doublings of compute per doubling of labor inputs. I guess there's a lot of questions you can delve  

into in terms of whether you would expect a  similar scale with AI and whether it makes sense   to think of AI as a population of researchers  that keeps growing with compute itself. Actually,   let's go there. Can you explain the  intuition that compute is a good proxy   for the number of AI researchers so to speak? So far I've talked about hardware as an initial   example because we had good data about a past  period. You can also make improvements on   the software side and when we think about an  intelligence explosion that can include — AI   is doing work on making hardware better,  making better software, making more hardware.   But the basic idea for the hardware is especially  simple in that if you have an AI worker that can   substitute for a human, if you have twice  as many computers you can run two separate   instances of them and then they can do two  different jobs, manage two different machines,   work on two different design problems. Now you  can get more gains than just what you would get   by having two instances. We get improvements  from using some of our compute not just to  

run more instances of the existing AI, but to  train larger AIs. There's hardware technology,   how much you can get per dollar you spend on  hardware and there's software technology and   the software can be copied freely. So  if you've got the software it doesn't   necessarily make that much sense to say that  — “Oh, we've got you a hundred Microsoft   Windows.” You can make as many copies as you  need for whatever Microsoft will charge you.   But for hardware, it’s different. It  matters how much we actually spend  

on the hardware at a given price. And if we look  at the changes that have been driving AI recently,   that is the thing that is really off-trend.  We are spending tremendously more money   on computer hardware for training big AI models. Okay so there's the investment in hardware,   there's the hardware technology itself, and  there's the software progress itself. The   AI is getting better because we're spending  more money on it because our hardware itself   is getting better over time and because we're  developing better models or better adjustments   to those models. Where is the loop here? The work involved in designing new hardware  

and software is being done by people now.  They use computer tools to assist them,   but computer time is not the primary  cost for NVIDIA designing chips,   for TSMC producing them, or for ASML making  lithography equipment to serve the TSMC fabs.   And even in AI software research that has become  quite compute intensive we're still in the range   where at a place like DeepMind salaries were  still larger than compute for the experiments.   Although more recently tremendously more of the  expenditures were on compute relative to salaries.  

If you take all the work that's being done  by those humans, there's like low tens of   thousands of people working at Nvidia designing  GPUs specialized for AI. There's more than 70,000   people at TSMC which is the leading producer of  cutting-edge chips. There's a lot of additional   people at companies like ASML that supply them  with the tools they need and then a company like   DeepMind, I think from their public filings, they  recently had a thousand people. OpenAI is a few  

hundred people. Anthropic is less. If you add up  things like Facebook AI research, Google Brain,   other R&D, you get thousands or tens of thousands  of people who are working on AI research.  We would want to zoom in on those who are  developing new methods rather than narrow   applications. So inventing the transformer  definitely counts but optimizing for some   particular businesses data set cleaning probably  not. So those people are doing this work,  

they're driving quite a lot of progress. What  we observe in the growth of people relative to   the growth of those capabilities is that pretty  consistently the capabilities are doubling on a   shorter time scale than the people required to  do them are doubling. We talked about hardware   and how it was pretty dramatic historically. Like  four or five doublings of compute efficiency per   doubling of human inputs. I think that's a bit  lower now as we get towards the end of Moore's  

law although interestingly not as much lower as  you might think because the growth of inputs has   also slowed recently. On the software side there's  some work by Tamay Besiroglu and collaborators;   it may have been his thesis. It's called Are  models getting harder to find? and it's applying   the same analysis as the “Are ideas getting harder  to find?” and you can look at growth rates of   papers, from citations, employment at these  companies, and it seems like the doubling   time of these like workers driving  the software advances is like several   years whereas the doubling of effective  compute from algorithmic progress is faster.   There's a group called Epoch, they've  received grants from open philanthropy,   and they do work collecting datasets that  are relevant to forecasting AI progress.   Their headline results for what's the  rate of progress in hardware and software,   and growth in budgets are as follows — For  hardware, they're looking at a doubling of   hardware efficiency in like two years. It's  possible it’s a bit better than that when   you take into account certain specializations  for AI workloads. For the growth of budgets  

they find a doubling time that's something  like six months in recent years which is pretty   tremendous relative to the historical rates. We  should maybe get into that later and then on the   algorithmic progress side, mainly using Imagenet  type datasets right now they find a doubling time   that's less than one year. So when you combine all  of these things the growth of effective compute   for training big AIs is pretty drastic. I think I saw an estimate that GPT-4 cost  

like 50 million dollars or around that range to  train. Now suppose that AGI takes a 1000x that,   if you were just a scale of GPT-4 it might not be  that but just for the sake of example, some part   of that will come from companies just spending  a lot more to train the models and that’s just   greater investment. Part of that will come from  them having better models.You get the same effect   of increasing it by 10x just from having a better  model. You can spend more money on it to train a   bigger model, you can just have a better model,  or you can have chips that are cheaper to train   so you get more compute for the same dollars. So  those are the three you are describing the ways   in which the “effective compute” would increase? Looking at it right now, it looks like you might   get two or three doublings of effective  compute for this thing that we're calling   software progress which people get by asking — how  much less compute can you use now to achieve the   same benchmark as you achieved before? There are  reasons to not fully identify this with software   progress as you might naively think because some  of it can be enabled by the other. When you have  

a lot of compute you can do more experiments  and find algorithms that work better. We were   talking earlier about how sometimes with the  additional compute you can get higher efficiency   by running a bigger model. So that means you're  getting more for each GPU that you have because   you made this larger expenditure. That can look  like a software improvement because this model   is not a hardware improvement directly because  it's doing more with the same hardware but you   wouldn't have been able to achieve it without  having a ton of GPUs to do the big training run.  The feedback loop itself involves the AI that  is the result of this greater effect of compute   helping you train better AI or use less effective  compute in the future to train better AI?  It can help with the hardware design. NVIDIA  is a fab-less chip design company. They  

don't make their own chips. They send files of  instructions to TSMC which then fabricates the   chips in their own facilities. If you could  automate the work of those 10,000+ people   and have the equivalent of a million people  doing that work then you would pretty quickly   get the kind of improvements that can  be achieved with the existing nodes that   TSMC is operating on and get a lot of those  chip design gains. Basically doing the job   of improving chip design that those people  are working on now but get it done faster.   While that's one thing I think that's less  important for the intelligence explosion.   The reason being that when you make  an improvement to chip design it only   applies to the chips you make after that.  If you make an improvement in AI software,  

it has the potential to be immediately applied  to all of the GPUs that you already have.   So the thing that I think is most disruptive  and most important and has the leading edge of   the change from AI automation of the  inputs to AI is on the software side  At what point would it get to the point  where the AIs are helping develop better   software or better models for future AIs?  Some people claim today, for example,   that programmers at OpenAI are using Copilot  to write programs now. So in some sense you're   already having that feedback loop but I'm a little  skeptical of that as a mechanism. At what point   would it be the case that the AI is contributing  significantly in the sense that it would almost   be the equivalent of having additional  researchers to AI progress and software?  The quantitative magnitude of the help is  absolutely central. There are plenty of companies  

that make some product that very slightly boosts  productivity. When Xerox makes fax machines,   it maybe increases people's productivity in  office work by 0.1% or something. You're not   gonna have explosive growth out of that  because 0.1% more effective R&D at Xerox   and any customers buying the machines is  not that important. The thing to look for   is — when is it the case that the contributions  from AI are starting to become as large as the   contributions from humans? So when this is  boosting their effective productivity by 50   or 100% and if you then go from like eight months  doubling time for effective compute from software   innovations, things like inventing the transformer  or discovering chinchilla scaling and doing your   training runs more optimally or creating flash  attention. If you move that from 8 months to 4  

months and then the next time you apply that it  significantly increases the boost you're getting   from the AI. Now maybe instead of giving a 50% or  100% productivity boost now it's more like 200%.   It doesn't have to have been able to automate  everything involved in the process of AI   research. It can be that it's automated a bunch  of things and then those are being done in extreme   profusion. A thing AI can do, you can have it  done much more often because it's so cheap.   And so it's not a threshold of — this  is human level AI, it can do everything   a human can do with no weaknesses in any  area. It's that, even with its weaknesses  

it's able to bump up the performance. So that  instead of getting the results we would have   with the 10,000 people working on finding  these innovations, we get the results that   we would have if we had twice as many of those  people with the same kind of skill distribution.  It’s a demanding challenge, you need quite a lot  of capability for that but it's also important   that it's significantly less than — this is a  system where there's no way you can point at it   and say in any respect it is weaker than a human.  A system that was just as good as a human in every   respect but also had all of the advantages of an  AI, that is just way beyond this point. If you   consider that the output of our existing fabs make  tens of millions of advanced GPUs per year. Those   GPUs if they were running AI software that was  as efficient as humans, it is sample efficient,   it doesn't have any major weaknesses,  so they can work four times as long,   the 168 hour work week, they can have much more  education than any human. A human, you got a PhD,  

it's like 20 years of education, maybe longer  if they take a slow route on the PhD. It's   just normal for us to train large models by eat  the internet, eat all the published books ever,   read everything on GitHub and  get good at predicting it.   So the level of education vastly beyond any human,  the degree to which the models are focused on task   is higher than all but like the most motivated  humans when they're really, really gunning for it.   So you combine the things tens of millions of  GPUs, each GPU is doing the work of the very best   humans in the world and the most capable humans  in the world can command salaries that are a lot   higher than the average and particularly  in a field like STEM or narrowly AI,   like there's no human in the world who has a  thousand years of experience with TensorFlow or   let alone the new AI technology that was invented  the year before but if they were around, yeah,   they'd be paid millions of dollars a year. And so  when you consider this — tens of millions of GPUs.  

Each is doing the work of 40, maybe more of these  existing workers, is like going from a workforce   of tens of thousands to hundreds of millions. You  immediately make all kinds of discoveries, then   you immediately develop all sorts of tremendous  technologies. Human level AI is deep, deep into   an intelligence explosion. Intelligence explosion  has to start with something weaker than that.  Yeah, what is the thing it starts with  and how close are we to that? Because   to be a researcher at OpenAI is not just  completing the hello world Prompt that Copilot   does right? You have to choose a new idea, you  have to figure out the right way to approach it,   you perhaps have to manage the people who  are also working with you on that problem.  

It's an incredibly complicated portfolio of skills  rather than just a single skill. What is the point   at which that feedback loop starts where you're  not just doing the 0.5% increase in productivity   that an AI tool might do but is actually the  equivalent of a researcher or close to it?  Maybe a way is to give some illustrative examples  of the kinds of capabilities that you might see.   Because these systems have to be a lot weaker  than the human-level things, what we'll have is   intense application of the ways in which AIs have  advantages partly offsetting their weaknesses.  

AIs are cheap so we can call a lot of them to  do many small problems. You'll have situations   where you have dumber AIs that are deployed  thousands of times to equal one human worker.   And they'll be doing things like voting algorithms  where with an LLM you generate a bunch of   different responses and take a majority vote among  them that improves some performance. You'll have   things like the AlphaGo kind of approach where you  use the neural net to do search and you go deeper   with the search by plowing in more compute which  helps to offset the inefficiency and weaknesses of   the model on its own. You'll do things that would  just be totally impractical for humans because of   the sheer number of steps, an example of that  would be designing synthetic training data.  

Humans do not learn by just going into the  library and opening books at random pages,   it's actually much much more efficient to have  things like schools and classes where they teach   you things in an order that makes sense, focusing  on the skills that are more valuable to learn.   They give you tests and exams. They're designed to  try and elicit the skill they're actually trying   to teach. And right now we don't bother with  that because we can hoover up more data from   the internet. We're getting towards the end of  that but yeah, as the AIs get more sophisticated   they'll be better able to tell what is a useful  kind of skill to practice and to generate that.   We've done that in other areas like AlphaGo. The  original version of AlphaGo was booted up with  

data from human Go play and then improved with  reinforcement learning and Monte-carlo tree search   but then AlphaZero, a somewhat more sophisticated  model benefited from some other improvements   but was able to go from scratch and it  generated its own data through self play.   Getting data of a higher quality than the human  data because there are no human players that   good available in the data set and also  a curriculum so that at any given point   it was playing games against an  opponent of equal skill itself.   It was always in an area when it was easy to  learn. If you're just always losing no matter  

what you do, or always winning no matter what  you do, it's hard to distinguish which things   are better and which are worse? And when we have  somewhat more sophisticated AIs that can generate   training data and tasks for themselves, for  example if the AI can generate a lot of unit tests   and then can try and produce programs that  pass those unit tests, then the interpreter   is providing a training signal and the AI can  get good at figuring out what's the kind of   programming problem that is hard for AIs right now  that will develop more of the skills that I need   and then do them. You're not going to have  employees at Open AI write a billion programming   problems, that's just not gonna happen. But you  are going to have AIs given the task of producing   the enormous number of programming challenges. In LLMs themselves, there's a paper out of  

Anthropic called Constitution AI where they  basically had the program just talk to itself   and say, "Is this response helpful? If not, how  can I make this more helpful” and the responses   improved and then you train the model on the more  helpful responses that it generates by talking   to itself so that it generates it natively and  you could imagine more sophisticated or better   ways to do that. But then the question is GPT-4  already costs like 50 million or 100 million or   whatever it was. Even if we have greater effective  compute from hardware increases and better models,   it's hard to imagine how we could sustain four or  five orders of magnitude greater effective size   than GPT-4 unless we're dumping in trillions of  dollars, the entire economies of big countries,   into training the next version. The  question is do we get something that  

can significantly help with AI progress  before we run out of the sheer money and   scale and compute that would require to  train it? Do you have a take on that?  First I'd say remember that there are these  three contributing trends. The new H100s are   significantly better than the A100s and a  lot of companies are actually just waiting   for their deliveries of H100s to do even  bigger training runs along with the work   of hooking them up into clusters and engineering  the thing. All of those factors are contributing   and of course mathematically yeah, if you do  four orders of magnitude more than 50 or 100   million then you're getting to trillion dollar  territory. I think the way to look at it is   at each step along the way, does it look  like it makes sense to do the next step?   From where we are right now seeing the results  with GPT-4 and ChatGPT companies like Google and   Microsoft are pretty convinced that this is very  valuable. You have talk at Google and Microsoft  

that it's a billion dollar matter to change  market share in search by a percentage point   so that can fund a lot. On the far end if you  automate human labor we have a hundred trillion   dollar economy and most of that economy is paid  out in wages, between 50 and 70 trillion dollars   per year. If you create AGI it's going to automate  all of that and keep increasing beyond that.   So the value of the completed project Is very  much worth throwing our whole economy into it,   if you're going to get the good version and not  the catastrophic destruction of the human race   or some other disastrous outcome. In between  it's a question of — how risky and uncertain   is the next step and how much is the growth in  revenue you can generate with it? For moving up to   a billion dollars I think that's absolutely going  to happen. These large tech companies have R&D  

budgets of tens of billions of dollars and when  you think about it in the relevant sense all the   employees at Microsoft who are doing software  engineering that’s contributing to creating   software objects, it's not weird to spend tens of  billions of dollars on a product that would do so   much. And I think that it's becoming clearer that  there is a market opportunity to fund the thing.   Going up to a hundred billion dollars, that's the  existing R&D budgets spread over multiple years.   But if you keep seeing that when you scale up the  model it substantially improves the performance,   it opens up new applications, that is you're  not just improving your search but maybe it   makes self-driving cars work, you replace bulk  software engineering jobs or if not replace them   amplify productivity. In this kind of dynamic you  actually probably want to employ all the software   engineers you can get as long as they are able  to make any contribution because the returns   of improving stuff in AI itself gets so high. But  yeah, I think that can go up to a hundred billion.   And at a hundred billion you're using a  significant fraction of our existing fab   capacity. Right now the revenue of NVIDIA is 25  billion, the revenue of TSMC is over 50 billion. I   checked in 2021, NVIDIA was maybe 7.5%, less than  10% of TSMC revenue. So there's a lot of room and  

most of that was not AI chips. They have a large  gaming segment, there are data center GPU's that   are used for video and the like. There's room  for more than an order of magnitude increase by   redirecting existing fabs to produce more AI chips  and they're just actually using the AI chips that   these companies have in their cloud for the big  training runs. I think that that's enough to go   to the 10 billion and then combine with stuff  like the H100 to go up to the hundred billion. 

Just to emphasize for the audience the initial  point about revenue made. If it costs OpenAI   100 million dollars to train GPT-4 and it  generates 500 million dollars in revenue,   you pay back your expenses with 100 million and  you have 400 million for your next training run.   Then you train your GPT 4.5, you get let's say  four billion dollars in revenue out of that.  

That's where the feedback group of revenue comes  from. Where you're automating tasks and therefore   you're making money you can use that money to  automate more tasks. On the ability to redirect   the fab production towards AI chips, fabs take  a decade or so to build. Given the ones we have   now and the ones that are going to come online  in the next decade, is there enough to sustain   a hundred billion dollars of GPU compute if  you wanted to spend that on a training run?  Yes, you definitely make the hundred billion one.  As you go up to a trillion dollar run and larger,   it's going to involve more fab construction and  yeah, fabs can take a long a long time to build.   On the other hand, if in fact you're getting  very high revenue from the AI systems and you're   actually bottlenecked on the construction of these  fabs then their price could skyrocket and that   could lead to measures we've never seen before  to expand and accelerate fab production. If you  

consider, at the limit you're getting models that  approach human-like capability, imagine things   that are getting close to brain-like efficiencies  plus AI advantages. We were talking before   a cluster of GPU supporting AIs that do things,  data parallelism. If that can work four times as   much as a highly skilled motivated focused human  with levels of education that have never been   seen in the human population, and if a typical  software engineer can earn hundreds of thousands   of dollars, the world's best software engineers  can earn millions of dollars today and maybe more   in a world where there's so much demand for AI.  And then times four for working all the time.   If you can generate close to 10 million dollars  a year out of the future version H100 and it cost   tens of thousands of dollars with a huge profit  margin now. And profit margin could be reduced   with large production. That is a big difference  that that chip pays for itself almost instantly  

and you could support paying 10 times as much  to have these fabs constructed more rapidly.   If AI is starting to be able to contribute more  of the skilled technical work that makes it hard   for NVIDIA to suddenly find thousands upon  thousands of top quality engineering hires.  If AI hasn't reached that level of performance  then this is how you can have things stall   out. A world where AI progress stalls out is  one where you go to the 100 billion and then   over succeeding years software progress turns out  to stall. You lose the gains that you are getting   from moving researchers from other fields. Lots of  physicists and people from other areas of computer   science have been going to AI but you tap out  those resources as AI becomes a larger proportion   of the research field. And okay, you've put in  all of these inputs, but they just haven't yielded  

AGI yet. I think that set of inputs probably  would yield the kind of AI capabilities needed   for intelligence explosion but if it doesn't,  after we've exhausted this current scale up of   increasing the share of our economy that is trying  to make AI. If that's not enough then after that   you have to wait for the slow grind of things  like general economic growth, population growth   and such and so things slow. That results in my  credences and this kind of advanced AI happening   to be relatively concentrated, over the next 10  years compared to the rest of the century because   we can't keep going with this rapid redirection  of resources into AI. That's a one-time thing. 

If the current scale up works we're going to  get to AGI really fast, like within the next   10 years or something. If the current scale  up doesn't work, all we're left with is just   like the economy growing 2% a year, we have 2%  a year more resources to spend on AI and at that   scale you're talking about decades before just  through sheer brute force you can train the 10   trillion dollar model or something. Let's talk  about why you have your thesis that the current   scale up would work. What is the evidence from  AI itself or maybe from primate evolution and the  

evolution of other animals? Just give me the whole  confluence of reasons that make you think that.  Maybe the best way to look at that might be  to consider, when I first became interested   in this area, so in the 2000s which was  before the deep learning revolution,   how would I think about timelines? How did  I think about timelines? And then how have I   updated based on what has been happening with  deep learning? Back then I would have said   we know the brain is a physical object, an  information processing device, it works,   it's possible and not only is it possible  it was created by evolution on earth.   That gives us something of an upper  bound in that this kind of brute force   was sufficient. There are some complexities  like what if it was a freak accident and that  

didn't happen on all of the other planets and  that added some value. I have a paper with Nick   Bostrom on this. I think basically that's not that  important an issue. There's convergent evolution,   octopi are also quite sophisticated. If a special  event was at the level of forming cells at all,   or forming brains at all, we get to skip that  because we're choosing to build computers and   we already exist. We have that advantage. So  evolution gives something of an upper bound,   really intensive massive brute  force search and things like   evolutionary algorithms can produce intelligence. Isn’t the fact that octopi and other mammals got   to the point of being pretty intelligent but not  human level intelligent some evidence that there's   a hard step between a cephalopod and a human? Yeah, that would be a place to look   but it doesn't seem particularly compelling.  One source of evidence on that is work by  

Herculano-Houzel. She's a neuroscientist who  has dissolved the brains of many creatures and   by counting the nuclei she's able to determine  how many neurons are present in different species   and has found a lot of interesting trends in  scaling laws. She has a paper discussing the   human brain as a scaled up primate brain. Across  a wide variety of animals, mammals in particular,   there's certain characteristic changes in the  number of neurons and the size of different   brain regions as things scale up. There's a lot  of structural similarity there and you can explain   a lot of what is different about us with a brute  force story which is that you expend resources   on having a bigger brain, keeping it in good  order, and giving it time to learn. We have an   unusually long childhood. We spend more compute  by having a larger brain than other animals,  

more than three times as large as chimpanzees, and  then we have a longer childhood than chimpanzees   and much more than many, many other creatures.  So we're spending more compute in a way that's   analogous to having a bigger model and having  more training time with it. And given that we see   with our AI models, these large consistent  benefits from increasing compute spent in   those ways and with qualitatively new capabilities  showing up over and over again particularly in   areas that AI skeptics call out. In my experience  over the last 15 years the things that people call   out are like —”Ah, but the AI can't do that and  it's because of a fundamental limitation.” We've   gone through a lot of them. There were Winograd  schemas, catastrophic forgetting, quite a number  

and they have repeatedly gone away through  scaling. So there's a picture that we're   seeing supported from biology and from our  experience with AI where you can explain —   Yeah, in general, there are trade-offs where the  extra fitness you get from a brain is not worth it   and so creatures wind up mostly with small brains  because they can save that biological energy and   that time to reproduce, for digestion and  so on. Humans seem to have wound up in a   self-reinforcing niche where we greatly increase  the returns to having large brains. Language and   technology are the obvious candidates. You have  humans around you who know a lot of things and  

they can teach you. And compared to almost any  other species we have vastly more instruction   from parents and the society of the [unclear].  You're getting way more from your brain than   you get per minute because you can learn  a lot more useful skills and then you can   provide the energy you need to feed  that brain by hunting and gathering,   by having fire that makes digestion easier. Basically how this process goes on is that it's   increasing the marginal increase in reproductive  fitness you get from allocating more resources   along a bunch of dimensions towards cognitive  ability. That's bigger brains, longer childhood,   having our attention be more on learning.  Humans play a lot and we keep playing as  

adults which is a very weird thing compared  to other animals. We're more motivated to copy   other humans around us than the other primates.  These are motivational changes that keep us using   more of our attention and effort on learning  which pays off more when you have a bigger   brain and a longer lifespan in which to learn in. Many creatures are subject to lots of predation   or disease. If you're mayfly or a mouse and  if you try and invest in a giant brain and a   very long childhood you're quite likely to be  killed by some predator or some disease before   you're actually able to use it. That means you  actually have exponentially increasing costs in a  

given niche. If I have a 50% chance of dying every  few months, as a little mammal or a little lizard,   that means the cost of going from three  months to 30 months of learning and childhood   development is not 10 times the loss, it’s 2^-10.  A factor of 1024 reduction in the benefit I get   from what I ultimately learn because 99.9 percent  of the animals will have been killed before  

that point. We're in a niche where we're a large  long-lived animal with language and technology so   where we can learn a lot from our groups. And that  means it pays off to just expand our investment   on these multiple fronts in intelligence. That's so interesting. Just for the audience   the calculation about like two to the whatever  months is just like, you have a half chance of   dying this month, a half chance of dying next  month, you multiply those together. There's   other species though that do live in flocks or  as packs. They do have a smaller version of the   development of cubs that play with each other.  Why isn't this a hill on which they could have  

climbed to human level intelligence themselves?  If it's something like language or technology,   humans were getting smarter before we got  language. It seems like there should be   other species that should have beginnings of this  cognitive revolution especially given how valuable   it is given we've dominated the world. You would  think there would be selective pressure for it.  Evolution doesn't have foresight. The thing  in this generation that gets more surviving   offspring and grandchildren is the thing that  becomes more common. Evolution doesn't look ahead   and think oh in a million years you'll have a lot  of descendants. It's what survives and reproduces   now. In fact, there are correlations where social  animals do on average have larger brains and  

part of that is probably the additional social  applications of brains, like keeping track of   which of your group members have helped you before  so that you can reciprocate. You scratch my back,   I'll scratch yours. Remembering who's dangerous  within the group is an additional application of   intelligence. So there's some correlation  there but what it seems like is that  

in most of these cases it's enough to invest  more but not invest to the point where a mind   can easily develop language and technology  and pass it on. You see bits of tool use in   some other primates who have an advantage compared  to say whales who have quite large brains partly   because they are so large themselves and they have  some other things, but they don't have hands which   means that reduces a bunch of ways in which brains  can pay off and investments in the functioning of   that brain. But yeah, primates will use sticks to  extract termites, Capuchin monkeys will open clams   by smashing them with a rock. But what they  don't have is the ability to sustain culture.   A particular primate will maybe discover  one of these tactics and it'll be copied   by their immediate group but they're not  holding on to it that well. When they see   the other animal do it they can copy it in  that situation but they don't actively teach   each other in their population. So it's easy  to forget things, easy to lose information   and in fact they remain technologically  stagnant for hundreds of thousands of years. 

And we can look at some human situations. There's  an old paper, I believe by the economist Michael   Kramer, which talks about technological growth  in the different continents for human societies.   Eurasia is the largest integrated connected area.  Africa is partly connected to it but the Sahara  

desert restricts the flow of information and  technology and such. Then you have the Americas   after the colonization from the land bridge were  largely separated and are smaller than Eurasia,   then Australia, and then you had smaller island  situations like Tasmania. Technological progress   seems to have been faster the larger the connected  group of people. And in the smallest groups,   like Tasmania where you had a relatively small  population, they actually lost technology.   They lost some fishing techniques. And if  you have a small population and you have  

some limited number of people who know a  skill and they happen to die or there's   some change in circumstances that causes  people not to practice or pass on that thing   then you lose it. If you have few people you're  doing less innovation and the rate at which you   lose technologies to some local disturbance and  the rate at which you create new technologies   can wind up imbalanced. The great change of  hominids and humanity is that we wound up in this   situation where we were accumulating faster than  we were losing and accumulating those technologies   allowed us to expand our population. They created  additional demand for intelligence so our brains   became three times as large as chimpanzees and  our ancestors who had a similar brain size.  Okay. And the crucial point in relevance to  AI is that the selective pressures against  

intelligence in other animals are not acting  against these neural networks because they're   not going to get eaten by a predator if they  spend too much time becoming more intelligent,   we're explicitly training them to become more  intelligent. So we have good first principles   reason to think that if it was scaling that  made our minds this powerful and if the things   that prevented other animals from scaling are not  impinging on these neural networks, these things   should just continue to become very smart. Yeah, we are growing them in a technological   culture where there are jobs like software  engineer that depend much more on cognitive   output and less on things like metabolic  resources devoted to the immune system or   to building big muscles to throw spears. This is kind of a side note but I'm just   kind of interested. You referenced Chinchilla  scaling at some point. For the audience this   is a paper from DeepMind which describes if  you have a model of a certain size what is   the optimum amount of data that it should be  trained on? So you can imagine bigger models,   you can use more data to train them and in this  way you can figure out where you should spend your   compute. Should you spend it on making the model  bigger or should you spend it on training it for   longer? In the case of different animals, in some  sense how big their brain is like model sizes and   they're training data sizes like how long they're  cubs or how long their infants or toddlers   before they’re full adults. I’m curious,  is there some kind of scaling law? 

Chinchilla scaling is interesting because we were  talking earlier about the cost function for having   a longer childhood where it's exponentially  increasing in the amount of training compute   you have when you have exogenous forces that can  kill you. Whereas when we do big training runs,   the cost of throwing in more GPU is almost  linear and it's much better to be linear than   exponentially decay as you expend resources. Oh, that's a really good point.  Chinchilla scaling would suggest that for  a brain of human size it would be optimal   to have many millions of years of education  but obviously that's impractical because of   exogenous mortality for humans. So there's  a fairly compelling argument that relative  

to the situation where we would train AI that  animals are systematically way under trained.   They're more efficient than our models. We  still have room to improve our algorithms to   catch up with the efficiency of brains but  they are laboring under that disadvantage.  That is so interesting. I guess  another question you could have is:  

Humans got started on this evolutionary  hill climbing route where we're getting   more intelligent because it has more benefits  for us. Why didn't we go all the way on that   route? If intelligence is so powerful why aren't  all humans as smart as we know humans can be?   If intelligence is so powerful, why hasn't there  been stronger selective pressure? I understand hip   size, you can't give birth to a really big headed  baby or whatever. But you would think evolution   would figure out some way to offset that if  intelligence has such big power and is so useful.  Yeah, if you actually look at it quantitatively  that's not true and even in recent history it   looks like a pretty close balance between  the costs and the benefits of having more   cognitive abilities. You say, who needs to  worry about the metabolic costs? Humans put   20 percent of our metabolic energy into the  brain and it's higher for young children.   And then there's like breathing and  digestion and the immune system. For  

most of history people have been dying left  and right. A very large proportion of people   will die of infectious disease and if you  put more resources into your immune system   you survive. It's life or death pretty  directly via that mechanism. People die   more of disease during famine and so there's  boom or bust. If you have 20% less metabolic   requirements [unclear] you're much more likely  to survive that famine. So these are pretty big.  And then there's a trade-off about just cleaning  mutational load. So every generation new mutations   and errors happen in the process of reproduction.  We know there are many genetic abnormalities that  

occur through new mutations each generation and in  fact Down syndrome is the chromosomal abnormality   that you can survive. All the others just kill  the embryo so we never see them. But down syndrome   occurs a lot and there are many other lethal  mutations and there are enormous numbers of less   damaging mutations that are degrading every system  in the body. Evolution each generation has to   pull away at some of this mutational load and the  priority with which that mutational load is pulled   out scales in proportion to how much the traits  it is affecting impact fitness. So you get new   mutations that impact your resistance to malaria,  you got new mutations that damage brain function   and then those mutations are purged each  generation. If malaria is a bigger difference   in mortality than the incremental effectiveness as  a hunter-gatherer you get from being slightly more   intelligent, then you'll purge that mutational  load first. Similarly humans have been vigorously   adapting to new circumstances. Since agriculture  people have been developing things like the  

ability to have amylase to digest breads and milk.  If you're evolving for all of these things and if   some of the things that give an advantage for that  incidentally carry along nearby them some negative   effect on another trait then that other trait can  be damaged. So it really matters how important   to survival and reproduction cognitive abilities  were compared to everything else the organism   has to do. In particular, surviving famine, having  the physical abilities to do hunting and gathering  

and even if you're very good at planning your  hunting, being able to throw a spear harder can   be a big difference and that needs energy to  build those muscles and then to sustain them.  Given all these factors it's not a slam  dunk to invest at the margin. And today,   having bigger brains is associated with  greater cognitive ability but it's modest.  

Large-scale pre-registered studies with MRI  data. The correlation is in a range of 0.25 - 0.3   and the standard deviation of  brain size is like 10%. So if you   double the size of the brain, the existing brain  costs like 20 of metabolic energy go up to 40%,   okay, that's like eight standard deviations  of brain size if the correlation is   0.25 then yeah, you get a gain from that eight  standard deviations of brain size, two standard  

deviations of cognitive ability. In our modern  society, where cognitive ability is very rewarded   and finishing school and becoming an engineer or a  doctor or whatever can pay off a lot financially,   the average observed return in income is  still only one or two percent proportional   increase. There's more effects at the tail,  there's more effect in professions like STEM   but on the whole it's not a lot. If it was like  a five percent increase or a 10 percent increase   then you could tell a story where yeah, this is  hugely increasing the amount of food you could   have, you could support more children, but  it's a modest effect and the metabolic costs   will be large and then throw in these other these  other aspects. Else we can just see there was not  

very strong rapid directional selection  on the thing which would be there if   by solving a math puzzle you could defeat malaria,  then there would be more evolutionary pressure.  That is so interesting. Not to mention of  course that if you had 2x the brain size,   without c-section you or your mother or both would  die. This is a question I've actually been curious   about for over a year and I’ve briefly tried to  look up an answer. I know this was off topic and   my apologies to the audience, but I was super  interested and that was the most comprehensive   and interesting answer I could have hoped for.  So yeah, we have a good explanation or good first   principles evolution or reason for thinking  that intelligence scaling up to humans is not   implausible just by throwing more scale at it. I would also add that we also have the brain  

right here with us available for neuroscience  to reverse engineer its properties. This was   something that would have mattered to me more in  the 2000s. Back then when I said, yeah, I expect   this by the middle of the century-ish, that was a  backstop if we found it absurdly difficult to get   to the algorithms and then we would learn  from neuroscience. But in actual history,   it's really not like that. We develop things in  AI and then also we can say oh, yeah, this is   like this thing in neuroscience or maybe this is a  good explanation. It's not as though neuroscience  

Is driving AI progress. It turns  out not to be that necessary.  I guess that is similar to how planes were  inspired by the existence proof of birds   but jet engines don't flap. All right,  good reason to think scaling might work.   So we spent a hundred billion dollars  and we have something that is like human   level or can help significantly with AI research. I mean that that might be on the earlier end but I   definitely would not rule that out given the rates  of change we've seen with the last few scale ups.  At this point somebody might be skeptical.  We already have a bunch of human researchers,   how profitable is the incremental  researcher? And then you might say no,   this is thousands of researchers. I don’t know how  to express this skepticism exactly. But skeptical  

of just generally the effect of scaling up  the number of people working on the problem   to rapid-rapid progress on that problem. Somebody  might think that with humans the reason the amount   of population working on a problem is such a good  proxy for progress on the problem is that there's   already so much variation that is accounted  for. When you say there's a million people   working on a problem, there's hundreds of super  geniuses working on it, thousands of people who   are very smart working on it. Whereas with an AI  all the copies are the same level of intelligence   and if it's not super genius intelligence  the total quantity might not matter as much. 

I'm not sure what your model is here. Is the model  that the diminishing returns kickoff, suddenly has   a cliff right where we are? There were results  in the past from throwing more people at problems   and this has been useful in historical prediction,  this idea of experience curves and [unclear] law   measuring cumulative production in a field,  which is also going to be a measure of the   scale of effort and investment, and people have  used this correctly to argue that renewable   energy technology, like solar, would be falling  rapidly in price because it was going from a low   base of very small production runs, not  much investment in doing it efficiently,   and climate advocates correctly called out,  people like David Roberts, the futurist   [unclear] actually has some interesting writing on  this. They correctly called out that there would   be a really drastic fall in prices of solar and  batteries because of the increasing investment   going into that. The human genome project would be  another. So I’d say there's real evidence. These   observed correlations, from ideas getting harder  to find, have held over a fair range of data and   over quite a lot of time. So I'm wondering what‘s  the nature of the deviation you're thinking of?  Maybe this is a good way to describe what happens  when more humans enter a field but does it even   make sense to say that a greater population of  AIs is doing AI research if there's like more   GPUs running a copy of GPT-6 doing AI research.  How applicable are these economic models of the  

quantity of humans working on a problem to  the magnitude of AIs working on a problem?  If you have AIs that are directly automating  particular jobs that humans were doing before   then we say, well with additional compute we  can run more copies of them to do more of those   tasks simultaneously. We can also run them at  greater speed. Some people have an intuition   that what matters is time, that it's not how many  people working on a problem at a given point. I   think that doesn't bear out super well but AI can  also run faster than humans. If you have a set of   AIs that can do the work of the individual human  researchers and run at 10 times or 100 times the   speed. And we ask well, could the human research  community have solved these algorithm problems, do  

things like invent transformers over 100 years, if  we have AIs with a population effective population   similar to the humans but running 100 times as  fast and so. You have to tell a story where no,   the AI can't really do the same things as the  humans and we're talking about what happens when   the AIs are more capable of in fact doing that. Although they become more capable as lesser   capable versions of themselves help us make  themselves more capable, right? You have to   kickstart that at some point. Is there an example  in analogous situations? Is intelligence unique in   the sense that you have a feedback loop of — with  a learning curve or something else, a system’s   outputs are feeding into its own inputs. Because  if we're talking about something like Moore's law  

or the cost of solar, you do have this way where  we're throwing more people with the problem and   we're making a lot of progress, but we don't  have this additional part of the model where   Moore's law leads to more humans somehow  and more humans are becoming researchers.  You do actually have a version of that in the  case of solar. You have a small infant industry   that's doing things like providing solar panels  for space satellites and then getting increasing   amounts of subsidized government demand because  of worries about fossil fuel depletion and then   climate change. You can have the dynamic where  visible successes with solar and lowering prices   then open up new markets. There's a particularly  huge transition where renewables become cheap  

enough to replace large chunks of the electric  grid. Earlier you were dealing with very niche   situations like satellites, it’s very difficult  to refuel a satellite in place and in remote   areas. And then moving to the sunniest areas  in the world with the biggest solar subsidies.   There was an element of that where more and more  investment has been thrown into the field and the   market has rapidly expanded as the technology  improved. But I think the closest analogy   is actually the long run growth of human  civilization itself and I know you had Holden   Karnofsky from the open philanthropy project on  earlier and discuss some of this research about   the long run acceleration of human population  and economic growth. Developing new technologies   allowed the human population to expand and  humans to occupy new habitats and new areas   and then to invent agriculture to support the  larger populations and then even more advanced   agriculture in the modern industrial society. So  there, the total technology and output allowed you  

to support more humans who then would discover  more technology and continue the process. Now   that was boosted because on top of expanding the  population the share of human activity that was   going into invention and innovation went up and  that was a key part of the industrial revolution.   There was no such thing as a corporate  research lab or an engineering university   prior to that. So you're both increasing  the total human population and the share   of it going in. But this population dynamic  is pretty analogous. Humans invent farming,  

they can have more humans, they  can invent industry and so on.  Maybe somebody would be skeptical that with AI  progress specifically, it’s not just a matter of   some farmer figuring out crop rotation or some  blacksmith figuring out how to do metallurgy   better. In fact even to make the 50% improvement  in productivity you basically need something on   the IQ that's close to Ilya Sutskever. There's  like a discontinuous line. You’re contributing  

very little to produc

2023-06-20 08:23

Show Video

Other news