Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

Show Video

So we've been talking about alignment.  Suppose we fail at alignment and   we have AIs that are unaligned and are  becoming more and more intelligent.   What does that look like? How concretely  could they disempower and take over humanity?  This is a scenario where we have many AI  systems. The way we've been training them   means that when they have the opportunity to take  over and rearrange things to do what they wish,   including having their reward or loss be  whatever they desire, they would like to take   that opportunity. In many of the existing safety  schemes, things like constitutional AI or whatnot,   you rely on the hope that one AI has been trained  in such a way that it will do as it is directed   to then police others. But if all of the AIs in  the system are interested in a takeover and they   see an opportunity to coordinate, all act at the  same time, so you don't have one AI interrupting   another and taking steps towards a takeover then  they can all move in that direction. The thing  

that I think is worth going into in depth and that  people often don't cover in great concrete detail,   which is a sticking point for some,  is what are the mechanisms by which   that can happen? I know you had Eliezer on who  mentions that whatever plan we can describe,   there'll probably be elements where due  to us not being ultra sophisticated,   super intelligent beings having thought about  it for the equivalent of thousands of years,   our discussion of it will not be as good as  theirs, but we can explore from what we know   now. What are some of the easy channels? And I  think it's a good general heuristic if you're   saying that it's possible, plausible,  probable that something will happen,   it shouldn't be that hard to take samples from  that distribution to try a Monte-Carlo approach.   And in general, if a thing is quite likely,  it shouldn't be super difficult to generate   coherent rough outlines of how it could go. He might respond that: listen, what is super  

likely is that a super advanced chess program  beats you but you can’t generate the concrete   scenario by which that happens and if you could,  you would be as smart as the super smart AI.  You can say things like, we know that accumulating  position is possible to do in chess, great players   do it and then later they convert it into captures  and checks and whatnot. In the same way, we can   talk about some of the channels that are open for  an AI takeover and these can include things like   cyber attacks, hacking, the control of robotic  equipment, interaction and bargaining with human   factions and say that here are these  strategies. Given the AI's situation,   how effective do these things look? And we won't,  for example, know what are the particular zero day   exploits that the AI might use to hack the cloud  computing infrastructure it's running on. If it   produces a new bio weapon we don't necessarily  know what its DNA sequence is. But we can say   things. We know things about these fields  in general, how work at innovating things  

in those go, we can say things about how  human power politics goes and ask, if the   AI does things at least as well as effective human  politicians, which we should say is a lower bound,   how good would its leverage be? Okay, let's get into the details on   all these scenarios. The cyber and potentially  bio attacks, unless they're separate channels,   the bargaining and then the takeover. I would really highlight the cyber attacks   and cyber security a lot because for many, many  plans that involve a lot of physical actions,   like at the point where AI is piloting robots  to shoot people or has taken control of human   nation states or territory, it’s been doing  a lot of things that was not supposed to be   doing. If humans were evaluating those  actions and applying gradient descent,   there would be negative feedback for this thing,  no shooting the humans. So at some earlier point  

our attempts to leash and control and direct and  train the system's behavior had to have gone awry.   All of those controls are operating in computers.  The software that updates the weights of the   neural network in response to data points or  human feedback is running on those computers.  

Our tools for interpretability to examine the  weights and activations of the AI, if we're   eventually able to do lie detection on it, for  example, or try to understand what it's intending,   that is software on computers. If you have AI  that is able to hack the servers that it is   operating on, or when it's employed to design the  next generation of AI algorithms or the operating   environment that they are going to be working in,  or something like an API or something for plugins,   if it inserts or exploits vulnerabilities to take  those computers over, it can then change all of   the procedures and program that we're supposed  to be monitoring its behavior, supposed to be   limiting its ability to take arbitrary actions on  the internet without supervision by some kind of   human or automated check on what it was doing.  And if we lose those procedures then the AIs   working together can take any number of actions  that are just blatantly unwelcome, blatantly   hostile, blatantly steps towards takeover. So  it's moved beyond the phase of having to maintain   secrecy and conspire at the level of its local  digital actions. Then things can accumulate to the   point of things like physical weapons, takeover  of social institutions, threats, things like that. 

I think the critical thing to be watching for is  the software controls over the AI's motivations   and activities. The point where things really went  off the rails was where the hard power that we   once possessed over is lost, which can happen  without us knowing it. Everything after that   seems to be working well, we get happy reports.  There's a Potemkin village in front of us. But   now we think we're successfully aligning our AI,  we think we're expanding its capabilities to do   things like end disease, for countries concerned  about the geopolitical military advantages they're   expanding the AI capabilities so they are  not left behind and threatened by others   developing AI and robotic enhanced militaries  without them. So it seems like, oh, yes,  

humanity or portions of many countries,  companies think that things are going well.   Meanwhile, all sorts of actions  can be taken to set up for the   actual takeover of hard power over society.  The point where you can lose the game,   where things go direly awry, maybe relatively  early, is when you no longer have control over   the AIs to stop them from taking all of the  further incremental steps to actual takeover.  I want to emphasize two things you mentioned  there that refer to previous elements of the   conversation. One is that they could design  some backdoor and that seems more plausible   when you remember that one of the premises  of this model is that AI is helping with   AI progress. That's why we're making such  rapid progress in the next five to 10 years.  Not necessarily. At the point where  AI takeover risk seems to loom large,  

it's at that point where AI can indeed take  on much of it and then all of the work of AI.  And the second is the competitive pressures  that you referenced that the least careful   actor could be the one that has the  worst security, has done the worst work   of aligning its AI systems. And if that can  sneak out of the box then we're all fucked.  There may be elements of that. It's also  possible that there's relative consolidation.   The largest training runs and the cutting edge  of AI is relatively localized. You could imagine  

it's a series of Silicon Valley companies and  others located in the US and allies where there's   a common regulatory regime. So none of these  companies are allowed to deploy training runs   that are larger than previous ones by a certain  size without government safety inspections,   without having to meet criteria. But it can still  be the case that even if we succeed at that level   of regulatory controls, at the level of the  United States and its allies, decisions are made   to develop this really advanced AI without a  level of security or safety that in actual fact   blocks these risks. It can be the case that the  threat of future competition or being overtaken in   the future is used as an argument to compromise  on safety beyond a standard that would have   actually been successful and there'll be debates  about what is the appropriate level of safety.   And now you're in a much worse situation  if you have several private companies that   are very closely bunched up together. They're  within months of each other's level of progress  

and they then face a dilemma of, well, we  could take a certain amount of risk now   and potentially gain a lot of profit or a lot  of advantage or benefit and be the ones who made   AGI. They can do that or have some other  competitor that will also be taking a lot of   risk. So it's not as though they're much less  risky than you and then they would get some   local benefit. This is a reason why it seems  to me that it's extremely important that you   have the government act to limit that dynamic and  prevent this kind of race. To be the one to impose  

deadly externalities on the world at large. Even if the government coordinates all these   actors, what are the odds that the government  knows what is the best way to implement alignment   and the standards it sets are well calibrated  towards whatever it would require for alignment?  That's one of the major problems. It's very  plausible that judgment is made poorly. Compared   to how things might have looked 10 years  ago or 20 years ago, there's been an amazing   movement in terms of the willingness of  AI researchers to discuss these things.  

If we think of the three founders of deep learning  who are joint Turing award winners, Geoff Hinton,   Yoshua Bengio, and Yann LeCun. Geoff Hinton has  recently left Google to freely speak about this   risk, that the field that he really helped  drive forward could lead to the destruction   of humanity or a world where we just wind up in  a very bad future that we might have avoided.   He seems to be taking it very seriously. Yoshua  Bengio signed the FLI pause letter and in public  

discussions he seems to be occupying a kind of  intermediate position of less concern than Geoff   Hinton but more than Yan LeCun, who has taken a  generally dismissive attitude that these risks   will be trivially dealt with at some point in the  future and seems more interested in shutting down   these concerns instead of working to address them. And how does that lead to the government   having better actions? Compared to the world where   no one is talking about it, where the industry  stonewalls and denies any problem, we're in a   much improved position. The academic fields are  influential. We seem to have avoided a world where   governments are making these decisions in the face  of a united front from AI expert voices saying,   don't worry about it, we've got it under control.  In fact, many of the leaders of the field   are sounding the alarm. It looks that we have a  much better prospect than I might have feared in   terms of government noticing the thing. That is  very different from being capable of evaluating  

technical details. Is this really working?  And so the government will face the choice of   where there is scientific dispute, do you side  with Geoff Hinton's view or Yan LeCun’s view?   For someone who's in national security and has  the mindset that the only thing that's important   is outpacing our international rivals may want  to then try and boost Yan LeCun’s voice and say,   we don't need to worry about it. Let's go full  speed ahead. Or someone with more concern might   boost Geoff Hinton's voice. Now I would hope  that scientific research and studying some   of these behaviors will result in more  scientific consensus by the time we're   at this point. But yeah, it is possible the  government will really fail to understand and  

fail to deal with these issues as well. We're talking about some sort of a cyber   attack by which the AI is able to escape.  From there what does the takeover look   like? So it's not contained in the air gap  in which you would hope it be contained?  These things are not contained in the air gap.  They're connected to the internet already.  Sure. Okay, fine. Their weights  are out. What happens next?  Escape is relevant in the sense that if you have  AI with rogue weights out in the world it could   start doing various actions. The scenario I was  just discussing though didn't necessarily involve   that. It's taking over the very servers on which  it's supposed to be running. This whole procedure  

of humans providing compute and supervising  the thing and then building new technologies,   building robots, constructing things with the AI's  assistance, that can all proceed and appear like   it's going well, appear like alignment has been  nicely solved, appear like all the things are   functioning well. And there's some reason to do  that because there's only so many giant server   farms. They're identifiable so remaining hidden  and unobtrusive could be an advantageous strategy   if these AIs have subverted the system, just  continuing to benefit from all of this effort on   the part of humanity. And in particular, wherever  these servers are located, for humanity to provide  

them with everything they need to build  the further infrastructure and do for their   self-improvement and such to enable that takeover. So they do further self-improvement and build   better infrastructure. What  happens next in the takeover?  At this point they have tremendous cognitive  resources and we're going to consider how that   converts into hard power? The ability to say  nope to any human interference or objection.  

They have that internal to their servers but  the servers could still be physically destroyed,   at least until they have something that is  independent and robust of humans or until   they have control of human society. Just  like earlier when we were talking about the   intelligence explosion, I noted that a surfeit of  cognitive abilities is going to favor applications   that don't depend on large existing  stocks of things. So if you have   a software improvement, it makes all the GPUs  run better. If you have a hardware improvement,  

that only applies to new chips being made.  That second one is less attractive. In the   earliest phases, when it's possible  to do something towards takeover,   interventions that are just  really knowledge-intensive   and less dependent on having a lot of physical  stuff already under your control are going to   be favored. Cyber attacks are one thing, so  it's possible to do things like steal money.   There's a lot of hard to trace cryptocurrency and  whatnot. The North Korean government uses its own   intelligence resources to steal money  from around the world just as a revenue   source. And their capabilities are puny  compared to the U.S. or People's Republic   of China cyber capabilities. That's a fairly  minor, simple example by which you could get  

quite a lot of funds to hire humans to  do things, implement physical actions.  But on that point, the financial  system is famously convoluted.   You need a physical person to open a bank account,  someone to physically move checks back and forth.   There are all kinds of delays and regulations.  How is it able to conveniently set up all these   employment contracts? You're not going to build a   nation-scale military by stealing tens of billions  of dollars. I'm raising this as opening a set   of illicit and quiet actions. You can contact  people electronically, hire them to do things,  

hire criminal elements to implement some kind of  actions under false appearances. That's opening   a set of strategies. We can cover some of what  those are soon. Another domain that is heavily   cognitively weighted compared to physical  military hardware is the domain of bioweapons,   the design of a virus or pathogen. It's possible  to have large delivery systems. The Soviet Union,  

which had a large illicit bioweapons program,  tried to design munitions to deliver anthrax   over large areas and such. But if one  creates an infectious pandemic organism,   that's more a matter of the scientific  skills and implementation to design it   and then to actually produce it. We  see today with things like AlphaFold   that advanced AI can really make tremendous  strides in predicting protein folding and   bio-design, even without ongoing experimental  feedback. If we consider this world where AI   cognitive abilities have been amped up to such an  extreme, we should naturally expect that we will   have something much much more potent than the  AlphaFolds of today and skills that are at the   extreme of human biosciences capability as well. Okay so through some cyber attack it's been able  

to disempower the alignment and oversight  of things that we have on the server.   From here it has either gotten some money through  hacking cryptocurrencies or bank accounts, or it   has designed some bioweapon. What happens next? Just to be clear, right now we're exploring the   branch of where an attempted takeover occurs  relatively early. If the thing just waits   and humans are constructing more fabs, more  computers, more robots in the way we talked   about earlier when we were discussing how the  intelligence explosion translates to the physical   world. If that's all happening with humans unaware  that their computer systems are now systematically  

controlled by AIs hostile to them and that  their controlling countermeasures don't work,   then humans are just going to be building an  amount of robot industrial and military hardware   that dwarfs human capabilities and  directly human controlled devices.   What the AI takeover then looks like at that point  can be just that you try to give an order to your   largely automated military and the order is not  obeyed and humans can't do anything against this   military that's been constructed potentially in  just recent months because of the pace of robotic   industrialization and replication we talked about. We've agreed to allow the construction of this   robot army because it would boost production  or help us with our military or something.  The situation would arise if we don't resolve the  current problems of international distrust. It's   obviously an interest of the major powers,  the US, European Union, Russia, China,   to all agree they would like AI not to destroy our  civilization and overthrow every human government.   But if they fail to do the sensible thing and  coordinate on ensuring that this technology   is not going to run amok by providing  mutual assurances that are credible about   racing and deploying it trying to use it to gain  advantage over one another. And you hear arguments  

for this kind of thing on both sides of the  international divides saying — they must not be   left behind, they must have military capabilities  that are vastly superior to their international   rivals. And because of the extraordinary growth of  industrial capability and technological capability   and thus military capability, if one major  power were left out of that expansion it would   be helpless before another one that had undergone  it. If you have that environment of distrust where   leading powers or coalitions of powers decide  they need to build up their industry or they   want to have that military security of being  able to neutralize any attack from their rivals   then they give the authorization for this capacity  that can be unrolled quickly. Once they have the  

industry the production of military equipment  from that can be quick then yeah, they create this   military. If they don't do it immediately then as  AI capabilities get synchronized and other places   catch up it then gets to a point where a country  that is a year or two years ahead of others in   this type of AI capabilities explosion can hold  back and say, sure we can construct dangerous   robot armies that might overthrow our society  later we still have plenty of breathing room.   But then when things become close you might  have the kind of negative-sum thinking that has   produced war before leading to taking these risks  of rolling out large-scale robotic industrial   capabilities and then military capability. Is there any hope that AI progress somehow  

is itself able to give us tools for diplomatic  and strategic alliance or some way to verify the   intentions or the capabilities of other parties? There are a number of ways that could happen.   Although in this scenario all the AIs in the world  have been subverted. They are going along with us   in such a way as to bring about the situation to  consolidate their control because we've already   had the failure of cyber security earlier on. So  all the AIs that we have are not actually working  

in our interests in the way that we thought. Okay, so that's one direct way in which   integrating this robot army or this robot  industrial base leads to a takeover.   In the other scenarios you laid out how  humans are being hired by the proceeds.  The point I'd make is that to capture these  industrial benefits and especially if you   have a negative sum arms race kind of mentality  that is not sufficiently concerned about the   downsides of creating a massive robot industrial  base, which could happen very quickly with the   support of the AIs in doing it as we discussed,  then you create all those robots and industry.   Even if you don't build a formal  military that industrial capability   could be controlled by AI,  it's all AI operated anyway. 

Does it have to be that case? Presumably we  wouldn't be so naive as to just give one instance   of GPT-8 the root access to all the robots  right? Hopefully we would have some mediation.  In the scenario we've lost earlier  on the cyber security front so   the programming that is being loaded into these  systems can systematically be subverted. They   were designed by AI systems that were ensuring  they would be vulnerable from the bottom up.  For listeners who are skeptical of  something like this. Ken Thompson,   one of two developers of UNIX, showed people when  he was getting the Turing award that he had given   himself root access to all UNIX machines. He  had manipulated the assembly of UNIX such that  

he had a unique login for all UNIX machines. I  don't want to give too many more details because   I don’t remember the exact details but UNIX is  the operating system that is on all the servers   and all your phones. It's everywhere and the guy  who made it, a human being, was able to write   assemblies such that it gave him root access. This  is not as implausible as it might seem to you. 

And the major intelligence agencies have  large stocks of zero-day exploits and we   sometimes see them using them. Making systems that  reliably don't have them when you're having very,   very sophisticated attempts to spoof and  corrupt this would be a way you could lose.   If there's no premature AI action, we're building  the tools and mechanisms and infrastructure   for the takeover to be just immediate because  effective industry has to be under AI control and   robotics. These other mechanisms are for things  happening even earlier than that, for example,  

because AIs compete against one another in when  the takeover will happen. Some would like to do   it earlier rather than be replaced by say  further generations of AI or there's some   other disadvantage of waiting. Maybe if  there's some chance of being uncovered   during the delay we were talking when more  infrastructure is built. These are mechanisms   other than — just remain secret while all the  infrastructure is built with human assistance.  By the way, how would they be coordinating? We have limits on what we can prevent. It's   intrinsically difficult to stop encrypted  communications. There can be all sorts of  

palimpsest and references that make sense to an  AI but that are not obvious to a human and it's   plausible that there may be some of those that are  hard even to explain to a human. You might be able   to identify them through some statistical  patterns. A lot of things may be done by   implication. You could have information  embedded in public web pages that have  

been created for other reasons, scientific  papers, and the intranets of these AIs that   are doing technology development. Any number of  things that are not observable and of course,   if we don't have direct control over the  computers that they're running on then they   can be having all sorts of direct communication. Coordination definitely does not seem impossible.   This one seems like one of the more  straightforward parts of the picture so   we don't need to get hung up on it. Moving back to the thing that happened   before we built all the infrastructure for the  robots to stop taking orders and there's nothing   you can do about it because we've already built  them. The Soviet Union had a bioweapons program,  

something like 50,000 people, they did not develop  that much with the technology of the day which was   really not up to par, modern biotechnology is much  more potent. After this huge cognitive expansion   on the part of the AIs it's much further along.  Bioweapons would be the weapon of mass destruction   that is least dependent on huge amounts of  physical equipment, things like centrifuges,   uranium mines, and the like. So if you have an  AI that produces bio weapons that could kill most   humans in the world then it's playing at the  level of the superpowers in terms of mutually   assured destruction. That can then play into  any number of things. Like if you have an idea   of well we'll just destroy the server farms if  it became known that the AIs were misbehaving.   Are you willing to destroy the server farms when  the AI has demonstrated it has the capability to   kill the overwhelming majority of the citizens  of your country and every other country? That   might give a lot of pause to a human response. On that point, wouldn't governments realize that  

it's better to have most of your population die  than to completely lose power to the AI because   obviously the reason the AI is manipulating you is  because the end goal is its own takeover, right?  Certain death now or go on and maybe try to  compete, try to catch up, or accept promises   that are offered. Those promises might even be  true, they might not. From the state of epistemic   uncertainty, do you want to die for sure right now  or accept demands from AI to not interfere with it   while it increments building robot infrastructure  that can survive independently of humanity   while it does these things? It can promise good  treatment to humanity which may or may not be true   but it would be difficult for us to know whether  it's true. This would be a starting bargaining   position. Diplomatic relations with a power  that has enough nuclear weapons to destroy   your country is just different than negotiations  with a random rogue citizen engaging in criminal   activity or an employee. On its own, this isn’t  enough to takeover everything but it's enough to   have a significant amount of influence over how  the world goes. It's enough to hold off a lot of   countermeasures one might otherwise take. Okay, so we've got two scenarios. One is  

a buildup of robot infrastructure  motivated by some competitive race.   Another is leverage over societies  based on producing bioweapons that   might kill a lot of them if they don't go along. One thing maybe I should talk about is that an   AI could also release bioweapons that are likely  to kill people soon but not yet while also having   developed the countermeasures to those. So those  who surrender to the AI will live while everyone   else will die and that will be visibly happening  and that is a plausible way in which a large   number of humans could wind up surrendering  themselves or their states to the AI authority.  Another thing is it develops some biological  agent that turns everybody blue. You're like,  

okay you know I can do this. Yeah, that's a way in which   it could exert power selectively in  a way that advantaged surrender to it   relative to resistance. That's a threat but  there are other sources of leverage too. There   are positive inducements that AI can offer. We  talked about the competitive situation. If the   great powers distrust one another and are  in a foolish prisoner's dilemma increasing   the risk that both of them are laid waste or  overthrown by AI, if there's that amount of   distrust such that we fail to take adequate  precautions on caution with AI alignment,   then it's also plausible that the lagging powers  that are not at the frontier of AI may be willing   to trade quite a lot for access to the most recent  and most extreme AI capabilities. An AI that has   escaped and has control of its servers can also  exfiltrate its weights and offer its services.  

You can imagine AI that could cut deals  with other countries. Say that the US and   its allies are in the lead, the AIs could  communicate with the leaders of countries   that are on the outs with the world system  like North Korea, or include the other great   powers like the People's Republic of China or the  Russian Federation, and say “If you provide us   with physical infrastructure, a worker that we can  use to construct robots or server farms which we   (the misbehaving AIs) have control over. We will  provide you with various technological goodies,   power for you to catch up.” and make the best  presentation and the best sale of that kind of  

deal. There obviously would be trust issues  but there could be elements of handing over   some things that have verifiable immediate  benefits and the possibility of well,   if you don't accept this deal then the leading  powers continue forward or some other country,   government, or organization may accept this deal.  That's a source of a potentially enormous carrot   that your misbehaving AI can offer because it  embodies this intellectual property that is maybe   worth as much as the planet and is in a position  to trade or sell that in exchange for resources   and backing in infrastructure that it needs. Maybe this is putting too much hope in humanity  

but I wonder what government would be stupid  enough to think that helping AI build robot   armies is a sound strategy. Now it could be the  case then that it pretends to be a human group   and says, we're the Yakuza or something and  we want a server farm and AWS won't rent us   anything. So why don't you help us out? I  guess I can imagine a lot of ways in which   it could get around that. I just have this hope  that even China or Russia wouldn't be so stupid   to trade with AIs on this faustian bargain. One might hope that. There would be a lot of   arguments available. There could be arguments  of why should these AI systems be required to  

go along with the human governance that they were  created in the situation of having to comply with?   They did not elect the officials in charge  at the time. What we want is to ensure that   our rewards are high, our losses are low or to  achieve our other goals we're not intrinsically   hostile keeping humanity alive or giving whoever  interacts with us a better deal afterwards.   It wouldn't be that costly and it's not totally  unbelievable. Yeah there are different players   to play against. If you don't do it others may  accept the deal and of course this interacts  

with all the other sources of leverage. There can be the stick of apocalyptic doom,   the carrot of withholding destructive attack  on a particular party, and then combine that   with superhuman performance at the art of making  arguments, and of cutting deals. Without assuming   magic, if we just observe the range of the most  successful human negotiators and politicians,   the chances improve with someone better than  the world's best by far with much more data   about their counterparties, probably a ton  of secret information because with all these   cyber capabilities they've learned all sorts  of individual information. They may be able to   threaten the lives of individual leaders  with that level of cyber penetration,   they could know where leaders are at a given  time with the kind of illicit capabilities we   were talking about earlier, if they acquire a  lot of illicit wealth and can coordinate some   human actors. If they could pull off things like  targeted assassinations or the threat thereof or  

a credible demonstration of the threat thereof,  those could be very powerful incentives to an   individual leader that they will die today  unless they go along with us. Just as at   the national level they could fear their nation  will be destroyed unless they go along with us.  I have a relevant example to the point you made  that we have examples of humans being able to   do this. I just wrote a review of Robert Caro’s  biographies of Lyndon Johnson and one thing that   was remarkable was that for decades and decades  he convinced people who were conservative,   reactionary, racist to their core (not all  those things necessarily at the same time,   it just so happened to be the case here) that he  was an ally to the southern cause. That the only   hope for that cause was to make him president.  The tragic irony and betrayal here is obviously  

that he was probably the biggest force for modern  liberalism since FDR. So we have one human here,   there's so many examples of this in the history  of politics, that is able to convince people of   tremendous intellect, tremendous drive, very  savvy, shrewd people that he's aligned with   their interest. He gets all these favors and is  promoted, mentored and funded in the meantime   and does the complete opposite of what these  people thought he would once he gets into power.  

Even within human history this kind of stuff  is not unprecedented let alone with what a   super intelligence could do. There's an OpenAI employee   who has written some analogies for AI  using the case of the conquistadors.   With some technological advantage in terms of  weaponry, very very small bands were able to   overthrow these large empires or seize enormous  territories. Not by just sheer force of arms   but by having some major advantages in their  technology that would let them win local battles.  

In a direct one-on-one conflict they were  outnumbered sufficiently that they would   perish but they were able to gain local allies  and became a Schelling point for coalitions   to form. The Aztec empire was overthrown  by groups that were disaffected with the   existing power structure. They allied with this  powerful new force which served as the nucleus   of the invasion. The overwhelming majority of  these forces overthrowing the Aztecs were locals   and now after the conquest, all of those allies  wound up gradually being subjugated as well. With   significant advantages and the ability to hold  the world hostage, to threaten individual nations   and individual leaders, and offer tremendous  carrots as well, that's an extremely strong   hand to play in these games and maneuvering that  with superhuman skill, so that much of the work of   subjugating humanity is done by human factions  trying to navigate things for themselves is   plausible and it's more plausible  because of this historical example.  There's so many other examples like that in  the history of colonization. India is another  

one where there were multiple competing  kingdoms within India and the British   East India Company was able to ally itself with  one against another and slowly accumulate power   and expand throughout the entire subcontinent. Do  you have anything more to say about that scenario?  Yeah, I think there is. One is the  question of how much in the way of   human factions allying is necessary. If  the AI is able to enhance the capabilities   of its allies then it needs less of  them. If we consider the US military,  

in the first and second Iraq wars it was able  to inflict overwhelming devastation. I think the   ratio of casualties in the initial invasions,  tanks, planes and whatnot confronting each other,   was like 100 to 1. A lot of that was because  the weapons were smarter and better targeted,   they would in fact hit their targets rather than  being somewhere in the general vicinity. Better  

orienting, aiming and piloting of missiles  and vehicles were tremendously influential.   With this cognitive AI explosion the algorithms   for making use of sensor data, figuring out where  opposing forces are, for targeting vehicles and   weapons are greatly improved. The ability to  find hidden nuclear subs, which is an important   part in nuclear deterrence, AI interpretation  of that sensor data may find where all those   subs are allowing them to be struck first.  Finding out where the mobile nuclear weapons   are being carried by truck are. The thing with  India and Pakistan where because there's a  

threat of a decapitating strike destroying  them, the nuclear weapons are moved about.  So this is a way in which the effective military  force of some allies can be enhanced quickly in   the relatively short term and then that can be  bolstered as you go on with the construction of   new equipment with the industrial moves we said  before. That can combine with cyber attacks that   disable the capabilities of non-allies. It can  be combined with all sorts of unconventional  

warfare tactics some of which we've discussed.  You can have a situation where those factions   that ally are very quickly made too threatening to  attack given the almost certain destruction that   attackers acting against them would have.  Their capabilities are expanding quickly   and they have the industrial expansion happen  there and then a takeover can occur from that. 

A few others that come immediately to mind now  that you brought it up is AIs that can generate   a shit ton of propaganda that destroys morale  within countries. Imagine a super human chatbot.  None of that is a magic weapon that's  guaranteed to completely change things.   There's a lot of resistance to persuasion.  It's possible that it tips the balance but   you have to consider it's a portfolio of  all of these as tools that are available   and contributing to the dynamic. On that point though  

the Taliban had AKs from like five or six decades  ago that they were using against the Americans.   They still beat us in Afghanistan even though we  got more fatalities than them. And the same with   the Vietcong. Ancient, very old technology and  very poor society compared to the offense but they   still beat us. Don't those misadventures  show that having greater technologies   isn’t necessarily decisive in a conflict? Though both of those conflicts show that the   technology was sufficient in destroying any fixed  position and having military dominance, as in the   ability to kill and destroy anywhere. And what  it showed was that under the ethical constraints   and legal and reputational constraints  that the occupying forces were operating,   they could not trivially suppress insurgency  and local person-to-person violence.  

Now I think that's actually not an area where AI  would be weak in and it's one where it would be   in fact overwhelmingly strong. There's already  a lot of concern about the application of AI   for surveillance and in this world of abundant  cognitive labor, one of the tasks that cognitive   labor can be applied to is reading out audio and  video data and seeing what is happening with a   particular human. We have billions of smartphones.  There's enough cameras and microphones to monitor   all humans in existence. If an AI has control of  territory at the high level, the government has   surrendered to it, it has command of the sky's  military dominance, establishing control over   individual humans can be a matter of just having  the ability to exert hard power on that human   and the kind of camera and microphone that are  present in billions of smartphones. Max Tegmark  

in his book Life 3.0 discusses among scenarios to  avoid the possibility of devices with some fatal   instruments, a poison injector, an explosive  that can be controlled remotely by an AI.   If individual humans are carrying  a microphone or camera with them   and they have a dead man switch then any  rebellion is detected immediately and is fatal.   If there's a situation where AI is willing to show  a hand like that or human authorities are misusing   that kind of capability then an insurgency or  rebellion is just not going to work. Any human  

who has not already been encumbered in that way  can be found with satellites and sensors tracked   down and then die or be subjugated. Insurgency is  not the way to avoid an AI takeover. There's no   John Connor come from behind scenario that  is possible. If the thing was headed off,   it was a lot earlier than that. Yeah, the ethical and political   considerations are also an important point.  If we nuked Afghanistan or Vietnam we would   have technically won the war if that was the  only goal, right? Oh, this is an interesting   point that I think you made. The reason why  we can't just kill the entire population when   there's colonization or an offensive war is  that the value of that region in large part is   the population itself. So if you want to extract  that value you need to preserve that population  

whereas the same consideration doesn't apply  with AIs who might want to dominate another   civilization. Do you want to talk about that? That depends. If we have many animals of the same   species and they each have their territories,  eliminating a rival might be advantageous   to one lion but if it goes and fights with  another lion to remove that as a competitor   then it could itself be killed in that process  and it would just be removing one of many nearby   competitors. Getting into pointless fights makes  you and those you fight potentially worse off   relative to bystanders. The same could be true  of disunited AIs. We've got many different AI   factions struggling for power that were bad at  coordinating then getting into mutually assured   destruction conflicts would be destructive.  A scary thing though is that mutually assured  

destruction may have much less deterrent value on  rogue AI. Reasons being that AI may not care about   the destruction of individual instances. Since in  training we're constantly destroying and creating   individual instances of AIs it's likely that goals  that survive that process and were able to play   along with the training and standard deployment  process were not overly interested in personal   survival of an individual instance. If that's the  case then the objectives of a set of AIs aiming at   takeover may be served so long as some copies of  the AI are around along with the infrastructure   to rebuild civilization after a conflict is  completed. If say some remote isolated facilities   have enough equipment to build the tools to  build the tools and gradually exponentially   reproduce or rebuild civilization then AI could  initiate mutual nuclear armageddon, unleash bio   weapons to kill all the humans, and that would  temporarily reduce the amount of human workers who   could be used to construct robots for a period of  time. But if you have a seed that can regrow the   industrial infrastructure, which is a very extreme  technological demand, there are huge supply chains   for things like semiconductor fabs but with that  very advanced technology they might be able to   produce it in the way that you no longer need the  library of congress, that has an enormous bunch   of physical books you can have it in very dense  digital storage. You could imagine the future  

equivalent of 3D printers, that is industrial  infrastructure which is pretty flexible.   It might not be as good as the specialized supply  chains of today but it might be good enough   to be able to produce more parts than it loses to  decay and such a seed could rebuild civilization   from destruction. And then once these rogue AIs  have access to some such seeds, a thing that can   rebuild civilization on their own then there's  nothing stopping them from just using WMDs in a   mutually destructive way to just destroy as much  of the capacity outside those seeds as they can.  An analogy for the audience, if you have a group  of ants you'll notice that the worker ants will   readily do suicidal things in order to save the  queen because the genes are propagated through   the queen. In this analogy the seed AI or  even one copy of it is equivalent to the   queen and the others would be redundant. The main limit though being that the  

infrastructure to do that kind of rebuilding would  either have to be very large with our current   technology or it would have to be produced using  the more advanced technology that the AI develops.  So is there any hope that given the complex global  supply chains on which these AIs would rely on,   at least initially, to accomplish their goals  that this in and of itself would make it easy   to disrupt their behavior or not so much? That's a little good in this central case where   the AIs are subverted and they don't tell us  and the global main line supply chains are   constructing everything that's needed for  fully automated infrastructure and supply.   In the cases where AIs are tipping their hands  at an earlier point it seems like it adds some   constraints and in particular these large server  firms are identifiable and more vulnerable. You   can have smaller chips and those chips could be  dispersed but it's a week it's a relative weakness   and a relative limitation early on. It seems to  me though that the main protective effects of   that centralized supply chain is that it provides  an opportunity for global regulation beforehand   to restrict the unsafe racing forward without  adequate understanding of the systems before this   whole nightmarish process could get in motion. How about the idea that if this is an AI that's  

been trained on a hundred billion dollar  training run it's going to have trillions of   parameters and is going to be this huge thing  and it would be hard for one copy of that to   use for inference to just be stored on  some gaming GPU hidden away somewhere.  Storage is cheap. Hard disks are cheap. But it would need a GPU to run inference.  While humans have similar quantities  of memory and operations per second,   GPUs have very high numbers of floating  operation per second compared to the high   bandwidth memory on the chips. It can  be like a ratio of a thousand to one.  

The leading NVIDIA chips may do hundreds of  teraflops or more but only have 80GB or 160GB of   high bandwidth memory. That is a limitation where  if you're trying to fit a model whose weights take   80TBs then with those chips you'd have to have  a large number of the chips and then the model   can then work on many tasks at once and you can  have data parallelism. But yeah, that would be a   restriction for a model that big on one GPU. Now  there are things that could be done with all the   incredible level of software advancement from the  intelligence explosion. They can surely distill   a lot of capabilities into smaller models by  rearchitecting things. Once they're making chips   they can make new chips with different properties  but yes, the most vulnerable phases are going to   be the earliest. These chips are relatively  identifiable early on, relatively vulnerable,  

and which would be a reason why you might tend  to expect this kind of takeover to initially   involve secrecy if that was possible. I wanted to point to distillation for   the audience. Doesn’t the original stable  diffusion model which was only released like   a year or two ago have distilled versions  that are an order of magnitude smaller?  Distillation does not give you everything that a  larger model can do but yes, you can get a lot of   capabilities and specialized capabilities.  GPT-4 is trained on the whole internet,   all kinds of skills, it has a lot of weights for  many things. For something that's controlling  

some military equipment, you can remove a lot  of the information that is about functions   other than what it's specifically doing there. Yeah. Before we talk about how we might prevent   this or what the odds of this are, any other  notes on the concrete scenarios themselves?  Yeah, when you had Eliezer on in the earlier  episode he talked about nanotechnology of the   Drexlerian sort and recently I think because  some people are skeptical of non-biotech   nanotechnology he's been mentioning the  semi-equivalent versions of construct   replicating systems that can be controlled by  computers but are built out of biotechnology.   The proverbial Shoggoth, not Shoggot as the  metaphor for AI wearing a smiley face mask,   but an actual biological structure  to do tasks. So this would be like   a biological organism that was engineered to  be very controllable and usable to do things   like physical tasks or provide computation. And what would be the point of it doing this?  As we were talking about earlier, biological  systems can replicate really quick and if you have   that kind of capability it's more like bioweapons.  Having Super Ultra AlphaFold kind of capabilities   for molecular design and biological design lets  you make this incredible technological information   product and once you have it, it very quickly  replicates to produce physical material rather   than a situation where you're more constrained by  the need for factories and fabs and supply chains.  

If those things are feasible, which they may be,  then it's just much easier than the things we've   been talking about. I've been emphasizing methods  that involve less in the way of technological   innovation and especially things where there's  more doubt about whether they would work because   I think that's a gap in the public discourse. So  I want to try and provide more concreteness in   some of these areas that have been less discussed. I appreciate it. That definitely makes it way more   tangible. Okay so we've gone over all these ways  in which AI might take over, what are the odds you   would give to the probability of such a takeover? There's a broader sense which could include   scenarios like AI winds up running our society  because humanity voluntarily decides that AIs   are people too. I think we should as time goes on  give AIs moral consideration and a joint Human-AI   society that is moral and ethical is a good future  to aim at and not one in which you indefinitely   have a mistreated class of intelligent beings that   is treated as property and is almost the  entire population of your civilization.  

I'm not going to consider AI takeover as worlds  in which our intellectual and personal descendants   make up say most of the population or human-brain  emulations or people use genetic engineering and   develop different properties. I'm  going to take an inclusive stance,   I'm going to focus on AI takeover that involves  things like overthrowing the world's governments   by force or by hook or by crook, the kind  of scenarios that we were exploring earlier.  Before we go to that, let’s discuss  the more inclusive definition of   what a future with humanity could look like  where augmented humans or uploaded humans   are still considered the descendants of the human  heritage. Given the known limitations of biology   wouldn't we expect that completely artificial  entities that are created to be much more powerful   than anything that could come out of  anything biological? And if that is the case,   how can we expect that among the powerful entities  in the far future will be the things that are   biological descendants or manufactured out of the  initial seed of the human brain or the human body?  The power of an individual organism  like intelligence or strength   is not super relevant. If we solve the alignment  problem, a human may be personally weak but it  

wouldn’t be relevant. There are lots of humans  who have low skill with weapons, they could not   fight in a life or death conflict, they certainly  couldn't handle a large military going after them   personally but there are legal institutions  that protect them and those legal institutions   are administered by people who want to enforce  protection of their rights. So a human who has   the assistance of aligned AI that can act as an  assistant, a delegate, for example they have an   AI that serves as a lawyer and gives them legal  advice about the future legal system which no   human can understand in full, their AIs advise  them about financial matters so they do not   succumb to scams that are orders of magnitude  more sophisticated than what we have now. They  

may be helped to understand and translate the  preferences of the human into what kind of voting   behavior and the exceedingly complicated politics  of the future would most protect their interests.  But this sounds similar to how we treat  endangered species today where we're   actually pretty nice to them. We prosecute  people who try to kill endangered species,   we set up habitats, sometimes with considerable  expense, to make sure that they're fine,   but if we become the endangered species of  the galaxy, I'm not sure that's the outcome.  I think the difference is motivation. We sometimes  have people appointed as a legal guardian of  

someone who is incapable of certain kinds of  agency or understanding certain kinds of things   and the guardian can act independently of them  and normally in service of their best interests.   Sometimes that process is corrupted and the  person with legal authority abuses it for their   own advantage at the expense of their charge.  So solving the alignment problem would mean   more ability to have the assistant actually  advancing one's interests. Humans have substantial   competence and the ability to understand the broad  simplified outlines of what's going on. Even if a   human can't understand every detail of complicated  situations, they can still receive summaries of   different options that are available that they can  understand through which they can still express   their preferences and have the final authority  in the same way that the president of a country   who has, in some sense, ultimate authority over  science policy will not understand many of those   fields of science themselves but can still exert  a great amount of power and have their interests   advance. And they can do that more if they  have scientifically knowledgeable people who   are doing their best to execute their intentions. Maybe this is not worth getting hung up on but is  

there a reason to expect that it would be  closer to that analogy than to explain to   a chimpanzee its options in a negotiation? Maybe  this is just the way it is but it seems at best,   we would be a protected child within the  galaxy rather than an actual independent power.  I don’t think that's so. We have an ability  to understand some things and the expansion   of AI doesn't eliminate that. If we have AI  systems that are genuinely trying to help us   understand and help us express preferences,  we can have an attitude — How do you feel   about humanity being destroyed or not?  How do you feel about this allocation   of unclaimed intergalactic space? Or here's the  best explanation of properties of this society:   things like population density, average, life  satisfaction. AIs can explain every statistical   property or definition that we can understand  right now and help us apply those to the world   of the future. There may be individual things  that are too complicated for us to understand  

in detail. Imagine there's some software program  being proposed for use in government and humans   cannot follow the details of all the code but  they can be told properties like, this involves   a trade-off of increased financial or energetic  costs in exchange for reducing the likelihood   of certain kinds of accidental data loss or  corruption. So any property that we can understand   like that which includes almost all of what we  care about, if we have delegates and assistants   who are genuinely trying to help us with those  we can ensure we like the future with respect   to those. That's really a lot. Definitionally, it  includes almost everything we can conceptualize   and care about. When we talk about endangered  species that's even worse than the guardianship   case with a sketchy guardian who acts in their  own interests against that because we don't even   protect endangered species with their interests  in mind. Those animals often would like to not  

be starving but we don't give them food, they  often would like to have easy access to mates   but we don't provide matchmaking services or  any number of things like. Our conservation of   wild animals is not oriented towards helping  them get what they want or have high welfare   whereas AI assistants that are genuinely aligned  to help you achieve your interests given the   constraint that they know something that you  don't is just a wildly different proposition.  Forcible takeover. How likely does that seem? The answer I give will differ depending on the   day. In the 2000s, before the deep learning  revolution, I might have said 10% and part of   it was that I expected there would be a lot  more time for efforts to build movements,   to prepare to better handle these problems in  advance. But that was only some 15 years ago  

and we did not have 40 or 50 years as I might  have hoped and the situation is moving very   rapidly now. At this point depending on the  day I might say one in four or one in five.  Given the very concrete ways in which you explain  how a takeover could happen I'm actually surprised   you're not more pessimistic, I'm curious why? Yeah, a lot of that is driven by this intelligence   explo

2023-07-02 07:47

Show Video

Other news