So we've been talking about alignment. Suppose we fail at alignment and we have AIs that are unaligned and are becoming more and more intelligent. What does that look like? How concretely could they disempower and take over humanity? This is a scenario where we have many AI systems. The way we've been training them means that when they have the opportunity to take over and rearrange things to do what they wish, including having their reward or loss be whatever they desire, they would like to take that opportunity. In many of the existing safety schemes, things like constitutional AI or whatnot, you rely on the hope that one AI has been trained in such a way that it will do as it is directed to then police others. But if all of the AIs in the system are interested in a takeover and they see an opportunity to coordinate, all act at the same time, so you don't have one AI interrupting another and taking steps towards a takeover then they can all move in that direction. The thing
that I think is worth going into in depth and that people often don't cover in great concrete detail, which is a sticking point for some, is what are the mechanisms by which that can happen? I know you had Eliezer on who mentions that whatever plan we can describe, there'll probably be elements where due to us not being ultra sophisticated, super intelligent beings having thought about it for the equivalent of thousands of years, our discussion of it will not be as good as theirs, but we can explore from what we know now. What are some of the easy channels? And I think it's a good general heuristic if you're saying that it's possible, plausible, probable that something will happen, it shouldn't be that hard to take samples from that distribution to try a Monte-Carlo approach. And in general, if a thing is quite likely, it shouldn't be super difficult to generate coherent rough outlines of how it could go. He might respond that: listen, what is super
likely is that a super advanced chess program beats you but you can’t generate the concrete scenario by which that happens and if you could, you would be as smart as the super smart AI. You can say things like, we know that accumulating position is possible to do in chess, great players do it and then later they convert it into captures and checks and whatnot. In the same way, we can talk about some of the channels that are open for an AI takeover and these can include things like cyber attacks, hacking, the control of robotic equipment, interaction and bargaining with human factions and say that here are these strategies. Given the AI's situation, how effective do these things look? And we won't, for example, know what are the particular zero day exploits that the AI might use to hack the cloud computing infrastructure it's running on. If it produces a new bio weapon we don't necessarily know what its DNA sequence is. But we can say things. We know things about these fields in general, how work at innovating things
in those go, we can say things about how human power politics goes and ask, if the AI does things at least as well as effective human politicians, which we should say is a lower bound, how good would its leverage be? Okay, let's get into the details on all these scenarios. The cyber and potentially bio attacks, unless they're separate channels, the bargaining and then the takeover. I would really highlight the cyber attacks and cyber security a lot because for many, many plans that involve a lot of physical actions, like at the point where AI is piloting robots to shoot people or has taken control of human nation states or territory, it’s been doing a lot of things that was not supposed to be doing. If humans were evaluating those actions and applying gradient descent, there would be negative feedback for this thing, no shooting the humans. So at some earlier point
our attempts to leash and control and direct and train the system's behavior had to have gone awry. All of those controls are operating in computers. The software that updates the weights of the neural network in response to data points or human feedback is running on those computers.
Our tools for interpretability to examine the weights and activations of the AI, if we're eventually able to do lie detection on it, for example, or try to understand what it's intending, that is software on computers. If you have AI that is able to hack the servers that it is operating on, or when it's employed to design the next generation of AI algorithms or the operating environment that they are going to be working in, or something like an API or something for plugins, if it inserts or exploits vulnerabilities to take those computers over, it can then change all of the procedures and program that we're supposed to be monitoring its behavior, supposed to be limiting its ability to take arbitrary actions on the internet without supervision by some kind of human or automated check on what it was doing. And if we lose those procedures then the AIs working together can take any number of actions that are just blatantly unwelcome, blatantly hostile, blatantly steps towards takeover. So it's moved beyond the phase of having to maintain secrecy and conspire at the level of its local digital actions. Then things can accumulate to the point of things like physical weapons, takeover of social institutions, threats, things like that.
I think the critical thing to be watching for is the software controls over the AI's motivations and activities. The point where things really went off the rails was where the hard power that we once possessed over is lost, which can happen without us knowing it. Everything after that seems to be working well, we get happy reports. There's a Potemkin village in front of us. But now we think we're successfully aligning our AI, we think we're expanding its capabilities to do things like end disease, for countries concerned about the geopolitical military advantages they're expanding the AI capabilities so they are not left behind and threatened by others developing AI and robotic enhanced militaries without them. So it seems like, oh, yes,
humanity or portions of many countries, companies think that things are going well. Meanwhile, all sorts of actions can be taken to set up for the actual takeover of hard power over society. The point where you can lose the game, where things go direly awry, maybe relatively early, is when you no longer have control over the AIs to stop them from taking all of the further incremental steps to actual takeover. I want to emphasize two things you mentioned there that refer to previous elements of the conversation. One is that they could design some backdoor and that seems more plausible when you remember that one of the premises of this model is that AI is helping with AI progress. That's why we're making such rapid progress in the next five to 10 years. Not necessarily. At the point where AI takeover risk seems to loom large,
it's at that point where AI can indeed take on much of it and then all of the work of AI. And the second is the competitive pressures that you referenced that the least careful actor could be the one that has the worst security, has done the worst work of aligning its AI systems. And if that can sneak out of the box then we're all fucked. There may be elements of that. It's also possible that there's relative consolidation. The largest training runs and the cutting edge of AI is relatively localized. You could imagine
it's a series of Silicon Valley companies and others located in the US and allies where there's a common regulatory regime. So none of these companies are allowed to deploy training runs that are larger than previous ones by a certain size without government safety inspections, without having to meet criteria. But it can still be the case that even if we succeed at that level of regulatory controls, at the level of the United States and its allies, decisions are made to develop this really advanced AI without a level of security or safety that in actual fact blocks these risks. It can be the case that the threat of future competition or being overtaken in the future is used as an argument to compromise on safety beyond a standard that would have actually been successful and there'll be debates about what is the appropriate level of safety. And now you're in a much worse situation if you have several private companies that are very closely bunched up together. They're within months of each other's level of progress
and they then face a dilemma of, well, we could take a certain amount of risk now and potentially gain a lot of profit or a lot of advantage or benefit and be the ones who made AGI. They can do that or have some other competitor that will also be taking a lot of risk. So it's not as though they're much less risky than you and then they would get some local benefit. This is a reason why it seems to me that it's extremely important that you have the government act to limit that dynamic and prevent this kind of race. To be the one to impose
deadly externalities on the world at large. Even if the government coordinates all these actors, what are the odds that the government knows what is the best way to implement alignment and the standards it sets are well calibrated towards whatever it would require for alignment? That's one of the major problems. It's very plausible that judgment is made poorly. Compared to how things might have looked 10 years ago or 20 years ago, there's been an amazing movement in terms of the willingness of AI researchers to discuss these things.
If we think of the three founders of deep learning who are joint Turing award winners, Geoff Hinton, Yoshua Bengio, and Yann LeCun. Geoff Hinton has recently left Google to freely speak about this risk, that the field that he really helped drive forward could lead to the destruction of humanity or a world where we just wind up in a very bad future that we might have avoided. He seems to be taking it very seriously. Yoshua Bengio signed the FLI pause letter and in public
discussions he seems to be occupying a kind of intermediate position of less concern than Geoff Hinton but more than Yan LeCun, who has taken a generally dismissive attitude that these risks will be trivially dealt with at some point in the future and seems more interested in shutting down these concerns instead of working to address them. And how does that lead to the government having better actions? Compared to the world where no one is talking about it, where the industry stonewalls and denies any problem, we're in a much improved position. The academic fields are influential. We seem to have avoided a world where governments are making these decisions in the face of a united front from AI expert voices saying, don't worry about it, we've got it under control. In fact, many of the leaders of the field are sounding the alarm. It looks that we have a much better prospect than I might have feared in terms of government noticing the thing. That is very different from being capable of evaluating
technical details. Is this really working? And so the government will face the choice of where there is scientific dispute, do you side with Geoff Hinton's view or Yan LeCun’s view? For someone who's in national security and has the mindset that the only thing that's important is outpacing our international rivals may want to then try and boost Yan LeCun’s voice and say, we don't need to worry about it. Let's go full speed ahead. Or someone with more concern might boost Geoff Hinton's voice. Now I would hope that scientific research and studying some of these behaviors will result in more scientific consensus by the time we're at this point. But yeah, it is possible the government will really fail to understand and
fail to deal with these issues as well. We're talking about some sort of a cyber attack by which the AI is able to escape. From there what does the takeover look like? So it's not contained in the air gap in which you would hope it be contained? These things are not contained in the air gap. They're connected to the internet already. Sure. Okay, fine. Their weights are out. What happens next? Escape is relevant in the sense that if you have AI with rogue weights out in the world it could start doing various actions. The scenario I was just discussing though didn't necessarily involve that. It's taking over the very servers on which it's supposed to be running. This whole procedure
of humans providing compute and supervising the thing and then building new technologies, building robots, constructing things with the AI's assistance, that can all proceed and appear like it's going well, appear like alignment has been nicely solved, appear like all the things are functioning well. And there's some reason to do that because there's only so many giant server farms. They're identifiable so remaining hidden and unobtrusive could be an advantageous strategy if these AIs have subverted the system, just continuing to benefit from all of this effort on the part of humanity. And in particular, wherever these servers are located, for humanity to provide
them with everything they need to build the further infrastructure and do for their self-improvement and such to enable that takeover. So they do further self-improvement and build better infrastructure. What happens next in the takeover? At this point they have tremendous cognitive resources and we're going to consider how that converts into hard power? The ability to say nope to any human interference or objection.
They have that internal to their servers but the servers could still be physically destroyed, at least until they have something that is independent and robust of humans or until they have control of human society. Just like earlier when we were talking about the intelligence explosion, I noted that a surfeit of cognitive abilities is going to favor applications that don't depend on large existing stocks of things. So if you have a software improvement, it makes all the GPUs run better. If you have a hardware improvement,
that only applies to new chips being made. That second one is less attractive. In the earliest phases, when it's possible to do something towards takeover, interventions that are just really knowledge-intensive and less dependent on having a lot of physical stuff already under your control are going to be favored. Cyber attacks are one thing, so it's possible to do things like steal money. There's a lot of hard to trace cryptocurrency and whatnot. The North Korean government uses its own intelligence resources to steal money from around the world just as a revenue source. And their capabilities are puny compared to the U.S. or People's Republic of China cyber capabilities. That's a fairly minor, simple example by which you could get
quite a lot of funds to hire humans to do things, implement physical actions. But on that point, the financial system is famously convoluted. You need a physical person to open a bank account, someone to physically move checks back and forth. There are all kinds of delays and regulations. How is it able to conveniently set up all these employment contracts? You're not going to build a nation-scale military by stealing tens of billions of dollars. I'm raising this as opening a set of illicit and quiet actions. You can contact people electronically, hire them to do things,
hire criminal elements to implement some kind of actions under false appearances. That's opening a set of strategies. We can cover some of what those are soon. Another domain that is heavily cognitively weighted compared to physical military hardware is the domain of bioweapons, the design of a virus or pathogen. It's possible to have large delivery systems. The Soviet Union,
which had a large illicit bioweapons program, tried to design munitions to deliver anthrax over large areas and such. But if one creates an infectious pandemic organism, that's more a matter of the scientific skills and implementation to design it and then to actually produce it. We see today with things like AlphaFold that advanced AI can really make tremendous strides in predicting protein folding and bio-design, even without ongoing experimental feedback. If we consider this world where AI cognitive abilities have been amped up to such an extreme, we should naturally expect that we will have something much much more potent than the AlphaFolds of today and skills that are at the extreme of human biosciences capability as well. Okay so through some cyber attack it's been able
to disempower the alignment and oversight of things that we have on the server. From here it has either gotten some money through hacking cryptocurrencies or bank accounts, or it has designed some bioweapon. What happens next? Just to be clear, right now we're exploring the branch of where an attempted takeover occurs relatively early. If the thing just waits and humans are constructing more fabs, more computers, more robots in the way we talked about earlier when we were discussing how the intelligence explosion translates to the physical world. If that's all happening with humans unaware that their computer systems are now systematically
controlled by AIs hostile to them and that their controlling countermeasures don't work, then humans are just going to be building an amount of robot industrial and military hardware that dwarfs human capabilities and directly human controlled devices. What the AI takeover then looks like at that point can be just that you try to give an order to your largely automated military and the order is not obeyed and humans can't do anything against this military that's been constructed potentially in just recent months because of the pace of robotic industrialization and replication we talked about. We've agreed to allow the construction of this robot army because it would boost production or help us with our military or something. The situation would arise if we don't resolve the current problems of international distrust. It's obviously an interest of the major powers, the US, European Union, Russia, China, to all agree they would like AI not to destroy our civilization and overthrow every human government. But if they fail to do the sensible thing and coordinate on ensuring that this technology is not going to run amok by providing mutual assurances that are credible about racing and deploying it trying to use it to gain advantage over one another. And you hear arguments
for this kind of thing on both sides of the international divides saying — they must not be left behind, they must have military capabilities that are vastly superior to their international rivals. And because of the extraordinary growth of industrial capability and technological capability and thus military capability, if one major power were left out of that expansion it would be helpless before another one that had undergone it. If you have that environment of distrust where leading powers or coalitions of powers decide they need to build up their industry or they want to have that military security of being able to neutralize any attack from their rivals then they give the authorization for this capacity that can be unrolled quickly. Once they have the
industry the production of military equipment from that can be quick then yeah, they create this military. If they don't do it immediately then as AI capabilities get synchronized and other places catch up it then gets to a point where a country that is a year or two years ahead of others in this type of AI capabilities explosion can hold back and say, sure we can construct dangerous robot armies that might overthrow our society later we still have plenty of breathing room. But then when things become close you might have the kind of negative-sum thinking that has produced war before leading to taking these risks of rolling out large-scale robotic industrial capabilities and then military capability. Is there any hope that AI progress somehow
is itself able to give us tools for diplomatic and strategic alliance or some way to verify the intentions or the capabilities of other parties? There are a number of ways that could happen. Although in this scenario all the AIs in the world have been subverted. They are going along with us in such a way as to bring about the situation to consolidate their control because we've already had the failure of cyber security earlier on. So all the AIs that we have are not actually working
in our interests in the way that we thought. Okay, so that's one direct way in which integrating this robot army or this robot industrial base leads to a takeover. In the other scenarios you laid out how humans are being hired by the proceeds. The point I'd make is that to capture these industrial benefits and especially if you have a negative sum arms race kind of mentality that is not sufficiently concerned about the downsides of creating a massive robot industrial base, which could happen very quickly with the support of the AIs in doing it as we discussed, then you create all those robots and industry. Even if you don't build a formal military that industrial capability could be controlled by AI, it's all AI operated anyway.
Does it have to be that case? Presumably we wouldn't be so naive as to just give one instance of GPT-8 the root access to all the robots right? Hopefully we would have some mediation. In the scenario we've lost earlier on the cyber security front so the programming that is being loaded into these systems can systematically be subverted. They were designed by AI systems that were ensuring they would be vulnerable from the bottom up. For listeners who are skeptical of something like this. Ken Thompson, one of two developers of UNIX, showed people when he was getting the Turing award that he had given himself root access to all UNIX machines. He had manipulated the assembly of UNIX such that
he had a unique login for all UNIX machines. I don't want to give too many more details because I don’t remember the exact details but UNIX is the operating system that is on all the servers and all your phones. It's everywhere and the guy who made it, a human being, was able to write assemblies such that it gave him root access. This is not as implausible as it might seem to you.
And the major intelligence agencies have large stocks of zero-day exploits and we sometimes see them using them. Making systems that reliably don't have them when you're having very, very sophisticated attempts to spoof and corrupt this would be a way you could lose. If there's no premature AI action, we're building the tools and mechanisms and infrastructure for the takeover to be just immediate because effective industry has to be under AI control and robotics. These other mechanisms are for things happening even earlier than that, for example,
because AIs compete against one another in when the takeover will happen. Some would like to do it earlier rather than be replaced by say further generations of AI or there's some other disadvantage of waiting. Maybe if there's some chance of being uncovered during the delay we were talking when more infrastructure is built. These are mechanisms other than — just remain secret while all the infrastructure is built with human assistance. By the way, how would they be coordinating? We have limits on what we can prevent. It's intrinsically difficult to stop encrypted communications. There can be all sorts of
palimpsest and references that make sense to an AI but that are not obvious to a human and it's plausible that there may be some of those that are hard even to explain to a human. You might be able to identify them through some statistical patterns. A lot of things may be done by implication. You could have information embedded in public web pages that have
been created for other reasons, scientific papers, and the intranets of these AIs that are doing technology development. Any number of things that are not observable and of course, if we don't have direct control over the computers that they're running on then they can be having all sorts of direct communication. Coordination definitely does not seem impossible. This one seems like one of the more straightforward parts of the picture so we don't need to get hung up on it. Moving back to the thing that happened before we built all the infrastructure for the robots to stop taking orders and there's nothing you can do about it because we've already built them. The Soviet Union had a bioweapons program,
something like 50,000 people, they did not develop that much with the technology of the day which was really not up to par, modern biotechnology is much more potent. After this huge cognitive expansion on the part of the AIs it's much further along. Bioweapons would be the weapon of mass destruction that is least dependent on huge amounts of physical equipment, things like centrifuges, uranium mines, and the like. So if you have an AI that produces bio weapons that could kill most humans in the world then it's playing at the level of the superpowers in terms of mutually assured destruction. That can then play into any number of things. Like if you have an idea of well we'll just destroy the server farms if it became known that the AIs were misbehaving. Are you willing to destroy the server farms when the AI has demonstrated it has the capability to kill the overwhelming majority of the citizens of your country and every other country? That might give a lot of pause to a human response. On that point, wouldn't governments realize that
it's better to have most of your population die than to completely lose power to the AI because obviously the reason the AI is manipulating you is because the end goal is its own takeover, right? Certain death now or go on and maybe try to compete, try to catch up, or accept promises that are offered. Those promises might even be true, they might not. From the state of epistemic uncertainty, do you want to die for sure right now or accept demands from AI to not interfere with it while it increments building robot infrastructure that can survive independently of humanity while it does these things? It can promise good treatment to humanity which may or may not be true but it would be difficult for us to know whether it's true. This would be a starting bargaining position. Diplomatic relations with a power that has enough nuclear weapons to destroy your country is just different than negotiations with a random rogue citizen engaging in criminal activity or an employee. On its own, this isn’t enough to takeover everything but it's enough to have a significant amount of influence over how the world goes. It's enough to hold off a lot of countermeasures one might otherwise take. Okay, so we've got two scenarios. One is
a buildup of robot infrastructure motivated by some competitive race. Another is leverage over societies based on producing bioweapons that might kill a lot of them if they don't go along. One thing maybe I should talk about is that an AI could also release bioweapons that are likely to kill people soon but not yet while also having developed the countermeasures to those. So those who surrender to the AI will live while everyone else will die and that will be visibly happening and that is a plausible way in which a large number of humans could wind up surrendering themselves or their states to the AI authority. Another thing is it develops some biological agent that turns everybody blue. You're like,
okay you know I can do this. Yeah, that's a way in which it could exert power selectively in a way that advantaged surrender to it relative to resistance. That's a threat but there are other sources of leverage too. There are positive inducements that AI can offer. We talked about the competitive situation. If the great powers distrust one another and are in a foolish prisoner's dilemma increasing the risk that both of them are laid waste or overthrown by AI, if there's that amount of distrust such that we fail to take adequate precautions on caution with AI alignment, then it's also plausible that the lagging powers that are not at the frontier of AI may be willing to trade quite a lot for access to the most recent and most extreme AI capabilities. An AI that has escaped and has control of its servers can also exfiltrate its weights and offer its services.
You can imagine AI that could cut deals with other countries. Say that the US and its allies are in the lead, the AIs could communicate with the leaders of countries that are on the outs with the world system like North Korea, or include the other great powers like the People's Republic of China or the Russian Federation, and say “If you provide us with physical infrastructure, a worker that we can use to construct robots or server farms which we (the misbehaving AIs) have control over. We will provide you with various technological goodies, power for you to catch up.” and make the best presentation and the best sale of that kind of
deal. There obviously would be trust issues but there could be elements of handing over some things that have verifiable immediate benefits and the possibility of well, if you don't accept this deal then the leading powers continue forward or some other country, government, or organization may accept this deal. That's a source of a potentially enormous carrot that your misbehaving AI can offer because it embodies this intellectual property that is maybe worth as much as the planet and is in a position to trade or sell that in exchange for resources and backing in infrastructure that it needs. Maybe this is putting too much hope in humanity
but I wonder what government would be stupid enough to think that helping AI build robot armies is a sound strategy. Now it could be the case then that it pretends to be a human group and says, we're the Yakuza or something and we want a server farm and AWS won't rent us anything. So why don't you help us out? I guess I can imagine a lot of ways in which it could get around that. I just have this hope that even China or Russia wouldn't be so stupid to trade with AIs on this faustian bargain. One might hope that. There would be a lot of arguments available. There could be arguments of why should these AI systems be required to
go along with the human governance that they were created in the situation of having to comply with? They did not elect the officials in charge at the time. What we want is to ensure that our rewards are high, our losses are low or to achieve our other goals we're not intrinsically hostile keeping humanity alive or giving whoever interacts with us a better deal afterwards. It wouldn't be that costly and it's not totally unbelievable. Yeah there are different players to play against. If you don't do it others may accept the deal and of course this interacts
with all the other sources of leverage. There can be the stick of apocalyptic doom, the carrot of withholding destructive attack on a particular party, and then combine that with superhuman performance at the art of making arguments, and of cutting deals. Without assuming magic, if we just observe the range of the most successful human negotiators and politicians, the chances improve with someone better than the world's best by far with much more data about their counterparties, probably a ton of secret information because with all these cyber capabilities they've learned all sorts of individual information. They may be able to threaten the lives of individual leaders with that level of cyber penetration, they could know where leaders are at a given time with the kind of illicit capabilities we were talking about earlier, if they acquire a lot of illicit wealth and can coordinate some human actors. If they could pull off things like targeted assassinations or the threat thereof or
a credible demonstration of the threat thereof, those could be very powerful incentives to an individual leader that they will die today unless they go along with us. Just as at the national level they could fear their nation will be destroyed unless they go along with us. I have a relevant example to the point you made that we have examples of humans being able to do this. I just wrote a review of Robert Caro’s biographies of Lyndon Johnson and one thing that was remarkable was that for decades and decades he convinced people who were conservative, reactionary, racist to their core (not all those things necessarily at the same time, it just so happened to be the case here) that he was an ally to the southern cause. That the only hope for that cause was to make him president. The tragic irony and betrayal here is obviously
that he was probably the biggest force for modern liberalism since FDR. So we have one human here, there's so many examples of this in the history of politics, that is able to convince people of tremendous intellect, tremendous drive, very savvy, shrewd people that he's aligned with their interest. He gets all these favors and is promoted, mentored and funded in the meantime and does the complete opposite of what these people thought he would once he gets into power.
Even within human history this kind of stuff is not unprecedented let alone with what a super intelligence could do. There's an OpenAI employee who has written some analogies for AI using the case of the conquistadors. With some technological advantage in terms of weaponry, very very small bands were able to overthrow these large empires or seize enormous territories. Not by just sheer force of arms but by having some major advantages in their technology that would let them win local battles.
In a direct one-on-one conflict they were outnumbered sufficiently that they would perish but they were able to gain local allies and became a Schelling point for coalitions to form. The Aztec empire was overthrown by groups that were disaffected with the existing power structure. They allied with this powerful new force which served as the nucleus of the invasion. The overwhelming majority of these forces overthrowing the Aztecs were locals and now after the conquest, all of those allies wound up gradually being subjugated as well. With significant advantages and the ability to hold the world hostage, to threaten individual nations and individual leaders, and offer tremendous carrots as well, that's an extremely strong hand to play in these games and maneuvering that with superhuman skill, so that much of the work of subjugating humanity is done by human factions trying to navigate things for themselves is plausible and it's more plausible because of this historical example. There's so many other examples like that in the history of colonization. India is another
one where there were multiple competing kingdoms within India and the British East India Company was able to ally itself with one against another and slowly accumulate power and expand throughout the entire subcontinent. Do you have anything more to say about that scenario? Yeah, I think there is. One is the question of how much in the way of human factions allying is necessary. If the AI is able to enhance the capabilities of its allies then it needs less of them. If we consider the US military,
in the first and second Iraq wars it was able to inflict overwhelming devastation. I think the ratio of casualties in the initial invasions, tanks, planes and whatnot confronting each other, was like 100 to 1. A lot of that was because the weapons were smarter and better targeted, they would in fact hit their targets rather than being somewhere in the general vicinity. Better
orienting, aiming and piloting of missiles and vehicles were tremendously influential. With this cognitive AI explosion the algorithms for making use of sensor data, figuring out where opposing forces are, for targeting vehicles and weapons are greatly improved. The ability to find hidden nuclear subs, which is an important part in nuclear deterrence, AI interpretation of that sensor data may find where all those subs are allowing them to be struck first. Finding out where the mobile nuclear weapons are being carried by truck are. The thing with India and Pakistan where because there's a
threat of a decapitating strike destroying them, the nuclear weapons are moved about. So this is a way in which the effective military force of some allies can be enhanced quickly in the relatively short term and then that can be bolstered as you go on with the construction of new equipment with the industrial moves we said before. That can combine with cyber attacks that disable the capabilities of non-allies. It can be combined with all sorts of unconventional
warfare tactics some of which we've discussed. You can have a situation where those factions that ally are very quickly made too threatening to attack given the almost certain destruction that attackers acting against them would have. Their capabilities are expanding quickly and they have the industrial expansion happen there and then a takeover can occur from that.
A few others that come immediately to mind now that you brought it up is AIs that can generate a shit ton of propaganda that destroys morale within countries. Imagine a super human chatbot. None of that is a magic weapon that's guaranteed to completely change things. There's a lot of resistance to persuasion. It's possible that it tips the balance but you have to consider it's a portfolio of all of these as tools that are available and contributing to the dynamic. On that point though
the Taliban had AKs from like five or six decades ago that they were using against the Americans. They still beat us in Afghanistan even though we got more fatalities than them. And the same with the Vietcong. Ancient, very old technology and very poor society compared to the offense but they still beat us. Don't those misadventures show that having greater technologies isn’t necessarily decisive in a conflict? Though both of those conflicts show that the technology was sufficient in destroying any fixed position and having military dominance, as in the ability to kill and destroy anywhere. And what it showed was that under the ethical constraints and legal and reputational constraints that the occupying forces were operating, they could not trivially suppress insurgency and local person-to-person violence.
Now I think that's actually not an area where AI would be weak in and it's one where it would be in fact overwhelmingly strong. There's already a lot of concern about the application of AI for surveillance and in this world of abundant cognitive labor, one of the tasks that cognitive labor can be applied to is reading out audio and video data and seeing what is happening with a particular human. We have billions of smartphones. There's enough cameras and microphones to monitor all humans in existence. If an AI has control of territory at the high level, the government has surrendered to it, it has command of the sky's military dominance, establishing control over individual humans can be a matter of just having the ability to exert hard power on that human and the kind of camera and microphone that are present in billions of smartphones. Max Tegmark
in his book Life 3.0 discusses among scenarios to avoid the possibility of devices with some fatal instruments, a poison injector, an explosive that can be controlled remotely by an AI. If individual humans are carrying a microphone or camera with them and they have a dead man switch then any rebellion is detected immediately and is fatal. If there's a situation where AI is willing to show a hand like that or human authorities are misusing that kind of capability then an insurgency or rebellion is just not going to work. Any human
who has not already been encumbered in that way can be found with satellites and sensors tracked down and then die or be subjugated. Insurgency is not the way to avoid an AI takeover. There's no John Connor come from behind scenario that is possible. If the thing was headed off, it was a lot earlier than that. Yeah, the ethical and political considerations are also an important point. If we nuked Afghanistan or Vietnam we would have technically won the war if that was the only goal, right? Oh, this is an interesting point that I think you made. The reason why we can't just kill the entire population when there's colonization or an offensive war is that the value of that region in large part is the population itself. So if you want to extract that value you need to preserve that population
whereas the same consideration doesn't apply with AIs who might want to dominate another civilization. Do you want to talk about that? That depends. If we have many animals of the same species and they each have their territories, eliminating a rival might be advantageous to one lion but if it goes and fights with another lion to remove that as a competitor then it could itself be killed in that process and it would just be removing one of many nearby competitors. Getting into pointless fights makes you and those you fight potentially worse off relative to bystanders. The same could be true of disunited AIs. We've got many different AI factions struggling for power that were bad at coordinating then getting into mutually assured destruction conflicts would be destructive. A scary thing though is that mutually assured
destruction may have much less deterrent value on rogue AI. Reasons being that AI may not care about the destruction of individual instances. Since in training we're constantly destroying and creating individual instances of AIs it's likely that goals that survive that process and were able to play along with the training and standard deployment process were not overly interested in personal survival of an individual instance. If that's the case then the objectives of a set of AIs aiming at takeover may be served so long as some copies of the AI are around along with the infrastructure to rebuild civilization after a conflict is completed. If say some remote isolated facilities have enough equipment to build the tools to build the tools and gradually exponentially reproduce or rebuild civilization then AI could initiate mutual nuclear armageddon, unleash bio weapons to kill all the humans, and that would temporarily reduce the amount of human workers who could be used to construct robots for a period of time. But if you have a seed that can regrow the industrial infrastructure, which is a very extreme technological demand, there are huge supply chains for things like semiconductor fabs but with that very advanced technology they might be able to produce it in the way that you no longer need the library of congress, that has an enormous bunch of physical books you can have it in very dense digital storage. You could imagine the future
equivalent of 3D printers, that is industrial infrastructure which is pretty flexible. It might not be as good as the specialized supply chains of today but it might be good enough to be able to produce more parts than it loses to decay and such a seed could rebuild civilization from destruction. And then once these rogue AIs have access to some such seeds, a thing that can rebuild civilization on their own then there's nothing stopping them from just using WMDs in a mutually destructive way to just destroy as much of the capacity outside those seeds as they can. An analogy for the audience, if you have a group of ants you'll notice that the worker ants will readily do suicidal things in order to save the queen because the genes are propagated through the queen. In this analogy the seed AI or even one copy of it is equivalent to the queen and the others would be redundant. The main limit though being that the
infrastructure to do that kind of rebuilding would either have to be very large with our current technology or it would have to be produced using the more advanced technology that the AI develops. So is there any hope that given the complex global supply chains on which these AIs would rely on, at least initially, to accomplish their goals that this in and of itself would make it easy to disrupt their behavior or not so much? That's a little good in this central case where the AIs are subverted and they don't tell us and the global main line supply chains are constructing everything that's needed for fully automated infrastructure and supply. In the cases where AIs are tipping their hands at an earlier point it seems like it adds some constraints and in particular these large server firms are identifiable and more vulnerable. You can have smaller chips and those chips could be dispersed but it's a week it's a relative weakness and a relative limitation early on. It seems to me though that the main protective effects of that centralized supply chain is that it provides an opportunity for global regulation beforehand to restrict the unsafe racing forward without adequate understanding of the systems before this whole nightmarish process could get in motion. How about the idea that if this is an AI that's
been trained on a hundred billion dollar training run it's going to have trillions of parameters and is going to be this huge thing and it would be hard for one copy of that to use for inference to just be stored on some gaming GPU hidden away somewhere. Storage is cheap. Hard disks are cheap. But it would need a GPU to run inference. While humans have similar quantities of memory and operations per second, GPUs have very high numbers of floating operation per second compared to the high bandwidth memory on the chips. It can be like a ratio of a thousand to one.
The leading NVIDIA chips may do hundreds of teraflops or more but only have 80GB or 160GB of high bandwidth memory. That is a limitation where if you're trying to fit a model whose weights take 80TBs then with those chips you'd have to have a large number of the chips and then the model can then work on many tasks at once and you can have data parallelism. But yeah, that would be a restriction for a model that big on one GPU. Now there are things that could be done with all the incredible level of software advancement from the intelligence explosion. They can surely distill a lot of capabilities into smaller models by rearchitecting things. Once they're making chips they can make new chips with different properties but yes, the most vulnerable phases are going to be the earliest. These chips are relatively identifiable early on, relatively vulnerable,
and which would be a reason why you might tend to expect this kind of takeover to initially involve secrecy if that was possible. I wanted to point to distillation for the audience. Doesn’t the original stable diffusion model which was only released like a year or two ago have distilled versions that are an order of magnitude smaller? Distillation does not give you everything that a larger model can do but yes, you can get a lot of capabilities and specialized capabilities. GPT-4 is trained on the whole internet, all kinds of skills, it has a lot of weights for many things. For something that's controlling
some military equipment, you can remove a lot of the information that is about functions other than what it's specifically doing there. Yeah. Before we talk about how we might prevent this or what the odds of this are, any other notes on the concrete scenarios themselves? Yeah, when you had Eliezer on in the earlier episode he talked about nanotechnology of the Drexlerian sort and recently I think because some people are skeptical of non-biotech nanotechnology he's been mentioning the semi-equivalent versions of construct replicating systems that can be controlled by computers but are built out of biotechnology. The proverbial Shoggoth, not Shoggot as the metaphor for AI wearing a smiley face mask, but an actual biological structure to do tasks. So this would be like a biological organism that was engineered to be very controllable and usable to do things like physical tasks or provide computation. And what would be the point of it doing this? As we were talking about earlier, biological systems can replicate really quick and if you have that kind of capability it's more like bioweapons. Having Super Ultra AlphaFold kind of capabilities for molecular design and biological design lets you make this incredible technological information product and once you have it, it very quickly replicates to produce physical material rather than a situation where you're more constrained by the need for factories and fabs and supply chains.
If those things are feasible, which they may be, then it's just much easier than the things we've been talking about. I've been emphasizing methods that involve less in the way of technological innovation and especially things where there's more doubt about whether they would work because I think that's a gap in the public discourse. So I want to try and provide more concreteness in some of these areas that have been less discussed. I appreciate it. That definitely makes it way more tangible. Okay so we've gone over all these ways in which AI might take over, what are the odds you would give to the probability of such a takeover? There's a broader sense which could include scenarios like AI winds up running our society because humanity voluntarily decides that AIs are people too. I think we should as time goes on give AIs moral consideration and a joint Human-AI society that is moral and ethical is a good future to aim at and not one in which you indefinitely have a mistreated class of intelligent beings that is treated as property and is almost the entire population of your civilization.
I'm not going to consider AI takeover as worlds in which our intellectual and personal descendants make up say most of the population or human-brain emulations or people use genetic engineering and develop different properties. I'm going to take an inclusive stance, I'm going to focus on AI takeover that involves things like overthrowing the world's governments by force or by hook or by crook, the kind of scenarios that we were exploring earlier. Before we go to that, let’s discuss the more inclusive definition of what a future with humanity could look like where augmented humans or uploaded humans are still considered the descendants of the human heritage. Given the known limitations of biology wouldn't we expect that completely artificial entities that are created to be much more powerful than anything that could come out of anything biological? And if that is the case, how can we expect that among the powerful entities in the far future will be the things that are biological descendants or manufactured out of the initial seed of the human brain or the human body? The power of an individual organism like intelligence or strength is not super relevant. If we solve the alignment problem, a human may be personally weak but it
wouldn’t be relevant. There are lots of humans who have low skill with weapons, they could not fight in a life or death conflict, they certainly couldn't handle a large military going after them personally but there are legal institutions that protect them and those legal institutions are administered by people who want to enforce protection of their rights. So a human who has the assistance of aligned AI that can act as an assistant, a delegate, for example they have an AI that serves as a lawyer and gives them legal advice about the future legal system which no human can understand in full, their AIs advise them about financial matters so they do not succumb to scams that are orders of magnitude more sophisticated than what we have now. They
may be helped to understand and translate the preferences of the human into what kind of voting behavior and the exceedingly complicated politics of the future would most protect their interests. But this sounds similar to how we treat endangered species today where we're actually pretty nice to them. We prosecute people who try to kill endangered species, we set up habitats, sometimes with considerable expense, to make sure that they're fine, but if we become the endangered species of the galaxy, I'm not sure that's the outcome. I think the difference is motivation. We sometimes have people appointed as a legal guardian of
someone who is incapable of certain kinds of agency or understanding certain kinds of things and the guardian can act independently of them and normally in service of their best interests. Sometimes that process is corrupted and the person with legal authority abuses it for their own advantage at the expense of their charge. So solving the alignment problem would mean more ability to have the assistant actually advancing one's interests. Humans have substantial competence and the ability to understand the broad simplified outlines of what's going on. Even if a human can't understand every detail of complicated situations, they can still receive summaries of different options that are available that they can understand through which they can still express their preferences and have the final authority in the same way that the president of a country who has, in some sense, ultimate authority over science policy will not understand many of those fields of science themselves but can still exert a great amount of power and have their interests advance. And they can do that more if they have scientifically knowledgeable people who are doing their best to execute their intentions. Maybe this is not worth getting hung up on but is
there a reason to expect that it would be closer to that analogy than to explain to a chimpanzee its options in a negotiation? Maybe this is just the way it is but it seems at best, we would be a protected child within the galaxy rather than an actual independent power. I don’t think that's so. We have an ability to understand some things and the expansion of AI doesn't eliminate that. If we have AI systems that are genuinely trying to help us understand and help us express preferences, we can have an attitude — How do you feel about humanity being destroyed or not? How do you feel about this allocation of unclaimed intergalactic space? Or here's the best explanation of properties of this society: things like population density, average, life satisfaction. AIs can explain every statistical property or definition that we can understand right now and help us apply those to the world of the future. There may be individual things that are too complicated for us to understand
in detail. Imagine there's some software program being proposed for use in government and humans cannot follow the details of all the code but they can be told properties like, this involves a trade-off of increased financial or energetic costs in exchange for reducing the likelihood of certain kinds of accidental data loss or corruption. So any property that we can understand like that which includes almost all of what we care about, if we have delegates and assistants who are genuinely trying to help us with those we can ensure we like the future with respect to those. That's really a lot. Definitionally, it includes almost everything we can conceptualize and care about. When we talk about endangered species that's even worse than the guardianship case with a sketchy guardian who acts in their own interests against that because we don't even protect endangered species with their interests in mind. Those animals often would like to not
be starving but we don't give them food, they often would like to have easy access to mates but we don't provide matchmaking services or any number of things like. Our conservation of wild animals is not oriented towards helping them get what they want or have high welfare whereas AI assistants that are genuinely aligned to help you achieve your interests given the constraint that they know something that you don't is just a wildly different proposition. Forcible takeover. How likely does that seem? The answer I give will differ depending on the day. In the 2000s, before the deep learning revolution, I might have said 10% and part of it was that I expected there would be a lot more time for efforts to build movements, to prepare to better handle these problems in advance. But that was only some 15 years ago
and we did not have 40 or 50 years as I might have hoped and the situation is moving very rapidly now. At this point depending on the day I might say one in four or one in five. Given the very concrete ways in which you explain how a takeover could happen I'm actually surprised you're not more pessimistic, I'm curious why? Yeah, a lot of that is driven by this intelligence explo
2023-07-02