We re Building Computers Wrong

Show video

- For hundreds of years, analog computers were the most powerful computers on Earth, predicting eclipses, tides, and guiding anti-aircraft guns. Then, with the advent of solid-state transistors, digital computers took off. Now, virtually every computer we use is digital. But today, a perfect storm of factors is setting the scene for a resurgence of analog technology. This is an analog computer, and by connecting these wires in particular ways, I can program it to solve a whole range of differential equations. For example, this setup allows me to simulate a damped mass oscillating on a spring.

So on the oscilloscope, you can actually see the position of the mass over time. And I can vary the damping, or the spring constant, or the mass, and we can see how the amplitude and duration of the oscillations change. Now what makes this an analog computer is that there are no zeros and ones in here. Instead, there's actually a voltage that oscillates up and down exactly like a mass on a spring. The electrical circuitry is an analog for the physical problem, it just takes place much faster. Now, if I change the electrical connections, I can program this computer to solve other differential equations, like the Lorenz system, which is a basic model of convection in the atmosphere.

Now the Lorenz system is famous because it was one of the first discovered examples of chaos. And here, you can see the Lorenz attractor with its beautiful butterfly shape. And on this analog computer, I can change the parameters and see their effects in real time. So these examples illustrate some of the advantages of analog computers. They are incredibly powerful computing devices, and they can complete a lot of computations fast. Plus, they don't take much power to do it.

With a digital computer, if you wanna add two eight-bit numbers, you need around 50 transistors, whereas with an analog computer, you can add two currents, just by connecting two wires. With a digital computer to multiply two numbers, you need on the order of 1,000 transistors all switching zeros and ones, whereas with an analog computer, you can pass a current through a resistor, and then the voltage across this resistor will be I times R. So effectively, you have multiplied two numbers together. But analog computers also have their drawbacks. For one thing, they are not general-purpose computing devices.

I mean, you're not gonna run Microsoft Word on this thing. And also, since the inputs and outputs are continuous, I can't input exact values. So if I try to repeat the same calculation, I'm never going to get the exact same answer.

Plus, think about manufacturing analog computers. There's always gonna be some variation in the exact value of components, like resistors or capacitors. So as a general rule of thumb, you can expect about a 1% error.

So when you think of analog computers, you can think powerful, fast, and energy-efficient, but also single-purpose, non-repeatable, and inexact. And if those sound like deal-breakers, it's because they probably are. I think these are the major reasons why analog computers fell out of favor as soon as digital computers became viable. Now, here's why analog computers may be making a comeback.

(computers beeping) It all starts with artificial intelligence. - [Narrator] A machine has been programmed to see and to move objects. - AI isn't new. The term was coined back in 1956.

In 1958, Cornell University psychologist, Frank Rosenblatt, built the perceptron, designed to mimic how neurons fire in our brains. So here's a basic model of how neurons in our brains work. An individual neuron can either fire or not, so its level of activation can be represented as a one or a zero. The input to one neuron is the output from a bunch other neurons, but the strength of these connections between neurons varies, so each one can be given a different weight. Some connections are excitatory, so they have positive weights, while others are inhibitory, so they have negative weights. And the way to figure out whether a particular neuron fires, is to take the activation of each input neuron and multiply by its weight, and then add these all together.

If their sum is greater than some number called the bias, then the neuron fires, but if it's less than that, the neuron doesn't fire. As input, Rosenblatt's perceptron had 400 photocells arranged in a square grid, to capture a 20 by 20-pixel image. You can think of each pixel as an input neuron, with its activation being the brightness of the pixel. Although strictly speaking, the activation should be either zero or one, we can let it take any value between zero and one. All of these neurons are connected to a single output neuron, each via its own adjustable weight. So to see if the output neuron will fire, you multiply the activation of each neuron by its weight, and add them together.

This is essentially a vector dot product. If the answer is larger than the bias, the neuron fires, and if not, it doesn't. Now the goal of the perceptron was to reliably distinguish between two images, like a rectangle and a circle. For example, the output neuron could always fire when presented with a circle, but never when presented with a rectangle. To achieve this, the perception had to be trained, that is, shown a series of different circles and rectangles, and have its weights adjusted accordingly.

We can visualize the weights as an image, since there's a unique weight for each pixel of the image. Initially, Rosenblatt set all the weights to zero. If the perceptron's output is correct, for example, here it's shown a rectangle and the output neuron doesn't fire, no change is made to the weights.

But if it's wrong, then the weights are adjusted. The algorithm for updating the weights is remarkably simple. Here, the output neuron didn't fire when it was supposed to because it was shown a circle. So to modify the weights, you simply add the input activations to the weights.

If the output neuron fires when it shouldn't, like here, when shown a rectangle, well, then you subtract the input activations from the weights, and you keep doing this until the perceptron correctly identifies all the training images. It was shown that this algorithm will always converge, so long as it's possible to map the two categories into distinct groups. (footsteps thumping) The perceptron was capable of distinguishing between different shapes, like rectangles and triangles, or between different letters. And according to Rosenblatt, it could even tell the difference between cats and dogs. He said the machine was capable of what amounts to original thought, and the media lapped it up. The "New York Times" called the perceptron "the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself, and be conscious of its existence."

- [Narrator] After training on lots of examples, it's given new faces it has never seen, and is able to successfully distinguish male from female. It has learned. - In reality, the perceptron was pretty limited in what it could do. It could not, in fact, tell apart dogs from cats. This and other critiques were raised in a book by MIT giants, Minsky and Papert, in 1969.

And that led to a bust period for artificial neural networks and AI in general. It's known as the first AI winter. Rosenblatt did not survive this winter.

He drowned while sailing in Chesapeake Bay on his 43rd birthday. (mellow upbeat music) - [Narrator] The NAV Lab is a road-worthy truck, modified so that researchers or computers can control the vehicle as occasion demands. - [Derek] In the 1980s, there was an AI resurgence when researchers at Carnegie Mellon created one of the first self-driving cars. The vehicle was steered by an artificial neural network called ALVINN.

It was similar to the perceptron, except it had a hidden layer of artificial neurons between the input and output. As input, ALVINN received 30 by 32-pixel images of the road ahead. Here, I'm showing them as 60 by 64 pixels. But each of these input neurons was connected via an adjustable weight to a hidden layer of four neurons. These were each connected to 32 output neurons.

So to go from one layer of the network to the next, you perform a matrix multiplication: the input activation times the weights. The output neuron with the greatest activation determines the steering angle. To train the neural net, a human drove the vehicle, providing the correct steering angle for a given input image. All the weights in the neural network were adjusted through the training so that ALVINN's output better matched that of the human driver. The method for adjusting the weights is called backpropagation, which I won't go into here, but Welch Labs has a great series on this, which I'll link to in the description.

Again, you can visualize the weights for the four hidden neurons as images. The weights are initially set to be random, but as training progresses, the computer learns to pick up on certain patterns. You can see the road markings emerge in the weights. Simultaneously, the output steering angle coalesces onto the human steering angle. The computer drove the vehicle at a top speed of around one or two kilometers per hour. It was limited by the speed at which the computer could perform matrix multiplication.

Despite these advances, artificial neural networks still struggled with seemingly simple tasks, like telling apart cats and dogs. And no one knew whether hardware or software was the weak link. I mean, did we have a good model of intelligence, we just needed more computer power? Or, did we have the wrong idea about how to make intelligence systems altogether? So artificial intelligence experienced another lull in the 1990s.

By the mid 2000s, most AI researchers were focused on improving algorithms. But one researcher, Fei-Fei Li, thought maybe there was a different problem. Maybe these artificial neural networks just needed more data to train on. So she planned to map out the entire world of objects.

From 2006 to 2009, she created ImageNet, a database of 1.2 million human-labeled images, which at the time, was the largest labeled image dataset ever constructed. And from 2010 to 2017, ImageNet ran an annual contest: the ImageNet Large Scale Visual Recognition Challenge, where software programs competed to correctly detect and classify images. Images were classified into 1,000 different categories, including 90 different dog breeds. A neural network competing in this competition would have an output layer of 1,000 neurons, each corresponding to a category of object that could appear in the image.

If the image contains, say, a German shepherd, then the output neuron corresponding to German shepherd should have the highest activation. Unsurprisingly, it turned out to be a tough challenge. One way to judge the performance of an AI is to see how often the five highest neuron activations do not include the correct category. This is the so-called top-5 error rate. In 2010, the best performer had a top-5 error rate of 28.2%, meaning that nearly 1/3 of the time, the correct answer was not among its top five guesses.

In 2011, the error rate of the best performer was 25.8%, a substantial improvement. But the next year, an artificial neural network from the University of Toronto, called AlexNet, blew away the competition with a top-5 error rate of just 16.4%. What set AlexNet apart was its size and depth.

The network consisted of eight layers, and in total, 500,000 neurons. To train AlexNet, 60 million weights and biases had to be carefully adjusted using the training database. Because of all the big matrix multiplications, processing a single image required 700 million individual math operations. So training was computationally intensive. The team managed it by pioneering the use of GPUs, graphical processing units, which are traditionally used for driving displays, screens. So they're specialized for fast parallel computations.

The AlexNet paper describing their research is a blockbuster. It's now been cited over 100,000 times, and it identifies the scale of the neural network as key to its success. It takes a lot of computation to train and run the network, but the improvement in performance is worth it. With others following their lead, the top-5 error rate on the ImageNet competition plummeted in the years that followed, down to 3.6% in 2015. That is better than human performance.

The neural network that achieved this had 100 layers of neurons. So the future is clear: We will see ever increasing demand for ever larger neural networks. And this is a problem for several reasons: One is energy consumption. Training a neural network requires an amount of electricity similar to the yearly consumption of three households.

Another issue is the so-called Von Neumann Bottleneck. Virtually every modern digital computer stores data in memory, and then accesses it as needed over a bus. When performing the huge matrix multiplications required by deep neural networks, most of the time and energy goes into fetching those weight values rather than actually doing the computation. And finally, there are the limitations of Moore's Law. For decades, the number of transistors on a chip has been doubling approximately every two years, but now the size of a transistor is approaching the size of an atom.

So there are some fundamental physical challenges to further miniaturization. So this is the perfect storm for analog computers. Digital computers are reaching their limits. Meanwhile, neural networks are exploding in popularity, and a lot of what they do boils down to a single task: matrix multiplication. Best of all, neural networks don't need the precision of digital computers.

Whether the neural net is 96% or 98% confident the image contains a chicken, it doesn't really matter, it's still a chicken. So slight variability in components or conditions can be tolerated. (upbeat rock music) I went to an analog computing startup in Texas, called Mythic AI. Here, they're creating analog chips to run neural networks.

And they demonstrated several AI algorithms for me. - Oh, there you go. See, it's getting you. (Derek laughs) Yeah. - That's fascinating.

- The biggest use case is augmented in virtual reality. If your friend is in a different, they're at their house and you're at your house, you can actually render each other in the virtual world. So it needs to really quickly capture your pose, and then render it in the VR world. - So, hang on, is this for the metaverse thing? - Yeah, this is a very metaverse application. This is depth estimation from just a single webcam. It's just taking this scene, and then it's doing a heat map.

So if it's bright, it means it's close. And if it's far away, it makes it black. - [Derek] Now all these algorithms can be run on digital computers, but here, the matrix multiplication is actually taking place in the analog domain. (light music) To make this possible, Mythic has repurposed digital flash storage cells.

Normally these are used as memory to store either a one or a zero. If you apply a large positive voltage to the control gate, electrons tunnel up through an insulating barrier and become trapped on the floating gate. Remove the voltage, and the electrons can remain on the floating gate for decades, preventing the cell from conducting current. And that's how you can store either a one or a zero. You can read out the stored value by applying a small voltage. If there are electrons on the floating gate, no current flows, so that's a zero.

If there aren't electrons, then current does flow, and that's a one. Now Mythic's idea is to use these cells not as on/off switches, but as variable resistors. They do this by putting a specific number of electrons on each floating gate, instead of all or nothing.

The greater the number of electrons, the higher the resistance of the channel. When you later apply a small voltage, the current that flows is equal to V over R. But you can also think of this as voltage times conductance, where conductance is just the reciprocal of resistance.

So a single flash cell can be used to multiply two values together, voltage times conductance. So to use this to run an artificial neural network, well they first write all the weights to the flash cells as each cell's conductance. Then, they input the activation values as the voltage on the cells. And the resulting current is the product of voltage times conductance, which is activation times weight. The cells are wired together in such a way that the current from each multiplication adds together, completing the matrix multiplication.

(light music) - So this is our first product. This can do 25 trillion math operations per second. - [Derek] 25 trillion. - Yep, 25 trillion math operations per second, in this little chip here, burning about three watts of power.

- [Derek] How does it compare to a digital chip? - The newer digital systems can do anywhere from 25 to 100 trillion operations per second, but they are big, thousand-dollar systems that are spitting out 50 to 100 watts of power. - [Derek] Obviously this isn't like an apples apples comparison, right? - No, it's not apples to apples. I mean, training those algorithms, you need big hardware like this.

You can just do all sorts of stuff on the GPU, but if you specifically are doing AI workloads and you wanna deploy 'em, you could use this instead. You can imagine them in security cameras, autonomous systems, inspection equipment for manufacturing. Every time they make a Frito-Lay chip, they inspect it with a camera, and the bad Fritos get blown off of the conveyor belt.

But they're using artificial intelligence to spot which Fritos are good and bad. - Some have proposed using analog circuitry in smart home speakers, solely to listen for the wake word, like Alexa or Siri. They would use a lot less power and be able to quickly and reliably turn on the digital circuitry of the device.

But you still have to deal with the challenges of analog. - So for one of the popular networks, there would be 50 sequences of matrix multiplies that you're doing. Now, if you did that entirely in the analog domain, by the time it gets to the output, it's just so distorted that you don't have any result at all. So you convert it from the analog domain, back to the digital domain, send it to the next processing block, and then you convert it into the analog domain again.

And that allows you to preserve the signal. - You know, when Rosenblatt was first setting up his perceptron, he used a digital IBM computer. Finding it too slow, he built a custom analog computer, complete with variable resistors and little motors to drive them. Ultimately, his idea of neural networks turned out to be right.

Maybe he was right about analog, too. Now, I can't say whether analog computers will take off the way digital did last century, but they do seem to be better suited to a lot of the tasks that we want computers to perform today, which is a little bit funny because I always thought of digital as the optimal way of processing information. Everything from music to pictures, to video has all gone digital in the last 50 years. But maybe in a 100 years, we will look back on digital, not not as the end point of information technology, but as a starting point.

Our brains are digital in that a neuron either fires or it doesn't, but they're also analog in that thinking takes place everywhere, all at once. So maybe what we need to achieve true artificial intelligence, machines that think like us, is the power of analog. (gentle music) Hey, I learned a lot while making this video, much of it by playing with an actual analog computer. You know, trying things out for yourself is really the best way to learn, and you can do that with this video sponsor, Brilliant. Brilliant is a website and app that gets you thinking deeply by engaging you in problem-solving. They have a great course on neural networks, where you can test how it works for yourself.

It gives you an excellent intuition about how neural networks can recognize numbers and shapes, and it also allows you to experience the importance of good training data and hidden layers to understand why more sophisticated neural networks work better. What I love about Brilliant is it tests your knowledge as you go. The lessons are highly interactive, and they get progressively harder as you go on. And if you get stuck, there are always helpful hints. For viewers of this video, Brilliant is offering the first 200 people 20% off an annual premium subscription.

Just go to brilliant.org/veritasium. I will put that link down in the description. So I wanna thank Brilliant for supporting Veritasium, and I wanna thank you for watching.

2022-03-03

Show video