WATCH LIVE : TESLA Autonomy Day Event
Hi. Everyone. I'm, sorry for being late. Welcome. To our very first analyst. Day for, autonomy, I really. Hope that this is something we can do a little bit more regularly, now to. Keep you posted about the. Development. We're doing with regards to autonomous, driving. About. Three months ago we. Were getting prepped up for our q4, earnings call with, Elon and quite, a few other executives, and, one. Of the things that I told the. Group is that from. All the conversations. That I keep, having with investors, on a regular basis. The. The biggest gap that I see with, what I see inside the company, and we'll be outside perception. Is is our. Ability of autonomous driving and. It kind of makes sense because for the past couple of years we've, been really talking about model three ramp and you, know a lot of the debate has revolved, around model three but. In reality a, lot, of things have been happening in the background we've. Been working on the new force of driving chip we've. Had a complete, overhaul of our neural net for vision recognition, etc. So. Now that we finally started to produce our, full self-driving, computer, we. Thought it's a good idea to just open, the veil invite, everyone in and talk, about everything that we've been doing for. The past two years, so. About, three years ago we wanted to use we, wanted to find the best possible chip, for. Full. Autonomy and, we, found out that there's no chip that's been designed from ground up for, neural nets so. We, invited my, colleague Pete Bannon the, VP of silicon engineering, to, design such chip for us he's. Got about 35 years of experience of building chips, and designing chips, about. 12 of those years where for a company called PA semi, which, was later acquired by Apple, so. He worked on dozens of different architectures, and designs and he was the lead designer I, think for, Apple iPhone 5 but just before joining Tesla. And he's gonna be joined on stage, by Elon. Musk thank. You. Actually. I was. Gonna introduce Pete but once. Done so, he's. Just the the, best a, trip. And system, architect that that I know in. The world and and it's, a honor to have you and, your and your team at, Tesla. And. We'll. Take away just tell, him I think try to work cut your hair teen you were done thanks. For lunch it's, a pleasure to be here this morning and, a real, treat really to tell. You about all the work that my colleagues and I've been doing here at Tesla for the last three years I. Think. Will tell you a little bit about how the whole thing got started and, then I'll introduce you to the full self-driving, computer, and tell you a little bit about how it works we'll, dive into the chip itself and go through some of those details I'll, describe how the custom, neural network accelerator, that we design works and then, I'll show you some results, and hopefully. It will all still be awake by then. I. Was hired in February of 2016. I asked, Elon if he was willing to spend all the money it takes to do full custom system design and he, said well are, we gonna win and I said well yeah of course he. Said I mean and so. That got us started we hired a bunch of people and started, thinking about what, a full what, a custom-designed. Chip. For full autonomy would look like we, spent eighteen months doing, the design and in August. Of 2017. We released the design for manufacturing, we, got it back in December it powered up and it actually worked very very well in the first try we. Made a few changes and, released, a B zero Rev in April of 2018 in July. Of 2018. The. Chip was qualified and we started full, production, of. Production. Quality parts in December. Of 2018, we had the autonomous. Driving stack running. On the new hardware and we were able to start retrofitting. Employee cars and testing. The hardware and software out in the real world just.
Last March we. Started shipping of, the new computer, in the Model S and X and just. Earlier, in April we started production in the model, so. This. Whole program from the, hiring, of the first few employees to. Having it in full production in all three of our cars is, just a little over three years and it's probably the fastest. System. Development program I've ever been associated with and it really speaks a lot to the. Advantages, of having a tremendous amount of vertical integration, to. Allow you to do concurrent, engineering and, speed up deployment. In. Terms of goals we were totally focuses exclusively on, Tesla, requirements, and that, makes life a lot easier if you have one and only one customer you don't have to worry about anything else, one. Of those goals was to keep the power under 100 watt so we could retrofit the new machine into the existing, cars. We. Also wanted a lower part cost so, we could enable full redundancy, for safety at. The time we had a thumb in the wind I submitted that it would take at least 50 trillion operations a second of neural network performance to, drive a car and so we wanted to get at least that much and really as much as we possibly could. Batch. Sizes how many items you operate on at the same time so for example Google's TPU one has a batch size of 256, and you, have to wait around until you have 256, things, to process before you can get started, we. Didn't want to do that so we designed our machine with a batch size of 1 so as soon as an image shows up we process it immediately to minimize latency, which. Maximizes, safety, we. Needed a GPU to run some post-processing at, the time we were doing quite a lot of that but we speculated, that, over, time the amount of post-processing on the GPU would decline as the neural networks got better and better and. That has actually come to pass so. We took a risk, by putting a fairly modest GPU in the design as you'll see and. That turned out to be a good bet, security. Super important if you don't have a secure car you can't have a safe car so, there's a lot of focus on security, and then of course safety. In. Terms of actually doing the chip design as. Elon, alluded earlier there, was really no ground-up. Neural network accelerator, in existence. In 2016. Everybody, out there was adding. Instructions, to their CPU or GPU or DSP to make it better for inference, but nobody was really just doing, it. Natively. So. We set out to do that ourselves and then for other components, on the chip we purchased, industry standard IP. For. CPUs and GPUs that. Allowed us to minimize the design, time and also the risk to. The program. Um. Another thing that was a little unexpected when I first arrived was our ability to leverage existing teams, at Tesla Tesla, had a wonderful power supply design teams signal, integrity analysis, package, design, system. Software, firmware. Board, designs and a really good system validation, program that we were able to take advantage of to accelerate this program, here's. What it looks like. Over. There on the right you see all the connectors for the video that comes in from all the cameras, that are in the car you, can see the two self-driving, computers, in the middle of the board and then on the left is the power supply and some control connections, and so. I really love it when a solution is boiled down to its barest elements, you have video computing, and power and, and. It's. Straightforward. And simple. Here's. The original hardware 2.5, enclosure that the computer went into and we've.
Been Shipping for the last two years here's, the new design, for the FS D computer, it's basically the same and, that of course is driven by the constraints, of having a retrofit, program for the cars, I'd. Like to point out that this is actually a pretty small computer, it fits behind the glove box between the glove box and the firewall in the car it does not take up half your trunk. As. I said earlier there's two fully independent, computers, on the board you, can see them they're highlighted, in blue and green to. Either side of the large, SOC, you can see the DRAM chips for that we use for storage and then below left you see the flash chips that, represent, the file system so these are two independent. Computers, that boot up and run their own operating system. Yeah. If I can add something that's the general, principle here, is that any part, of this could fail and the call will keep driving, so. You can have cameras fail you could have, power. Circuits, fail you could have one, of the Tesla. Full, strata for self-driving computer chips fail car. Keeps driving the. Probability, of the, computer failing is substantially. Lower than somebody, losing. Consciousness, that's. That's the key metric least an order of magnitude. Yep. So. One of the things that we additional. Thing we do to keep the machine going is to have redundant power supplies, in the car so, one one machines running on one just power supply and the other ones on the other the. Cameras are the same so, half of the cameras run on the blue, power supply of the other half round the green power supply and both, chips receive, all of the video and process. It independently. So. In terms of driving the car the, basic sequence is collect. Lots of information, from the world around you not, only do we have cameras we also have radar GPS, maps di, M use ultrasonic. Sensors around the car we, have wheel ticks steering, angle we know what the acceleration and deceleration of, the car is supposed to be all, of that gets integrated together to. Form a plan, once. We have a plan the. Two machines, exchange, their independent. Version, of the plan to make sure it's the same and assuming. That we agree we. Then act and drive the car now. Once you've driven the car with some new control you have what costs want to validate it so we validate, that what we transmitted, it was what we intend to transmit, to the other actuators. In the car and then, you can use the sensor suite to make sure that it happens so if you ask the car to accelerate or brake or steer right or left you can look at the accelerometers. And make sure that you are in fact doing that so, there's a tremendous amount of redundancy and overlap, in both, our data, acquisition. And our data monitoring, capabilities, here. Moving. On to talk about the full self-driving, chip a little bit. It's. Packaged, in a 37 point five millimeter, BGA, with 1600, balls most of those are used for powering ground but plenty for signal as well if. You take the lid off it looks, like this you can see the package substrate and you can see the dye sitting in the center there if, you take the dye off and flip it over it looks like this there's, 13,000.
C Four bumps scattered. Across the top of the dye and then under net underneath, that are twelve metal layers and if, you which, is obscuring, all the details of the design so, if you strip that off it. Looks like this. This is a 14, nanometer FinFET, CMOS, process, it's 260. Millimeters, in size which. Is a modest-sized iso for comparison. Typical, cell phone chip, is about a hundred millimeters square, which. So we're quite a bit bigger than that but, a high end GPU would, be more like six hundred eight hundred millimeter, square so so, we're sort of in the middle I would call it the sweet spot it's it's a comfortable size to build there's, 250, million logic, gates on there and a total of six billion transistors, which. Even. Even, though I work on this all the time that's mind boggling to me. The. Chip is manufactured. And tested to a ecq. 100 standards, which is a standard, automotive. Criteria. Next. I'd like to just walk around the chip and explain all the different pieces to it and I'm sort of gonna go in the order that a pixel coming in from the camera would visit all the different pieces so up, there in the top left you can see the cameras Euler interface we, can ingest, 2.5, billion pixels, per second which is more than enough to cover all the sensors that we know about we. Have an on-chip network that distributes, data from the memory system so, the pixels would travel across the network to the memory. Controllers, on the right and left edges of the chip we, use industry standard LP, ddr4. Memory running, at 400. 4266. Gigabits, per second, which, gives us a peak bandwidth the sixty-eight gigabytes, a second which, is a pretty healthy bandwidth, but again this is not like ridiculous, so we're sort of trying to stay in the comfortable sweet spot for cost reasons. The. Image signal processor has. A 24-bit, internal, pipeline, that, allows us to do take, full advantage, the HDR sensors, that we have around the car it, does advance tone mapping which helps to bring out details, and shadows and then, it has advanced noise reduction which just improves your overall quality. Of the images that we're using in the neural network the. Neural. Network accelerator. Itself, there's two of them on the chip they each have 32, megabytes of SRAM to hold temporary, results, and minimize the amount of data that we have to transmit on, and off the chip which helps reduce power each. Array has a 96. By 96 multiply. Add array, with, in place accumulation. Which, allows us to do. 10,000 multiply ads per cycle, there's. Dedicated, riilu Hardware dedicated, pooling hardware and the, each of these delivered, 306. Excuse. Me each one delivers 36, trillion operations per, second, and they operate at two gigahertz the, two of them together on a diet delivers 72, trillion. Operations a, second so we exceeded. Our goal of 50, tera. Ops by a fair bit. There's. Also a video encoder we, encode video and use it in a very variety of places in the car including the backup camera display. There's. Optionally, a user, feature for - camp and also. For a clip logging data, to the cloud which Stewart and Andre will talk about more later, there's. A GPU on the chip it's modest performance it has a support. For both 32. And 16, bit floating point and then, we have 12 a 72, 64-bit. CPUs for a general-purpose processing. They operate at 2.2, gigahertz and this represents, about two and a half times the performance available, in the current solution. There's. A safety system that contains two CPUs that operate in lockstep this, system is the final arbiter of whether it's safe to actually. Drive the actuators, in the car so, this is where the two plans come together and, we, decide whether it's safe or not to move forward and, lastly. There's a safety system and then basically the job of the safety system is to ensure that this, chip only runs software that's been cryptographically, signed, by Tesla. If. It's not been signed by Tesla then the chip, does not operate. Now. I've told you a lot of different performance numbers and I thought it'd be helpful maybe to put it into perspective a little bit so, throughout. This talk I'm going to talk about a neural network from, our narrow camera, it uses, 35, Giga 35, billion operations 35. Giga apps and if, we use all 12 CPUs, to. Process that network we, could do one-and-a-half frames per second which is super, slow I'm, not nearly adequate, to drive the car if, we use the 600 gigaflop GPU. The. Same network we'd get 17, frames per second which, is still not good enough to drive the car with a cameras, the. Neural network accelerators, on the chip can deliver 21. Frames per second and you can, see from the scaling, as we moved along that, the amount of computing in the CPU and, GPU are, basically insignificant. To, what's available in the neural network accelerator, it's, it's really is night and day. So.
Moving On to talk about the neural network accelerator, we're. Just gonna stop for some water. On. The, left there's a cartoon, of a neural network. Just. To give you an idea what's going on the. Data comes in at the top and visits each of the boxes and the, data flows along the arrows, to the different boxes the boxes are typically convolutions. Or d convolutions, with real ooze the, green boxes are pooling layers and. The. Important thing about this is that. The. Data produced by one box is then consumed by the next box and then you don't need it anymore you, can throw it away so all, of that temporary, data that, gets created and destroyed as you, flow through the network there's no need to store, that off chip and DRAM so we keep, all that data in SRAM, and I'll. Explain why that's super important in a few minutes if. You look over on the right side of this you. Can see that in this network of. The 35 billion operations almost, all of them are convolution, which is based on dot products the rest are deconvolution, also, based on dot product and then, riilu and pooling which are relatively. Simple operations. So. If you were designing some hardware you'd clearly, target. Doing dot products which, are based on multiply, add and really. Kill that, but. Imagine that you sped it up by a factor of 10,000. So. 100% all of a sudden turns into 0.1%, 0.01. Percent and suddenly. The riilu and pooling operations, are going to be quite significant, so, our hardware doesn't our hardware design includes dedicated, resources, for processing, riilu and pooling as well. Now. This, chip is operating, in a thermally constrained, environment, so. We had to be very careful, about how we burn. That power we want to maximize the amount of arithmetic we can do so. We, picked integer. Add, it's. Nine. Times less energy than a corresponding, floating-point add, and. We picked 8-bit I 8-bit, integer, multiply, which, is significantly, less power than other multiply, operations, and is probably, enough. Accuracy. To get good results in. Terms of memory we chose to use SRAM, as much as possible and you can see there that going. Off chip to DRAM is approximately, a hundred times more expensive in, terms of energy consumption than. Using. Local SRAM so clearly we want to use the local SRAM as much as possible in. Terms, of control this, is data that was published in a paper by Mark Horowitz, at is SCC where he sort of critiqued. How much power it takes to execute a single instruction on, a regular introduced.
CPU And you can see that the add operation. Is. Only 0.15, cent percent of the total power all the rest of the power is control overhead and bookkeeping so, in our design reset to basically. Get rid of all that as much as possible because. What we're really interested in is arithmetic, so. Here's the design that we finished. You. Can see that it's dominated by the 32, megabytes of SRAM there's, big banks on the left and right and in the center bottom. And then, all the computing, is done in the upper middle every. Single clock we read. 256. Bytes of activation, data out of the SRAM array. 128. Bytes of weight data out. Of the stra memory and we combined it in. A in. A 96, by 96 mole, at array which. Performs, 9000, multiply ads per clock at, 2 gigahertz that's a total of 3.6. 336. Point err 8 tera ops. Now. When we're done with a dot product we unload the engine so that we shift the data out across. The dedicated, riilu unit optionally. Across a pooling, unit and then finally, into a write buffer where all the results get aggregated. Up and then we write out 128. Bytes per cycle back into the SRAM and this. Whole thing cycles, along all the time continuously. So, we're doing. Dot products while we're unloading previously results, doing. Pooling and writing back into the memory if. You add it all up to. Your Hertz you need one terabyte, per second, of SRAM bandwidth, to support all that work. And so the hardware supplies that so. One terabyte per second, a bandwidth per engine there's, two on the chip two terabytes, per second. The. Chip has the accelerator. Has a relatively, small instruction, set we, have a DMA read operation, to bring data in from memory we have a DMA write operation, to push results back out to memory we, have three dot product based instructions, convolution. Deconvolution, inner, product and then, two relatively simple a scale, is a one, input one output up operation. And L wise is two inputs and one output and, then of course stop when. You're done. We. Had to develop a neural network compiler, for this so we take the neural network that's been trained by our vision team as it, would be deployed in the older, cars and when, you take that and compile it for use on the new accelerator. The. Compiler does. Layer fusion, which allows. Us to maximize the computing, each time we read data out of the SRAM and put it back it. Also does some smoothing, so. That the demands, on the memory system aren't, too lumpy and, then we, also do, channel. Channel. Padding to reduce bang conflicts, and we do Bank aware esterday allocation, and this, is a case where we could, have put more Hardware in the design to handle Bank conflicts, but. By pushing it into software we save Hardware in power at. The cost of some software complexity, we. Also automatically. Insert DMAs, into the graph so, that data arrives, just in time for computing without having to stall the machine and then at the end we generate all the code we. Generate all the weight data we, compress it and we add a CRC. Check sum, for reliability. To. Run a program, all, the neural network descriptions, our programs. Are loaded into SRAM, at. The start and, then they sit there ready to go all the time so. To run a network you, have to program the address, of the input buffer which presumably, is a new image that just arrived from a camera, you. Set the output buffer address, you set the pointer to the network weights and then, you set go and then, the machine, goes off and will sequence, through the entire neural network all by itself usually. Running for a million, two million cycles, and, then when it's done you get an interrupt and can post-process, the results. So. Moving. On to results. We. Had a goal to stay under 100 watts this. Is measured data from cars driving around running the full autopilot stack and we're dissipating, 72, watts which, is a little bit more power. Than. The previous design but with the dramatic improvement, in performance it's, still a pretty good answer of.
That 72, watts about 15 watts is, being consumed running the neural networks, in. Terms. Of costs the, silicon cost of this solution is about 80% of what we were paying before so. We are saving money by switching, to this solution and, in terms of performance we took the narrow camera, neural, network which I've been talking about that has 35, billion operations, in it we, ran it on the old hardware in, a loop as quick as possible and we delivered 110, frames per second we, took the same data the same network compile. It for hardware for, the new FST computer, and. Using all four accelerators, we can get 2,300, frames per second processed so, a factor, of 21, I, think. This this is perhaps the most significant, slide. It's. Night and day. I've. Never worked on a project where the performance, increase was more than three. So. This, was pretty fun. If. You compare it to say, in videos drive Xavier's solution, a single, chip delivers. 21, ter ops our. Full scope of driving computer with two chips is 144. Ter ops. So. To. Conclude I. Think we've created a design that delivers outstanding performance. 144. Tariffs for a neural network processing, it has outstanding, power performance we managed to jam all of that performance into the thermal, budget that we had it, enables, a fully redundant computing, solution, it has a modest, cost and really, the important thing is that this FST, computer, will enable a new level of safety and autonomy, in Tesla's, vehicles, without impact, their cost or range something. That I think we're all looking forward to, yeah. I think, when. We do Q&A after, each. Segment. So if people have cute questions, about the hardware they can ask right now. The. The reason I asked. Pete to do just, a detailed. Far. More detailed and perhaps most people. Would. Appreciate. Dive. Into the Tesla full, self-driving computer, is because it, at. First it seems improbable how could it be that Tesla, who, has never designed a chip before we're. Designed the best trip in the world but. That is objectively, what has occurred not, not best, by a small margin best, by a huge, Roger. It's. In the cars right now all. Tesla's, being produced right now have this computer. We. Switched over from the. Nvidia. Solution, for, SMX about, a month ago and, it switched, over, model. 3 about 10 days ago all. Cars, being produced have, the have, all the hardware necessary compute. And otherwise for full self-driving. I'll. Say that a game all Tesla, cars being produced right now have. Everything, necessary for, full self-driving, all. You need to do is, improve the software and, later. Today you will drive the cars with. The, development, version of the improved software and you will see for yourselves. Questions. Repeat. A trip, to three global equities research very. Very impressive in every shape and form I was, wondering, like I've I took some notes you, are using activation, function. Arielle. You the rectify linear unit. But. If we think about the. Deep. Neural networks, it has multiple layers and some. Algorithms. May use different activation. Functions, for, different hidden layers like. Soft, Max or tan, H, do. You have flexibility. For. Incorporating. Different. Activation. Functions, rather than Lu in your platform then I have a follow-up yes we do we, have informations, of tan edge and sigmoid for example beautiful. One, last question, like. In the, nanometers. You mentioned, 14 nanometers, as I. Was. Wondering wouldn't, make sense to come little lower maybe 10 nanometers. Two years down or maybe seven at the time we started the design not all the IP that we wanted to purchase was available in ten nanometer we, finish, the design in 14.
It's. Maybe worth pointing out that we finished, this design like, maybe wanted to have two years ago and began design if the next generation, we're. Not talking about the next generation today, but we're. About half way through it. That. Will all. The things that are obvious for a next-generation chip, we're doing. Oh hi. You. Talked about the. Software. As the piece now you did a great job I was blown away understood. Ten percent of what you said but I trust, that it's in good hands. Thanks. So. It. Feels like you got the hardware pieces done and. That was really hard to do and now, you have to do the software, piece now. Maybe that's outside of your expertise, but how should we think about, that. Software, piece what. Can ask for better introduction, talk to Andre and Stuart I think yeah are there any funding any questions for the chip part before the next, part of the presentation is neural. Nets and. Software. So. Maybe I'm the chip side the, last slide was 144. Chileans. Of operations, per second, versus was it Nvidia 21, that's right and. Maybe can you just contextualize, that for. A finance person why. That's so significant, that gap thank. You well, I mean it's a factor of seven and performance, Delta so that means you can do seven. Times as many frames you can run neural networks that are seven times larger and more sophisticated so. It's, a it's a very big. Currency. That you can spend on on lots of interesting things to make the car better I think. That Savior power usage, is higher. Than ours Xavier powers, I. Don't. Know that, it's. Like the. The. Best. Of my knowledge the gnudi power requirements. Would increase. At least to. The same degree of factor of seven and and. Costs, would also increase by a factor of seven. So. Yeah I mean how, power is a real problem because it also reduces range so. It has the penalty for power is very high and. Then you have to get rid of that power by. The. Thermal, problem becomes really significant, because. You had to get rid of all that power. Thank. You very much I think we, have you know a lot of. Ask. The questions if, you guys don't mind the day of running but long this we're, gonna do that the drive demos afterwards, so if you've got if you if you. If anybody, needs to pop out and do drive demos a little, sooner you're welcome to do that I do want to make sure we answer your questions, yep. Pradeep, Romani from UBS Intel. And AMD to some extent have started moving towards a chip lock based architecture. I, did, not notice a chaplet, based design. Here do you think that looking. Forward, that would be something that might be of interest to you guys from an architecture, standpoint a chip based architecture, yes. We're. Not currently considering, anything like that I think that's mostly useful when you need to use different, styles of technology, so if you want to integrate silicon. Germanium or, DRAM, technology on, the same silicon. Substrate that gets pretty interesting but. Until. The die size gets. Obnoxious. I wouldn't. Go there. To. Be clear the, strategy, here in it this is the started you. Know basically three little over three years ago where's. Design. Build a computer that is, fully. Optimized. And aiming for full self-driving then. Write, software that, is designed to work specifically, on that computer, and get the most out of that computer, so, you have tailored. To hardware that. Is that is a master, of one trade self-driving. The. In-video. Is a great company but they have many customers, and so, when as they as they apply their resources, they need to do. A generalized, solution. We. Care about one thing self-driving. So. That it was designed to do that incredibly, well the, software's also designed to run on that hardware incredibly. Well. And. The combination, of the software and the hardware I think is unbeatable, I. The. Chip is designed to process video input. In, case you use let's, say lidar would, it be able, to process, that as well or is that is it. Primarily. For video. Conversion, explain to you today is that lidar. Is is. A fool's errand and and, anyone, luck relying on with lidar is doomed. Doomed. Expensive. Expensive. Sensors. That. On are unnecessary. It's like having a whole bunch of a expensive. Appendices. Like. A one appendix is bad well now there won't put a whole bunch of them that's ridiculous. You'll. See. So. Just two questions on, just on the power consumption. Is there way to maybe give, us like a rule of thumb on you, know every watt is reduces. Range by certain. Percent or a certain amount just, so we can get a sense of how much.
A. Model, 3 the the. Target consumption, is 250. Watts per mile. It. Depends on the nature of the driving as, to how many miles that effect in city it would have a much bigger effect than on highway so. You. Know if you're driving. For, an hour in a, city. And. You had a solution. Hypothetically, that. You. Know was it was it was a kilowatt you'd lose four, miles. On. A model three, so. If you're only going say, 12. Miles an hour then. That's like there were to be a 25-cent impact in range in city it's, basically, powers, of the, power that the the, power of the system has a massive impact on city range which. Is where we think of most most of the Robo taxi market will be. It. So as powers extremely, important. I'm. Sorry I didn't, hear thank, you. What's the primary design objective, of the next-generation ship, we. Don't, want to talk too much about the next-generation ship but it's. It'll. Be at least let's, say three times better than, the current system. That's. About two years away. Is. Is the chip being made you, you don't mean you facture the chip you contract, that out and. How much cost reduction, does that, save. In the overall vehicle cost, the. The. 20% cost reduction I cited was the the piece cost per, vehicle reduction. Not. That wasn't a development, cost I was just the actual yeah I'm saying but like if I'm manufacturing. These in mass is, this saving money in doing, it yourself. Yes. A little. Bit I mean most chips are made for, most people don't make chips with there aren't valve it's a pretty, unusual I. Think. You. Don't see any supply, issues, with, getting, the chip mass-produced the. Cost saving pays for, the development. I mean, the basic strategy, going to Elon was we're, gonna build this chip it's gonna reduce the costs, Anil, on said times. A million cars a year deal. That's. Correct yes. Sorry, if they're really chip specific questions we can answer them others there will be a Q&A opportunity. After after, Andre, talks and and, after Stuart, talks so. There will be two other Q&A opportunities, this is very. Chip specific, then. I'll. Be here all afternoon yeah, and exactly if he will be here at the end as well so. Are. You thanks. That. Died photo, you had there's the. Neural processor, takes up quite a bit of the die I'm curious is that your own design, or there's. Some external IP there yes, that was the custom design for by Tesla and, then I guess the follow-on would be there's. Probably a fair amount of opportunity, to reduce that footprint, as you tweak the design. It's. Actually quite dense. So. In terms of reducing it I don't think so it'll, will, greatly, enhance the functional. Capabilities in, the next generation, okay. And then last question can you share where you're you're fabbing this part. Well. What where are we family yet oh as. Samsung. Yes. Thank. You. I've. Granted knockity-knock Apple, just. Curious how. Defensible. Your, chip, technologies. And design is from it from, a IP, point, of view and. Hoping. That you won't won't be offering a lot of the IP the outside for free Thanks. We. Have filed on the order of a dozen patents on this technology. Fundamentally. It's linear algebra, which I don't think you can patent ah not. Sure. I. Think. If somebody, started today and, they, were really good they, might have something like what we have right now in three. Years. At. But in two years we'll have something something three times better. Talking. About the intellectual property protection, you have the best intellectual, property, and some. People just steal. It for, the fun of it I was, wondering if. We look at a few interactions. With Aurora. That companies, to industry. Believes they stole your intellectual. Property, I think, the key ingredient. That you need to protect is the weights that associate, to various parameters, do, you think your chip can do something, to, prevent, anybody. Maybe encrypt all the weights so that even you don't know what the way it's are at. The chip level so, that your intellectual property remains. Inside. It and nobody knows about it and nobody can, just, feel it.
When. I'd like to meet the person I could do that because they were out higher than heartbeat, yeah. It's a really hard problem. Yeah. I mean. We do encrypt the the. It's. It's a hard, trip to crack so, if they can crack it's very good so give any crack it and then also. Also figure out the software and the neural net system and everything else they. Can design it from scratch like. That's that's all it's, our intention to prevent people from stealing all that stuff I mean if, they do we hope it at least takes a long time it, will definitely take them a long time yeah. I mean, I felt like if we were if it was our goal to do that how would we do it you're very difficult. But. The thing that's I think a very powerful, sustainable. Advantage for us is the fleet nobody. Has the fleet those. Weights are constantly, being updated and, improved based. On billions of miles driven. Tesla. Has a hundred, times more cars with the. Full self-driving Hardware than everyone, else combined. You. Know we we, have. By. The end of this quarter we'll have 500,000. Cars with the full eight, camera set up twelve ultrasonics. Someone. Will still be on Hardware too but. We're still have the data gathering ability. And. Then by a year, from now we'll, have over, a million cars with full, self-driving, computer. Hardware, everything. Yeah. Should. We have fun it's just a massive data advantage it's similar to like you know how like, the. Google search engine has a massive advantage because, people use it and people. The people are programming, effectively. Program Google with the queries and the results. They just press. You on that and please reframe, the questions I'm a tackle, a man if it's appropriate but you, know when we talked to way mo or Nvidia. They do speak with equivalent, conviction, about their leadership because of their competence. In simulating. Miles. Driven can. You talk about the advantage of having real-world, miles versus, simulated, miles because I think they expressed, that you know by the time you get a million miles they can simulate a billion, and no, Formula One racecar driver for example could, ever successfully. Complete, a real-world track, without driving in a simulator can, you talk about the advantages, it. Sounds like the that you perceived, to have associated. With having data. Ingestion, coming, from real-world miles versus, simulated, miles. Absolutely. The, simulator. We. Have a quite. A good simulation, too but, it's just it. Does not capture the long tail of weird things that happen in the real world if the simulation fully, captured, the real world. Well. I mean. That would be proof that we're living in a simulation I think. Yeah. It. Doesn't I wish. But. It simulations. Do not capture the real world they. Don't the, real worlds really weird and messy, you, need the you need the cars. In the road. And. We actually it get it get into that in Andre and Stuart's presentation, yeah. So, okay when we move on to to Andre. The. Last question was actually a very good Segway because. One thing to remember about our F is the computer, is that it can run much. More complex, neural nets for, much, more precise, image, recognition and. To. Talk to you about how we actually get, that image data and how we analyze them we have our senior, director, of AI. Andre. Potty who's gonna explain, all of that to you, Andre, has a PhD. From Stanford University. Where. He studied computer science, focusing. On education recognition, and deep learning Andre. Why don't you just talk do your own intro yes there's a lot of PhDs from Stanford, that's not important, yes okay we don't care come on Thank. You. Andre. Started, the computer vision class at Stanford that's much more significant. That's what matters just a so if, you please talk. About your background in. A.
Way. That is not bashful. Just, tell. Me about the secreto yeah and then sure yeah, so yeah I think I've, been training neural networks basically for what is now a decade and these. Neural networks were not actually really, used in the industry until maybe five or six years ago so it's been some time that I've been trained these neural networks and that included you know institutions at Stanford at at. Opening. I at Google and really. Just training a lot of neural networks not just for images but also for natural language and designing. Architectures, that coupled those two modalities. For for, my PhD. So. Really computers computer science class oh yeah and at Stanford, actually taught the convolutional, neural networks class and. So I was the primary instructor for that class I actually started the course and designed, the entire curriculum so, in the beginning it was about 150, students and then it grew to 700, students over the next two or three years so it's a very popular class as one of the largest classes at Stanford right now so that was also really successful, I mean I under a is like really one of the best computer vision people in the world arguably, the best. Okay. Thank you. Yeah. So hello. Everyone so. Pete told you all about the chip that we've designed that runs neural networks in the car my. Team is responsible for training, of the these neural networks and that includes all of data collection from the fleet neural, network training and then some of the deployment on to that. So. What. Do then you know that works do exactly. In the car so, what we are seeing here is a stream. Of videos from across the vehicle across the car these, are eight cameras that send, us videos and then these neural networks are looking at those videos and are processing. Them and making predictions about what they're seeing and so, the some of the things that we're interested in there's some of the things you're seeing on this visualization here are lane, line markings other objects, the distances to those objects, what we call drawable space shown, in blue which is where the car is allowed to go and a, lot of other predictions like traffic lights traffic signs and so on. Now. For. My talk I will talk roughly, into in three stages so first I'm going to give you a short primer on neural networks and how they work and how they're trained and I, need to do this because I need to explain, in the second part why, it is such a big deal that we have the fleet and why it's so important, and why it's, a key enabling factor to really training this you know networks and making, them work effectively on, the roads and in the third stage I'll talk about a vision and lidar and how we can estimate depth, just from vision alone. So. The core problem that these networks are solving in the car is that a visual recognition so. Four unites these are very this is a very simple problem you. Can look at all of these four images and you can see that they contain a cello about an iguana, or scissors, so. This is very simple and effortless, for, us this, is not the case for computers and, the reason for that is that these images are to, a computer. Really. Just a massive grid of pixels and at. Each pixel you have the brightness value at, that point and so, instead of just, seeing an image a computer really gets a million numbers, in a grid that tell you the brightness values at all the positions the major rows if you will, it. Really is the matrix yeah. And, so, we have to go from that grid of pixels and brightness values into high level concepts like iguana and so on and as. You might imagine this, iguana has a certain pattern of brightness values but, iguanas, actually can take on many appearances so they can be in many different appearances, different, poses and different brightness conditions against. The different backgrounds you can have a different crops of that iguana and so we have to be robust across all those conditions and we have to understand, that all those different brightness power patterns. Actually correspond, to they go on us now, the reason you and I are very good at this is because we have a massive neural network inside, our heads, there's processing, those images so, light, hits the retina travels, to the back of your brain to the visual cortex and the, visual cortex consists of many neurons that are wired together and that, are doing all the pattern recognition on top of those images and. Really. Over the last I, would say about five years, the. State of the art approaches, to processing, images using computers have also. Started. To use neural, networks but in this case artificial, neural networks but, these. Artificial neural networks and this is just a cartoon, diagram of it are, a very rough mathematical, approximation, to your visual cortex we'll really do have neurons and they are connected together and here, I'm only showing three or four neurons in three or four in four, layers but, a typical neural network will have tens to, hundreds of millions of neurons and each neuron will have a thousand, connections so these are really large pieces.
Of Almost simulated, tissue and. Then, what we can do is we can take those neural networks and we can show them images so for example I can feed my iguana into, this neural network and the, network will make predictions about what it's seen now, in the beginning these neural networks are initialized completely, randomly so the connection strengths between all those different neurons are completely random and therefore, the predictions, of that network are also going to be completely random so, it might think that you're, actually looking at a boat right now and it's very unlikely that this is actually an iguana and during. The training during. A training process really what we're doing is we, know that that's actually in iguana we have a label so what we're doing is we're, basically saying, we'd. Like the probability of iguana to be larger for this image and the probability of all the other things to go down and, then, there's a mathematical process called back propagation stochastic, gradient descent that, allows us to back propagate that signal through those connections and, update. Every one of those connections. Sorry, and update, every one of those connections just a little amount and once. The update is complete the probability, of iguana for this image will go up a little bit so it might become 14 percent and the property of the other things will go down and of, course we don't just do this for this single image we actually have entire large data sets that are labeled so, we have lots of images typically you might have millions, of images thousands. Of labels or something like that and you are doing forward backward passes over and over again so you're showing the computer here's an image it has opinion and then you're saying this is the correct answer and it Tunes itself a little bit you repeat this millions, of times and you sometimes you show images the same image to the computer you, know hundreds of times as well so, the, network training typically, will take on the order of few hours or a few days depending, on how big of a network you're training and. That's. The process of training a neural network now, there's something very unintuitive about the way neural networks work that I have to really get into and that is that. They really do require a lot of these examples and they really do start from scratch they know nothing and it's really hard to wrap your head around it around this so. As an example here's a cute dog and you, probably may not know the breed of this dog but the, correct answer is that this is a Japanese spaniel now, all of us are looking at this and we're seeing Japanese, spaniel we're like okay I got it I understand kind of what, this Japanese spaniel looks like and if I show you a few more images of, other. Dogs you can probably pick out other Japanese spaniels here so in particular those three look like a Japanese spaniel and the other ones do not so. You can do this very quickly and you need one example but computers do not work like this they actually need a ton of data of Japanese spaniels so, this is a grid of Japanese spaniels showing, them you need a source of examples showing them in different poses different brightness conditions different backgrounds different crops. You, really need to teach the computer from, all the different angles what this Japanese spaniel looks like and it really requires all that data to get that to work otherwise, the computer can't pick up on that pattern automatically. So. With us all this imply about the setting of self-driving of course we don't care about dog breeds too much maybe we will at some point but for now we really care about Ling line markings objects. Where they are where we can drive and so on so. The way we do this is we don't have labels like iguana for images but we do have images from the fleet like this and we're interested in for example in line markings so, we, a human, typically, goes into an image and using. A mouse annotates. The lane line markings so here's an example of an annotation that a human could create a label for this image and it's. Saying that that's what you should be seeing in this image these are the line line markings and then, what we can do is we can go to the fleet and we can ask for more images from the fleet and if. You ask the fleet if you just do a neat job of this and you just ask for images at random the fleet might respond with images, like this typically.
Going Forward on some highway this, is what. You. Might just get like a random collection like this and we would annotate all that data. If you're not careful and you only annotate a random distribution of this data your. Network will kind of pick up on this this random, distribution on data and work only in that regime so, if you show it slightly different example. For. Example here is an image, that. Actually the road, is curving and it is a bit of a more residential neighborhood then. If you show the neural network this, image that network might make a prediction that is incorrect it might say that okay well I've seen lots of times on highways lanes, just go forward so here's a possible prediction and, of course this is very incorrect but, the. Neural network really can't be blamed it does not know that the Train on the the, tree on the left whether or not it matters or not it does not know if the car on the right matters or not towards, the lane line it does not know that the that. The buildings. In the background matter, or not it really starts completely from scratch and you, and I know that the truth is that none of those things matter what actually matters is that there are a few white, lane line markings over there and in a vanishing point and the fact that they curl a little bit should, pull the prediction. Except. There's, no mechanism by which we can just tell the neural network hey those Linga line markings actually matter the only tool, in the toolbox that we have is labelled, data so, what we do is mean to take images like this when the network fails and, we need to label them correctly, so in this case we will turn the, lane to the right and then, we need to feed lots of images of this to the neural net and neural, that over time will accumulate, will basically pick up on this pattern that those things there don't matter but, those leg line markings do and we learn to predict the correct lane. So. What's really critical is not just the scale of the data set we don't just want millions of images we actually need to do a really good job of covering the possible space of things, that the car might encounter on the roads so we need to teach the computer how to handle, scenarios. Where it's light and wet you, have all these different specular, reflections, and as you might imagine the, brightness patterns and these images will look very different we. Have to teach the computer how to deal with shadows how, to deal with Forks. In the road how. To deal with large objects, that might be taking up most of that image how, to deal with tunnels or how to do with construction sites and in, all these cases there's, no again explicit, mechanism to tell the network what to do we, only have massive amounts of data we want to source all those images and we, want to annotate, the correct lines and the network will pick up on the patterns of those, now. Large, and very datasets basically, make these networks work very well this is not just a finding for us here at Tesla this is a ubiquitous, finding, across the entire industry so. Experiments. And research from Google, from facebook from Baidu, from. Alphabets. Deepmind all show similar plots where, neural, networks really love data, and love scale and variety, as you, add more data these neural networks start to work better and get higher accuracies, for free so, more, data just makes them work better, now. A number. Of companies have a number of people have kind of pointed out that potentially we could use simulation, to actually achieve, the scale of the data sets and we're, in charge of a lot of the conditions here and maybe we can achieve some variety, in a simulator now, at Tesla and that was also kind of brought up into question questions.
Just, Just before this now, at Tesla this is actually a, screenshot. Of our own simulator we use simulation. Extensively. Who use it to develop and evaluate the software we've also even used it for training quite successfully, so but. Really when. It comes to training data from neural networks there really is no substitute for real data the. Simulator. Simulations. Have, a lot of trouble with modeling. Appearance physics, and the, behaviors of all the agents around you so. There. Are some examples to really try that point across the, real world really throws a lot of crazy, stuff at you so. In this case for example we have very complicated environments, with snow with trees with wind we, have various. Visual artifacts that are hard to simulate potentially, we, have complicated, construction sites, bushes. And, plastic. Bags that can go in that can kind. Of go around with the wind, complicated. Construction sites that might feature lots of people kids, animals all mixed in and simulating. How those things interact and flow through this construction zone might actually be completing completely, intractable it's, not about the movement of any one pedestrian in there it's about how they respond to each other and how those cars will, respond to each other and how they respond to you driving in that setting, and. All. Of those are actually really tricky to simulate it's, almost like you have to solve the self-driving problem, to, just simulate, other cars in your simulation so it's, really complicated so we have dogs, exotic. Animals and in, some cases it's not even that, you can't simulate it is that you can't even come up with yeah so for example I didn't, know that you can have truck on truck on truck like that but, in the real world you find this and you find lots of other things that are very hard to really, even come up with so, really the variety that I'm seeing in the data coming from the fleet is just. Crazy. With respect to what we have in a simulator we have a really good simulator. Like. Simulation. You're fundamentally a grain you're grading, your own homework so. You, you, know you if you know that you're gonna simulate it, okay. You can definitely solve for it but as Andre is saying you, don't know what you don't know the world is very, weird. And it has. Millions. Of corner cases, and. If. You somebody, can produce, a self-driving, simulation, that, accurately, matches reality that. In itself would be in a monumental. Achievement of, human. Capability they. Can't there's, no way. Yeah. So. I, think. The three points are I really try to drive home until, now are to. Get neural networks to work well you require these three essentials, you require a large data set a very, data set and a real data set and. If, you have those capabilities you, can actually train your networks and make them work very well and so. Why is Tesla in such a unique and interesting position, to really get all these three essentials right and. The answer to that of course is the fleet. We. Can really source data from it and make our neural network systems work extremely well so. Let me take you through a concrete example of for. Example making. The object detector work better to give you a sense of how we develop these in all that works how we iterate on them and how we actually get them to work overtime so. Object detection is something we care a lot about we'd like to put bounding boxes around say the cars and the objects, here because we need to track them and we to understand how they might move around, so, again, we might ask human annotators to give us some annotations, for these and humans. Might go in and might tell you that ok those patterns over there are cars and bicycles, and so on and you can train your no network on this but, if you're not careful the, neural network all will make miss predictions in some cases so as an example if we stumble by a car like this that has a bike on the back of it then, the neural network actually went when I joined, would.
Actually Create two deductions. It would create a car, deduction and a bicycle deduction and that's, actually kind of correct, because I guess both of those objects actually, exist but for the purposes of the controller and a planner downstream, you really don't want to deal. With the fact that this bicycle can go with the car the truth is that, that bike, is attached to that car so in terms of like just objects on the road there's a single object a single, car and so, what you'd like to do now is you'd like to just potentially annotate, lots, of those images as this is just a single car so. The process that we that we go through internally, in the team is that. We take this image or a few, images that show this pattern, and we, have a mechanism a machine learning mechanism by which we can ask the fleet to, source us examples, that look like that and the. Fleet might respond with images that contains those patterns so as an example these six images might come from the fleet they all contain bikes, on backs of cars and. We. Would go in and we would annotate all those as just a single car and then, the performance of that detector, actually improves and the network, internally, understands, that hey when the bike is just attached to the car that's actually just a single car and it can learn that given enough examples and that's how we've sort of fixed that problem, I will. Mention that I talked quite a bit about sourcing, data from the fleet I just want to make a quick point that we've designed this from, the beginning with privacy in mind and all the data that we used for training is anonymized. Now. The fleet doesn't just respond with bicycles, on backs of cars we look for all the things we look for lots of things all the time so for, example we look for boats and the fleet can respond with boats we look from construction sites and the, fleet can send us lots of construction sites from across the world we, look for even slightly, more rare cases so for example finding debris on the road is, pretty important to us, so these are examples of images that have streamed, to us from the fleet that show tires, cones. Plastic. Bags and things like that if we can source these at scale we can annotate them correctly and then your network will learn how to deal with them in the world here's. Another example animals. Of course also a very rare occurrence an event but we wanted a neural network to really understand what's going on here that these are animals and we want to deal with that correctly, so. To summarize, the. Process. By, which we iterate on neural network predictions looked something like this we, start with a seed data set that was potentially sourced at random we, annotate that data set and then, we train your networks on that data set and put, that in the car and then. We, have mechanisms by which we notice inaccuracies, in the car when, this detector may, be misbehaving, so for example if we. Detect that the neural network might be uncertain, or if, we detect that. Or. If there's a driver intervention, or any of those settings we can create this trigger infrastructure, that sends us data of those inaccuracies and so, for example if we don't, perform very well on Lane line detection on tunnels then we can notice, that there's a problem in tunnels that, image would enter our unit tests so we can verify that we've actually fixing the problem over time but, now what you do is to fix this inaccuracy. You need to source many more examples that look like that so, we asked the fleet to please send us many more tunnels and then, we label all those tunnels correctly, we incorporate that into the training set and we retrain the network redeploy. And iterate. The cycle over and over again and so. We refer to this iterative, process by which we improve these predictions, as the, data engine so. Iteratively. Deploying. Something potentially in shadow mode, sourcing. Inaccuracies, and incorporating, the training set over and over again and we do this basically for all the predictions of these neural networks now. So, far I've talked, about a lot of explicit, labeling, so. Like. I mentioned we asked people to annotate data this. Is an expensive process in time, and also. Respect. Oh yeah, it's just an expensive process and so, these annotations, of course can be very. Expensive to achieve, so, what I want to talk about also is really. To utilize the power of the fleet you don't want to go through this human annotation bottleneck you want to just stream in data and automate it automatically, and we have multiple mechanisms by which we can do this so. As one example of a project that we recently. Worked. On is the, detection of currents so you're driving down the highway someone. Is on the left or on the right and they, cut in in front of you into your lane so, here's a video showing.
The Autopilot detecting, that this car, is, intruding, into our lane now. Of, course we'd like to detect a current as fast as possible, so the, way we approach this problem is we don't write explicit, a code, for is the left blinker on is a right blinker on track. The keyboard over time and see if it's moving horizontally we. Actually use a fleet learning approach so the way this works is we, ask. The, fleet to please send us data whenever they see a car transition, from a right lane to the center or from left to Center and then what, we do is we rewind. Time backwards and we, automatically, can annotate that hey that car will turn will in 1.3 seconds cut in in front of the on preview and then we can use that for training than your lat and so, the neural net will automatically, pick up on a lot of these patterns, so for example the cars are typically Yod then, moving this way maybe the blinker is on all that stuff happens internally inside the neural net just, from these examples so, we ask the fleet to automatically, send out all this data we can get half a million or so images and all. Of these would be annotated for currents and then, we train the network and. Then we took, this cut in network and we deployed it to the fleet but we don't turn it on yet we run it in shadow mode and in. Shadow mode the network is always making predictions hey I think this, vehicle is going to cut in from the way it looks this vehicle is going to cut in and then we look for mispredictions so. As an example this. Is an clip, that, we had from shadow mode of the cutting Network and it's. Kind of hard to see but the network thought that the vehicle right ahead of us and on the right is going to cut in and you can sort of see that it's it's slightly flirting, where the lane line is trying, to it's sort of encroaching a little bit and the network got excited and it thought that that was going to be current that vehicle will actually end up in our center lane that turns out to be incorrect and the vehicle did not actually do that so, what we do now is we just turn, the data engine we, source that. Ran in the shadow mode is making predictions it makes some false positives and there are some false negative detections so, we got overexcited, and sometimes and sometimes we missed a current when it actually happened all, those create a trigger that, streams to us and that, gets incorporated now, for free there's no humans, harmed in the process of labeling this data incorporated. For free into our training set we retrain the network and redeploy the shadow mode and so, we can spin, this a few times and we always look at the false positives and negatives coming from the fleet and once, we're happy with the false positive false negative 3 sure we actually flip the bit and actually let. The car, control. To that Network and so you may have noticed we actually shipped one of our first versions of a copy intact architecture. Approximately. I think three months ago so if you've noticed that the car is much better at detecting currents that's, fleet learning operating. At scale. Yes. It actually works quite nicely so. Let's. Plate learning no humans were harmed in the process it's, just a lot of neural network training based on data and a lot of shadow mode. Looking at those results, another. Base essentially like. Everyone's. Training, the network all the time is what it amounts to whether that whether order to order Pollux on or off the. Network is being trained, every. Mile that's driven, for. The car th