Preparing for a Post Moore's Law World (MICRO-2016 Keynote)

Preparing for a Post Moore's Law World (MICRO-2016 Keynote)

Show Video

professor Austin and I'm giving the keynote I gave it micro last month I gave the keynote at the micro symposium in Hawaii this is the exact same talks I'm just giving it again and we're recording it I wanted to get a recording of it for various purpose Tara t purposes so I think it's ok I think I wonder if people want to ask questions go ahead otherwise you can ask questions at the end but I think it's ok to ask questions during this title is preparing for a post more Moore's law world this talk is about what are the implications to computer architecture when Moore's law ends which is as I'll show you this talk is happening right now Moore's Law is important because it's the primary means that we scale power performance costs and computing systems and that scaling is the value that we create as architects what we sell when we design a computer this talk is all about how do we continue that scaling once Moore's law ends and how do I get a perspective on that in 2013 I got awarded a very large Center here at University of Michigan we've got 27 faculty from 14 universities we support over 90 90 students and for the first time in my career I had to stop thinking just about my research and think more broadly about the research of the community and that really led me to these topics I'm talk about today so I'm going to talk about the problem but I'm also going to talk about solutions to the problem and strangely not all of them come from the architecture world in fact probably the most powerful solutions come from the cad e da world as I'll talk about later so see far is the center that I had that focuses on this problem of scaling the goal of the center is really to serve tomorrow's applications things like computer vision machine learning big data analytics there's lots of algorithms out there that are applications out there that can use scaled systems but the big challenges you know how do we innovate in a way that's effectively scaling and how do we do that when silicon is really losing its value we'll talk about that and then how do we do that on top of silicon that's even becoming unreliable today so there's a lot of great challenges makes for a very interesting time in computer architecture and then I'll just say I won't mention it over and over again but all the work in this talk is work from see fire faculty now all keynotes all architecture keynotes by union rules must start with Moore's Law graph and it's a good idea to start with the Morris log graph because Moore's Law is the fuel that drives the engine of the computing industry what is Moore's Law Moore's Law is this 2x density every 18 months this green line here over decades we've seen by shrinking transistors we get double the density for the same cost roughly every 18 months and with that for a long time we saw this up this blue line here this purple line we saw that the speed of the circuits increased in a similar manner as well as their power scaling their power density as well as instruction level parallelism but around two thousand to two thousand five you can see these three performance aspects started to drop off even though the density scaling continued on when the frequency topped off about 2005 the impact there was a lack of perceived value in uni processors today you new processors are not we don't even sell you new processors today anymore and when the power density scaling stopped that was the start of dark silicon I'll talk about that in a little bit and then finally when the ILP topped out that was really when we had to move away from micro architecture and start looking at system architecture as a means to create scalable systems a real a key number i want you to see here is if we draw this line out and look at what the gap is today it's about 10x head these head the performance metrics continue to scale with density for the last ten years our uni processors would be ten times faster or use one-tenth the power so that gap is what we need to bridge somehow if we want to continue to create value in the form of scalability now normally people when they show this graph they go but isn't it beautiful that density keep scaling we've got more transistors but i'll show you in this graph that Moore's law is ending and it's ending right now here we see a log scale of density of dimensions minimum transistor dimensions from 180 to 130 90 6532 2214 that's the current advance node today these are Intel technology and on the bottom here I've got the street date when Intel first cells that transistor that's when they appear that's when you can buy them and so I've taken four generations here 182 45 i drew this gray line here and then I extended it with the red line that's actually Moore's law scalability that's 2x every 18 months but if you look from 45 to 14 and then extend that line out you'll see we've moved from Moore's law to Moores crawl 2x every 36 months and recently intel announced there's a five or six quarters slip in 10 nanometer which is going to take us to even a slower increase in density now moore's law never really stops it slows down and it doesn't become exponential anymore it becomes linear and the transformation that will happen here is from one of density scaling to innovation on the technology device material side so for example we saw a recent advance like in the form of FinFETs where by changing the geometry of the transistors we get an advancement and now there's the pin faites coming and there's better material and lots of things but what we'll see is that we won't get the 2x devices every 18 months ever again we're going to incremental increases in the future in our silicon now will we switch over to non-silicon unlikely the investment in silicon is too great will the beam on silicon devices integrated perhaps but again those aren't going to provide the scalability that silicon has over the last 40 years those are still made out of atoms and we're really at the atomic density here at seven nanometers you're really you're talking only 14 atoms in the in the width of a minimum size transistor it's just not a lot further down you can go from there so what do we do we're losing all of these benefits that architects have been lavished with over decades now just to recap value the thing that you make money on when you sell to a customer has always been scalability in our community faster computers lower power computers cheaper computers and we're losing the technology scaling side density power density as well as transistor speed we're losing all those things so what does that mean it means the architects you got to innovate or you're going to die now what does it mean when a community dies it means the community isn't very important anymore nobody really cares about it students don't go into it people don't invest in it there's no startups in that community we're already seeing this happen in the hardware community today I'll show some of that our community is dying because we're not able to fill that gap with innovation so the goal of this talk is really talk about how do we restart this innovation and make it happen more quickly more efficiently all right well architects are nimble bunch and when we first found out that we weren't going to be able to scale at the Uni processor level the solution that came about was the chip multiprocessor so let's take a look at the chip multiprocessor we're ten years down the road let's do an assessment how well has it done so chip multiprocessor we're going to put multiple similar cores it's a homogeneous parallel architecture these are the same cores communicating through some shared infrastructure some shared memory system there's a great paper I recommend everybody read it by hadi asthma's l dot zadeh and Georgia Tech this is when he was grass soon at UT Austin where stark silicon in the end of the multi-core scaling where they did a study in this paper where they looked at homogeneous parallel architectures and they scaled them over time accounting for both technology advances architectural advances and application advances and they did this for the parsec benchmarks which are very parallel benchmarks so there's really sort of a best-case scenario in terms of the software you're running on it and this green line is Moore's law scalability over time so we'd want our homogeneous parallel systems to follow that line if they were able to deliver scalability that would help close that gap but what they found is that they were able to get thirteen percent more performance every 18 months for a variety of reasons which we're kind of tease out now one of which is dark silicon but there's many other problems as well so you know that on the technology side you know it doesn't look like homogeneous parallelism can really save us I think it's also interesting to look because we create value and try to sell that value I think it's also important for us to look at how does the press and the public see the value of our work so i went to google and i typed in our computers getting faster and this is what i found why CPUs aren't getting any faster Moore's lies and making chips cheaper anymore our new laptops really faster as it turns out not really do I even care about fast processors anymore and then one of my colleagues Valeri bro taco said well that's a kind of a negative question why don't you put in a more positive question into google and see if you can get a better result so I typed in computers are getting really fast and here's what I got the death of CPU scaling from one quarter many and why we're still stuck why aren't process getting faster and then finally a glimmer of hope how fast does your pc really need to be maybe we don't need performance i disagree with that assessment but the perception of what we're doing is very poor i think we can all agree on that from the public so who's the blame who's the blame for the ineffectiveness of multi chip multi-core computing as a means to innovate our way back to moore's law scalability well some people say blame the programmer but i say nay we can't blame the programmer why because a programmers are very good at extracting parallelism and where there's a desire to have parallelism they've done a fantastic job so there's niches where parallelism is a plenty and we're still not getting the scalability we want here's an example of one bitcoin mining is a 2000 square-foot warehouse just contains GP GP use it does that one pedda hash per second reportedly generates eight million bitcoins per month and there's many other things that you can wear out scaling scale computing another space where you've seen parallelism really take off by the way we're house scale computing that is what a testament to the poor work of the architects in that instead of scale lean on the chip they scale out of the chip and they say we we don't care what's on the chip we just want to build our own big massive parallel computers and they pair and they program very effectively so I say no programmers are doing a great job so then people say well let's get let's blame the educators this is the guy that taught me parallel programming that's this is Jim Goodman from University of Wisconsin let's blame the educators and I say nay again CES is booming booming everybody wants to be in CS is a fastest growing area in the university here at Michigan here you can see CS enrollment over time it's really on its own Moore's Law trajectory I mean we meet every semester to try and figure out how we're going to handle all these students oh these two lines EE and CS see more about this later but you can see that the hardware side is very flat over a long period of time and we've got lots of classes at Michigan the teach parallel program both undergrads and the graduate students so there's a lot of opportunity to learn about parallel program and people who know me know I've been working in Ethiopia since 2009 even their most popular most popular degree in the university is this CS degrees first most popular civil engineering because there's a huge infrastructure boom in the country so there's lots of jobs in that space but the world over we're making lots of programmers and we're teaching them how to do parallel programming so it's not really a programmer I don't think we can blame our educators they're producing the good programmer so well maybe the transistor is to blame and I will say yay the transistor is definitely to blame let's see why these some slides from michael taylor at UCSD explaining the problem of dark silicon which has to do with transistors when they get smaller today don't use less power at the rate they used to use less power so consequently if we can't scale the amount of energy we shove into the chip we need to turn on fewer transistors that's the dark silicon problem traditionally when you would scale to X and density that means that the dimensions got / 1.44 in both dimensions you would have roughly two times as many transistors and then each of them would be about 1.4 times faster because it was smaller had less charge responded more quickly about a 3x improvement in computational performance per unit area well if they all use the same amount of power we'd have to have three times as much power going on chip so instead of three amps you'd have to have 9 amps and you know and these chips the technologies for bringing more power on chip have advanced that at the slowest rate you could possibly imagine but fortunately because of Dennard scaling there's denard there he noted that if they're if they're running faster because they're smaller they're also using a similar less amount of power because they're smaller there's let's charge in their channels and then here was the kicker we could make up the rest by dropping voltage by a factor of 1.44

if you drop voltage by a factor 1.44 you cut the amount of energy going on chip in half and you got back to the original point we've had three times as much computational power at the same amount of energy going in but then something happened with this guy right here the dielectrics the dielectrics between the gate and the channel got so thin that in order to lower VDD to do this step right here we had to lower the threshold voltage and the way you lower threshold voltage is you make that thinner and that got so thin that the leakage started to spike up so now if we drop threshold voltage leakage goes up so your dynamic is still scaling down but your leakage is shooting up and you still need more energy to go on chip you've lost this component and the only way we can deal with this is by turning off a larger fraction of transistors every generation and that's not a good solution for chip multiprocessors chip multiprocessors need everything running all the time so this really flies in the face of that particular design so yeah transistor you are letting us down big time but I say there's one other person to blame the architects do you recognize that guy that's G Nam doll in 65 when I was born this guy started talking about I'm dolls law he didn't call it on dolls law but he he talked about how as we expose parallelism will get so good at it that eventually it won't be useful at all because the performance of the program will be determined by the parts that we didn't paralyze now remember that number I had earlier in the talk the 10x where does 10x place us in terms of dolls law what do we have to accomplish 10x puts us right about here it says that if you want 10x performance improvement with homogeneous parallel chip multiprocessors you need to paralyze ninety percent of the execution and you need to run it on upwards of the thousand processors that's how you get 10x that seems like a pretty ridiculously hard goal for a parallel system and oh by the way in 18 months you're going to have to do ninety-five percent on four thousand processors okay so it's not a sustainable scalability approach to rely solely on parallelism we don't want to throw away parallelism it's a great way to provide speed ups but it is not a solution to Moore's law scalability gap it's only a part of the puzzle so yeah architects we really should have known back in the 60s the homogeneous parallel architectures we're going to save our bacon so what is the solution what do we need to concentrate on and I say the solution is this guy right here well not this guy actually but the research this guy did this is Jason Clemens he was one of my PhD students and he did research on how do you accelerate the performance of computer vision algorithms on embedded hardware which is a really difficult problem because computer vision is very computationally intensive algorithm and when you say you want to do it on an embedded platform then you're not only worried about performance you also want to do it with energy efficiency it's kind of the worst possible scenario to deliver that and so Jason and I started working together on this problem and we soon figured out that you know we really need help in the application space because the vision algorithms are it's a very broad space that draws from image processing from machine learning and also from computer vision they have their own host of algorithms so we started working also with Sylvia sava razy and you know Jason you know I told them when people ask you our unit architect say yes when they ask you am i you computer vision researcher say yes you got to really have both those things and what he did is he built an architecture that was tailored to the particular space it was a heterogeneous parallel architecture it was a parallel architecture that was customized to do well on the serial components that were left over after you paralyzed this was the kind of a the magnum opus of his thesis which was the Eva architecture of the embedded vision architecture and I'll just give you three things that it does that provides a lot of value because it understands the vision application space for one thing it's a heterogeneous multi-core so it's composed of a couple big powerful cores that are served serving stuff to a lot of very small cores in that environment you can expose a lot of parallelism and if you can tease out which threads need capable cores versus simple course you can do an assignment and you can get more performance / energy more performance per area which is also important in the embedded space second thing you had is ahead application specific functional units monopoly compare dot product unit vector max decision tree compare simple little accelerators that could be applied across a large number of algorithms and provide very good speed up by simply reducing the latency of some of these long non-parallel kernels and then finally one of the things he did is he taught the memory system how to do really well at a style of algorithm called a stencil a stencil is an algorithm that doesn't deal with a data element it deals with a region ask two dimensional region of data and in fact these particular algorithms tend to have a velocity vector associated with that and so you can do a really good job of doing two-dimensional prefetching to increase spatial locality significantly that lowers energy as well as improves performance of the memory system so there's three examples of how you can serve this space but here's the thing that I really want to leave you here he found a 90 x greater efficiency over an embedded cpu so 90 x efficiency efficiency means he did less work by a factor of 90 he can spend that either a performance improvement or in energy savings or anything in between but you'll note he got that 10x gap closed and then he went nine x passed it and that was just pure innovation innovation that combined parallelism with customization for a particular application space so where we need to focus as a community now is on heterogeneous parallel architectures we need to embrace parallelism parallelism is a good thing and we need to embrace customization customization is a good thing but for the first time we really need to start being serious about bringing those two things that normally don't fit well together together in tandem because those heterogeneous parallel systems they overcome dark silicon because they concentrate computations of the smallest component possible I don't need all my transistors working at the same time and they overcome the tyranny of on dell's law because where there is cereal aspects of my program I can apply acceleration to those things now normally in a talk like this I would spend the rest of the talk telling you what are good ways to do heterogeneous parallel design but there's a little twist in the middle is talk because I'm not going to do that why because I know architects are smart and they're going to figure all this stuff out in fact it's not that interesting to me to know how to do this for all the different applications that are out there people will figure that out it's not that hard and the results are really good and for decades we've seen that the results when you specialized for an application are really good what I want to do what I want to do now is I want to tell you why there's a good chance that this will probably never happen and to understand that you need to understand what I like to call the good the bad and the ugly of today's silligan first the good we saw the good hetero parallel systems they can close Moore's Law at you can get ten fifty a hundred X out of those architectures and I know this community can build those architectures those designs will exist here's the bad we're losing Dennard scaling Moore's Law is slowing so there's this big gap but I am convinced that this is so powerful it'll overcome the bad the good overcomes the bad now the ugly who knows ugly the ugly is you'll never be able to afford it because the cost of design is so expensive today that is absurd to think that you could tailor and architecture in the future on any generation of silicon within the past 10 years and serve it up to a market it's just simply too expensive to do hardware design today and ultimately the cost of hardware design and the fact that it's so expensive and the rate which is growing is killing our community the number one thing we need to work on as community the most important problem we need to work on is how do we reduce the cost of design if we can't get a hold on the cost of design and reduce that cost we will die because our solutions which are going to be hetero parallel will be too expensive to deploy so here's what I want you to remember out of this talk today that bridging a Moore's Law performance gap is less about how to do it I know you guys will figure out how to do it it's more about how much does it cost if it costs a hundred million dollars to get your solution out there then your solution has at are going to market that can serve up many hundreds of millions of dollars of revenue and that's a very unlikely market for any specialized device and my claim what I think is the solution of this problem is if we can get a hundred x reduction in the cost of taking an idea to the market that's design verification manufacturing and delivery then innovation will flourish and all these scaling challenges become simply by restarting innovation in our community so how bad is the cost of design it's really bad this is travelling this is looking at 500 an ohmmeter down to 20 nanometer and I'm tracking here hardware design and verification software design and verification and manufacturing mass costs in 20 nanometer which is a legging technology that's a previous generation technology it's estimated a hundred and twenty million dollars to bring an idea to the market so if you're a startup you're going to need probably three rounds of funding to bring an idea to the market very unlikely scenario in any startup to get three rounds of funding we're talking 45 million for hardware design or 55 million for hardware design verification 45 for the software and twenty million dollars just for the masks just to produce that custom piece of silicon that seems really ridiculous especially when you consider the software world you see that right there that's five hundred thousand dollars what can you do with five hundred thousand dollars in the software world well for instance you can make a company like Instagram and deliver your product to the market for five hundred thousand dollars Instagram is a 35 billion dollar company it's worth twice as much as Nvidia three times as much as xilinx and four times as much as synopsis and it was started with five hundred thousand dollars good luck trying to start a hardware company with five hundred thousand dollars you're going to need at least a hundred million dollars to start a hardware company and as a result there's no innovation in our field there's no innovation I talked to CEOs and CTOs of hardware companies and they don't want my Center doing anything that would help produce startups because they can't remember a time when big companies bought little startups to create technology and to acquire that technology they can't remember that anymore talk to an executive at Microsoft or Google and its exact opposite they want a flourishing startup community so they can acquire technology because they know they can't grow all their own technology because of this really expensive cost nano diversity the number of designs in the world is steadily decreasing over time this is a six starts over time this is the number of unique designs that happen in a particular year back in the late 90s there were about 10,000 designs per year started and then with the dot boom there was a huge drop-off and we've been on a pretty steady decline today we're on roughly a three percent decline per year in start in new asics starts and we're at about 2,000 in the last year 2080 six starts 2000 huge planet billions of people 2,000 new designs that people are attempting to take to market 2000 it's nothing how many software startups over there last year probably in the millions all right so you let's say you don't believe me maybe mother nature can convince you in biology there's this theory called the RK selection theory where r is the number of babies you have and k is the amount of quality raising you put into those babies I got a lot of kids and I can tell you that when you have more kids you spend less time with them right now the arcade selection theory says that when the environment is unstable everything around you is changing the species that produce more babies do better than the species that have less and put more time into their babies why because with more babies there's more opportunities that someone's going to adapt to the quickly changing environment in a stable environment the case election species are more tend to thrive better because they're raised to compete very effectively in a stable environment where everyone's competing for resources very efficiently so for example on this on the Safari the elephants are very efficient at competing for food and they raise few babies and train them very well well the analogy here is having babies is like design that's how nature designs new things is through procreation and it stable environments we know the rats are going to do great and then stable environments or know the elephants are going to do great what kind of environment is the silicon world today is it a stable environment there's an unstable environment I would claim it's highly unstable costs are changing dramatically reliability issues are raising we're losing Dennard scaling extremely unstable but what kind of designs do we do if you're sticking a hundred-million-dollar into design you're really putting some care and love into your child the kind of design that would be more like our selection would be $500,000 many many many cheap designs kind of like they have over in the software world much different than what we have so really we're like a bunch of elephants in an unstable environment we're like a bunch of elephants walking into a zombie apocalypse and it's not going to end well for the elephants so it's really a bad match we want to be able to design very cheaply so we can have many potential solutions to overcome challenges in silicon and you know reliability and new applications etc so let's talk about some what I would claim our better remedies places where we need to focus as a community now for me it's all about accelerating innovation and system innovation I don't care about microarchitectures anymore we've done microarchitectures we perfected the microarchitecture they're small tweaks are going to do but today it's now about system architecture how do we take these components that we built and perfected over many times and compose them together to create better systems and where we need to create new components is when they're demanded by the applications that they serve and how do we make it really cheap so that anyone can do it anywhere well anyone anywhere with a little bit of resources not a hundred million but let's say if you had like one rich uncle you could do a hardware startup that's the world that I would like to see I think that world would be a dramatic change for all the problems we have in hardware design so I'm going to got five proposals on things we can do one is we need to expect more from architecture I'll have a slide on each of these two we got to reduce the cost of designing custom hardware three we need to embrace open source for we need to widen the applicability of custom hardware and five we need to reduce the cost of hardware manufacturing now most architects ago hardware manufacturing what do I have to do with that i'm going to show you that i think the key to reducing the cost of hardware manufacturing doesn't come from the technology side it comes from the architecture side all right expect more from architectural innovation one of the ways we're going to get back and close the gap the Moore's Law gap is by just doing better as architects I grew up in a time when if you wanted to get a Niska paper you need to have fifteen percent and you could get a Niska paper I remember people quoting the fifteen percent you gotta have fifteen percent speed up it's going to be tough to close at 10x with fifteen percent it's even more tough if you're at Intel it until they always taught us I need one percent speed up for one percent area oh my gosh that's a long way to get that 10x out of one percent now why do they want this rule it's because it loud performance to track density changes right perfect one percent one percent since i started see far i've been saying to everybody your idea has to deliver 2x or more or let somebody else pay for it you can do that work just don't do that in my Center and I've really seen a dramatic shift we're now I have lots and lots of ideas that are up there beyond the two so let me show you a couple here's one of my favorite ones this is from David Brooks at Harvard this is the helix up work it appeared in hpc a bout a year ago now i love this work because this is a compiler optimization that provides 2x in the hardware world we have something called Moore's Law we expect density and traditionally performance to scale at a rate of two x every 18 months in the compiler world they have what's called probst teens law which says that the performance of a compiled program will double with compiler technology every 18 years they don't expect a lot from the compiler in terms of performance but here's a way you can get your compiler to give you a 2x this is a paralyzing compiler this helix up compiler it takes a serial program and decomposes into threads one of the challenges there is trying to find where all the dependencies between these threads at compile time so what this does is it does a profiling of that tries to find where there's no dependencies between possible points in the program as many other parallels in compilers do and then it does one other thing that other parallelizing compilers have not yet done it tries to eliminate dependencies that it sees and then looks at the impact it has on the quality of the output of the program so the question is asking is are there dependencies which are slowing my program but when I eliminate them they have minimal impact on the correctness of the result and the answer is there are a number of dependencies that do exist and when you eliminating you get a roughly a 2x speed up over parallelizing compilers and you get about a four percent reduction in the quality of the output so if you sort of set that bar hire people will do more dramatic things and and and generate more understanding on how to do scalability through innovation here's some other work that's by David excuse me Kevin scattering at university of virginia where he wants to do efficient data mining algar rhythms and he's doing that by integrating them onto a dram chip so he gets massive parallelism and he reduces the latency of accessing his very large data sets he does this by working with micron they integrated finite state automaton into the road buffers of a dram and they compute with those fsms to implement these data mining algorithms they get 90 x over a cpu but they get 2x and greater over the best known algorithms on cmps and GPUs and so projects like this have seen a resurgence again in near memory computing there's another opportunity to provide a lot of value the second recommendation i have is let's reduce the cost to design custom hardware and and that's really all about high level synthesis which is a technology has been around for a long time but now we got to get serious about it serious in the sense that we invest in high level synthesis serious in the sense that architects start using high level synthesis here's an example of a project that's getting traction on both sides this is the aladdin project again from david brooks at harvard where he'll take c code descriptions of the algorithms combined with a variety of architectural constraints that this particular accelerator has to live under and then this will do a design space search analysis to find the Pareto optimal designs under those constraints it does almost as good as hand-built designs and very soon these technologies will surpass what humans can do just like that happen in the compiler world in addition to this you're also going to need if you want to design effectively and inexpensively we're going to need good benchmarks to help us understand what are tomorrow's killer applications and a great example this is the cortex suite from mike taylor at UC san diego the cortex week brings together a lot of next generation algorithms you'd normally associate with humans doing by hand these are starting to move into future next generation algorithms things like learning and and and selection language processing computer vision computational photography a variety of very cool and next generation algorithms Michael's doing a really great job maintaining this both in that he tries to make it as simple as possible to allow people to refactor these codes and to wrangle them in a way that it works well with your particular designs as well as keeping it evergreen in the last two years he said he's added four new algorithms to this suite and he'll continue to keep this evergreen a really great benchmark suite I recommend people take a look at the third thing I want to encourage our community to do is to start thinking seriously about embracing open-source concepts we need to have hardware IP that is free when I talk to my colleagues in industry this is really an anathema to what they see as the future of computing they want their IP to be locked down and sold and that's how we're going to make our money sell an IP well what I want to tell you is that that a successful design community is more sophisticated than that it says that there's a there's a line in the sand where everything behind that line is stuff that we don't have to pay for any more why do we have to pay for a USB 3.0 interface in hardware I mean that's old technology that everybody spends money again and again on perfecting why can't we just have one that we built and share and if we want to do changes to that we can do that so we need to have a line where everything below that line is free everything above that line is advancements and deserves to be paid for this is what they do in the software world and it works really well for them you can build a lot of infrastructure starting with significant amount of free and open-source intellectual property okay so let's just do a little thought experiment to see how bad things are in the Sun in the hardware world today compared to the software world let's say you want to build the next great smartphone you're going to need an SOC hardware SOC and then on that hardware so see you're going to need software to run on ok so what are we going to do we're going to build our hardware so safe this is my big hardware so see I've got some multi I got a couple arm CPUs in here I've got a video coder decoder I've got a graphics GPGPU in there I've got some audio processing I've got some encryption I've got some screen drivers in there as well so let's not what I'm going to do here is I'm going to I'm going to mark everything that we need to build from scratch or purchase from another vendor in red and everything that's free and open source in green so let's see that ok so let's see yeah everything we got to pay for or build ourselves right everything there's nothing we we value no IP is free none 00 now when we get done with this SOC we're going to slap some software on it and so let's use Android that's a really popular target so I enjoyed has applications the application framework the library's the runtime the linux kernel all right let's see what do we have to build ourselves well we're gonna have to build a couple drivers for stuff that specialized on our board and we're gonna have to put that special sauce on there like maybe we've got our own storefront or we've got our own you know we're selling into the government market we've got our own encryption on the internet or whatever you know there's some little special something you might add there and let's see what's free Oh massive amounts millions and millions of hours of effort into this IP free shared everyone contributes it's below the line we don't have to pay for mail tools we don't have to pay for kernels we don't have to pay for Java runtimes we don't have to pay for java just-in-time capataz it's all free we're building on top of that and creating IP on top of that hardware needs to move into that world we need to decide what needs to be free and what doesn't need to be free we need to at least ask that question and there are people doing this open source hardware is growing we've got the Arduino now it's open source and you know so the system-level sense but it's gained a lot of traction having that open free definition that people can build to and compete to sell you we even even have open source architectures like the o.r risk the aura risks unfortunately it's GPL and companies hate the GPL they're afraid of it because it has this copyleft provision where anything that it touches could potentially be under that same license so I think that license in particular is really hurt the adoptability of the o.r the open RISC architecture from Berkeley there's the risk 5 which is supported Christa supported by CFR Christa as know which is part of CFR and this is a open source free architecture the instruction set is open source and free as well as a variety of its implementations Linux port a GCC compiler port etc and I've got a lot of hope for this technology and this technology is really seriously being considered by a lot of upstarts in the hardware business like China and India they've done a lot to invest in this and so there is some hope that we may have some free and open-source infrastructure in the hardware world I think we definitely do need it for we need to widen the applicability of customized hardware and that seems like a conundrum right how do we make something custom that we can reuse over and over and over again well we've already seen this in the world today things like GP GPUs originally gpgpu is where GPUs targeted at graphics and then some very brilliant person said hey why don't we make it a GPGPU and target linear algebra ah now all of a sudden it makes sense using this over very wide domains and it's the trick there is moving from application to algorithm and that's the key to widen the applicability of customized architectures this is christa as Novick project at Berkeley called ESP and Samba with specialized processors where they take applications and then using compiler tools identify the computational patterns one of the algorithms that are inside of these codes and then map those algorithms to a set of computational resources that don't target a specific application but target application classes things like dense linear algebra sparse linear algebra graph analysis ILP extraction homogeneous parallel architecture whatever they have a they have at this point seven identify 17 application classes that cover a very broad set of applications with this there's a hope of putting an upper bound on the number of accelerators we need and so now design doesn't require creating these individual components but again allows us to focus on how many of the components do we need how do we compose them together how do they communicate how do we manage storage and how do we map software efficiently to those components and then we're back to cheaper hardware design but one of the great challenges with hardware design is you always end up going to the fam right and when you go to the fab they're going to want 20 million dollars to produce the masks for your 20 nanometer chip so how do you get out of that and I think the key there is we need to give silicon its personality not in the fab but in the assembly plant Martha Kim from Colombia her thesis was on this topic I worked with her on this she created what she called brick-and-mortar silicon brick-and-mortar silicon is a form of silicon design where the way you create your design is you purchase components in silicon form you place them on a two-dimensional surface so for example I've got a CPU a memory block I've got an FPGA block I've got a DSP a specialized graph processor etc these components are delivered not only as Hardware physical Hardware they're also delivered with software as well these are mature components and then I in the third dimension create a polymorphic interconnect which allows them to efficiently communicate with each other now what you end up with here is you end up with something that's about ten percent slower than an async but you didn't have to design all this stuff and you were able to purchase this and put it together well ever even talking to a fab you're just doing assembly you've got some robotic capability or you farm it out to someone that does and you've built yourself a custom piece of silicon that's pretty close to what you could have done with an ASIC at a miniscule fraction of the cost now how do we get over those huge 20 million dollar fab costs well those are paid for by whoever designed these individual components and they can amortize that 20 million not across just this design but across any design in the world that utilizes that component and you create a market for competition on components who has the best graph processor which one is most energy-efficient that's the one I want to integrate into my particular design and so in this world you're creating value in the form of system architecture using components that are really defined by on this previous page what the architecture saw is what are the composable components that allow us to build the maximum number of applications and then if there's something that you just don't know how to build you can throw your FB g a block on there or you can go full custom and pay your mass costs it's a really different approach to design that will significantly reduce the cost of now fortunately we're already starting to see technologies that enable this inner posers 3d integration the problem is those those those those technologies today are very boutique technologies they're expensive they're unreliable and so one of the areas we need to do investment is in is how do we make those technologies reliable how do we make them inexpensive and I fully I fully believe that architects can do that no problem so to conclude here's the problem right we've got this Moore's law scaling gap that exists because we're not innovating effectively to close it and to we've got this dark silicon problem lack of instruction level parallelism and transistors aren't getting any faster because Dennard scaling is left left us so you know silicon is not helping as much how do we close that gap we're going to close that gap with heterogeneous parallel designs customized parallel architectures the problem though is we can't afford that in today's world you can't build a customized architecture and then pay 110 million dollars to bring that to market you're never going to make your money back no one's ever going to give you that third round of funding to get to market we need to figure out how to get to market for a half million dollars right so well let's build better designs that are valuable more valuable let's reduce the cost of doing that design let's get some open source concepts happening in our community so we only have to pay for every single piece of silicon again and again for the next 50 years when you know we're designing that 370 you know there should be stuff that's just plain free for we need to widen the applicability of customization by targeting not applications but algorithms and once we do that we can lower the cost of manufacturing because now we can create modularity at the hardware level and we can compose a system architecture out of cheap physical components the nres that are killing us are going to be embedded on those components and they'll be amortized over all the world of systems that might want that particular component and in the end you got a good chance to restart nano diversity a good chance to create a lot more designs those more designs are going to solve the problems of the Moore's Law gap but also they're going to create more jobs more companies and more students you remember that CS since the sea and the e curves before enrollments are are steady and you know slightly dropping you know if it's cool to build hardware if you can make money and do exciting innovation and hardware students will come back in droves and so ultimately what we want to do is not scale our systems we want to scale innovation that's what we want to do so in the end we don't end up with just two two forms see CPU or GPU we're going to end up with you know lots of diversity that we can handle so any questions do you agree I agree okay thank you it was a cash is basically yeah yeah right i've seen people i've seen people say that within one of the solutions the dark silicon problem is to grow memory faster than you grow logic now tell you that's a world that's a real sleepy architecture world right if we're just building bigger caches I think but that is not going to contribute to the dark silicon problem right amies is gonna go it is dark yep yeah it does I gave a talk I gave a talk at DARPA a few weeks ago and there was an Intel exec there Ian young who's uh he's technology VP and afterwards he gave his talk and he basically said Intel is going to solve the dark silicon problem they're going to get there going to restart scalability of power as they shrink devices again through better materials t fats 35 t fats the couple other things they know are going to be very energy efficient and you know what it turns out that if you embrace the ideals in this talk your goal in life is to produce more efficient architectures and if you're producing more efficient architectures that if the dark silicon problem goes away that just means that you get more and more computation for your efficient little algorithms you can just go wider for everything you do so I don't think I think that if this if the dark silicon problem goes away that's only good for everybody and then we'll be well positioned to take advantage of it I don't really see it going away oh absolutely it so long yeah it's good labeling I don't think you're right it's a good label you're not going to be doing the Bluetooth when you're leaving Wi-Fi yeah like you are but there are scenarios when we buy functionality I do i do bluetooth undoing wife i was watching youtube videos in my car the other day know what my kid was and i had on the stereo yes actually getting you idea into a reality yeah you made that possible you pink academy masculine versus water in that industry oh I think it has to I think it'll almost entirely come from academia because oh I meet with I have a board of I have a science advisory board for see fires it up and it's all composed of CTOs and you know high level people within the companies and they I mean it's very hard for me to communicate this vision to them they love the idea of fewer competition when and when another I've got a slide in and show you this one but just the number of companies at each technology node producing that technology note it drops to like Moore's law they love the idea of being the last standing which is really bad because there's no innovation when there's no competition and so that that I mean having one fab in the world means that that at least from the technology side everything dies they might be able to regulate how much they can charge for that resource but there'll be no there'll be no need to innovate anymore you have captured the market so they do like the idea of so on the brick-and-mortar side they like the idea of more people wanting to manufacture silicon that's good for them because Intel knows that in in 10 nanometer they can't produce enough silicon to amortize the cost of the fab they have to have outside silicon going through that fat which they naturally will because you know iBM has gotten out of the market so now there's you know people will go looking for places to manufacture their silicon but on the other hand they don't like the idea of startups they don't like the idea of creating technology that makes it easier for small companies to survive which is just like it seems like it's like that to me that that that mindset seems very prehistoric I mean do you are you guys so convinced you can create every technology you ever need that you're not going to need startups as a result oh and then take the open source stuff I had a nice conversation with Chris Rowan the guy that did 10 silica talking to him about open source at synopsis and he thinks it's ridiculous we're going to make all our money selling IP and so I tried to explain them you know if you made a bunch of IP free stuff that you don't deserve to make money on you know because it's just it's too old and then concentrated on trying to make money on new stuff they don't get that at all they're trying to wait out outlive the other companies until they're the only one that can give IP to the two or three teams that are still designing and is crazy they're just they just don't get it that's why I mean I've been going to the DARPA I've been going to DARPA a lot trying to convince them you know like you guys need to support this because industry is never going to support this I wrote a big NSF grant I did the same thing in my proposal i talked about how that the the will to do this change will not come from industry it'll definitely come from academia and it's really our job to look past I would also recommend people go look at Stephen forest talk on the end of Moore's law I think that's a really great talk as well it's on YouTube just look for forests in Moore's law and he he paints an even more dour picture than I do he he thinks it'll it'll pretty much ruin the US economy with Moore's law stops one-third of our economy is based on Mars lot one-third of our economy is based on our ability to market electronic goods and so if we can't create value then we risk a third of our GNP which would be a big big impact to the US we'd have to start at least two more wars and really rape Alaska of all of its oils until the end of time I mean I mean imagine what they would do to try and replace that in our economy it would be really bad so yeah I've been trying to get some economists interested in this problem I think it's a really interesting problem to work on and to postulate about in the u.s. there look Samsung's good fats yep they basically ward boundaries I know that through another right it's interesting there's a so there's a Mike Taylor gives a great talk it's called the suck of sock and he talks about how the number of companies are doing design SOC design is dwindling very quickly right so pretty soon Samsung is going to determine what your SOC is there's only one counterexample one counter trend and that's nation-states nation states are the only things that can afford to build fabs anew so that's China and India right using taxing every in Indian will give you enough resources to build a fab taxing every Chinese citizen would give you enough resources to build a fat yeah and then you know the endgame is uh and that and once China and India get involved I mean there's two traditions of manufacturing that really focus on efficiency and cost efficiency and the rest of the world is is focusing on high margins it's interesting to compare the way the marketing and protection of IP works in the microprocessor industry to the high-end fashion purses industry right it's very similar right we we don't want people to counterfeit our chips are overproduce because there's such a massive margin on the silicon reselling that people will over produce and sell it a fraction of the cost and still make a ton of money high-end fashion purses same thing right so what do they do they put like really expensive zipper handles on them so you can identify them very quickly what are they doing on micro processors they put laser holograms on the outside of the chip so you can identify them very quickly in a market and I sort of a captured market where the margins are ridiculous when China and India come in it's going to devastate them because they're gonna have so much room yeah yeah but look at my vision here I love China and India driving those prices down right because I don't care about components anymore components is last century's architecture future is all about systems system architecture and mapping software effectively that systems architecture and creating value by composing components in a way that's efficient and cheap so that means design tools that means getting applications people to work with architects and creating new creating new opportunities to sell systems and that this idea is also it's difficult when I'm go to DARPA they go how does that work and because they don't remember a time when you could do that right there's a few examples from you know Samsung S Oh Sees and Intel you know parallel process you know there's not there's never been a time in our community where people just took all the components and put them together in different ways and then try to create new applications for those components it's never been ever it's never happened in our our lifetime just because it was too expensive to do design if you can embrace technologies like assembly time customization and if you could perfect composable customization algorithmic composable customization then you could see a world where you would have cheap components and you yourself could put those together in a way that would create a significant amount of value and you know a really small you know like a medical imaging application or in a you know augmented reality application and you wouldn't need millions millions and millions of dollars hundreds of millions of dollars to manufacture it you just need one rich relative and maybe an angel round of funding and you could get to a place where you could show somebody a piece of hardware that would be super impressive oh why is it not possible to do something that like 130 nanometer they've already recouped all their cost of the foundry set up so theoretically they could stay in a smaller margin and I could use a rich uncle slower technology because also there's a trade-off with any kind of custom as a cup composability you're going to sacrifice to perform its peak performance right yeah so I that's a really good that's a good thing to think about so you're basically saying what if we could go back two or three generations yeah now there's a trail like there should be a war between generation losing speed because it generate before with the apples of generational sacrifice or I sacrifice customization that's interesting so what we went back to say 65 yeah which is here at 50 million dollars deliver but your point is that we've refined those technologies we could probably drive that down quite a bit yeah cuz they're bulldozing love those back I would still argue don't do that right because you're still going to be tempted to design your own full custom thing what we really need to do is along with this idea that we need to get these costs in in tune I mean only some of those costs are mask how do we deal with these costs here well the software costs can be addressed quite a bit with open-source the hardware costs could be addressed with open source with a market where i could buy components rather than having to build those from whole cloth all by myself so this idea of modularity i would still want that to happen if you went back two or three technologies like this so that's something i think that you missed here is Instagram how long has nvidia been around sigh links been around how long did it take to create instagram the time factor of modularity of other people doing the work is really also benefit yeah it just was all now about money it's about money and time which been in the N equals money yeah true very true yeah anyway stuff to think about when you go back and work on your research think about this kind of stuff I know a lot of people are moving into the heterogeneous customized architecture space is very interesting one I think for every effort we put into designing something for application x we need to consider would our time be better spent thinking about how would I build infrastructure that could do what I just did there because the reality is is we've been doing designs for application x y and z for the last 50 years you know how do we how do we now move to a point where we don't have to do custom components we don't have to specifically target in a custom designed to an application but we can instead do a system architecture that can that can attack that space very efficiently anyway thank you very much for your attention

2021-05-18 21:36

Show Video

Other news