3 New Groundbreaking Chips Explained Outperforming Moore s Law

Show video

We need bigger GPUs last week was an incredibly  huge one in tech and as a chip designer I'm beyond   excited so today we will have a look at the three  hottest headlines of the last week the new Nvidia   Blackwell GPU why is NVIDIA going for larger chips  and making some serious tradeoffs? then we will   discuss the new 4 trillion transistor chip from  Cerebras and the new kind of analog chip we were   all waiting for. NVIDIA is now at the top of the  world we've never seen such a profitability from   a hardware company right that's one of the reasons  why my Investment Portfolio looks so great and now   they've revealed their new Blackwell GPU. 208  billion transistors and so so you could see see   you I can see that there's a small line between  two dies this is the first time two dies have   been like this together in such a way that the two  chip the two dies think it's one chip this new GPU   providing four times the training performance and  up to 30 times the inference performance compared   to the previous generation the Hopper GPU first of  all let's discuss how did they manage to achieve   this fourfold performance. as a first step to  double the performance Nvidia has to double   the area huh right but that was an expensive  decision because the price per chip is actually   the price per area which depends of course on the  technology note and the volume in fact Nvidia had   to keep using the n4p process by TSMC n4p process  is a refined version of the N4 with a 6% yes just   6% transistor density boost and 22% more energy  efficiency over the N4 unfortunately Nvidia had   to stay at this process node because TSMC is  currently struggling with this 3 nm process to   be specific they are struggling to achieve  the satisfactory yields and this of course   impacts not only Nvidia but also the road maps  of AMD Intel and other chip makers in a bid to   maintain its competitive advantage Nvidia had to  introduce use double die design which is packaged   using TSMC's Chip on Wafer on Substrate CoWoS-L  packaging technology this packaging technology   is used to integrate multiple dyes side by  side to achieve better interconnect density   and with that you can achieve high speed and  high bandwidth communication between the chips   compared to conventional packaging methods that's  how they achieved nearly one single silicon now   if we can see consider the dual die design  and the packaging the cost of fabrication of   this GPU more than doubles more than doubles in  comparison to the previous HOA GPU so they will   be definitely not getting to their legendary  85% margins as they used to and they had to go   for this tradeoff for this painful tradeoff to  maintain their competitive advantage because as   we will see the competition is hitting up. all  the hyperscalers are now developing their own  

custom silicon like Amazon Google Meta everyone  is designing their own AI chips and also as you   know AMD and Intel also want to get the piece  of this pie startups like Cerebras and Groq   also have some solid alternatives so yes Nvidia  is definitely a leader in AI hardware and making   great efforts to stay so but the competition will  not let them to rest for a moment. we've seen that   doubling the silicon doubles the performance but  where the second double fold is coming from? it   definitely doesn't come from the new process  node but rather from the new numbering format   it's coming from lowering the precision of the  whole calculations you know we can encode the same   number in let's say 8bit 20 bit or in 4bit just  what will change is the precision but for the most   calculations within the neural network it's not  really essential to compute let's say 20 digits   of which number the network can accomplish the  same task at the same accuracy at a lower level   of precision and that's precisely the trick here  if we lower the precision of calculation let's   say instead of 8 bit numbers we will be using  4-bit numbers we can immediately save the half of   the memory because smaller numbers requires less  energy to compute requires less memory bandwidth   and the logic which is required to do this math  takes up less silicon in the previous Hopper GPU   they've used floating point numbers up to 8bit  precision but with Blackwell GPU they've taken it   one step further in the new Blackwell architecture  the matrix multiplication units doing math with   numbers just 4 bits wide this is another area from  where the improvement in performance comes from.   honestly 4 bits is quite low and that makes  me curious to see how well it's going to work   for inference application for example let me know  what you think in the comments to summarize it the   improvement in performance coming from connecting  two GPUs together supporting very low Precision   FP4 format a massive amount of high bandwidth  memory and improved interconnect bandwidth as   simple as that this GPU and the super computer  DGX Superpod built out of it will be available for   sale later on this year in one of the interviews  Jensen hang said that they're going to price it   somewhere in between 30 to $40,000 and I have many  doubts about this first of all since the h100 was   selling for about 40,000 last year so Blackwell  is likely to be priced higher than that for now   I'm really looking forward to see the real world  benchmarks we are discussing different AI chips   today but just to give you a feeling of how  high is demand for AI infrastructure is there   is a recent quote from TSMC's founder Morris Chang  regarding the demand for AI chips he says we are   not talking about tens or hundreds of thousands of  wafers but instead building three five or 10 fabs   but we need bigger GPUs now let's discuss the new  starting 4 trillion transistor chip from Cerebras   this one is pretty unique and they are crashing  the Moors low you know that since the advent of   microchips in 1972 the semiconductor industry  has followed Moore's law it states that the   number of transistors on a chip is able to double  roughly every 2 years as you can see from this   plot Cerebras seems to be outperforming this law  which many had believed was no longer applicable   their previous chip was fabricated at 7 nm by  tsmc and the new one the wafer scale engine 3   is at 5 nanometers the number of transistors on  the chip is more than doubled since the previous   generation thanks to the technology node upgrade  but as we know a huge success of this chip is a   success by tsmc which is able to fabricate such a  giant gigantic chip at 5 nm with a high yield one   of the reasons why Cerebras was successful over  the last years is that they were doing things   differently than others while a silicon wafer can  typically accommodate many chips and that's what   typically AMD Nvidia and Intel are doing they  are cutting such a 300 mm wafer or 12in wafer   into let's say 65 gpus while Cerebras takes this  wafer and makes a single giant chip out of it to   give you a feeling of the scale of this this is  the new Cerebras chip next to the Nvidia h100 GPU   it's 56 times larger than Nvidia h100. Amidst  the ongoing AI boom there are many promising   tech startups you may like to invest in such  as cerebras but the problem is that investing   in private equity generally is not easy however  Linqto removes these barriers making the access   to private markets simple and open to everyone  through Linqto platform you can invest in some   of the most promising AI Tech startups I've  discussed on my channel such as Lightmatter   the photonics AI startup I disc discussed in the  previous episode in addition you can invest in   SambaNova Spark cognition and others you can  check out the full list of startups on their   website if you're interested in investing in  the future of artificial intelligence consider   starting your private equity portfolio today using  the link below by using the code ANASTASI500 you   will receive a discount $500 off on your first  investment the code is valid for 30 days only   thank you Linqto for sponsoring this video  the rate at which we're advancing Computing   is insane and it's still not fast enough so we  built another chip Hopper is fantastic but we   need bigger GPUs going for larger silicon is  such a great idea and it totally makes sense   for today's AI workloads and Cerebras was doing  it before it became mainstream it's beneficial   because many GPUs have to be used for a single AI  task and interconnecting them and distributing the   load is a complex and expensive task to do but  by having one giant chip you can significantly   reduce the costs and complexity needed this new  Cerebras chip features nearly 1 million AI cores   900,000 AI cores and 44 GB of memory and when  it comes to memory in this case it is on chip   memory that is intertwined between the computing  cores and this has exactly the same goal that we   discussed in many of my previous videos to keep  the memory and the computing cores as close as   possible together to reduce the bottleneck and  that's another architectural difference compared   to Nvidia and AMD gpus which have off chip memory  this new AI chip is designed to train the Next   Generation of giant large language models with  up to 24 trillion parameters in size just think   about it it's 10 times larger than open's AI  GPT4 and Google's Gemini the next step is to   connect 2048 of such chips together to build an  AI supercomputer and this one will be capable of   reaching one quarter of a zettaflop (10^21)  performance as one of my colleagues like to   say oh dear such machine for example could train  a 70 billion parameters llama model from scratch   in one day it's pretty clear that the trend is  headed towards larger silicon but the thing is   with the larger silicon that whenever I talk  about cerebras for example you always ask me   about the yield about the defects and you're  totally right the bigger the Silicon gets the   greater is the yield challenge in especially for  the small process nodes like sub 10 nm because   then the transistor features become so uh fragile  and so tiny that a single particle a single Dust   Landing on a chip or a single defect in a chip can  kill not just a transistor but a large part of the   circuit can you imagine that and obviously you  cannot get 100% yield and this would mean that   cerebras would have to scrap every single wafer  this would have been such a disaster anyway   Cerebras manages to sell every single chip that  they make and whenever defects occur they have   a workaround a defective AI Core can be bypassed  in the software and then replaced with one of the   Redundant or so-called spare course this way  you always get a configuration of 900,000 AI   cores with no wafers wasted and of course Nvidia  is facing the same challenge which is a headache   for tsmc and that's the reason why they didn't  get to uh three nm process because the yield is   at I don't know 80% so it's quite poor eventually  they were able to find a tradeoff let me know what   you think in the comments and if you're enjoying  this video consider subscribing to the channel   and sharing this video with your friends and on  social media this helps the channel a lot thank   you it's clear that AI is in desperate need for  a Hardware Revolution and everyone is looking for   a type of architecture that can mimic our human  brain because our human brain is still the most   efficient engine for non-artificial intelligence  we've known for decades that analog can be much   more energy efficient and area efficient than  conventional digital chips if so then why analog   chips haven't become mainstream yet well because  there are a plethora of problems we've discussed   them in my previous videos we will also talk about  it today but the new in charge chip addressing the   most of them and also taking analog Computing to  the whole new level first of all many Computing   tasks and especially generative AI requires  tons of memory tons of memory to deal with data   and parameters of neural networks these computing  tasks are dominated by just a few basic operations   that draw on memory the cost of accessing the  memory can be orders of magnitude higher than   the energy expanded on the computing operation  itself now what if we could make this memory   intense tasks more efficient and by that make the  overall thing orders of magnitude more efficient   one of the emerging approaches addressing this  memory bottleneck is near memory or in-memory   computing and that's usually implemented in analog  fashion analog means that instead of operating   with digital signals like zeros and ones and  conventional transistors analog chips are working   with continuous signals and a continuous signal  can be anything between zero and one and then   we use analog circuits which are consisting of  for example resistors and capacitors and the new   EnCharge chip is taking this concept to the new  level actually the main the key operation that is   at the heart of AI programs is so-called matrix  multiply accumulate operation you may remember   talking about it in many of my previous videos so  you probably already know it what happens is that   a chip loads input values into the memory and then  multiplies these values by so-called weights many   such multiplications are performed in parallel  and then the result the output is added so added   up this is known as accumulate operation and there  were already many and many attempts in the past to   implement this operation in analog way for example  the Mythic chip which I previously discussed it   performs multiply accumulate operations in  an analog circuit using resistors and then   sums up the currents at the output however along  with this various problems associated with noise   mismatch accuracy cropped up Mythic has really  struggled really struggled to find solutions to   these issues over the last years and eventually  they pivoted to a different application well and   charge approach is different their computing  is carried out using charge-domain computation   with metal capacitors and I think it's a great  idea let me explain instead of performing the   entire Matrix multiply accumulate operation in  analog they're performing multiply operation in   digital with transistors and then the accumulated  operation is implemented in a very interesting way   in analog using capacitors and the trick here is  that instead of adding up currents at the output   they are adding up the charge in a capacitor so  they're basically accumulating the charge in the   capacitor which is a great thing to do because  it's quite easy and precise and moreover they're   using the capacitors which are coming anyway for  free you know billions of transistors on a chip   they are interconnected with the metal wires  which can be seen like a multi-level highway   with up to 10 or 20 layers deep and in this chip  they are utilizing the capacitors which are made   of the parts of this metal interconnects that sit  on top of the transistors and the best part about   this that these metal capacitors are really easy  to deal with they don't have any uh temperature   dependency so company mismatch and the size is  very well controlled with the CMOS technology   so it's a good element in general you know and  the best part about this that they're performing   analog computing using digital technology which  is very Advanced which which is easy to deal   with with all the EDA tools that we have now  and they've already made a first prototype of   this chip which is reportedly showing a striking  Improvement in energy efficiency it's capable of   150 trillion operations per second per Watt which  is at least 20 times more energy efficient than   previous analog chips like Mythic for example on  top of that they've also built a software stack   for it that manages this whole access to the  memory and their first commercial product is   already coming later on this year looking forward  to it as a first step they are targeting inference   applications which means taking an already  pre-trained model and running it locally on   the chip And here the main goal is to make it more  energy efficient and that's exactly what Analog   Computing is good for and it's such a low power  you can put it to the age devices for example to   your phone but afterwards according to the end  charge this approach can also be scaled to the   AI training I really love this approach when  I read it I was like that's good because the   trick is that in CMOS technology a capacitor  is the most reliable thing you can really get   and in general this approach takes the Best  of Both Worlds analog and digital and as it's   based on digital it can also scale quite well  you know it's been a dark time or you call it   also winter for analog technology I think but now  it's getting warmer and the spring is coming and   as always I'm looking forward to reading what  you think about this technology in the comments   I love this decade the decade of technological  acceleration and I love making the videos about   it you know for you guys and to build the  community around this channel thank you for   being a part of it and if you want to support the  channel me creating these videos you can check out   the patreon the link is in the description below  and also check out the sponsor and if you want   let's connect on LinkedIn honestly I never used  it but I changed my mind so if you want you can   scan this code and uh let's connect thank you so  much and I will see you in the next episode ciao

2024-03-29

Show video