New Disruptive Microchip Technology and The Secret Plan of NVIDIA

New Disruptive Microchip Technology and The Secret Plan of NVIDIA

Show Video

NVIDIA has introduced a new optical chip making  a huge shift from electricity to light for moving   data in data centers and this is a very important  technology to understand because it will define   the next decade in AI i've spent the past decade  designing chips and now building my own startup   but when I was starting out at chip design no  one cared about chips and finally it's a bright   day for photonics in this video I will break  down this new optical breakthrough how it works   why it matters what it has to do with the new  state-of-the-art NVIDIA Rubin GPU and finally   we will discuss why NVIDIA goes quantum let me  shed some light on it let's start with a problem   it turns out that new reasoning models disrupt  all previous projections for GPU demand you know   in the early days of AI large language models were  trained to predict the next word in the sentence   but things have changed we are now seeing a  rise of new class of models reasoning models   like OpenAI's o1 or DeepSeek R-1 and this don't  just generate responses they perform multi-step   thinking and it turned out that reasoning is  expensive it requires at least 20 times more   tokens per inference request which is a result of  the model talking to itself during the reasoning   these models hold more context and often simulate  multiple solutions before answering and this   requires about 100 times more compute compared to  traditional one-shot LLMs and this is exactly what   driving the surge in demand for compute in fact  it's no longer enough to have fast GPUs you need   a whole infrastructure that can support massive  computation at scale and now we are coming to   the most interesting part the bottleneck in AI is  not compute anymore moving the data between chips   is just think about it when you connect thousands  of GPUs together in a large GPU cluster each GPU   in fact heavily depends on the data which comes  from its neighbors and it's constantly passing   the results forward so even the smallest delay  here adds up fast the problem here isn't just   the physical distance between these GPUs it's the  physics itself because till now the standard was   copper and sending data through copper is like  running a marathon through sand you know every   electron faces resistance and wastes a lot of  energy is heat just think about it in big GPU   clusters where thousands of GPUs constantly swap  data this adds up quickly especially when we are   talking about petabytes of information flowing  every second this is why modern AI data centers   are not only about the single GPU performance but  about the network performance and that's why for   a long time we are betting big on photonics  technology that replaces copper wires with   light speed optical interconnects and this is  the technology that will define the next decade   in AI in today's data centers network switches  connect to optical transceivers which are type   of translators translating electrical signals  into optical ones and sending them across the   data center however inside the rack most of the  connections are still electrical the brutal truth   is using traditional copper wire is very slow  and extremely power inefficient because copper   wire resisting the flow of electricity slowing  data down and generating a lot of heat in fact   if we take a modern data center about 70% of total  power consumption spent on moving the data so much   more than on the actual compute NVIDIA and TSMC  are of course aware of this problem and they've   been working to solve it for a while and so now  finally NVIDIA introduced this new optical chip   Quantum-X is so-called co-packaged optics and the  idea here in simple terms is to use light instead   of electricity to shuffle data between GPUs now  just to enlighten you why light is so attractive   because when we use light we can transmit lots of  data in parallel using different wavelength or if   you like different colors of light simultaneously  let me explain light operates at extremely high   frequencies 400 up to 750 terahertz for visible  light extending into the near infrared spectrum   this is so-called bandwidth a range of frequencies  a signal can occupy and here we are talking about   terascale range bandwidth right which gives us a  lot of channels much more channels to transmit the   data compared to the electrical signals moreover  there is no resistance as in the case of copper so   it's just faster and it generates less heat  and use less power per bit okay so first of   all we're announcing NVIDIA's first co-packaged  option silicon photonic system it is the world's   first 1.6 terabit per second CPO it is based on a  technology called Micro Ring Resonator Modulator   it is completely built with this incredible  process technology at TSMC that we've been   working with for some time and we partnered with  just a giant ecosystem of technology providers to   invent what I'm about to show you this is really  crazy technology now let's break down how this   new technology works we start by encoding data  into tiny beams of light photons these beams are   generated by integrated lasers and then applied to  the new optical chip inside the new optical chip   there are tiny optical modulators implemented with  technology called Micro Ring Modulator basically   this is a tiny ring structure that when we apply  an electric field to it changes its resonant   frequency which in turn changes the intensity of  the light passing through it and that's how we can   encode the information into the light to simplify  this concept imagine communicating by changing the   rhythm of a blinking flashlight but billions times  faster after encoding these photons travel through   microscopic silicon pathways called Wave Guides  carrying many information simultaneously at the   other end tiny photo detectors grab the light and  convert it back into electrical signals which is   then read by the GPU now to the most interesting  part let's have a look at the Quantum-X it's a   photonic package that includes the controlling  Quantum-X chip with a specialized ASIC Application   Specific Integrated Circuit designed to manage  signal processing which is supporting network   protocols and basically do control and routing  in addition there are 18 photonic engines and   then we have a look inside the real breakthrough  behind this new technology is in how this chips   are actually built and packaged TSMC has  developed a technology called COUPE Compact   Universal Photonic Engine it combines photonic and  electronic circuits using advanced 3D packaging by   layering them one on top of the other now with  this new technology TSMC managed to package it   all in one package but the tricky part here is  in fact manufacturing here TSMC has developed   the entire foundry process where they integrate  the photonic chip in a more mature process node   with an electronic chip in this state-of-the-art  node if we take a closer look at the top we've   got the electronic chip in 6nm featuring 220  million transistors think of it as a control   center and right on the bottom sits the photonic  layer a 65nm silicon chip loaded with about 1,000   devices including Micro Ring Modulators Wave  Guides and photo detectors anticipating your   questions about 65nm it actually doesn't make  sense to go to a few nanometers when we are   dealing with optical components like Wave Guides  because as we discussed in many previous episodes   on photonics on this channel photonic elements are  inherently constrained by wavelength of light they   manipulate so it doesn't make sense to go to a  few nanometers here so we have a photonic chip   and then the electronic chip that sits on top of  COWoS Chip On Wafer On Substrate 2.5D interposer   and these two layers are tightly integrated  just a few micrometers apart so the signals   zip between them very fast with no loss actually  I was very lucky to get an opportunity to discuss   this huge innovation with amazing Gilad Shainer  who is a Senior VP of Datacenter Networking at   NVIDIA what does it mean what does it enable so  first it reduces power there is a 3.5x reduction   in power consumption so now we actually can bring  more GPUs in the infrastructure and we can drive   more compute and enable more outcome and more  tokens to be used and all the greatness of what   the GPU can bring and 3.5 is a big reduction in  power consumption the second thing is now we don't   need to use those transceivers on the scale out  network so I'm saving millions of transceivers   that I need to install so I can get my data center  to be fully operated much faster and every months   every day actually on a large data center cost a  lot of money if I don't use the data center and   months or two months it's even you know it's  an amazing benefit and this co-packaged optic   technology is very important because we have a  limited power budget right so we can get more   compute out of it let me know your thoughts in  the comments section below and if you are among   those 70% of people who are watching this video  but not subscribed to the channel and in case you   are enjoying it consider subscribing this makes  me and my team very happy the first generation   of TSMC's COUPE is set for mass production in  the second half of 2026 and NVIDIA as well as   AMD will be the first adapters in fact the  new NVIDIA Rubin Ultra GPU which is very   interesting and I will break it down in a moment  is likely to be the first one to debut with TSMC   COUPE technology and this one is entering mass  production in the end of 2026 so the future looks   bright literally now before we break down the new  NVIDIA Rubin Ultra GPU and why NVIDIA goes quantum   have you ever wondered how much of your personal  information is floating around online your name   home address phone number even information about  your family members it all gets out there thanks   to data brokers that collect and sell your  private information without you ever knowing   this exposes you to the risks of data breaches  and personal security we've all seen headlines   where databases with millions of user records were  leaked or sold online and sadly this is happening   more and more frequently that's where Incogni the  sponsor of today's episode comes in Incogni helps   you take control by removing your personal  data from the databases that brokers rely on   i tried it myself and I was surprised how simple  it was first you sign up authorize Incogni to act   on your behalf and they send data protection  low compliance requests to these companies   forcing them to remove your information from their  databases and the best part you can track every   step of the progress in real time right from  your dashboard as someone who values privacy   a lot I highly recommend you to try out incogn  it's a simple way to put an end to unwanted spam   emails robocalls and just keep your data off the  grid check it out with my link below and use code INTECH thank you Incogni for sponsoring this  episode now Quantum-X and Spectrum-X chips   are just the beginning and soon co-packaged optics  will become the new normal and I think in the next   five years this innovation we discussed today will  enable huge scale out scaling to multi-million   GPUs AI factories and the next potential leap  can be achieved with new materials for example   replacing silicon in modulators with lithium  niobate or indium phosphide it's impossible to   remember this stuff and as for where it's all  headed of course we eventually want to bring   optics within the GPUs themselves for interchiplet  communication because this will enable the next   big leap in performance and I'm personally a  little bit obsessed with this topic so when we   had a small group Q&A with Jensen I could not  let it go I cannot wait to see this happening   in fact many companies like Broadcom and  startups like Lightmatter and Ayar Labs   are working on bringing this technology to life  what's even more interesting Lightmatter went one   step further and started working on photonic  interconnect for 3D packaging this is a topic   for one of the next episodes subscribe  to the channel to enjoy it in the future   now let's break down the next state-of-the-art AI  GPU the new Rubin GPU is named after Vera Rubin   the astronomer who found the evidence the key  evidence for the existence of dark matter and   dark matter as you may know believed to constitute  over 80% of the universe mass and space is one   more topic which I'm obsessed with the new NVIDIA  Rubin GPU is a double die design similar to the   Blackwell GPU it will be manufactured by TSMC  at N3P so 3nm process node and will feature two   compute dies linked by IO chiplets and here we  expect a huge leap in performance because Rubin   GPU expected to deliver 50 PFLOPs of FP4 compute  FP4 is a 4-bit floating point precision format   which is becoming so popular and adopted widely  adopted in AI machine learning workloads because   it allows to reduce memory and power requirements  and 50 PFLOPs is impressive it's more than triple   the performance of the latest Blackwell B300 GPU  or five times the performance of the latest AMD MI   accelerator which is roughly 10 PFLOPs now where  does this boost in performance is coming from   as always part of it comes from the process node  upgrade as here they moving from N4 to N3 and this   gives a decent improvement in the logic scaling  and minimal improvement in the memory scaling and   the rest of course comes from the architectural  upgrades but here NVIDIA hasn't shared enough   details yet i was very privileged to discuss what  to expect with amazing Shar Narisimhan who is a   Director of Datacenter GPUs at NVIDIA so a lot  of those improvements we are making architectural   advancements we're not yet at the point to go into  all of those details some of that benefit comes   from having a much larger NVLink domain so in the  newest Rubin designs you can see it goes up to 576   GPUs all fully interconnected in a single NVLink  domain that allows us to have all of these GPUs   talking to each other at very high speeds very low  latencies so there's innovation on the NVLink side   there's innovation in the silicon design as well  and you also saw the NVIDIA Dynamo announcement so   we'll continue to make innovation at the libraries  level and we bring all of this together to deliver   the type of performance improvements that you're  seeing when it comes to new GPUs Rubin Ultra GPU   was definitely the most interesting announcement  if we take a look at the Rubin Ultra chip it's   an even larger design here they are moving from  two GPUs per package to four GPUs per package and   from 144 GPUs per NVLink to 576 GPUs per NVLink  so they're clearly scaling up before scaling out   the Rubin Ultra features 4 reticle size GPUs  linked by 2 IO chiplets and co- packaged with   16 high bandwidth memories using COWoS Chip On  Wafer on Substrate packaging technology from   TSMC and with that they get to well 100 PFLOPs of  FP4 compute well knowing how much NVIDIA struggled   with Blackwell GPU at the interposer will be very  interesting to see how NVIDIA and TSMC going to   address all the thermal and power challenges  that are coming with this Rubin Ultra GPU so   there are many challenges one of the biggest  challenges that we solved and you first saw   this with the Blackwell architecture is having  a high bandwidth interface that connects both of   these adjacent die together that interface allows  us to exchange data at 10 terabytes per second so   it's a very fast movement of data across the die  and you really want to be in a situation where   that dual die design is actually performing the  same as a single die so part of that is knowing   intelligently which compute core is going to  do the processing for the next step in the   calculations of the neural network so we have  intelligence baked in in our libraries and our   algorithms that allow us to bring data close to  the adjacent compute cores it's also what we call   cache coherent so you have memory sitting in the  right data sitting in the right memory location   at the right time just before it's actually  going to be used in a subsequent calculation   so these present enormous challenges being able to  have algorithms that allow you to predict what's   the next calculation that's going to take place  and going and prefetching that data and putting   it in the right spot so that you're wasting very  little energy in moving that into the appropriate   compute core for that next calculation and  lastly we also thank our partners at TSMC   who have helped with the manufacturing there  are obviously significant challenges when it   comes to fabricating and packaging such a large  die there are many known issues when it comes to   growing dye and they have done an excellent job  solving all of those so we're very appreciative   of their efforts as well so get ready for huge  demands for power in fact we've already gotten   used to seeing a massive surge in power densities  from generation to generation just think about it   one Rubin Ultra rack will consume 600kW of  power and to cool it down NVIDIA engineered   a special Kyber Rack architecture we are already  beyond what air cooling can handle and now liquid   cooling is becoming the new standard it's just  far more efficient and it's directly built into   the rack itself here NVIDIA is using advanced  cold plates which pull heat directly from the   chip and transfer it to the circulating water  there are many innovative techniques when it   comes to actually conducting away the heat off  of the die itself we've now gone to a direct   cooling liquid cooling architecture where we  have a plate directly on top of the die itself   we've also made a lot of other innovations  when it comes to liquid cooling that allow   us to bring the entire rack into a very tight  design like you've already seen in the Blackwell   single rack architecture we're now putting 72 GPUs  right next to each other in a single rack so it's   innovation in terms of how we pump in the cooling  mechanism itself bring it being able to bring that   directly on top of the die and removing heat away  and being very efficient about how we actually   use compute itself in a lot of cases we're  making more efficient algorithms and processing   for example our transformer engine allows us to  take calculations that the industry in general   would historically have done in FP8 or FP16 and  we downcast that all the way down to FP4 that on   its own just saves a lot of memory and compute  space and so not only are we innovating at the   cooling level where we're introducing liquid  cooling and making these incredibly dense racks   we're also innovating in terms of the algorithm  so that you don't have to use the silicon in the   same brute force way as you might have had  to do before and this is just one of many   upgrades as they're rethinking every layer how  we build and scale data centers of the future   what's interesting NVIDIA roadmap for the next  few years is way more than just a list of GPUs   it's a layered plan for building entire AI systems  at the industrial scale for now there are still   shipping Blackwell GPUs but by the end of 2026  we will enter Rubin phase and here the GPUs are   also getting the next generation of high band  memory finally the Feynman generation projected   for 2028 will bring next generation GPU design  and introduce the eighth generation of NVLink   switch hinting at possible architectural shift  and here it seems like every layer gets an upgrade   every year so customers have a reason to upgrade  every year instead of typical 5 year GPU lifespan   AI is becoming a general purpose technology and  when we talk about the estimate that datacenter CapEx will surpass $1 trillion by 2028 we  might see that like investing in healthcare   manufacturing finance and energy at the same time  with the ripple effects touching every aspect of   the economy and society and here companies like  TSMC NVIDIA Broadcom Marvel Google and OpenAI and   many startups will capture a massive share of this  outlay what's even more interesting is that NVIDIA   has also turned an eye towards quantum computing  especially for tasks like simulating molecules or   optimizing complex supply chains where quantum  approaches may offer a significant edge as a   first step NVIDIA is opening Quantum Research  Center in Boston and this seems like a long-term   strategy to build a common quantum ecosystem  and as a first step they will focus on quantum   error correction and working on CUDA libraries  for quantum algorithms so why are they doing this   quantum computing technology is still in the  making definitely a couple of breakthroughs   away but their idea is to build an ecosystem  in advance so when the quantum is ready it can   be seamlessly integrated into existing NVIDIA  infrastructure without disruption just like   they had it ready right at the beginning of the  AI boom in my opinion quantum computers will not   replace classical computers but rather complement  them for particular task and this will require its   tight integration with a GPU based supercomputers  and this is exactly what NVIDIA is getting ready   for now I want to wrap up this video with some  of my key takeaways from the GTC conference and   behind the scenes i was lucky to attend GTC in  person in San Jose and honestly was beyond my   expectations it's one of the most interesting AI  events I've ever attended and it's very different   from what we used to at technical conferences like  ISCCC the first thing I want to mention I find it   really beautiful that technology is becoming  so popular the queue to the keynote was like   seven miles long i love the fact that technology  is no longer on the background as it's not only   actors and singers but scientists and engineers  and tech executives like Jensen Huang who are   becoming a new rock stars and inspiring millions  in fact the GPU business today is not just about   building chips anymore it's about building AI  infrastructure and the key metric of success   is performance per watt how many tokens you can  generate per second per watt for your users let   me know your thoughts in the comments and let's  connect over on LinkedIn I write there two to   three times a week and remember to check out  the sponsor to support the channel thank you   for watching till the end love you guys very  tired of talking see you in the next video ciao

2025-04-06 22:00

Show Video

Other news

Приказ Роскомнадзора №51, летающие такси в Китае и Gemini 2.5 Pro | 2Weekly #23 2025-04-12 18:26
Building a New LGR Gaming PC! Ryzen 9800X3D 2025-04-07 02:58
Strange Tech from the Quantum Realm! 2025-04-04 17:45