New Disruptive Microchip Technology and The Secret Plan of NVIDIA

Show Video

NVIDIA has introduced a new optical chip making a huge shift from electricity to light for moving data in data centers and this is a very important technology to understand because it will define the next decade in AI i've spent the past decade designing chips and now building my own startup but when I was starting out at chip design no one cared about chips and finally it's a bright day for photonics in this video I will break down this new optical breakthrough how it works why it matters what it has to do with the new state-of-the-art NVIDIA Rubin GPU and finally we will discuss why NVIDIA goes quantum let me shed some light on it let's start with a problem it turns out that new reasoning models disrupt all previous projections for GPU demand you know in the early days of AI large language models were trained to predict the next word in the sentence but things have changed we are now seeing a rise of new class of models reasoning models like OpenAI's o1 or DeepSeek R-1 and this don't just generate responses they perform multi-step thinking and it turned out that reasoning is expensive it requires at least 20 times more tokens per inference request which is a result of the model talking to itself during the reasoning these models hold more context and often simulate multiple solutions before answering and this requires about 100 times more compute compared to traditional one-shot LLMs and this is exactly what driving the surge in demand for compute in fact it's no longer enough to have fast GPUs you need a whole infrastructure that can support massive computation at scale and now we are coming to the most interesting part the bottleneck in AI is not compute anymore moving the data between chips is just think about it when you connect thousands of GPUs together in a large GPU cluster each GPU in fact heavily depends on the data which comes from its neighbors and it's constantly passing the results forward so even the smallest delay here adds up fast the problem here isn't just the physical distance between these GPUs it's the physics itself because till now the standard was copper and sending data through copper is like running a marathon through sand you know every electron faces resistance and wastes a lot of energy is heat just think about it in big GPU clusters where thousands of GPUs constantly swap data this adds up quickly especially when we are talking about petabytes of information flowing every second this is why modern AI data centers are not only about the single GPU performance but about the network performance and that's why for a long time we are betting big on photonics technology that replaces copper wires with light speed optical interconnects and this is the technology that will define the next decade in AI in today's data centers network switches connect to optical transceivers which are type of translators translating electrical signals into optical ones and sending them across the data center however inside the rack most of the connections are still electrical the brutal truth is using traditional copper wire is very slow and extremely power inefficient because copper wire resisting the flow of electricity slowing data down and generating a lot of heat in fact if we take a modern data center about 70% of total power consumption spent on moving the data so much more than on the actual compute NVIDIA and TSMC are of course aware of this problem and they've been working to solve it for a while and so now finally NVIDIA introduced this new optical chip Quantum-X is so-called co-packaged optics and the idea here in simple terms is to use light instead of electricity to shuffle data between GPUs now just to enlighten you why light is so attractive because when we use light we can transmit lots of data in parallel using different wavelength or if you like different colors of light simultaneously let me explain light operates at extremely high frequencies 400 up to 750 terahertz for visible light extending into the near infrared spectrum this is so-called bandwidth a range of frequencies a signal can occupy and here we are talking about terascale range bandwidth right which gives us a lot of channels much more channels to transmit the data compared to the electrical signals moreover there is no resistance as in the case of copper so it's just faster and it generates less heat and use less power per bit okay so first of all we're announcing NVIDIA's first co-packaged option silicon photonic system it is the world's first 1.6 terabit per second CPO it is based on a technology called Micro Ring Resonator Modulator it is completely built with this incredible process technology at TSMC that we've been working with for some time and we partnered with just a giant ecosystem of technology providers to invent what I'm about to show you this is really crazy technology now let's break down how this new technology works we start by encoding data into tiny beams of light photons these beams are generated by integrated lasers and then applied to the new optical chip inside the new optical chip there are tiny optical modulators implemented with technology called Micro Ring Modulator basically this is a tiny ring structure that when we apply an electric field to it changes its resonant frequency which in turn changes the intensity of the light passing through it and that's how we can encode the information into the light to simplify this concept imagine communicating by changing the rhythm of a blinking flashlight but billions times faster after encoding these photons travel through microscopic silicon pathways called Wave Guides carrying many information simultaneously at the other end tiny photo detectors grab the light and convert it back into electrical signals which is then read by the GPU now to the most interesting part let's have a look at the Quantum-X it's a photonic package that includes the controlling Quantum-X chip with a specialized ASIC Application Specific Integrated Circuit designed to manage signal processing which is supporting network protocols and basically do control and routing in addition there are 18 photonic engines and then we have a look inside the real breakthrough behind this new technology is in how this chips are actually built and packaged TSMC has developed a technology called COUPE Compact Universal Photonic Engine it combines photonic and electronic circuits using advanced 3D packaging by layering them one on top of the other now with this new technology TSMC managed to package it all in one package but the tricky part here is in fact manufacturing here TSMC has developed the entire foundry process where they integrate the photonic chip in a more mature process node with an electronic chip in this state-of-the-art node if we take a closer look at the top we've got the electronic chip in 6nm featuring 220 million transistors think of it as a control center and right on the bottom sits the photonic layer a 65nm silicon chip loaded with about 1,000 devices including Micro Ring Modulators Wave Guides and photo detectors anticipating your questions about 65nm it actually doesn't make sense to go to a few nanometers when we are dealing with optical components like Wave Guides because as we discussed in many previous episodes on photonics on this channel photonic elements are inherently constrained by wavelength of light they manipulate so it doesn't make sense to go to a few nanometers here so we have a photonic chip and then the electronic chip that sits on top of COWoS Chip On Wafer On Substrate 2.5D interposer and these two layers are tightly integrated just a few micrometers apart so the signals zip between them very fast with no loss actually I was very lucky to get an opportunity to discuss this huge innovation with amazing Gilad Shainer who is a Senior VP of Datacenter Networking at NVIDIA what does it mean what does it enable so first it reduces power there is a 3.5x reduction in power consumption so now we actually can bring more GPUs in the infrastructure and we can drive more compute and enable more outcome and more tokens to be used and all the greatness of what the GPU can bring and 3.5 is a big reduction in power consumption the second thing is now we don't need to use those transceivers on the scale out network so I'm saving millions of transceivers that I need to install so I can get my data center to be fully operated much faster and every months every day actually on a large data center cost a lot of money if I don't use the data center and months or two months it's even you know it's an amazing benefit and this co-packaged optic technology is very important because we have a limited power budget right so we can get more compute out of it let me know your thoughts in the comments section below and if you are among those 70% of people who are watching this video but not subscribed to the channel and in case you are enjoying it consider subscribing this makes me and my team very happy the first generation of TSMC's COUPE is set for mass production in the second half of 2026 and NVIDIA as well as AMD will be the first adapters in fact the new NVIDIA Rubin Ultra GPU which is very interesting and I will break it down in a moment is likely to be the first one to debut with TSMC COUPE technology and this one is entering mass production in the end of 2026 so the future looks bright literally now before we break down the new NVIDIA Rubin Ultra GPU and why NVIDIA goes quantum have you ever wondered how much of your personal information is floating around online your name home address phone number even information about your family members it all gets out there thanks to data brokers that collect and sell your private information without you ever knowing this exposes you to the risks of data breaches and personal security we've all seen headlines where databases with millions of user records were leaked or sold online and sadly this is happening more and more frequently that's where Incogni the sponsor of today's episode comes in Incogni helps you take control by removing your personal data from the databases that brokers rely on i tried it myself and I was surprised how simple it was first you sign up authorize Incogni to act on your behalf and they send data protection low compliance requests to these companies forcing them to remove your information from their databases and the best part you can track every step of the progress in real time right from your dashboard as someone who values privacy a lot I highly recommend you to try out incogn it's a simple way to put an end to unwanted spam emails robocalls and just keep your data off the grid check it out with my link below and use code INTECH thank you Incogni for sponsoring this episode now Quantum-X and Spectrum-X chips are just the beginning and soon co-packaged optics will become the new normal and I think in the next five years this innovation we discussed today will enable huge scale out scaling to multi-million GPUs AI factories and the next potential leap can be achieved with new materials for example replacing silicon in modulators with lithium niobate or indium phosphide it's impossible to remember this stuff and as for where it's all headed of course we eventually want to bring optics within the GPUs themselves for interchiplet communication because this will enable the next big leap in performance and I'm personally a little bit obsessed with this topic so when we had a small group Q&A with Jensen I could not let it go I cannot wait to see this happening in fact many companies like Broadcom and startups like Lightmatter and Ayar Labs are working on bringing this technology to life what's even more interesting Lightmatter went one step further and started working on photonic interconnect for 3D packaging this is a topic for one of the next episodes subscribe to the channel to enjoy it in the future now let's break down the next state-of-the-art AI GPU the new Rubin GPU is named after Vera Rubin the astronomer who found the evidence the key evidence for the existence of dark matter and dark matter as you may know believed to constitute over 80% of the universe mass and space is one more topic which I'm obsessed with the new NVIDIA Rubin GPU is a double die design similar to the Blackwell GPU it will be manufactured by TSMC at N3P so 3nm process node and will feature two compute dies linked by IO chiplets and here we expect a huge leap in performance because Rubin GPU expected to deliver 50 PFLOPs of FP4 compute FP4 is a 4-bit floating point precision format which is becoming so popular and adopted widely adopted in AI machine learning workloads because it allows to reduce memory and power requirements and 50 PFLOPs is impressive it's more than triple the performance of the latest Blackwell B300 GPU or five times the performance of the latest AMD MI accelerator which is roughly 10 PFLOPs now where does this boost in performance is coming from as always part of it comes from the process node upgrade as here they moving from N4 to N3 and this gives a decent improvement in the logic scaling and minimal improvement in the memory scaling and the rest of course comes from the architectural upgrades but here NVIDIA hasn't shared enough details yet i was very privileged to discuss what to expect with amazing Shar Narisimhan who is a Director of Datacenter GPUs at NVIDIA so a lot of those improvements we are making architectural advancements we're not yet at the point to go into all of those details some of that benefit comes from having a much larger NVLink domain so in the newest Rubin designs you can see it goes up to 576 GPUs all fully interconnected in a single NVLink domain that allows us to have all of these GPUs talking to each other at very high speeds very low latencies so there's innovation on the NVLink side there's innovation in the silicon design as well and you also saw the NVIDIA Dynamo announcement so we'll continue to make innovation at the libraries level and we bring all of this together to deliver the type of performance improvements that you're seeing when it comes to new GPUs Rubin Ultra GPU was definitely the most interesting announcement if we take a look at the Rubin Ultra chip it's an even larger design here they are moving from two GPUs per package to four GPUs per package and from 144 GPUs per NVLink to 576 GPUs per NVLink so they're clearly scaling up before scaling out the Rubin Ultra features 4 reticle size GPUs linked by 2 IO chiplets and co- packaged with 16 high bandwidth memories using COWoS Chip On Wafer on Substrate packaging technology from TSMC and with that they get to well 100 PFLOPs of FP4 compute well knowing how much NVIDIA struggled with Blackwell GPU at the interposer will be very interesting to see how NVIDIA and TSMC going to address all the thermal and power challenges that are coming with this Rubin Ultra GPU so there are many challenges one of the biggest challenges that we solved and you first saw this with the Blackwell architecture is having a high bandwidth interface that connects both of these adjacent die together that interface allows us to exchange data at 10 terabytes per second so it's a very fast movement of data across the die and you really want to be in a situation where that dual die design is actually performing the same as a single die so part of that is knowing intelligently which compute core is going to do the processing for the next step in the calculations of the neural network so we have intelligence baked in in our libraries and our algorithms that allow us to bring data close to the adjacent compute cores it's also what we call cache coherent so you have memory sitting in the right data sitting in the right memory location at the right time just before it's actually going to be used in a subsequent calculation so these present enormous challenges being able to have algorithms that allow you to predict what's the next calculation that's going to take place and going and prefetching that data and putting it in the right spot so that you're wasting very little energy in moving that into the appropriate compute core for that next calculation and lastly we also thank our partners at TSMC who have helped with the manufacturing there are obviously significant challenges when it comes to fabricating and packaging such a large die there are many known issues when it comes to growing dye and they have done an excellent job solving all of those so we're very appreciative of their efforts as well so get ready for huge demands for power in fact we've already gotten used to seeing a massive surge in power densities from generation to generation just think about it one Rubin Ultra rack will consume 600kW of power and to cool it down NVIDIA engineered a special Kyber Rack architecture we are already beyond what air cooling can handle and now liquid cooling is becoming the new standard it's just far more efficient and it's directly built into the rack itself here NVIDIA is using advanced cold plates which pull heat directly from the chip and transfer it to the circulating water there are many innovative techniques when it comes to actually conducting away the heat off of the die itself we've now gone to a direct cooling liquid cooling architecture where we have a plate directly on top of the die itself we've also made a lot of other innovations when it comes to liquid cooling that allow us to bring the entire rack into a very tight design like you've already seen in the Blackwell single rack architecture we're now putting 72 GPUs right next to each other in a single rack so it's innovation in terms of how we pump in the cooling mechanism itself bring it being able to bring that directly on top of the die and removing heat away and being very efficient about how we actually use compute itself in a lot of cases we're making more efficient algorithms and processing for example our transformer engine allows us to take calculations that the industry in general would historically have done in FP8 or FP16 and we downcast that all the way down to FP4 that on its own just saves a lot of memory and compute space and so not only are we innovating at the cooling level where we're introducing liquid cooling and making these incredibly dense racks we're also innovating in terms of the algorithm so that you don't have to use the silicon in the same brute force way as you might have had to do before and this is just one of many upgrades as they're rethinking every layer how we build and scale data centers of the future what's interesting NVIDIA roadmap for the next few years is way more than just a list of GPUs it's a layered plan for building entire AI systems at the industrial scale for now there are still shipping Blackwell GPUs but by the end of 2026 we will enter Rubin phase and here the GPUs are also getting the next generation of high band memory finally the Feynman generation projected for 2028 will bring next generation GPU design and introduce the eighth generation of NVLink switch hinting at possible architectural shift and here it seems like every layer gets an upgrade every year so customers have a reason to upgrade every year instead of typical 5 year GPU lifespan AI is becoming a general purpose technology and when we talk about the estimate that datacenter CapEx will surpass $1 trillion by 2028 we might see that like investing in healthcare manufacturing finance and energy at the same time with the ripple effects touching every aspect of the economy and society and here companies like TSMC NVIDIA Broadcom Marvel Google and OpenAI and many startups will capture a massive share of this outlay what's even more interesting is that NVIDIA has also turned an eye towards quantum computing especially for tasks like simulating molecules or optimizing complex supply chains where quantum approaches may offer a significant edge as a first step NVIDIA is opening Quantum Research Center in Boston and this seems like a long-term strategy to build a common quantum ecosystem and as a first step they will focus on quantum error correction and working on CUDA libraries for quantum algorithms so why are they doing this quantum computing technology is still in the making definitely a couple of breakthroughs away but their idea is to build an ecosystem in advance so when the quantum is ready it can be seamlessly integrated into existing NVIDIA infrastructure without disruption just like they had it ready right at the beginning of the AI boom in my opinion quantum computers will not replace classical computers but rather complement them for particular task and this will require its tight integration with a GPU based supercomputers and this is exactly what NVIDIA is getting ready for now I want to wrap up this video with some of my key takeaways from the GTC conference and behind the scenes i was lucky to attend GTC in person in San Jose and honestly was beyond my expectations it's one of the most interesting AI events I've ever attended and it's very different from what we used to at technical conferences like ISCCC the first thing I want to mention I find it really beautiful that technology is becoming so popular the queue to the keynote was like seven miles long i love the fact that technology is no longer on the background as it's not only actors and singers but scientists and engineers and tech executives like Jensen Huang who are becoming a new rock stars and inspiring millions in fact the GPU business today is not just about building chips anymore it's about building AI infrastructure and the key metric of success is performance per watt how many tokens you can generate per second per watt for your users let me know your thoughts in the comments and let's connect over on LinkedIn I write there two to three times a week and remember to check out the sponsor to support the channel thank you for watching till the end love you guys very tired of talking see you in the next video ciao

2025-04-06 22:00

Show Video

Other news

Don't Miss Duo's Big Unveil: See What Attackers Will Hate & Users Will Love 2025-06-01 04:54

The Fight for AI Market Dominance | CNBC Marathon 2025-05-28 09:37

Bring your own model to Windows using Windows ML | BRK225 2025-05-26 17:57