New Chinese GPUs and the Truth about DeepSeek. NVIDIA is out?

New Chinese GPUs and the Truth about DeepSeek. NVIDIA is out?

Show Video

the release of the Chinese DeepSeek R1 model  caused a really big splash on the stock market   and this week the discussions around it continued  meanwhile new models and new Chinese GPUs made   headlines so in this video I want to focus on  the impact of all of this on the GPU market   and why it's in fact a huge opportunity which may  not happen again I will also break down some new   very interesting Chinese GPUs that are on the way  right now so if you want to know where the market   is shifting watch till end in December 2024 the  Chinese AI company DeepSeek released their V3   based model which was extremely efficient but no  one paid any attention to it at the end of January   they released the Reasoning Model R1 which they  claimed achieved comparable performance to Open's   AI 01 and this release just exploded and the first  reason was due to its Hardware training costs what   many people concluded is that the best NVIDIA GPUs  may not be needed to make big strides in AI and I   think it's very important to discuss what's going  on here first of all High-Flyer is a hedge fund   that also founded DeepSeek and High-Flyer used  to be one of the biggest NVIDIA customers in   the Chinese market purchasing tens of thousands of  A100 and H100 GPUs so no way they are a threat to   NVIDIA but still this $6 million figure training  cost made all the headlines because it's very low   compared to Open's AI estimated 100 million for  a similar model and then we all witnessed how the   media as always messed up the whole story if we  now look at the bigger picture DeepSeek reportedly   has access to roughly 50,000 GPUs among them are  older A100 GPUs but mostly different adaptations   of H100 Hopper GPU for the Chinese market among  them are H800 and H20 versions what's interesting   if we look at the specs H800 GPU almost matches  the peak performance of H100 GPU you in the most   performance metrics biggest difference is in  the memory and NVLink bandwidth in practice   this means slower data movement between memory  and the processing cores as well as in between   GPUs as discussed before by now H800 is also  not allowed and only H20 is allowed and this   is quite funny because H20 is in fact preferable  because it has more memory and in 2024 NVIDIA sold   roughly 1 million H20 GPUs to China and the next  NVIDIA GPU to come to the Chinese market was B20   which is a derivative of B200 Blackwell GPU but  the exact specs are yet unknown when I saw NVIDIA   stock dropping I was like shopping time because  long term I think this DeepSeek drama will only   increase the evaluation of NVIDIA straighten  expert controls and it will all come to the   fact that Chinese companies will have to move to  the domestic options which are getting better and   better and now we are at the most interesting  part let's discuss which options do they have   and what is yet to come in fact DeepSeek  are one reduced requirements on the compute   side open the door to many domestic hardware  and yes before the restrictions took effect   NVIDIA share on the Chinese market was roughly  90% but over the last few years Chinese companies   have been working on getting a share of this  pie including companies like Huawei Alibaba   Moore Threads Biren Tencent Enflame Hygon and  many more among them the most interesting story   is Huawei their Ascend 910b GPU is the most  powerful GPU which is designed and manufactured   in China and it's in a very high demand now  if you look at the official specs its peak   performance at 8bit precision is 512 TeraFLOPs so  theoretically it has higher FLOPs than NVIDIA H20   GPU and now Huawei is ramping up its R1 model  on Huawei Cloud which is partially built out of   Ascend 910b GPUs Huawei is challenging Nvidia  with a new chip for Artificial Intelligence   according to the Wall Street Journal Huawei has  reportedly told potential clients that the chip   is comparable to NVIDIA's H100 at the same time  the new Ascend 910c GPU is in development they've   already manufactured the first samples and plan to  ramp up mass production already this year if you   previously watched this video you know that SMIC  or SMIC Chinese semiconductor manufacturing giant   is currently struggling with a yield in N+3 process which is roughly at 20% now and this   number is far off from what is typically  required to bring a product such as this   GPU to mass production if you want to know more  details on this subscribe to the channel now and   watch this video right after this one now talking  of 910c GPU is manufactured in N+3 process node   by SMIC which is equivalent to 6nm process by  TSMC or N6 and it's rumored to be a doubled   die design means doubling the same silicon of 910b  GPU and this is very interesting for many reasons   first of all because it's following the general  industry trend of building larger GPUs because   larger chips can handle more data simultaneously  and it resembles the idea behind the latest NVIDIA   Blackwell GPU where we have two large GPU dies  which basically contain the core logic and they   are linked by a very fast interconnect bridge and  through this bridge one die communicates with the  other and every die is surrounded by four memories  and to package something as complex as this they   are using an advanced Chip on Wafer on Substrate  L (CoWoS-L Packaging) technology available from   TSMC now manufacturing of this doubled die design  and this complex packaging is very challenging   because you have to align many many pins and you  may have heard about NVIDIA delaying the release   and the shipment of their Blackwell GPUs due to  the manufacturing and thermal challenges now here   the secret sauce is this special packaging and  huge TSMC experience and they eventually able   to nail it down while this kind of packaging is  not available at SMIC now this kind of advanced   Packaging Technology is not supported by SMIC in  fact they are not supporting any of the advanced   Packaging Technologies including CoWoS Packaging  it will be interesting to see how SMIC going to   handle this or it's done on a single piece of  silicon so doubled die design on a single piece   of silicon then no doubt they're going to struggle  with yield as they already struggling with a yield   even for smaller designs let me know what you  think in the comments in fact Huawei GPUs as well   as many other Chinese Hardware domestic companies  we will discuss in a moment are all relying on   SMIC fab which first of all has a pretty limited  capacity often prioritized for Huawei products   and also struggles with yields manufacturing yield  is a percentage of the chips which is successfully   produced without defects and are usable in the  final product according to the last available   reports from the end of last here SMIC yield  in N+2 process node was roughly 30% and this   is a really bad number because this means 70% of  the produced chips are defective and have to be   scrapped away while the Ascend 910c GPU will be  done in N+3 process node which means potentially   even lower yield another big challenge for China  is memory to be self-sufficient they need to   fabricate high bandwidth memory domestically  and they have no high bandwidth manufacturing   at the moment but ChangXin Memory Technologies and  Huawei are trying to solve this probably the most   interesting part of Huawei story is that they're  not just designing their own silicon and building   their own EDA Tools (Electronic Design Automation  Tools) which support engineers in designing those   chips they are now buying manufacturing equipment  securing wafer manufacturing memory manufacturing   and basically trying to cover the entire  supply chain this will help them to achieve   self-sufficiency reduce reliance on SMIC their  yields their capacity also reduce dependency   on foreign suppliers but we all know that this is  challenging to achieve because still many critical   technologies and critical tools are relying on  the foreign suppliers let me know your thought   on this in the comments next we will discuss the  rest of Chinese domestic GPU market and what's   coming and also tricks which DeepSeek is used and  why Mark Zuckerberg started it all before this as   you may know I'm building my own startup now so  I'm traveling a lot meeting investors customers   and when I travel I use public Wi-Fi that  lacks security controls making it easy for   anyone to access them and potentially steal your  private data including sensitive information like   login credentials banking details and personal  messages as we saw recently someone can just   hijack your session and access your accounts  without needing any credentials and this is   scary that's where Surfshark VPN has been really  helpful for me it encrypts all the information   sent between your devices and the internet making  it significantly harder for bad actors to mass   with your personal data the best part Surfshark  comes with Antivirus and Surfshark Alert which   notifies you immediately if your data has been  compromised I recommend you to try out Surfshark   VPN it's an easy and affordable way to strengthen  your online security go to surfshark.com/intech   for 4 extra months of Surfshark thank you  Surfshark for sponsoring this episode as   you will see now there is no shortage of  NVIDIA competition in China including many   government-backed startups like Hygon Moore  Threads Intellifusion a very interesting player   among them is Moore Threads I made quite some  effort inviting them to the channel not successful   yet but stay tuned Moore Threads is a Chinese  startup that has been developing gaming and data   center GPUs their latest GPU S4000 is designed  for AI acceleration in data centers its peak   performance is 200 TeraFLOPs at 8bit precision  and 100 TeraFLOPs at 16bit precision so it's not   super impressive when we compare it to NVIDIA  GPUs or Huawei GPUs but it might be just enough   for a model with reduced compute requirements  by now they've already built multiple computing   clusters with tens of thousands of their GPUs and  use it for training for example of a 70 billion   parameters Aquila2 model also it supports training  and fine-tuning of all the mainstream models like   Llama3 and Qwen from Alibaba group and also it  supports already the distilled version of the   DeepSeek-R1 model now how did DeepSeek manage  to build a model which requires significantly   less computing resources for both inference and  training in fact here they implemented several   interesting tricks the main trick is reasoning  and their clever implementation of the mixture   of experts architecture which allowed to reduce  GPU computer requirement by 1/3 the idea that the   model is divided into sub-networks so-called  experts and each of them is trained for the   particular task on the particular data set for  example one expert focuses on syntax while another   specializes on the semantic meaning just like our  brain might work in our brain the frontal lobe is   responsible for planning and decision- making  while the temporal lobe processes auditorial   information and then we have the fusiform face  area which is great at recognizing faces in a   mixture of expert architecture this is equivalent  to an expert trained for facial recognition tasks   and then this mixture of experts is connected to  so-called Gating Network which takes an input and   decides which is the most relevant expert to be  activated for this particular task and this is   in fact how they manage to significantly reduce  the computational requirements and this is the   big difference to the Llama3 model which is not  implementing this mixture of experts architecture   and it's a 405 billion parameters model means  for each token prediction it activates 405   billion parameters in comparison DeepSeek-V3 has  roughly 671 billion parameters but for each token   prediction they managed to activate roughly  40 billion parameters so now just imagine for   each token prediction for each pass forward  they activate 10 times fewer parameters and   this is where this huge saving in compute is  coming from this is very clever but it's not   entirely new other AI Labs been implementing it  as well DeepSeek was just the first to combine   all the tricks and to implement the training  of this model based on this architecture that   efficiently another trick was training the model  at 8bit precision from the very beginning when you   use fewer decimals and calculation this helps you  to reduce training time right computing resources   and memory usage again not entirely new many  other labs been doing it as well but all these   innovations coming together allowed them to reduce  GPU resources with that they managed to train a   model which is comparable to Open's AI 01 and  on many benchmarks similar to Gemini Flash 2.0   which was released just a week before but no one  put any attention because here are geopolitical aspects play big role the second thing which  made DeepSeek so attractive is the open source   part they released the open weights which is sort  of the output of the training data and it's open   source and usable you can download it and modify  it yourself and the paper is very detailed I will   link it below and this immediately puts pressure  on OpenAI Claude Google and other AI Labs what I   find interesting here Mark Zuckerberg love him  or hate him he in fact disrupted the industry   you remember back in 2023 the Llama1 was leaked  and we all know this sort of leaks right starting   from Llama2 he officially open sourced it and he  kept doing it ever since what Meta did with Llama   was indeed disruptive and shifted the industry  and since then DeepSeek was just a matter of time   and Meta's strategy is make LLMs a compliment to  the Meta's product so Mark basically decided to   make it a commodity and this is a very smart move  so Meta fully focused on their own core product   keeping users on Instagram and Facebook as long as  as possible and their products are benefiting from   LLMs while for OpenAI Claude Anthropic LLMs are at  the core of their main product and business it's a   business strategy whereby you make complements  of your core business a commodity it appears   counterintuitive but essentially reducing the  price for a complement typically increases demand   for your core product NVIDIA did exactly the same  with with CUDA and all the models software around   their core product around GPUs and this driving  up the value of their core product this whole   story is actually about Google and OpenAI as they  are clearly in red ocean competing on LLM- -based   products Google released its Deep Research feature  and AI feature to conduct comprehensive research   on complex topics and weeks later OpenAI released  Deep Research and called it the same thing so they   directly competing with each other the general  trend is that LLMs are getting better and better   cheaper and cheaper reducing the gap between  the free and the paid product and DeepSeek was   inevitable and considering the 6 Tigers Chinese  Six Tigers there is more to come so where are we   heading with all of this it seems like LLMs are  are actually becoming a commodity let me know   what you think in the comments if we go back to  NVIDIA NVIDIA has a pressure but not from DeepSeek   but it's coming from their CUDA mode because it's  not clear how long it's going to last if you're   not familiar with CUDA CUDA is an entire ecosystem  that allows AI researchers to program GPU clusters   less as a distributed system and more like one  giant GPU CUDA is NVIDIA's mode it's something   which is a complement to their Hardware which  driving the value of Hardware higher and we   don't know how long this mode will last unless  they reinvent themselves like they're now trying   to do with COSMOS in fact another trick that  DeepSeek team did is instead of using high level NVIDIA framework for GPU configuration  they used lower level so assembly like   language PTX (Parallel Thread Execution) to  reconfigure those GPUs and with that they   managed to improve the data compression and  decompression and they implemented a bunch of   other tricks with configuration for inter GPU  communication and this allowed them to further   improve the overall efficiency of the training  and just remember DeepSeek was highly motivated   to squeeze every bit of performance from those  GPUs they have access to because long term the   scarce of resources making this maximum GPU  utilization a necessity in any case longterm   those premium Hardware margins will have to  go down and it will be getting cheaper and   cheaper and if you believe in this trend of  throwing more and more compute longterm the   one who can innovate and get access to lots of  cheap energy will win when we look at China the   cost of energy is lower than in the US about 8  cents vs 13 cents but looking at their energy   split it's not looking good still like 50% of the  energy coming from burning coal and oil and it's   very polluting now but looking at this plan  long-term strategy is to switch to renewables I think long-term this is not about access to  the semiconductor manufacturing EV tools or   talent it's about access to cheap energy and  we will need tons of it that's why companies   like Meta are building natural gas plants and the  next obvious step is nuclear power plants but here   we have to keep in mind how long it takes to build  one like a decade I'm sure this was just the first   release that got so much attention but there is  more to come there are many interesting players on   Chinese AI market at the moment the race is  dominated by Alibaba and ByteDance and then   there are Six Tigers which are considered to be  leading AI Labs in China and as competition hits   up we can expect more breakthroughs from these  players as well as strong responses from the   US and EU based AI labs and let's hope for the  best outcome for the whole world now I'm looking   look forward to reading your comments and if you  watched that far consider sharing this video with   your friends colleagues and on social media and  subscribe for more content like this to stay up to   dat with what's next in technology it's free but  makes me very happy and a little update I'm hiring   a researcher into my team and the description  is in the description box below have a look and   if you feel like you are a good fit feel free to  apply thank you see you in the next episode ciao

2025-02-24 10:22

Show Video

Other news

How to Use ATR Glass Cockpit Autopilot & Flight Director! | Full Guide 2025-04-04 01:36
Mixed Signals and Emerging Technology with Amy Zegart 2025-03-29 18:39
New Technologies for Early Detection of Alzheimer’s 2025-03-28 12:14