NVIDIA DGX Spark: Your Personal AI Supercomputer | NVIDIA GTC 2025 Session

NVIDIA DGX Spark: Your Personal AI Supercomputer | NVIDIA GTC 2025 Session

Show Video

It is my pleasure to introduce Alan Bourgoyne.  He is the Director of Product Marketing here at   NVIDIA, and he's going to be talking to us about  Project DIGITS. A warm round of applause for Alan.  Thank you, Seppy. That 20 IUs is on the way;  I'll give it to you after the show. So,   I know it's Thursday afternoon, it's late. How  many people are ready for a drink? I am. I could   go for a couple of drinks, so hopefully, we'll  have an entertaining show today. Let's take a  

look. DGX Spark from Project DIGITS to Personal  AI Supercomputer at CES. Jensen did a keynote   back in January and he announced Project DIGITS,  the Grace Blackwell AI supercomputer for your   desk. It has a thousand AI teraflops, 128 gigs  of memory, and runs our AI stack. It's very tiny;   you can see the picture there—it fits on your  desktop. Project DIGITS was an homage to a   system we released back in 2015, the DIGITS box.  It was our first AI training box, a workstation   with four GPUs in it, a big software stack on  it, and it was the first box really dedicated   to training. A year after that, we released our  first DGX system. So, DIGITS was kind of the   precursor to DGX, and the name for DIGITS was  that homage. How many people saw the keynote?  

How many of you saw it at the SAP center? All  right, if you didn't see that at the SAP center,   you missed something. There was a glitch in the  stream, and I'm going to show you what you missed.  30 million software engineers in the future—there  are 30 million of them around the world, and 100%   of them are going to be AI-assisted. I'm certain  of that. 100% of NVIDIA software engineers will be   AI-assisted by the end of this year. So, AI agents  will be everywhere. How they run, what enterprises   run, and how we run it will be fundamentally  different. We need a new line of computers,   and this is what started it all. This is the  NVIDIA DGX1, with 20 CPU cores, 128 gigabytes   of GPU memory, one petaflop of computation,  and it costs $150,000, consuming 3,500 watts.  

Let me now introduce you to the new DGX. This  is NVIDIA's new DGX, and we call it DGX Spark.   DGX Spark. You'll be surprised: 20 CPU cores.  We partnered with MediaTek to build this for   us. They did a fantastic job. It's been a great  joy working with Rickai and the MediaTek team.  

I really appreciate their partnership. They  built us a chip-to-chip MVLink CPU to GPU,   and now the GPU has 128 gigabytes. This is fun,  one petaflop. So, this is like the original DGX1   with pin particles. You would have thought  that that's a joke that would land at GTC.  Okay, well, here's 30 million. There are 30  million software engineers in the world, and it's,   you know, 10 to 20 million data scientists. This  is now clearly the gear of choice. Thank you,   Janine. Look at this. In every bag, this is  what you should find. This is the development  

platform of every software engineer in  the world. If you have a family member,   spouse, or someone you care about who's a software  engineer, AI researcher, or just a data scientist,   and you would like to give them the perfect  Christmas present, tell me this isn't what they   want. So, ladies and gentlemen, today we will let  you reserve the first DGX Sparks for the attendees   of GTC. Go reserve yours. You already have one of  these, so now you just got to get one of these.  You can imagine how excited we are. All the  people have been spending months and months   working on this project, and of course, when our  moment comes up to shine, the stream glitches,   and you don't see that part. So, what can you say?  But Jensen, when he started that talk, he said he   doesn't have a script, no script, and he's not  lying. The one part he wasn't lying about is you  

can reserve DIGITS. You can actually go online and  reserve those. We're going to send you a notice   next week to remind you, but you can go online  and reserve yourself a system today. It is a new   class of computers, right? It's designed from the  ground up to run software. You saw the comparison;   those numbers weren't by accident. What you saw  from that DGX1 to what we have today—the memory,   the flops—right, it's designed for AI from  the ground up. It runs our full AI accelerated  

software stack. Everything runs there. It's going  to be available from OEM partners: ASUS, Dell, HP,   Lenovo. You can actually go to the show and look  at the ASUS, Dell, and HP boxes. You can see their   prototypes. It's on sale on the NVIDIA marketplace  right now, and I just happen to have one of the   prototype systems right here for us to take a  look at. Maybe somebody will get a selfie after,  

but this—I feel like the Lion King. There should  be real dramatic music playing right now. Do we   have that? Not even. Okay, I guess only the  keynote gets that kind of value, but thank   you. There you go. There's one right there. So, why do we build these things? You saw a  

lot of this in the keynote, right? AI is evolving  rapidly. AI agents are kind of the big thing now.   We want AI to help us do things, right? We want AI  to collaborate with other AIs to do things for us.   For example, a healthcare agent should be able  to help the doctor diagnose things and pick out   medicines. It's more than one model doing these  things. I would love it if, for travel, most of us   are in travel this week, you have to extend a day.  What do you have to do? Got to call the hotel,   the rental car place, and the airline. It'd be  great if agents just did that for you, right?   And helped you make those decisions and make those  things happen. AI agents are getting very popular,  

and businesses are using them because they're  useful to help people do their jobs more   efficiently and more effectively. Instead of  having just point AIs, the rise of reasoning   AI is significant. Jensen talked a lot about  this—all of those new models where the AI not   only goes through the model once but goes through  multiple times to try to find the best answer.   That's the DeepSeek model, that's the reasoning  models. And a lot of times, you can see them   think. If you use Perplexity or some of the new  models, you can actually see it thinking, and it's  

generating a lot more tokens than before. So, when  Jensen talked about the scaling laws, that axis on   the bottom is very important. The more intelligent  I make the model, the more it can reason, the more   compute I need. Those scaling laws are out there,  and so when we're making these kinds of systems,   we need more powerful desktop systems. If you  look at our local systems today, how many of   you do your AI workloads on your desk? You can do  it on your laptop, I give you that, or on your PC,   but sometimes things don't work there. Maybe  the model's too big, the workload's too big. I'm  

trying to make an agent, and I want to run three  or four models at the same time. A little bit too   much for my laptop, so I have to go out to the  cloud. Or maybe I want to use some software that   doesn't run well, or maybe it's not actually  important to my device. So, a lot of times,  

you have to go off-device, and that's why we  created this new class of AI computers. We   want to have systems that are powerful enough to  help you offload work and literally create your   own personal cloud right at your own personal data  center on your desk. By having these systems like   DGX Spark and DGX Station, our full accelerated  software stack—all of your favorite tools out   there like PyTorch and Jupyter Notebook—everything  just runs right. So, your environment just comes   over, and now you've got your own personal cloud. If we look at the software stack now, before  

anybody cuts me to shreds, this is my marketing  version of the software stack. I know all you   engineers—it's not 100%, but I try to fit it on  one slide so we can look at it a little closer   together. Down at the hardware, that's our GB10  super chip. It's got the CPU, the GPU, encode   and decode engines, our optical flow accelerator,  and our CX ConnectX chip for networking. We've got   our DGX OS. This system runs the same OS that our  DGX is running in the data center, the same exact   software. All of that's there, all the containers,  everything we do to accelerate that kernel. And   of course, the CUDA libraries—that's where a  lot of the magic happens, right? That's how we   get to the hardware, provide you the libraries  and toolkits you need to accelerate all of the   higher-level frameworks and all of your favorite  tools that are out in the market today. And,  

of course, the great tools we provide, like  NVIDIA AI Blueprints, a workbench, all of   those are all supported thanks to the beauty of  our software stack that just runs everywhere.  This was kind of the design goal: take that  software stack and leverage it. Look at that—I   do have production values; I made the arrows move.  You can write it once and move it wherever you   want to go. You can work on your Spark, you can  move to DGX, you can move to the cloud—basically,   any accelerated infrastructure you've got.  You write it once, and it's ARM-based,  

so all the ARM code is there. It runs, we develop,  and we work through all the tools for you. So,   if you're working on a DGX, we all know how  hard it is to get access to that system and   get access to that data center cloud. It's great  that you can prototype, work out all the kinks,   and test. That way, when you finally get your  slice of time in the data center, you're a lot   more confident. You don't have to experiment quite  so much by using that Spark system on your desk,   and it truly is your personal cloud. You can  take all these workloads, run them on your Spark,  

and have your systems connect to it. It's going  to be really easy to install. Our out-of-the-box   plan is to let you install it basically like you  would put a thermostat in your house. You're going   to hit its network, give it the information—what's  the network I'm on? It has a wireless network and   a wired network. You're going to give that  information, it's going to restart itself,   and now you've got a network-connected compute  device. Really easy, simple. It's a full computer;  

it's got ports so you can plug a mouse and  keyboard into it. It's got Bluetooth, so you   can wirelessly connect your mouse and keyboard.  It's got an HDMI port on the back, so you can   plug into it and use it as a standalone device  if you'd like to. But the beauty of it is,   either way, you've got your own personal cloud  that you can send all of your bigger jobs to.  I mentioned the ConnectX chip in there, and  that's a really important feature of this box.   It lets you connect two of these systems  together to basically form a little cluster. So,  

you've effectively doubled the memory and doubled  your compute performance. Have it on the network,   have your desktop system, and now you've got an  even more powerful system. Work with models up   to over 400 billion parameters with the  combined memory from those two systems.   I think I showed you the back of that thing;  you can see those ports. They're the big giant  

ports. If you go into the system in the back,  you'll see where those ConnectX cables connect   to. They're kind of big cables that plug into  the back, and that's what gives you much higher   performance than using the network. So, we'll  allow you to form your own little mini cluster   on your desktop if that's what you want to do. So, as far as workloads, It really is for AI   developers, model prototyping, and development.  You can do your development work there. Maybe  

you want to develop AI-augmented applications,  create AI chatbots, or create all the agents we   talked about earlier. Maybe use some of our  blueprints to create those. Fine-tuning—you   could fine-tune a model up to 70B on a single  one of these Sparks. We're going to show a demo   a little bit later. We actually did a little of  that up front to do some fine-tuning, and we'll   show an example of inference. It's great for  inference. You want to test out a model, you've   done a little work, and you want to see if it does  what you want it to do. Maybe you want your own  

co-pilot. Maybe you want to train it with your own  codebase, put it on your desk, and now you've got   your own personal code co-pilot. The data is safe  and secure; it never has to leave the building.   I know companies would get very upset if you  borrowed some cloud time and sent all your source   code up there to train a model. The data stays  there; it's local and doesn't have to go anywhere.  You saw Jensen mention data science. Our full  stack runs there, including our accelerated   RAPIDS and cuDF libraries. Everything is  there. We also run all of our other libraries,   so you can run Isaac to train robots, all  our computer vision models, the VLM models,   and more. We're trying to install as much  of this on the system as we can. We want  

to make the download palatable, so we can't put  everything there. There's always that trade-off:   how long do you want to wait for an update to  run versus how much we put on there. But the good   news is, everything's there, and it's all free.  You can join a developer program, sign up for NGC,   download everything, and try it out on your own. A unique thing about Spark is that if you look at   all of our Grace Blackwell solutions today in the  data center, they have Blackwell GPUs and Grace   CPUs. But those Blackwell GPUs are really  maximized for compute. People don't care about  

things like RT cores for ray tracing. A couple of  other things we include in there for graphics are   actually there. You've got a full Blackwell  GPU in a Spark, so it's got the RT cores,   tensor cores, and CUDA cores. If you like, you  can visualize on it as well. A lot of times, data  

scientists will want to visualize on it. It's good  for compute. Maybe you want to do some simulation   there. A lot of the simulations now done in  molecular biology and earth sciences use AI to   create those simulations and run them. This box  can be used to help develop those applications,  

and you can test it out right there. You  can visualize it; it has the code there.  We're trying to bring up a few applications  in the lab. We brought up a protein-folding   model that uses a couple of AI models,  and we were actually able to look at it   and use power bricks to look at the model, the  protein-folding model. People are going to use   it for that. There's a lot of interest in using  it for visualization as part of the workflow. 

A lot of people ask how it compares and how it  stacks up to PCs and workstations. There are   a lot of words on this chart, but I'll show you  one with arrows in a minute, which is probably a   little easier to grok. You can see across the top  line that Spark got about a thousand teraflops.   Nothing like a box of water in the afternoon;  it's really good. We've got about 3,300 teraflops   on our highest-end PC GPUs. Workstations are up  to about 4,000 teraflops. We just introduced our   RTX Pro 6000 high-end GPU with 96 gigs of  memory. It's a very powerful GPU. Of course,   data center GPUs can go up to 141 gigs, but  they've got NVLink and can scale out to just look   like ridiculously large single GPU instances. If you look at the memory footprint,  

out of 128 gigs, you've got maybe 100 gigs plus  or minus to be able to work with on Spark after   you take out some room for the OS. You've  got 32 gigs on the PC, 96 gigs on the Pro,   and data center GPUs can go up to 141 gigs. But  they've got NVLink and can scale out to just look   like ridiculously large single GPU instances. If you look at model sizes, you can run  

about a 200 billion parameter model on Spark.  Only 64 billion on a PC on a single GPU. You   get a little higher, 198 billion on a workstation,  but the cost of that GPU is probably way more than   the Spark system costs. If you've already got a  laptop, whether you want to invest in that or not,   and then multi-GPU—well, I'm not really sure  anybody can get four of the 600-watt Pro cards   or the 500-plus watt cards, so those are probably  more like two. By stacking two of those together,  

you can get a lot more performance. When people ask me which one should I buy,   if it fits in GPU memory, the discrete cards are  probably going to be faster. You've got more raw   computing performance there. But if your challenge  is that it just doesn't fit on my system, it's too  

big, too many models, or the software stack's not  there, then the Spark is going to be the way to   go. So, kind of a good rule of thumb. If you just  want to look at this a little more visually, you   can see how this fits. On a PC, about a 51 billion  parameter model, this is all FP4. I threw out   about 20% memory overhead for overhead when you're  running the model, which is probably part of the   course. Use a better optimizer, and maybe you  can squeeze a little bit more out of that. About   153 billion on a workstation with one of the new  GPUs, 200 billion for Spark, and 405 billion if   you stack two of them. If you can manage  to get four of those GPUs in a workstation,   you can probably get a little bigger than that.  Of course, DGX Station is still in the works,   but it's going to be very large. DGX Station is  going to be that gap between desktop and data  

center, so it's really set up for very large,  very demanding workflows. More than likely,   it might even be a shared compute resource with a  couple of engineers since it's so powerful. But it   gives you an idea of how all these lay out. With that, I want to take a look at a demo.   We wanted to bring a system here, but it's  just not really ready for that yet. We're very  

early in the development process. We've got some  development boards in-house, and networking's not   great in the building. I've noticed some of the  videos and plans are kind of laggy, so we might   notice some of that here. But we wanted to try  to fine-tune a model, and we've got a few videos  

here. The time frame on the tuning is actually  compressed.It took maybe five hours or so to tune,   and I don't think everybody wants to hang  around for that long to watch it tune,   so we'll compress that backing part. But I think  we want to step through and just show you what's   going on on the box. First, we're going to take it  and fine-tune it. We're going to take a DeepSeek  

R1 distilled Quinn 32B model and use a data set  we created—NVIDIA source code. It consists of 500   question and answer pairs. So, that's our goal. We  want to test this and then ask it some questions   to see if it can help us write some code. First, we're going to set up our model. Here's   the model you can see up there, and you can see  the code base—the NVIDIA code. We're going to do   4-bit, and we're using Qura to help us fine-tune  this. We've got data we created using Hugging  

Face's TRL to help us create that. So, that's  our goal. We're going to try to fine-tune this   model on our DGX Spark. Now we've got that set  up, so let's take a look at our model. There's   our model right there. We're setting up our  code, and you can see the NF4. We're going   to go FP4 in this and make a model that can  give us good performance. You can see all the   other parameters there, setting up for attention. Here's our data prep. We're going to walk through   this. Some of my videos are not playing, but I'm  going to back up one and see if I can play some  

of these anyway. The clicker doesn't seem to  like them. Of course, the monitors—you expect   them to go right to left, but that's not always  the case. Okay, so we'll go through. I'll drive   it from here. So, here we go. We'll set up the  model, which we saw, and then we've got our data   set. We're going to set it up here. I'll let  these stop at paces so you can see where we're  

setting things up. You can see where we set up a  prompt. We want to do question-answer pairs. You   can see the part that's coded out in red, so  we can set up how we want to train the model.  Now that we've got our data set up and  ready to go, it seems to want to get   stuck on the same slide over and over  again, so I'm going to try to see if   I can advance it manually. There we go.  Now we're ready to start the training.   We've got a notebook set up, and you'll see it'll  go to the training here. We're going to open up a  

TensorBoard so we can watch what's going on in  a minute. You can see getting everything going.   There's our TensorBoard. We're going to bring  that up and set it up to run. Then we'll stop it   right before it kind of jumps right past that.  See if I can get it to stop there. All right,   and here's our TensorBoard. So, here's what we're  going to watch. Of course, we're going to watch   our training epoch, all the data going through  that for that epoch. We're going to look at our   gradient norms to make sure we're not too high  or too low. We want to be kind of in the middle.  

If it's skewing too far one way or the other,  we're not getting good training out of that. Our   learning rate—we want to see that kind of go up  quickly and then start to peter down. Hopefully,   we're training our model, and it's learned all it  can. Training loss—we want to make sure the actual  

data is fitting right. We want to make sure our  model is being trained and we're not just randomly   changing weights here and there for no good reason  at all. Finally, we'll look at our token accuracy.   We want to check how well the model is answering  the questions during the testing. We should start   to see that converge and get good response rates. Now we're going to just let it roll. Again,   this is compressed from a few hours' worth of  testing down to about a minute's worth of video so   that we don't sit here all day and can go get some  drinks. You can see it's running now, and you can  

see all those arrows were kind of up high. You'll  see our training is starting to work. The learning   rate is starting to come down on that middle  one. So, you can see it's kind of learning what   it's supposed to do. Our training loss is going  down, we're getting good responses, and our token  

accuracy is going up. The one on the bottom, we're  getting better and better as we go. Now we're kind   of done with it, and that was kind of fast, but  I pulled it so we can actually see what happened   at the end. We can look at our chart. You can see  the one up on the far top, I guess it's your left,   that's just our running through our epochs. So,  we're going to run through that, and it's going   to increase. Our training gradient did kind of  want to do—you saw it was kind of up high. We're   doing lots of changing of the weights, but then  it kind of came back down to more normalized.  

That's good. If we would have stayed up high,  that would have been something to be concerned   about. Our training, the way I wanted to look  at it, our learning rate, of course, it went up.   We saw a nice good curve. We taught it a lot,  and then as it started to go through the data   set, it started to do much better and kind of knew  what it needed to learn. Our training loss—we had  

a lot of misses compared to what we asked it to  do, but you can see it kind of converged. Finally,   our token accuracy was pretty good. It kind of  went up, had a little dip at the end, but that   was good. We were getting good answers, and the  model was doing what we were asking it to do. 

Now the only thing left to do is to try it  and see what it does. Here, I'm going to try   to pause some things on this video so we can see  what's going on a little bit. First, you can see   the question. You said, "Can you help me write  a PyTorch training loop using the Transformer   Engine library with FP4 precision? The loop should  include model initialization, loss computation,   and optimizer setup. A minimal working example  would be ideal." So, this is something you might  

do if you had a code co-pilot on your desk.  You'd ask it to help you do some code so you   can go on and do some other work, let this  thing do some of the lifting for you. Here,   you're going to see again—this is not sped up,  this is not slowed down. I just want to qualify  

these are engineering development boards. They're  not optimized; we're not running at final clocks.   We still have work to do here, not final software,  but I wanted to give you an idea. Even at this   very early stage, hopefully, my laptop is  going to not introduce a lot of overhead,   but to see what happens. You can see it starts  to think and is going to give you a response.  Again, we should have had some dramatic music  playing for this. Gotta get some production values  

next time. Need a budget. And I think it's about  done. So, you're all some of the first people   outside of NVIDIA to actually see a Spark system  take a model, train it, and run it and give you   a query. So, that's it. You actually saw it  live. Some of the first people on the planet   to actually see it outside the lab. [Applause] Just to recap what we saw, I know we looked at   a lot there. There are a lot of words, a lot of  things on the screen. It might have been hard to   see from where you're sitting, but fine-tuning  is definitely computationally a heavy load. You  

need a lot of memory, you need a lot of flops, and  there's a lot of data. That's where the teraflops   performance and the memory come in. Our stack has  already been ported to ARM 64, so a lot of the   tools and things we use in our DGX just work. If  you're in a DGX environment, everything just kind   of works there. We've got all support for all the  dev kits. You saw us running lots of things. We   were running quite a few different pieces of code.  I always forget everything that's going in there,   but I wrote some notes. I was smart. We were  using Hugging Face TRL, bits and bytes for some  

of the translation stuff, PyTorch CUDA was there,  Jupyter was the notebook we used. So, all again,   third-party tools that run in the environment. It  wasn't just a straight NVIDIA stack, of course.   We had all our CUDA-optimized tools there, CUDA  X tools running. So, it's really—you really can't  

overstate the importance of the NVIDIA ecosystem.  That is just a huge advantage when you're doing   AI work. We want to make sure that all runs  on Spark and that it makes it easy for you to   seamlessly move through these different workflows. If you look at some of the advantages when we talk  

about this being for developers, this is really  it. You can run large models and large workloads   right there on your desktop. We've got the NVIDIA  software stack with the tools you need to build   and run AI. It's very important to remember  that the data stays local. We didn't have to   take any of our source code or anything out of  the building. It stayed there on the rack in the  

environment we put it in. This is very important  if you've got private data. I know some people   who tell me they spend time creating synthetic  data because the real data they want to run on   can't leave the building. It takes time, it takes  effort, and you can skip that step. Keep it local.   It's your own personal AI cloud.You don't have  to fight for contention; you don't have to beg  

for time on the clusters or go get more money for  cloud resources. It runs there, and it's yours.   If you want to change the software, you can do  it. How many have tried to change software in   a cluster before? How many have been successful?  That's right—it's hard to do. They really protect  

those systems and resources. This is your box; you  can do with it as you will. You can connect two of   these together to really scale up and expand in  some workloads. As Jensen mentioned, these are   really part of a new class of computers created  for AI. People ask me, "Oh, can I use it for this?   Can you do that?" It's really an AI box. If you've  got a laptop and you want to do something, we did   the training. If you had to lock up your laptop  for five or six hours doing the training run,  

that's a lot of time you're not in meetings, not  reading your email, not doing all the other stuff   you've got to do. So, it's a heavy lift. This is  a great way to offload your desktop systems and   not have to go to some of those other resources.  The software makes it easy to migrate to DGX, DGX   Cloud, or any of our accelerated infrastructures.  It's going to be available from OEM partners. We   showed the ones up front from ASUS, Dell, HP, and  Lenovo. So, if you've got a particular favorite   vendor you like to buy from, you'll be able to  get it from them. You can go check out the models.  

ASUS, Dell, and HP have their versions of this in  their booths. You can go get a picture and go get   a selfie with it, probably if you ask nicely. You  can reserve your system, so you can go to NVIDIA's   marketplace or nvidia.com and reserve the system.  Think of these gold systems as founders editions.   We're going to make some of them and sell them,  but it's going to be like a founders edition.   Once they're gone, they're gone. We really  want our partners to go out and sell these,   and that's where the volume and mass production is  going to come from. If you want to get a cool one,  

reserve one that's just like this and looks  cool once it's on your shelf. We have a 4 GB   storage option available for $39.99. You can save  a little money and get a 1 TB option, which will   be cheaper. We announced it at $29.99 at CES  early in the year, so it just depends on what   you want to do. There are plenty of USB-C  ports on there, so you can plug more storage   into it if you like and save yourself a little  money. You might want to reserve yours today. 

So, with that, I think we'll just turn over  to questions. Yeah, go ahead to the mic,   and I'll have some more box of water while you  go there. Thank you for the presentation. So,   some quick questions about the hardware:  How hot does it get? How much power does   it require? And can we change the drive inside? Sure, good questions. So, three questions there:  

one is how much power, how hot does it get,  and can we change the drive? How hot it gets,   I don't know yet. It should be cool. It's  going to have a fan in it. It's a very small,   low-power device. Power—we won't know final power  budgets until we get the systems in-house. They've   got to lock the clocks and CPUs and GPUs, and  everything will determine the power. It's going   to easily plug into a wall outlet. I still think  200ish watts, give or take, but we still have a   little testing to do before we have that. It's  not really intended for you to go in there and  

change the drives or any of the components. It's  not going to be something where you can easily go   in and swap things out. We're not really building  that kind of a system. It's meant to be a little   all-in-one and built that way. Now, what our  partners do, I can't comment on. So, you can   go and ask Dell, HP, and Lenovo if they have plans  and maybe they have something different than ours. 

To build on the power question, are you  planning on having a separate power brick   for the 200 watts, because I think the pictures  of the prototypes didn't show a power plug?  Yeah, there's no power plug. It'll  plug in through one of the USB ports,   and it's going to be an external power brick to  help keep the size down. It makes it easier for   multiple countries that way. It's kind of hard  to do it internally and be able to ship globally   that way. So, you're going to be limited by  the power profile that USB PD can provide.  The other question is, since NVIDIA is also a  switch company, have you thought about having   a companion switch that has, say, an NVMe storage  so that I can truly simulate a cluster environment   where I have object storage through the switch  provider, and I can hook up, like, maybe two or   three of them together through this, whatever you  presume it's 200 gig on the switch on the output   there for them. Yeah, so, like, have a little NAS  type device that goes with this, with your cutesy   branding, right? Have the storage available with  it, provide an object storage, and, for extra fun   and bonus points, have a way for me to put a Slurm  controller or login node on it, and then, again, I   have my mini cluster, all in a nice little stack. Yeah, we've had some discussions about whether we  

make a switch for these, a ConnectX switch. I  think, in most cases, it's probably way more   expensive than the box is for a switch. But for  now, we're just going to let you connect the two   together. I can't speak for what our partners want  to do longer term and how they want to support it,   but that's kind of our plan. Again,  this is like a founders edition, so  

we're not really in the system business. We don't  build systems, so we don't intend to do that. But   our partners may do something different.  And, by the way, it's a standard DAC cable   for cross-connection. If you go look, you can see  there's one plugged into the back of one of the   little stacks in the booth downstairs. Thanks. I had a question about memory bandwidth. If you   can tell us anything about it, just like, yeah,  there's actually—I think I had one more slide   here, but I'm mostly curious about how it compares  to a GPU because it's much less, so there. Take a  

picture of that. Cool. But the memory bandwidth  is up there. It's 273 GB per second. In my   experience, it's really important for inference  speed and the size of the model. Yeah, so I was   wondering if you guys did a bunch of optimization  on the software side to make it run much faster.  Yeah, we're always looking at optimizations  and things like that. We showed that chart:   if it fits in memory, it's going to be faster on  a discrete GPU. You've got more flops, more memory  

bandwidth. That's just the way it's going to be.  The sacrifice you're making here is that we give   you a lot more memory in a very tiny footprint,  a little device. The memory on there is LPDDR5X,   so it's power-efficient. It's not like GDDR7,  where you've got like two terabytes and you   need a lot more power. We couldn't make  something that power-efficient with that   kind of memory. It's just not there yet. We can  only hope, but today, we're not there. Okay,   cool. Thank you. And we've got the specs live on  the web page. The web page has been refreshed,  

so all these are up there. If you didn't get a  picture, you can go check out the web page, and   we've got all the specs up there for the system. Maybe I missed the details, so how do you connect   the two Sparks together? You connect them on  the back. I'll get the little model again. I   like carrying it. What's the bandwidth between  those? It's kind of like "Wheel of Fortune."   There are two large connectors on the back that  connect the systems. You would plug one of these   into the other. What kind of connector is it? It's  a ConnectX-7. It's a fairly large connector. Okay,  

it's a nice, good old-fashioned copper cable. Great stuff. When are you going to ship them?   That's the $60,000 question. We're saying they'll  be shipping this summer, early in the summer. So,   if you go on and reserve it, hopefully, you  can have some time by the pool this summer and   enjoy your Spark. It should be available.  Thanks. Our partners will have their own   schedules, so they'll provide those later. I think I'm looking at the red clock there,   and I'm going to get flagged in a  minute, but we have one more. Oh,  

I have a question about the storage. You guys  are using one or four terabytes of NVMe M.2 SSD.   Do you have any preference on PCI generation  for that? It's all NVMe 2. I don't know what   the generation is. Is it Gen 5? So, PCI Gen 5. Okay, one more. You've got two minutes left.   You've got probably a minute and a half by the  time you get up there. So, what kind of clock   speeds are you targeting, and what kind  of GPU am I getting? I only see Blackwell   generation up there. How many cores do I get? Yeah, we'll provide some more data once we  

lock those down. We don't know what the clocks  are going to be for the final clocks until we   get the boards back. They do a lot of testing and  validation, and then they'll lock them down. Then   we'll have the core counts and everything and  provide those for you. But I guess core counts  

are locked in by now, more or less. We haven't  shared the core counts yet, but we will soon.  Outstanding. Thank you so much for coming,  everybody. This brings us to the end. The   rest of the details will be available on our  website and from our partners. Thanks again!

2025-04-10 06:47

Show Video

Other news

Why cloud-based video security is exploding in popularity: Insights from Milestone 2025-04-15 01:51
Что внутри аккумулятора TPCELL 4A 21V типоразмера LXT под MAKITA ? 2025-04-12 01:02
В РЕМОНТЕ ноутбуков важна любая мелочь. Кз и нет изображения - типовая неисправность ноутбуков 2025-04-10 06:35