NVIDIA DGX Spark: Your Personal AI Supercomputer | NVIDIA GTC 2025 Session

Show Video

It is my pleasure to introduce Alan Bourgoyne. He is the Director of Product Marketing here at NVIDIA, and he's going to be talking to us about Project DIGITS. A warm round of applause for Alan. Thank you, Seppy. That 20 IUs is on the way; I'll give it to you after the show. So, I know it's Thursday afternoon, it's late. How many people are ready for a drink? I am. I could go for a couple of drinks, so hopefully, we'll have an entertaining show today. Let's take a

look. DGX Spark from Project DIGITS to Personal AI Supercomputer at CES. Jensen did a keynote back in January and he announced Project DIGITS, the Grace Blackwell AI supercomputer for your desk. It has a thousand AI teraflops, 128 gigs of memory, and runs our AI stack. It's very tiny; you can see the picture there—it fits on your desktop. Project DIGITS was an homage to a system we released back in 2015, the DIGITS box. It was our first AI training box, a workstation with four GPUs in it, a big software stack on it, and it was the first box really dedicated to training. A year after that, we released our first DGX system. So, DIGITS was kind of the precursor to DGX, and the name for DIGITS was that homage. How many people saw the keynote?

How many of you saw it at the SAP center? All right, if you didn't see that at the SAP center, you missed something. There was a glitch in the stream, and I'm going to show you what you missed. 30 million software engineers in the future—there are 30 million of them around the world, and 100% of them are going to be AI-assisted. I'm certain of that. 100% of NVIDIA software engineers will be AI-assisted by the end of this year. So, AI agents will be everywhere. How they run, what enterprises run, and how we run it will be fundamentally different. We need a new line of computers, and this is what started it all. This is the NVIDIA DGX1, with 20 CPU cores, 128 gigabytes of GPU memory, one petaflop of computation, and it costs $150,000, consuming 3,500 watts.

Let me now introduce you to the new DGX. This is NVIDIA's new DGX, and we call it DGX Spark. DGX Spark. You'll be surprised: 20 CPU cores. We partnered with MediaTek to build this for us. They did a fantastic job. It's been a great joy working with Rickai and the MediaTek team.

I really appreciate their partnership. They built us a chip-to-chip MVLink CPU to GPU, and now the GPU has 128 gigabytes. This is fun, one petaflop. So, this is like the original DGX1 with pin particles. You would have thought that that's a joke that would land at GTC. Okay, well, here's 30 million. There are 30 million software engineers in the world, and it's, you know, 10 to 20 million data scientists. This is now clearly the gear of choice. Thank you, Janine. Look at this. In every bag, this is what you should find. This is the development

platform of every software engineer in the world. If you have a family member, spouse, or someone you care about who's a software engineer, AI researcher, or just a data scientist, and you would like to give them the perfect Christmas present, tell me this isn't what they want. So, ladies and gentlemen, today we will let you reserve the first DGX Sparks for the attendees of GTC. Go reserve yours. You already have one of these, so now you just got to get one of these. You can imagine how excited we are. All the people have been spending months and months working on this project, and of course, when our moment comes up to shine, the stream glitches, and you don't see that part. So, what can you say? But Jensen, when he started that talk, he said he doesn't have a script, no script, and he's not lying. The one part he wasn't lying about is you

can reserve DIGITS. You can actually go online and reserve those. We're going to send you a notice next week to remind you, but you can go online and reserve yourself a system today. It is a new class of computers, right? It's designed from the ground up to run software. You saw the comparison; those numbers weren't by accident. What you saw from that DGX1 to what we have today—the memory, the flops—right, it's designed for AI from the ground up. It runs our full AI accelerated

software stack. Everything runs there. It's going to be available from OEM partners: ASUS, Dell, HP, Lenovo. You can actually go to the show and look at the ASUS, Dell, and HP boxes. You can see their prototypes. It's on sale on the NVIDIA marketplace right now, and I just happen to have one of the prototype systems right here for us to take a look at. Maybe somebody will get a selfie after,

but this—I feel like the Lion King. There should be real dramatic music playing right now. Do we have that? Not even. Okay, I guess only the keynote gets that kind of value, but thank you. There you go. There's one right there. So, why do we build these things? You saw a

lot of this in the keynote, right? AI is evolving rapidly. AI agents are kind of the big thing now. We want AI to help us do things, right? We want AI to collaborate with other AIs to do things for us. For example, a healthcare agent should be able to help the doctor diagnose things and pick out medicines. It's more than one model doing these things. I would love it if, for travel, most of us are in travel this week, you have to extend a day. What do you have to do? Got to call the hotel, the rental car place, and the airline. It'd be great if agents just did that for you, right? And helped you make those decisions and make those things happen. AI agents are getting very popular,

and businesses are using them because they're useful to help people do their jobs more efficiently and more effectively. Instead of having just point AIs, the rise of reasoning AI is significant. Jensen talked a lot about this—all of those new models where the AI not only goes through the model once but goes through multiple times to try to find the best answer. That's the DeepSeek model, that's the reasoning models. And a lot of times, you can see them think. If you use Perplexity or some of the new models, you can actually see it thinking, and it's

generating a lot more tokens than before. So, when Jensen talked about the scaling laws, that axis on the bottom is very important. The more intelligent I make the model, the more it can reason, the more compute I need. Those scaling laws are out there, and so when we're making these kinds of systems, we need more powerful desktop systems. If you look at our local systems today, how many of you do your AI workloads on your desk? You can do it on your laptop, I give you that, or on your PC, but sometimes things don't work there. Maybe the model's too big, the workload's too big. I'm

trying to make an agent, and I want to run three or four models at the same time. A little bit too much for my laptop, so I have to go out to the cloud. Or maybe I want to use some software that doesn't run well, or maybe it's not actually important to my device. So, a lot of times,

you have to go off-device, and that's why we created this new class of AI computers. We want to have systems that are powerful enough to help you offload work and literally create your own personal cloud right at your own personal data center on your desk. By having these systems like DGX Spark and DGX Station, our full accelerated software stack—all of your favorite tools out there like PyTorch and Jupyter Notebook—everything just runs right. So, your environment just comes over, and now you've got your own personal cloud. If we look at the software stack now, before

anybody cuts me to shreds, this is my marketing version of the software stack. I know all you engineers—it's not 100%, but I try to fit it on one slide so we can look at it a little closer together. Down at the hardware, that's our GB10 super chip. It's got the CPU, the GPU, encode and decode engines, our optical flow accelerator, and our CX ConnectX chip for networking. We've got our DGX OS. This system runs the same OS that our DGX is running in the data center, the same exact software. All of that's there, all the containers, everything we do to accelerate that kernel. And of course, the CUDA libraries—that's where a lot of the magic happens, right? That's how we get to the hardware, provide you the libraries and toolkits you need to accelerate all of the higher-level frameworks and all of your favorite tools that are out in the market today. And,

of course, the great tools we provide, like NVIDIA AI Blueprints, a workbench, all of those are all supported thanks to the beauty of our software stack that just runs everywhere. This was kind of the design goal: take that software stack and leverage it. Look at that—I do have production values; I made the arrows move. You can write it once and move it wherever you want to go. You can work on your Spark, you can move to DGX, you can move to the cloud—basically, any accelerated infrastructure you've got. You write it once, and it's ARM-based,

so all the ARM code is there. It runs, we develop, and we work through all the tools for you. So, if you're working on a DGX, we all know how hard it is to get access to that system and get access to that data center cloud. It's great that you can prototype, work out all the kinks, and test. That way, when you finally get your slice of time in the data center, you're a lot more confident. You don't have to experiment quite so much by using that Spark system on your desk, and it truly is your personal cloud. You can take all these workloads, run them on your Spark,

and have your systems connect to it. It's going to be really easy to install. Our out-of-the-box plan is to let you install it basically like you would put a thermostat in your house. You're going to hit its network, give it the information—what's the network I'm on? It has a wireless network and a wired network. You're going to give that information, it's going to restart itself, and now you've got a network-connected compute device. Really easy, simple. It's a full computer;

it's got ports so you can plug a mouse and keyboard into it. It's got Bluetooth, so you can wirelessly connect your mouse and keyboard. It's got an HDMI port on the back, so you can plug into it and use it as a standalone device if you'd like to. But the beauty of it is, either way, you've got your own personal cloud that you can send all of your bigger jobs to. I mentioned the ConnectX chip in there, and that's a really important feature of this box. It lets you connect two of these systems together to basically form a little cluster. So,

you've effectively doubled the memory and doubled your compute performance. Have it on the network, have your desktop system, and now you've got an even more powerful system. Work with models up to over 400 billion parameters with the combined memory from those two systems. I think I showed you the back of that thing; you can see those ports. They're the big giant

ports. If you go into the system in the back, you'll see where those ConnectX cables connect to. They're kind of big cables that plug into the back, and that's what gives you much higher performance than using the network. So, we'll allow you to form your own little mini cluster on your desktop if that's what you want to do. So, as far as workloads, It really is for AI developers, model prototyping, and development. You can do your development work there. Maybe

you want to develop AI-augmented applications, create AI chatbots, or create all the agents we talked about earlier. Maybe use some of our blueprints to create those. Fine-tuning—you could fine-tune a model up to 70B on a single one of these Sparks. We're going to show a demo a little bit later. We actually did a little of that up front to do some fine-tuning, and we'll show an example of inference. It's great for inference. You want to test out a model, you've done a little work, and you want to see if it does what you want it to do. Maybe you want your own

co-pilot. Maybe you want to train it with your own codebase, put it on your desk, and now you've got your own personal code co-pilot. The data is safe and secure; it never has to leave the building. I know companies would get very upset if you borrowed some cloud time and sent all your source code up there to train a model. The data stays there; it's local and doesn't have to go anywhere. You saw Jensen mention data science. Our full stack runs there, including our accelerated RAPIDS and cuDF libraries. Everything is there. We also run all of our other libraries, so you can run Isaac to train robots, all our computer vision models, the VLM models, and more. We're trying to install as much of this on the system as we can. We want

to make the download palatable, so we can't put everything there. There's always that trade-off: how long do you want to wait for an update to run versus how much we put on there. But the good news is, everything's there, and it's all free. You can join a developer program, sign up for NGC, download everything, and try it out on your own. A unique thing about Spark is that if you look at all of our Grace Blackwell solutions today in the data center, they have Blackwell GPUs and Grace CPUs. But those Blackwell GPUs are really maximized for compute. People don't care about

things like RT cores for ray tracing. A couple of other things we include in there for graphics are actually there. You've got a full Blackwell GPU in a Spark, so it's got the RT cores, tensor cores, and CUDA cores. If you like, you can visualize on it as well. A lot of times, data

scientists will want to visualize on it. It's good for compute. Maybe you want to do some simulation there. A lot of the simulations now done in molecular biology and earth sciences use AI to create those simulations and run them. This box can be used to help develop those applications,

and you can test it out right there. You can visualize it; it has the code there. We're trying to bring up a few applications in the lab. We brought up a protein-folding model that uses a couple of AI models, and we were actually able to look at it and use power bricks to look at the model, the protein-folding model. People are going to use it for that. There's a lot of interest in using it for visualization as part of the workflow.

A lot of people ask how it compares and how it stacks up to PCs and workstations. There are a lot of words on this chart, but I'll show you one with arrows in a minute, which is probably a little easier to grok. You can see across the top line that Spark got about a thousand teraflops. Nothing like a box of water in the afternoon; it's really good. We've got about 3,300 teraflops on our highest-end PC GPUs. Workstations are up to about 4,000 teraflops. We just introduced our RTX Pro 6000 high-end GPU with 96 gigs of memory. It's a very powerful GPU. Of course, data center GPUs can go up to 141 gigs, but they've got NVLink and can scale out to just look like ridiculously large single GPU instances. If you look at the memory footprint,

out of 128 gigs, you've got maybe 100 gigs plus or minus to be able to work with on Spark after you take out some room for the OS. You've got 32 gigs on the PC, 96 gigs on the Pro, and data center GPUs can go up to 141 gigs. But they've got NVLink and can scale out to just look like ridiculously large single GPU instances. If you look at model sizes, you can run

about a 200 billion parameter model on Spark. Only 64 billion on a PC on a single GPU. You get a little higher, 198 billion on a workstation, but the cost of that GPU is probably way more than the Spark system costs. If you've already got a laptop, whether you want to invest in that or not, and then multi-GPU—well, I'm not really sure anybody can get four of the 600-watt Pro cards or the 500-plus watt cards, so those are probably more like two. By stacking two of those together,

you can get a lot more performance. When people ask me which one should I buy, if it fits in GPU memory, the discrete cards are probably going to be faster. You've got more raw computing performance there. But if your challenge is that it just doesn't fit on my system, it's too

big, too many models, or the software stack's not there, then the Spark is going to be the way to go. So, kind of a good rule of thumb. If you just want to look at this a little more visually, you can see how this fits. On a PC, about a 51 billion parameter model, this is all FP4. I threw out about 20% memory overhead for overhead when you're running the model, which is probably part of the course. Use a better optimizer, and maybe you can squeeze a little bit more out of that. About 153 billion on a workstation with one of the new GPUs, 200 billion for Spark, and 405 billion if you stack two of them. If you can manage to get four of those GPUs in a workstation, you can probably get a little bigger than that. Of course, DGX Station is still in the works, but it's going to be very large. DGX Station is going to be that gap between desktop and data

center, so it's really set up for very large, very demanding workflows. More than likely, it might even be a shared compute resource with a couple of engineers since it's so powerful. But it gives you an idea of how all these lay out. With that, I want to take a look at a demo. We wanted to bring a system here, but it's just not really ready for that yet. We're very

early in the development process. We've got some development boards in-house, and networking's not great in the building. I've noticed some of the videos and plans are kind of laggy, so we might notice some of that here. But we wanted to try to fine-tune a model, and we've got a few videos

here. The time frame on the tuning is actually compressed.It took maybe five hours or so to tune, and I don't think everybody wants to hang around for that long to watch it tune, so we'll compress that backing part. But I think we want to step through and just show you what's going on on the box. First, we're going to take it and fine-tune it. We're going to take a DeepSeek

R1 distilled Quinn 32B model and use a data set we created—NVIDIA source code. It consists of 500 question and answer pairs. So, that's our goal. We want to test this and then ask it some questions to see if it can help us write some code. First, we're going to set up our model. Here's the model you can see up there, and you can see the code base—the NVIDIA code. We're going to do 4-bit, and we're using Qura to help us fine-tune this. We've got data we created using Hugging

Face's TRL to help us create that. So, that's our goal. We're going to try to fine-tune this model on our DGX Spark. Now we've got that set up, so let's take a look at our model. There's our model right there. We're setting up our code, and you can see the NF4. We're going to go FP4 in this and make a model that can give us good performance. You can see all the other parameters there, setting up for attention. Here's our data prep. We're going to walk through this. Some of my videos are not playing, but I'm going to back up one and see if I can play some

of these anyway. The clicker doesn't seem to like them. Of course, the monitors—you expect them to go right to left, but that's not always the case. Okay, so we'll go through. I'll drive it from here. So, here we go. We'll set up the model, which we saw, and then we've got our data set. We're going to set it up here. I'll let these stop at paces so you can see where we're

setting things up. You can see where we set up a prompt. We want to do question-answer pairs. You can see the part that's coded out in red, so we can set up how we want to train the model. Now that we've got our data set up and ready to go, it seems to want to get stuck on the same slide over and over again, so I'm going to try to see if I can advance it manually. There we go. Now we're ready to start the training. We've got a notebook set up, and you'll see it'll go to the training here. We're going to open up a

TensorBoard so we can watch what's going on in a minute. You can see getting everything going. There's our TensorBoard. We're going to bring that up and set it up to run. Then we'll stop it right before it kind of jumps right past that. See if I can get it to stop there. All right, and here's our TensorBoard. So, here's what we're going to watch. Of course, we're going to watch our training epoch, all the data going through that for that epoch. We're going to look at our gradient norms to make sure we're not too high or too low. We want to be kind of in the middle.

If it's skewing too far one way or the other, we're not getting good training out of that. Our learning rate—we want to see that kind of go up quickly and then start to peter down. Hopefully, we're training our model, and it's learned all it can. Training loss—we want to make sure the actual

data is fitting right. We want to make sure our model is being trained and we're not just randomly changing weights here and there for no good reason at all. Finally, we'll look at our token accuracy. We want to check how well the model is answering the questions during the testing. We should start to see that converge and get good response rates. Now we're going to just let it roll. Again, this is compressed from a few hours' worth of testing down to about a minute's worth of video so that we don't sit here all day and can go get some drinks. You can see it's running now, and you can

see all those arrows were kind of up high. You'll see our training is starting to work. The learning rate is starting to come down on that middle one. So, you can see it's kind of learning what it's supposed to do. Our training loss is going down, we're getting good responses, and our token

accuracy is going up. The one on the bottom, we're getting better and better as we go. Now we're kind of done with it, and that was kind of fast, but I pulled it so we can actually see what happened at the end. We can look at our chart. You can see the one up on the far top, I guess it's your left, that's just our running through our epochs. So, we're going to run through that, and it's going to increase. Our training gradient did kind of want to do—you saw it was kind of up high. We're doing lots of changing of the weights, but then it kind of came back down to more normalized.

That's good. If we would have stayed up high, that would have been something to be concerned about. Our training, the way I wanted to look at it, our learning rate, of course, it went up. We saw a nice good curve. We taught it a lot, and then as it started to go through the data set, it started to do much better and kind of knew what it needed to learn. Our training loss—we had

a lot of misses compared to what we asked it to do, but you can see it kind of converged. Finally, our token accuracy was pretty good. It kind of went up, had a little dip at the end, but that was good. We were getting good answers, and the model was doing what we were asking it to do.

Now the only thing left to do is to try it and see what it does. Here, I'm going to try to pause some things on this video so we can see what's going on a little bit. First, you can see the question. You said, "Can you help me write a PyTorch training loop using the Transformer Engine library with FP4 precision? The loop should include model initialization, loss computation, and optimizer setup. A minimal working example would be ideal." So, this is something you might

do if you had a code co-pilot on your desk. You'd ask it to help you do some code so you can go on and do some other work, let this thing do some of the lifting for you. Here, you're going to see again—this is not sped up, this is not slowed down. I just want to qualify

these are engineering development boards. They're not optimized; we're not running at final clocks. We still have work to do here, not final software, but I wanted to give you an idea. Even at this very early stage, hopefully, my laptop is going to not introduce a lot of overhead, but to see what happens. You can see it starts to think and is going to give you a response. Again, we should have had some dramatic music playing for this. Gotta get some production values

next time. Need a budget. And I think it's about done. So, you're all some of the first people outside of NVIDIA to actually see a Spark system take a model, train it, and run it and give you a query. So, that's it. You actually saw it live. Some of the first people on the planet to actually see it outside the lab. [Applause] Just to recap what we saw, I know we looked at a lot there. There are a lot of words, a lot of things on the screen. It might have been hard to see from where you're sitting, but fine-tuning is definitely computationally a heavy load. You

need a lot of memory, you need a lot of flops, and there's a lot of data. That's where the teraflops performance and the memory come in. Our stack has already been ported to ARM 64, so a lot of the tools and things we use in our DGX just work. If you're in a DGX environment, everything just kind of works there. We've got all support for all the dev kits. You saw us running lots of things. We were running quite a few different pieces of code. I always forget everything that's going in there, but I wrote some notes. I was smart. We were using Hugging Face TRL, bits and bytes for some

of the translation stuff, PyTorch CUDA was there, Jupyter was the notebook we used. So, all again, third-party tools that run in the environment. It wasn't just a straight NVIDIA stack, of course. We had all our CUDA-optimized tools there, CUDA X tools running. So, it's really—you really can't

overstate the importance of the NVIDIA ecosystem. That is just a huge advantage when you're doing AI work. We want to make sure that all runs on Spark and that it makes it easy for you to seamlessly move through these different workflows. If you look at some of the advantages when we talk

about this being for developers, this is really it. You can run large models and large workloads right there on your desktop. We've got the NVIDIA software stack with the tools you need to build and run AI. It's very important to remember that the data stays local. We didn't have to take any of our source code or anything out of the building. It stayed there on the rack in the

environment we put it in. This is very important if you've got private data. I know some people who tell me they spend time creating synthetic data because the real data they want to run on can't leave the building. It takes time, it takes effort, and you can skip that step. Keep it local. It's your own personal AI cloud.You don't have to fight for contention; you don't have to beg

for time on the clusters or go get more money for cloud resources. It runs there, and it's yours. If you want to change the software, you can do it. How many have tried to change software in a cluster before? How many have been successful? That's right—it's hard to do. They really protect

those systems and resources. This is your box; you can do with it as you will. You can connect two of these together to really scale up and expand in some workloads. As Jensen mentioned, these are really part of a new class of computers created for AI. People ask me, "Oh, can I use it for this? Can you do that?" It's really an AI box. If you've got a laptop and you want to do something, we did the training. If you had to lock up your laptop for five or six hours doing the training run,

that's a lot of time you're not in meetings, not reading your email, not doing all the other stuff you've got to do. So, it's a heavy lift. This is a great way to offload your desktop systems and not have to go to some of those other resources. The software makes it easy to migrate to DGX, DGX Cloud, or any of our accelerated infrastructures. It's going to be available from OEM partners. We showed the ones up front from ASUS, Dell, HP, and Lenovo. So, if you've got a particular favorite vendor you like to buy from, you'll be able to get it from them. You can go check out the models.

ASUS, Dell, and HP have their versions of this in their booths. You can go get a picture and go get a selfie with it, probably if you ask nicely. You can reserve your system, so you can go to NVIDIA's marketplace or nvidia.com and reserve the system. Think of these gold systems as founders editions. We're going to make some of them and sell them, but it's going to be like a founders edition. Once they're gone, they're gone. We really want our partners to go out and sell these, and that's where the volume and mass production is going to come from. If you want to get a cool one,

reserve one that's just like this and looks cool once it's on your shelf. We have a 4 GB storage option available for $39.99. You can save a little money and get a 1 TB option, which will be cheaper. We announced it at $29.99 at CES early in the year, so it just depends on what you want to do. There are plenty of USB-C ports on there, so you can plug more storage into it if you like and save yourself a little money. You might want to reserve yours today.

So, with that, I think we'll just turn over to questions. Yeah, go ahead to the mic, and I'll have some more box of water while you go there. Thank you for the presentation. So, some quick questions about the hardware: How hot does it get? How much power does it require? And can we change the drive inside? Sure, good questions. So, three questions there:

one is how much power, how hot does it get, and can we change the drive? How hot it gets, I don't know yet. It should be cool. It's going to have a fan in it. It's a very small, low-power device. Power—we won't know final power budgets until we get the systems in-house. They've got to lock the clocks and CPUs and GPUs, and everything will determine the power. It's going to easily plug into a wall outlet. I still think 200ish watts, give or take, but we still have a little testing to do before we have that. It's not really intended for you to go in there and

change the drives or any of the components. It's not going to be something where you can easily go in and swap things out. We're not really building that kind of a system. It's meant to be a little all-in-one and built that way. Now, what our partners do, I can't comment on. So, you can go and ask Dell, HP, and Lenovo if they have plans and maybe they have something different than ours.

To build on the power question, are you planning on having a separate power brick for the 200 watts, because I think the pictures of the prototypes didn't show a power plug? Yeah, there's no power plug. It'll plug in through one of the USB ports, and it's going to be an external power brick to help keep the size down. It makes it easier for multiple countries that way. It's kind of hard to do it internally and be able to ship globally that way. So, you're going to be limited by the power profile that USB PD can provide. The other question is, since NVIDIA is also a switch company, have you thought about having a companion switch that has, say, an NVMe storage so that I can truly simulate a cluster environment where I have object storage through the switch provider, and I can hook up, like, maybe two or three of them together through this, whatever you presume it's 200 gig on the switch on the output there for them. Yeah, so, like, have a little NAS type device that goes with this, with your cutesy branding, right? Have the storage available with it, provide an object storage, and, for extra fun and bonus points, have a way for me to put a Slurm controller or login node on it, and then, again, I have my mini cluster, all in a nice little stack. Yeah, we've had some discussions about whether we

make a switch for these, a ConnectX switch. I think, in most cases, it's probably way more expensive than the box is for a switch. But for now, we're just going to let you connect the two together. I can't speak for what our partners want to do longer term and how they want to support it, but that's kind of our plan. Again, this is like a founders edition, so

we're not really in the system business. We don't build systems, so we don't intend to do that. But our partners may do something different. And, by the way, it's a standard DAC cable for cross-connection. If you go look, you can see there's one plugged into the back of one of the little stacks in the booth downstairs. Thanks. I had a question about memory bandwidth. If you can tell us anything about it, just like, yeah, there's actually—I think I had one more slide here, but I'm mostly curious about how it compares to a GPU because it's much less, so there. Take a

picture of that. Cool. But the memory bandwidth is up there. It's 273 GB per second. In my experience, it's really important for inference speed and the size of the model. Yeah, so I was wondering if you guys did a bunch of optimization on the software side to make it run much faster. Yeah, we're always looking at optimizations and things like that. We showed that chart: if it fits in memory, it's going to be faster on a discrete GPU. You've got more flops, more memory

bandwidth. That's just the way it's going to be. The sacrifice you're making here is that we give you a lot more memory in a very tiny footprint, a little device. The memory on there is LPDDR5X, so it's power-efficient. It's not like GDDR7, where you've got like two terabytes and you need a lot more power. We couldn't make something that power-efficient with that kind of memory. It's just not there yet. We can only hope, but today, we're not there. Okay, cool. Thank you. And we've got the specs live on the web page. The web page has been refreshed,

so all these are up there. If you didn't get a picture, you can go check out the web page, and we've got all the specs up there for the system. Maybe I missed the details, so how do you connect the two Sparks together? You connect them on the back. I'll get the little model again. I like carrying it. What's the bandwidth between those? It's kind of like "Wheel of Fortune." There are two large connectors on the back that connect the systems. You would plug one of these into the other. What kind of connector is it? It's a ConnectX-7. It's a fairly large connector. Okay,

it's a nice, good old-fashioned copper cable. Great stuff. When are you going to ship them? That's the $60,000 question. We're saying they'll be shipping this summer, early in the summer. So, if you go on and reserve it, hopefully, you can have some time by the pool this summer and enjoy your Spark. It should be available. Thanks. Our partners will have their own schedules, so they'll provide those later. I think I'm looking at the red clock there, and I'm going to get flagged in a minute, but we have one more. Oh,

I have a question about the storage. You guys are using one or four terabytes of NVMe M.2 SSD. Do you have any preference on PCI generation for that? It's all NVMe 2. I don't know what the generation is. Is it Gen 5? So, PCI Gen 5. Okay, one more. You've got two minutes left. You've got probably a minute and a half by the time you get up there. So, what kind of clock speeds are you targeting, and what kind of GPU am I getting? I only see Blackwell generation up there. How many cores do I get? Yeah, we'll provide some more data once we

lock those down. We don't know what the clocks are going to be for the final clocks until we get the boards back. They do a lot of testing and validation, and then they'll lock them down. Then we'll have the core counts and everything and provide those for you. But I guess core counts

are locked in by now, more or less. We haven't shared the core counts yet, but we will soon. Outstanding. Thank you so much for coming, everybody. This brings us to the end. The rest of the details will be available on our website and from our partners. Thanks again!

2025-04-10 06:47

Show Video

Other news

全系列大對決！5.8mm薄旗艦機 Samsung Galaxy S25 Edge 到底適合誰？2億像素相機力壓 S25+ 遠攝變焦？ S25 Ultra 效能｜散熱｜電量表現終極比拼！ 2025-05-30 14:58

Salesforce to Buy Informatica, Apple’s Tariff Headwinds | Bloomberg Technology 5/27/2025 2025-05-29 12:47

A Tech Insider's Look at Nuclear With Faraz Ahmad 2025-05-26 06:40