Efficiently process complex AI workloads with ND MI300X v5 technology | BRKFP367

Efficiently process complex AI workloads with ND MI300X v5 technology | BRKFP367

Show Video

[MUSIC] Mahesh Balasubramanian: All right, good afternoon and good evening, folks. Hope you're having a good time at Ignite. I'm excited to be here and I'm excited to talk about how Microsoft and AMD are partnering in bringing AI solutions and GPU choice to the market.

My name is Mahesh Balasubramanian. I lead Product Marketing at AMB for the Instinct product line. And I'm here today with.

Locky Ainley: Hi, everyone. I'm Locky Ainley. It's nice to miss you all. I do Product Marketing for AI infrastructure at Microsoft. We have a huge focus and a great amount of excitement around the MI300X machines.

Mahesh Balasubramanian: Perfect. So for the talk today, we thought we're going to have a little bit of a conversation, we'll talk about how we see this market with AI, how the adoption is going on, how Microsoft sees it given the leadership position they are in, and having deep engagement with enterprises. And then transition a little bit into the GPUs and the joint work between Microsoft and AMD to bring these GPUs to market to help enable a more pervasive democratization of AI with the larger enterprises. So, without further ado, let's talk about how the state of AI adoption, especially on the enterprise side of things. We've heard quite a bit about data is key for AI and bringing AI and taking value out of AI.

And there's a massive amount of data in this world, some public, some enterprise. And what this iceberg image kind of represents is that there's a massive amount of data with enterprises that are not public that have not seen the benefit of AI yet. But the good news is we're starting to see that happen now, especially in 2024. And we're projecting that in 2025, it's going to be a lot more accelerated with regards to how enterprises leverage their data to actually find business value.

There's been a lot of surveys from the likes of MIT Sloan, from Wharton, from BCG, McKinsey, which points out that about 80 to 90% of enterprise data today still remain underutilized, hasn't seen the value that AI can bring to them. But what we have observed in 2024 is a rapid shift of these customers adopting AI and then doing POCs and moving from POCs to actually deployments. In fact, I think Satya mentioned today that about 78% of top five Fortune 500 companies, they have some form of AI deployment where they are using AI on different workloads and use cases.

That is something that we observe and that's something that we think is accelerating faster and faster as we go forward. In fact, we have observed that even the spend for AI is starting to increase more and more for all these enterprises, not just digital-native brands, but the broader enterprises to look and adopt AI as well. And we do think that this is going to accelerate rather fast. That's what we are observing from the AMB side as we talk with these customers. Locky, I'm curious to hear what Microsoft's perspective on this? Locky Ainley: Yeah, absolutely. I think we really are at this precipice where sort of a lot of the AI investment was reserved for smaller startups or AI innovators, people with venture capital money.

But now as it's becoming more pervasive in a lot of the enterprises, they're looking to really manage those huge data sites they have and like really derive true AI value out of what they're doing. And that means they need the right compute to support that. Mahesh Balasubramanian: Yeah. Do you have some examples? Locky Ainley: Yeah. Mahesh Balasubramanian: That you can share? Locky Ainley: Coincidentally, there's a few on screen. Mahesh Balasubramanian: Yeah.

Locky Ainley: You look at these brands -- Telstra, I'm from Australia, so near and dear to my heart. Mahesh Balasubramanian: And Air India from India. Locky Ainley: Yeah, exactly. Mahesh Balasubramanian: What a coincidence.

Locky Ainley: And Volvo, obviously. Like very traditional enterprise brands that are starting to adopt more and more use of AI across their enterprise to support business workflows, to support the work that they're trying to do and their internal sort of customers if you like, their user groups. So how can AI benefit them most? And again, they're turning to a lot of the compute that's needed to support that innovation that they want to drive across the enterprise, stay competitive, stay ahead of the curve, if you like.

Mahesh Balasubramanian: It's really cool to see that these customers are leveraging Azure AI, which is just amazing, that you can see the non-digital-native brands who are not traditionally at the early adoption state start to adopt AI. And this is where we feel that we are moving in from the early adopters to the early majority, where these customers, or the larger mass of enterprise, is starting to adopt AI and seeing value from it. One of the surveys I saw was for every dollar investment in AI, it is paying back about $3.60, about. So that's a really good value you can see from using AI. Locky Ainley: Yeah, yeah, that's right. I mean, it's funny.

I don't know, maybe most folks here that have been in the industry for a while -- I was a bit of an AI cynic to begin with. I thought, how long is this going to stick around for? But I think over the past couple of years, it's really come for every app, every person. We've really started to see that adoption.

And so this is a really big mechanism that I think we're investing in. And I think, like, a lot of enterprises are really starting to adopt as well. There's new use cases, every corner you turn, there's another application of AI that's super interesting.

Mahesh Balasubramanian: Yeah. So we know that AI is gaining traction; larger majority of enterprise customers are starting to adopt. So the question really goes is, what kind of technology and computational need powers the AI behind the scene? So at AMD, we take pride in the diverse selection of compute engines that we provide for various computational needs, from cloud, to on-prem data centers, to edge, to collab devices. In fact, we have deep collaboration with Microsoft across various products.

Locky Ainley: Yeah. Mahesh Balasubramanian: But if you truly see from a data center perspective where my focus is, where your focus is, that there's a heavy use of CPUs -- so the AMD EPYC processors, right from the classic ML to the data processing, labeling, structuring data, processing data, to the Instinct Cloud's GPUs, which help with the Generative AI and all the training and inference that are required on that particular data set. So, there's a broad set of compute engines that are needed, and all of this, we work jointly with Microsoft to bring to you. Locky Ainley: And I love this slide, because really for me, it's still such a great story. I hear a lot, I need GPUs, GPUs, GPUs, right? But there's really a long process and pipeline that happens with the input into the AI solution. Starts with the CPU.

It starts with like, this processing. So you don't always need the biggest, baddest GPU that's available. Sometimes you need just some simple general compute, perhaps some HPC at the right time. And obviously, when you need it, you've got the AMD Instinct MI300X. Mahesh Balasubramanian: That's right.

I'm excited about Instinct. I'm the marketer for Instinct GPU. So let's go talk a little bit more on the GPU side. So the first thing I want to introduce on the Instinct side is the ecosystem's actually quite broad, and there's a ton of AI leaders and platforms that run on Instinct today. So from the likes of Microsoft, OpenAI, GPT runs on Instinct today, Meta is a huge customer of AI. And in fact, there's quite a few exciting startups and ecosystem players from the likes of Cohere, Stability, Reka, Fireworks, to companies like Rapt and UbiOps and ClearML, who all support the Instinct platform, both the hardware and the software ecosystem.

So there's a pretty strong, broad reach that we are building and ensuring that there is support for all the needs for any enterprise or AI startup that's out there. So, what is an Instinct, is the exciting part. Like what's the differentiation, right? From a performance, from a compute perspective, AMD brings, it's always been across its product line, taken pride in bringing leadership performance. And that continues to stay true on the Instinct product lineup as well. So when it comes to AI datatypes and computational needs with FP16 or FP8, Instinct has the greatest performance that's out there on the GPU in this current generation.

What that does is bring some really good performance for inference and training. So I have an example here of a competitor comparison. And what this shows is that the MI300X Instinct that's available on Azure VMs today provides about 1.4x better performance on the Llama 405B model.

It's a premium open-source model that's available and easy to benchmark. And you can see that there's some really good performance given the computation. The other big part about Instinct is its memory advantage. Instinct has about 192 gigabytes of HBM3 memory on each GPU.

That's more than 2x than any other GPU that's out there in its class. And it provides a massive value with regards to what you can do with GPU. So, when it comes to larger models or larger data sets, Instinct can do more with less. So, you can train larger models, more models, with less number of GPUs, given the memory advantage that it has. Locky Ainley: It has that GPU efficiency, right? Mahesh Balasubramanian: That's right, exactly.

And it provides a choice for customers on how they want to use. Locky Ainley: Right. Mahesh Balasubramanian: And all of this is supported with ROCm as a software stack. And I'll talk a little bit more, but the key part about ROCm is it's completely open-source. It's broadly adopted by the ecosystem.

And it's easy to leverage and transition from existing ecosystem that you might be familiar with. It's supported by the likes of PyTorch, from Hugging Face. All of it enables people to come on this Instinct ecosystem and take advantage of the performance as well as the memory. So, let's talk a little bit about the memory advantage.

What can you expect? Like, how does this memory advantage translate for me and for my use case and workload? I have a couple of examples over here. So one is image processing, image generation, is a really cool workload. The likes of Stability, Mid Journey, all of them use it.

And memory advantage here shows us you can generate larger memory with higher precision, with higher quality with a single GPU compared with GPU for just more memory. And similarly, models, like the Llama 405B that I mentioned, you are able to fit that entire model in a single server. So an AGPU server can fit a 405B model in a single server with FB16 precision. So you're going to get a better quality result using the full precision on a Llama model.

Whereas for other GPUs, you might have to go to multiple GPUs. And that's where the PCO benefit comes for Instinct. In fact, I think, Locky, you have something in your hand.

Locky Ainley: Yeah. Mahesh Balasubramanian: That shows what it looks like, right? Locky Ainley: Yeah. I mean, we're talking about the GPUs, so it's really quite amazing to actually physically touch it, hold it, the weight of it.

It's so interesting. Particularly from a cloud perspective, we talk about virtual machines. But we have some of these in the booth. But, I mean, I'll happily share this around and if people want to feel it and pass it around. It's quite interesting being able to actually see it and feel it and touch it. Mahesh Balasubramanian: Yeah.

It's about 750 watts. So that big weight is all the heat on it. Locky Ainley: That's right, that's right.

The GPU is just at the bottom there. The rest is really to keep it cool. Mahesh Balasubramanian: Yeah. All right, so that's about the hardware and the benefits of the hardware, the options and the choices it provides. So what about the software? That's always a big question. How easy it is for me to use the Instinct GPUs.

So, ROCm is a software stack that's provided with GPUs. It has all the essential elements from the compilers, from the toolchains, and the libraries, the runtime environment, all of it is provided within the ROCm environment that communicates with the AMD GPUs. But ROCm also has a broad ecosystem support. So, most of the AI models today are developed using PyTorch.

And ROCm is fully upstream with PyTorch, one of the only two GPUs that are completely upstream. Such that any model that's written PyTorch can run out-of-box on AMD. But also day zero support, right? Anytime PyTorch comes up with a new feature or function, a new release, it supports AMD GPUs on day zero.

So you're not going to miss out on any of that new feature/function. Similarly, Hugging Face -- there's a million models plus that's on Hugging Face that run out-of-box on AMD GPUs, because we have a deep engagement. They run nightly CI/CD, so continuous innovation and development to ensure any new model that's added to the repository runs out-of-box on AMD GPUs. Locky Ainley: I've seen a fantastic demo- palooza as well. I'd love to share that around. It's just drawing down Hugging Face models immediately, no code change, straight in, and it works.

Mahesh Balasubramanian: That's right. Locky Ainley: Which is amazing. Mahesh Balasubramanian: And in fact, we have another demo running right now in the AMD booth.

Please come and take a look. It also shows the ease of getting the most optimized model up and running and the performance that you can see from it. So that's another option for you to come and check. But beyond those two, there's a deep engagement across the ecosystem. OpenAI Triton is another great example, which unlocks propriety software so you have this unified compiler infrastructure that supports multiple hardware. AMD is supported on that.

JAX is an up-and-coming framework which is really, really good for training environment. Locky Ainley: Yeah. Mahesh Balasubramanian: AMD supports that. So it continues to grow. The point is, there's never been a better time or an easier path to adopt AMD Instinct solutions on a software stack like ROCm, which supports broad industry base.

Locky Ainley: Absolutely. Mahesh Balasubramanian: So having said that, maybe this is the time to translate it into what's the join Microsoft over here. Locky Ainley: Yeah. Thank you, thank very much, Mahesh. We'll talk about, sort of, scaling.

So we passed around one of the GPUs, it's the AMD Instinct MI300X. We've seen the specs on it. So that's just one of them.

At the bottom there is where the GPU sits, and the rest is the heatsink. So what we do with Azure, we take the AMD universal baseboard, we pop it in a UA blade to build out the ND-MI300X virtual machine. So, combining eight of these is where we get so much power. And so they're all connected with the Infinity Fabric. And then we can connect tens of thousands of these together virtually to build our classes and deploy them to customers. So, just a little bit more of a closer look of how we build the virtual machine.

Eight of those things that look like toaster ovens all combine there together to deliver at unbelievable power. And the memory optimization that we talked about before as well. So, these are best used for large language model training, large model training, and generative inferencing.

A lot of what we've done on Microsoft is we used these ourselves and we built out a platform that can deliver the inferencing that we need. The high bandwidth memory is an important part of that as well. Mahesh talked about it a little bit before. We run larger models with fewer GPUs on MI300X. And it's super important for the efficiency that we get.

Being able to put a workload across fewer GPUs means that that workload doesn't have to transfer as many times, and that reduces the amount of data slippage between them, increases the performance, or removes a lot of performance degradation between transferring between the GPUs. So that has been a real gamechanger. We worked a little bit on that memory together; we'll show in a little bit.

But we've also built out these into super large clusters at supercomputing scale as well. Mahesh Balasubramanian: That's right. It debuted within the top 15 in the top 500 supercomputers in the world. Locky Ainley: That's right, a couple of years ago.

And you guys just yesterday, or was it the day before? Mahesh Balasubramanian: Yeah, Monday, we announced that AMD powers the top one and the second, one and two, supercomputers of the world. In fact, three of the top five supercomputers in the world. And it takes a certain skill from a company like Microsoft and a company from AMD to build these massive supercomputers. There aren't that many people who can deploy that large of a cluster and scale a workload like HPL across the entire cluster. Locky Ainley: That's right.

Mahesh Balasubramanian: That's a testament to what Microsoft and AMD can do jointly. Locky Ainley: Yeah. That top 500 supercomputing list, there's not too many cloud vendors on there for a reason. I think the importance there is that reliability at scale, that ability to scale to the huge platform requirements that are needed at an enterprise level. So I think that it's a real testament to how we build out and how we were able to work together with AMD to optimize those clusters. We've talked a little bit about our product.

I think this is where you ask me about these. Mahesh Balasubramanian: I do, I do. So customers are what we -- like that's where rubber meets the road, right? Locky Ainley: That's right. Mahesh Balasubramanian: What do customers say? What do they think about the Azure instances with MI300X? And I think you have some big customers already that's using them. Locky Ainley: Yeah.

I mean, Microsoft ourselves uses MI300X to underpin a number of our AI services. And I think what's exciting there is just the ability to think about -- even if you're today, if anyone's using Copilot in their processes, as part of the M365 suite, you're likely being back-ended by MI300X and powered by those GPUs. Mahesh Balasubramanian: That is fantastic to hear.

So, people using Azure AI services and Copilot and 365 Copilot and GitHub Copilot, could be using Instinct, providing them the answers back. Locky Ainley: Yeah, that's right. So, a lot of those queries, like we process millions of those queries. Mahesh Balasubramanian: Millions, that's volume at scale. Locky Ainley: Yeah. And so we're excited about building that out more and making sure that we can optimize.

Again, it goes back to that high bandwidth memory, the ability for efficient use of GPUs, particularly when it comes to such a high volume of things like inference. Mahesh Balasubramanian: Yeah. And Black Forest Labs is another good example.

I show an example of what companies like Stable Diffusion, or Stability, can take advantage of the massive memory. And Black Forest Labs is another really good use case where they're doing prompt to video. And the advantage that memory provides for them to process and generate this video is a pretty massive boost for their performance and their needs. Locky Ainley: I'll come back to that slide you showed earlier with the image. It was just a great example of how important memory is, particularly when it comes to image generation, or even video generation, perhaps in the future. Mahesh Balasubramanian: Yeah, perfect.

Great to hear that customers are seeing some good value with this. Locky Ainley: Yeah. Enveda is another one, pharmaceutical, drug discovery situation.

They're training a model called Fusion. They're really excited about the adoption of MI300X. We expect to hear more from them soon as well.

Mahesh Balasubramanian: I think just this week, I think, or maybe last week, they went on their first human trial. Locky Ainley: Yeah. Mahesh Balasubramanian: One of their drugs that uses AI to download.

Locky Ainley: Yeah, that was an exciting announcement made just last week, yeah. Mahesh Balasubramanian: Was it? Locky Ainley: Yeah. Mahesh Balasubramanian: Perfect.

It's so exciting to hear this. Locky Ainley: Yeah. So, people obviously think that AI infrastructure at Microsoft and AMD working together closely, we've been at this for a while, right? Mahesh Balasubramanian: Yeah. Locky Ainley: Before this sort of renaissance of AI, we've been working on a lot of this together.

Do you remember some of the early days? Mahesh Balasubramanian: Yeah. We talked about from we have deep collaboration on EPYC and we have deep collaboration on the gaming side with Xbox as an example, or the Copilot. But Instinct codevelopment and coactivity, a partnership, started, like you said, with MI50 back in 2020. In fact, it does such a big hand with regards to the joint development of the software stack between Microsoft and AMD, as we saw AI development proceed, really helped bring MI300X into volume production and ramp much faster. In fact, I think on MI100s and 200s, we published jointly on modeling for some training for the P5 small language model.

Locky Ainley: That's right. Mahesh Balasubramanian: On those orders. So all the joint work, what's enabled us to bring these products to market and in a very, very efficient and fast manner.

Locky Ainley: And I think people think about ROCm, right? Microsoft's been there with ROCm for a long time. We've worked on optimizing of the software stack. And I think there's been deep collaboration with a lot of our engineering teams on making sure that we can get the best out of these GPUs on the ROCm platform. And we've seen a lot of success.

So when we talk about before, how can we -- the performance that we then are passing on to customers throughout the ND series with MI300X, it's built on a lot of that experience, a lot of that optimization. We're able to do a lot of that work early and then deliver it through. Mahesh Balasubramanian: That integration between AMD software stack and Microsoft software stack is such a smooth process right now, it's amazing.

Locky Ainley: And then we've got in 2023, that supercomputing that we announced, sorry, the top 500. Mahesh Balasubramanian: Yeah. Locky Ainley: I feel like it was just the first step, with you guys going to the top, just the other day. It would be great if we can do that with Azure. Mahesh Balasubramanian: That's the goal. Every generation, we need to be there.

We need to show that the partnership is strong. Locky Ainley: That's right. There is more to come, right? We're investing in building up more offerings with customers across the GPU solutions, adopting next generation.

Mahesh Balasubramanian: That's right. Just last month, Satya was on stage on a conversation with Lisa talking about strong partnership that has brought MI300 to market today and the continuing partnership for the next couple of generations of product, including MI355, which is public, as well as the MI400 series, where we're working jointly on how to bring this to market. Locky Ainley: And so we really are excited about more and more adoption for these solutions. We think our expertise with ROCm, our experience with ROCm, and being able to package that up within the solution and deliver it to customers.

We've worked really closely across the stack. We've made sure ROCm is highly optimized on Azure. And I think it's genuinely part of the ecosystem now that we bring to market with customers. And so, it's really all the way up the stack. Mahesh Balasubramanian: It's really good to see, especially the top line, where you can use AzureML Apps, AzureML Pipelines, all the tools that are available to Azure that people are extremely familiar with and have used for years. That Instinct and the ROCm software stack is part of that, that it's a seamless integration with it.

In fact, I have a question for you. Just this morning, I heard Satya's announcement with regards to the new AI Foundry. Locky Ainley: That's right. Mahesh Balasubramanian: Is that part of the stack? Locky Ainley: Yeah. This is a reflection maybe a couple clicks deeper than AI Foundry. But a reflection of it being really integrated with what we're bringing to customers, the optionality that we bring.

So we're excited about that, underpinning some of those Foundry exercises, or taking models as a service. We talked about Hugging Face before. Hugging Face is in the model catalog. So if you want to bring that down and run it on the MI300X, you can use, it's fantastic; we can do that. Mahesh Balasubramanian: Perfect. So we know that the market needs choice, the market's looking to accelerate AI adoption.

AMD makes fantastic GPUs. They're in Azure. Locky Ainley: Which is really a testament to cloud, just so we're all clear. Mahesh Balasubramanian: Yeah.

And it's fully integrated and used in volume by Azure and OpenAI services and customers. Locky Ainley: Many customers wouldn't even -- they might be using it today already. Mahesh Balasubramanian: Yeah. Locky Ainley: Through the services side.

Mahesh Balasubramanian: So what's next? Locky Ainley: So, I think that's the big question, right? So, learn more. We've got deeper details on the virtual machines, the sizes, the specs, all of that, everything that you need. Come and see us at the AMD Booth.

I'll be there. I'm also at the Azure Infra Booth. So we have a selection of. Mahesh Balasubramanian: What are they going to see there? That's a big part. Locky Ainley: We've got one of the four blades on display.

If you haven't stopped by, come and see the full ND-MI300X VM blade. And you can just get a sense of the size and scale of it. Mahesh Balasubramanian: And if you come to the AMD booth, you get to see the demo of how easy it is to pull the models and run them on AMD GPUs. It's just a few clicks.

And in fact, there's a demo tomorrow that's going to be run by AMD where you can see what's the process of pulling the most optimized models, and to show you the memory advantage of fitting that 405B model within a single server. Locky Ainley: Start now. We are ready to go, I think, a really unique and strong partnership we have with AMD. We've got a bunch of experts. So, talk to your account teams. We've got folks from ROCm.

We've got folks from the GPU side. We've got folks from Microsoft in our deployment services. So, we're able to really white glove and provide a lot of the experience. We're taking a lot of the experience across what we've done with other AI innovators and bringing them into the enterprise. Pilot, POCs, benchmark.

The exciting thing with MI300X is we're really ready to go. Mahesh Balasubramanian: Yeah. It's available.

We are ready to engage. Let's talk. Locky Ainley: Thanks so much for having me. Mahesh Balasubramanian: Yeah, likewise. It was exciting.

Thank you so much, Locky. Locky Ainley: I hope no one's pocketed that toaster, wherever it ended up. Mahesh Balasubramanian: And we are ready for questions if you have any questions. Locky Ainley: Yeah, we would love to hear from you.

Mahesh Balasubramanian: All right, it seems like things were clear. People are ready to go and you have the right links. Again, guys, so happy to have you. Please stop by the booth, ask any questions, we are here to serve.

Locky Ainley: Yeah, absolutely. Cheers, everyone. Mahesh Balasubramanian: Thank you. [APPLAUSE] [MUSIC]

2024-11-26 20:59

Show Video

Other news