Improving Developer Benefits Through AWS and Hugging Face Collaboration | Intel
Hello and welcome to Intel on AI I'm your host, Ryan Carson from the Intel AI team. I'd like to welcome Jeff Boudier and Sudeep Sharma to the show today. It's going to be a great show. Jeff builds products at Hugging Face, and previously he was the co-founder of Stupeflix which was acquired by GoPro, where he served as director of product management, product Marketing, business development and corporate Development. Sudeep is the product manager at Amazon EC2, where he manages the portfolio of core Intel instances or VMs prior to WAC, Sudeep was a principal software developer at MathWorks. The makers of MATLAB and simulation software.
Sudeep holds an MBA from Carnegie and a masters in Electrical Engineering from the University of Texas. It's lovely to have you both here and looking forward to our conversation. So let's get started. Everybody is building on the cloud and I'm curious, what are the key challenges your team specifically EC2 faces in meeting customer demands? And how did the introduction of Gen 7 Intel base instances really help address those? So at EC2 as a part of core infrastructure team, we work on developing VMs or instances for our customers. We have a wide variety of workloads and use cases from different industries. financial, software, gaming, health care, travel, automotive, semiconductor, virtually any kind of workload that you could think of running on native US.
The main challenges that we face include the customers constantly demand for the latest and greatest technology, really improved price performance benefits. And these customer workloads are constantly evolving, requiring new features and capabilities to be supported. The most recent example that comes to my mind is the support for B Flow 16 and eight data types, which is required for running and ML workloads. The rapid pace of software development accelerated by AI, puts increasing pressure on us to deliver the right features at a reasonable cost, and customers are looking to optimize cost and utilize and also maximize utilization specifically of the CPU resources that they run or that we deliver to them. To AWS instances.
Our goal is to consider all the customer challenges and unblock them to use the easy to instances. What we typically do is we typically resolve these challenges by introducing newer generation of products. For example, last year in second half of 2023, we introduced our Gen 7 Intel based instances. These instances are powered by Intel's Sapphire Rapid processor, which has built in accelerators, also has Intel, iMacs support to help with the AML workloads. Awesome.
So Jeff, I've been a big fan of Hugging Face for a while now and it was really exciting to see you roll out compute workloads in the cloud and doing inference right on the endpoints there. How have you developed this solution and thought about the product and serving customers better as you continue to innovate? Yeah, Thanks, Ryan. Well, you were asking about challenges earlier and you know, we make it easy for people to use AI to build their own AI, and maybe most people know about the most famous open model right now, which is Alama, the latest model from Meta. But we host over a million models today on Hugging Face. -It's just a few. -Yeah! On the Hugging Face hub,
and that's like a million models to classify tags to complete your sentences, to write summaries, to transcribe speech into text, to draw great images or understand them to do anything really with machine learning. And so a key challenge for people is to really access, understand what is the right model for the job and then build with these things. And that's what we're trying to do with our platform. In addition to making all of these models easy to use to access to build upon, we also wanted to make it as efficient as possible for companies to use them. And that's the approach that we took when we built Hugging Face, inference end points, which is our service on the hub to easily take a model and turn that into a service or an end point. And so we wanted to enable that for any kind of model.
And we want to give the customers choices so that they can select the best possible hardware for efficient serving of those models. That's why we were so excited to be invited to test before it was widely available. The latest Easy two instances based on the fourth Gen Xeon Intel CPUs, the Sapphire Rapids CPUs. We were able to test these on quintessence fuel sort of hugging these use cases and not just inference actually we also tested them on training -Absolutely. -What we were able to find
was that we were able to get massive improvements in throughput both for training, both for inference, both for Transformers models and both for diffusers models. I think we have two blog posts or three blog posts up today on the Hugging Face blog. One shows how we clocked eight times speed up. -Not bad. -When training a small language model,
and then we served that model and we had a 3x speed up, which is really remarkable. We also tried that on stable diffusion and get us more than 6x speed up on stable diffusion serving. So really impressive. And all of this is thanks to this AMX advanced metrics multiplication, the ability also to use new data types like B16 and B8. So that's another important part of the mission, right? The mission of Hugging Face.
We want to democratize good machine learning. Part of it is making things easy to use and to access. And the other thing is to make it cost effective so that companies can actually use them. So going back to Hugging face inference endpoints today, if you go over there and deploy a model, you can select from these Sapphire Rapids based easy two instances behind the hood. It is those are 7 IC instances that Sudeep was talking about.
They're super easy to access. And I think it starts as low as like $0.03 per hour to use. So that explains why they're so popular. On Hugging Face inference endpoints. It's so affordable, and this is one of the reasons that I joined Intel because I believe that the more choice people have for compute, the better the world will be. Right? And this is why I'm a big fan of of how easy to AWS and Hugging Face have teamed up to say let's make this more affordable and let's make it more accessible and make it faster.
So it's exciting to see this happen. So Sudeep, as AWS started to work with Hugging Face to roll this out and make it really accessible. How did you resolve the issues and the challenges and the excitement of doing that? Yeah, so at Amazon is a customer obsessed company, right? Everyone knows about it, right? So I'm working on designing new instances. We collaborate with our customers to understand their requirements, and this shapes the new features that we are going to add to what instances, right? We work backward from our customer needs to meet the requirements, which is very important and quintessential for us. This could include incorporating new features to add instances and also adding new capabilities.
The Intel AMX is a great example which with Jeff and you just spoke about, right? While working together, the desired outcome is to ensure that our customers are able to continuously innovate to solve real world problems. And it's great to hear Jeff say that it takes improvement and 6x improvements for some of the large models that we've been able to use our 7 IC instances. That was a particular reason why we introduced new set of instances with Gen 7. The compute optimized the general purpose and also a new, memory optimized, and also a new category of instances called Flex instances which we provided to our customers like Hugging Face, what they're used to to run these instances for, to run their application on these system and optimize their workloads. It's interesting, I think everybody assumes they need GPUs for these workloads and it's really refreshing to see that that there is unbelievably affordable and performant enough options to train and do inference with CPUs. You know, Jeff, as you were thinking about the solution, what you were going to offer as a product, how did you decide to go down this route of exploring, you know, the CPU option and partnering with AWS and Intel on that? Well, I think one thing that's very important for us is that we want to offer choices to companies so that they can deploy the right kind of models for their use case where we don't really believe in a world where there is one single gigantic model that can be applied to everything.
We believe more in a world where use cases require companies to build their own custom efficient, maybe sometimes smaller models. Right? And so if you need to classify emails, you're not going to need a 100 billion parameter model. You can probably use another type of model. And so for each type of model and so for each type of use case, you're going to have different requirements. Maybe latency is super important to you, Maybe you're going to send like a big batch of requests once a day.
Maybe throughput is super important to you, and depending on all these parameters, it's really not obvious. Like what is the best possible compute resources you can put behind that model. And so the same way that we want to support a wide diversity of use cases of library, of model architectures, of model sizes, we also wants to offer diversity of compute resources. So today, if you go to Hugging Face inference endpoints, you will find a whole bunch of different CPU type of instances, all like specialized for AI workloads and actually all Intel based instances as you will find various GPU instances, you will also find accelerator AI accelerators that are built purposefully to supports AI workloads.
And we work closely with the teams that Intel on their Gaudy lines of accelerators we have for many years. We make them easy to use through a custom library optimum banner and we actually have a great blog post today that shows you how you can put these things together. An Intel Xeon instance powering an embeddings generation model together with a large language model powered by a Gaudy accelerator working together in a RAG system as you build a complete solution for customers.
Absolutely. And again, this is what I think we're all excited. We're excited about offering more choice, more accessibility to everybody, because it will, you know, a rising tide will lift all boats. I also think that there are AI trends on CPU side as well, which we are seeing. Right? like as a customer seek to harness the power of AML, The demand for efficient and scalable solutions to handle this workload continues to grow, right? While GPU based instances are preferred customer choice for most of the AML workloads, what we are seeing is that customers are looking for opportunities to even use CPU based instances for their AML workloads. There are certain classes of AML workloads such as, you know, automatic speech recognition to figure out who is speaking, converting speech to text sentiment analysis, you know, that are well-suited for CPU based instances.
And what we have heard from customers is that if there are latency requirements are met, then they preferred running workloads on CPU based instances, specifically inference based workloads. -So there is a... -Why not? It's cheaper, it's just as performant. So it's exciting. So on that front, Sudeep, I know that AWS and Amazon has been investing a lot of time and money training folks, and I'm curious, you know, how is that going? What are you investing more in? Tell us about that program where you're doing next.
So we want our customers to continuously innovate, to solve real world problems, right? And to achieve this, it is important that we focus on building innovative programs that that have a lasting and positive impact on the communities in which we operate. And designing STEM and skills training programs are central to this approach. So we intend to continue increasing the access to skill training to give everyone who wants to further their cloud skill the tools to achieve this in 2020, Amazon made a commitment to investing to help 29 million people grow their tech skills with free cloud skills training.
The 29 million goal was set to be achieved by 2025, but in July 2024 we announced that we have not only met our goal but have exceeded it. So far we have been able to provide free training to 31 million learners and is counting. Wow. 29 million was impressive and beating it is even more impressive. So I'm very thankful that Amazon is investing in that and making those resources free for folks. I know Hugging Face prides itself on being accessible and easy to use and friendly as well.
How are you thinking about training and education over Hugging Face and I'd love to hear your thoughts on that. Yeah, I mean, we think that it's part of the mission, right? Our mission of democratization of Good Machine learning. So we invest a lot of time and resources into it. And I want to give a shout out to a two hour open source team and our developer evangelism team, and they put out some amazing resources that you can find there free and easy to access.
If you go to http://hf.code/learn, you will find free online courses to apply machine learning to natural language processing problems, to computer vision problems. We even have free courses on using machine learning for games and even now a new course on applying machine learning to robotics.
So yes, education super, super important to us and we really want to build a world where everybody is able, every single company is able to build their own models and build their own AI features, not having to rely to one single gigantic closed model. Absolutely. Yeah. I think the future is bright and I'm very excited about it. So Sudeep, there's a lot of competition out there and I'm curious how does AWS think about differentiating yourself from that competition? Yeah, so AWS has been partnering with Intel since past 18 years, the longest of any cloud vendor, AWS and Intel's leadership team engineering teams and many of the cross-functional teams have been engaged ever since AWS launched its first EC2 instance in 2006. And in those 18 years of collaborative collaboration, AWS has introduced more than 400 Intel based instances, including the fastest Intel instances in the cloud across different data US regions and these different categories of instances that are used by customers to run wide variety of workloads in cloud.
A journey of innovation with Intel has continued, even with the fourth generation of the latest Sapphire rapid base instances that we have. So last year we rolled out instances with Sapphire Rapid Processor. These instances were built using a custom Intel CPUs, which was available to do only AWS at that point of time.
This custom CPU was developed in partnership with Intel and the combination of AWS Nitro System, which is the foundational and fundamental technology that underpins, are easy to instances along with the custom Sapphire Rapid processor allowed us to offer 15% better performance versus the other cloud providers using compatible sapphire rapid processors. Apart from this, you know, at AWS Nitro System, which offloads the networking and EBS capabilities to dedicated hardware, also allows us to offer better performance. What is our competition? So that's how we try to differentiate with competition and that's what most of the customers say, that they get better performance at AWS. I love it. I just think it's such an exciting time to be alive, to be building products on top of the shoulders of giants. Right?
You know, and AWS is really worked hard to make that possible and Hugging Face is making it accessible. And it's just I'm so excited for the builders in the future. Speaking of that, Jeff, what kind of interesting use cases or services or products are you starting to see at Hugging Face customers build on top of these endpoints? Well, I think a trend that we have seen start with retrieval, augmented generation RAG is sort of a transition between thinking of an AI feature as a model to a world in which we start thinking about AI systems, AI systems in which multiple models work together to achieve that user experience. So an example of that is on the Hugging chat, Hugging chat is like an open source Chat GPT, and you are using open and models, using open source software and initially it was really a way for Hugging Face users to interact with the latest open model so you can select. I want to talk to Mistral. I want to talk to Cahir in error plus.
I want to talk to Meta Llama 3.1. But it's evolved into a much bigger system where you have the RAG system where an embeddings model is generating embeddings of your inputs so that we can look for relevant search results on the web. And then there is a summarization model that will summarize your conversation so that you can refer to it easily. And now we have agents and tools, so you can add an image generation tool, you can add a text, summarization tool, etc. You can actually add any Hugging Face space as a tool for your chat experience. And so here you don't have one model in your chat bot experience.
You maybe have ten models working together to give you a better, safer, etc. Answer to your questions. I think we're going to see a lot more of that. And I, I see that sort of come up in product experiences, product that I use every day in my work where multiple models are working together to power these new AI features. Absolutely. It's so exciting to think about folks using the right model at the right time for the right task affordably and effectively, right? And not deploying a Ferrari when they really just need a bike or when they need a Ferrari using the Ferrari.
But most of the time you just need a good electrical bike. So that's the right model for the right task, which is exciting. And I'm very thankful that we have, you know, heavy hitters like AWS supporting, you know, massive innovation from folks like Hugging Face. So it's been a delight to chat with you both and I am truly grateful for the hard work you're both doing to enable millions of people to really build the future and change the world. So thank you for taking time to be on the show. Jeff, Where can folks go to learn a bit more? Well, thanks Ryan, thanks so much for having me.
You know, we've been working with Intel for quite some time. We've built a lot of great software together. If people want to learn about how they can easily use the latest intel features with Hugging Face, we have a library that's just for that. That's called Optimum Intel.
So I think that would be a great, great resource for folks who want to learn more. Perfect. Thanks, Jeff And Sudeep, where should folks go to learn more as well? So in our easy to instance webpages, if you would find information about all the Gen 7 Intel instances that I was speaking about the M7i R7iz M7i-Flex, C7i-Flex, C7i, R7i. should also be displayed on your screen, so you should be able to find all the information about instances in those webpages.
Wonderful. Well, thanks so much guys, and looking forward to seeing the continuing innovation and seeing what beautiful products people build. -So thanks and have a lovely day. -Thanks Ryan.
-Thank you. -Thanks for having us.
2024-09-29 19:59