Build an intelligent application fast and flexibly using Open Source on Azure
- Welcome to the main event. These next 50 minutes are going to fly by. We have an exciting end-to-end demo of an intelligent app that was built using a combination of open source technologies developed by Microsoft and the community, as well as other services that run on Azure. Walking us through the demo is my teammate, Aaron Crawfis, Senior Product Manager on the Azure Open Source Incubations Team.
Aaron will be joined by friends to highlight some announcements and key technologies that are featured in the demo. And be sure to stick around because I'll be back to discuss building Web3 applications using Microsoft's developer tools, while applying DevOps best practices. Now sit back and relax. Here's Aaron. - Thanks Donovan.
Now, I'm sure we've all lost a pet or we've seen the flyers in our neighborhoods with a picture of a lost cat or dog. Now, imagine if there was a way to use the endless pet pictures that we all have in our phone's camera roll to be quickly reunited with our pets? With the new Pet Spotter app, we can do just that. Today I'm excited to be showcasing a brand new intelligent cloud native application that connects owners with their lost pets using fine-tuned machine learning. Instead of printing posters, we're gonna be using an advanced ML image classification model, fine-tuned by the images right from our camera roll.
With this trained ML model, when a pet is found, you can instantly snap a photo, then that will match with the model and connect you to the owner. Now, how can we make sure that our application is leveraging the latest and greatest technology and has a vibrant community of contributors and resources? Well, to make sure we're meeting those goals, we're gonna be leveraging a ton of great open source projects like Dapr, Bicep, Keda and more. Stay tuned as we showcase these projects and more throughout this session. All right, let's dive right in into our infrastructure. We're gonna be leveraging the Open Source Bicep project, which allows you to author your infrastructure as code definitions of your services, and then deploy into Azure. Here, we're looking at an infrastructure as code file, infra, Bicep, which contains our key vaults, our container registry, our Azure machine learning service bus, Cosmos DB, Kubernetes service, and our load testing.
Now, these are Bicep modules which allow us to use reusable templates for each of our resources. So if we open up aks.Bicep, you'll see a couple things. Right at the top, we have our parameters.
This is how you can customize and pass in information to the template. I've set up my location, my cluster name, the VM size, and I can also fine tune our parameters to set minimums, maximums, and allowed values. With those parameters, I can now specify my resource. In just 20 lines of Bicep code, I have my entire AKS cluster.
Here you can see my agent pool, which behind the scenes is a virtual machine scale set running arm 64 VMs. And I've also set up an add-on profile for HTTP application routing, to make it super easy to pass in traffic to my app. Now that I have my AKS cluster defined, let's showcase how easy it is to use Bicep to author new resources. So here I'm gonna be creating a storage account.
So I've already set up my parameters and now I'm gonna be creating a new resource, giving it the name storage, and I don't have to bring up the reference docs of all of the resources available to me. Bicep just tells me everything that's available, it guides me through that process. I can also define a new resource, which I'm doing now, or I can reference an existing one. And when creating a new one, again, I don't have to go back to those reference docs to know what fields are required, what do I need to set? I just click required properties and Bicep fills in everything for me. And now I can pass in information about the account name, the Azure region and location I'll be deploying into, and I now need to specify the SKU. And look at that, Bicep just tells me all of the available SKUs that I can use and the same thing with the kind.
So I'm gonna be selecting storage v2. Now that I have my storage account set up, I need to be able to set up my blob storage container. So to do that in Bicep, I just create another resource.
This is gonna be for our blob services, so I call it blob. Select blob services from that dropdown, and then Bicep will tell me that the required properties for blob services is just the name and it even suggests calling this default. So I just select that using the Bicep and telesense.
And finally, one last resource, let me create the container, which is where we're gonna be storing all of the images for our pets. So I select containers, I again select a new resource with the required properties, and then I can pass in that container name, which in this case is a parameter. So just in that few, those few lines of Bicep, I now have an entire storage account, my blob services, and then my container.
But how do the rest of my resources and my Bicep template leverage the storage account? That's where output parameters come in. I can specify all of the properties that I want to pass back into my applications infrastructure file. So here I have my storage account ID, which is a string where I can just say storage.id. I can do the same exact thing for the storage account name.
And as I'm filling this out, again, Bicep is just guiding me along the way. It's super easy just to specify that this is a string. I can choose from the dropdown of all of the available properties. And because my container is a child resource of that storage account, it's super simple just to say storage::blob and then access the container inside of it and get access to all of those properties. So super easy to specify my parameters, my resource, and its outputs.
And then finally, let's get that wired up. Let's go right in back into our info.Bicep file. Let's drop in a new module called storage, and let's select that storage.Bicep file that we just created, and Bicep will read that file and know that I'm required to set that location and a name. So let's go ahead and fill that out.
Now that my infrastructure file has been updated, I have a pipeline set up that will deploy this into Azure for me. So all I need to do is go into the version control, go ahead and stage and commit both the storage.Bicep and my updated info.Bicep. Push those into my repo, and now I have my GitHub actions workflow kicking off to deploy my resources into Azure.
So let's take a look at that pipeline. Here you can see that I am checking out the repository, logging into Azure, and then deploying my infrastructure file. With those three simple steps, I now can deploy my infrastructure straight into my Azure resource group. So let's go and see that. Yep, we're the, that GitHub runner, which is running that pipeline is actually also leveraging those arm 64 VMs that is being used in our AKS cluster. This ensures that our build and testing pipeline step matches the architecture and the expectations of our application.
And here you can see that our pipeline has completed, our infrastructure is now deployed into our Azure subscription, and if I pop over into my Azure resource group and hit refresh, I should now see that new storage account deployed for me. And look, there it is. If I hover over that, we'll see we now have a pet spotter storage account ready for me to begin uploading pet images to then train. So with that, we just saw how easy it was to create repeatable templates for our infrastructure and that can be deployed as part of our pipeline. The combination of the Bicep infrastructure is code language plus GitHub actions allows me to get up and running in Azure in minutes. All right, now that we have our infrastructure deployed and ready, it's time to write some code and deploy our application.
We're gonna be writing a microservices based application that we will be running on Kubernetes. Now you're probably hearing me say microservices and thinking, great, now I have to deal with service discovery, service to service invocation, state management, and all of the other requirements that comes with microservices. Well, let's learn more about an open source project, Dapr, that will make this easy. Now let's take a look at the Dapr Project. Now, Dapr or the distributed application runtime is a set of API portable building blocks for microservices. And with these building blocks, you can do all of the common microservice tasks, such as service and vocation, state management, published and subscribe messaging and even more.
And the key here is that Dapr takes on all of the complexity and management of those different tasks. So let's open up VS code and take a look at our .net blazer application, and here I have my pet model. I'm using the Dapr client as part of the .net SDK, and I've set up my pet model, where we have things like the name of the pet, the type and the breed, and so forth.
And now I wanna be able to save the state of this pet. Now I don't need to know all of the complexities of interfacing with external resources. All I need to do is just type try and look, GitHub Copilot took care of the rest. I just hit tab and instantly I'm using the Dapr client to save my state into my Dapr state store. So here I'm setting the store name, the pet ID, and then it's actually just using the context of this pet model to save all of its attributes into my state store. So in just like typing try and hitting tab, everything was taken care of for me.
Let's open up a terminal and use the Dapr CLI to go ahead and run this locally in my machine. Dapr Run will spin up a Dapr side car as a local process, and it'll go ahead and run.net watch to spin up an instance of our Pet Spotter app.
And there it is. Here we have our Pet Spotter homepage, where I can report a lost pet or report that I found a pet. But let's put this new Dapr state management to the test. Here I can fill out the form about I'm missing my dog, Winnie.
Love Winnie and devastated that she's been missing. So I wanna find her as quickly as I can. Winnie is a Nova Scotia retriever and if someone were to find Winnie, I want to be emailed right away.
So I fill out the information about how they can contact me if there's a match. And now here's where we can upload all of the pictures from my camera roll that I've taken of Winnie. I have a ton over the past few months, and so I have different angles, different lighting, something that this ML model can be trained on. So let's select those images and finally, let's hit submit. And now we're leveraging that Dapr integration that we just showcased, to save my state into my Dapr state store.
And in this case, Dapr actually automatically provisions a Redis container for me as my Dapr state store. So if I open that up to the Redis Explorer, select this new pet value, here is all of that information I just put in as part of that pet model. Hit submit and it's persisted into my state store. So it was just that easy to persist my state into that state store. So now we need to be able to have our front-end talk to our backend container.
So that's where Dapr pub/sub comes in, and just like state management publish and subscribe, messaging can become very complicated. However, with Dapr, all I need to do is use the publish event async method as part of the Dapr client, pass in that Pet ID, and now my backend, which is even a different language altogether, here I'm running a Python flask server that is acting as my pet spotter backend. I can subscribe to the different topics as part of Dapr. So here I am setting up a route for Dapr subscribe and to subscribe to that lost pet endpoint, all I need to do is enter in a new object here. And again, GitHub Copilot is gonna make this super simple.
I just select the line where I want to begin filling out the information about my subscription, hit tab. GitHub Copilot brings up the next suggestion. I hit tab and same thing with that last line there. So the integration of Copilot plus VS Code plus Dapr means that most of the heavy lifting for creating this app is taken care of for me. So now to wire this up and take advantage of that Dapr subscription, I'm gonna be setting up my flask route, and then here I'm gonna be using the cloud event, which Dapr provides to me. And so now I can say I need my event from the Dapr subscription and look, Copilot even takes care of that for me as well.
So I just hit tab and get that cloud event, and then I can get the pet ID from that event. And now I'm just gonna be printing that pet ID to the console and stay tuned in the next section, as we talk about how to integrate that ML. But let's go ahead and test out our Dapr pub/sub as part of our backend. So if I switch my terminal over to the backend, I can use Dapr run once again, this time running Python app.py,
hit enter, that sidecar spins up and look at that. My backend application is subscribed to the lost pet endpoint, in addition to the found pet, and it's printed that the subscriber has received the ID of Winnie. So now we have our front-end integrated with our backend and we're able to show our app running locally on our machine. So let's go ahead and pop back over into the source control of VS code and get this committed into our repo to go ahead and build those container images because now that the code's running locally, I want to be able to get those container images and start to deploy it off into my Azure Kubernetes service cluster. Now, when you're thinking about getting that deployed to Kubernetes, you're probably thinking all of the endless yammel and Helm charts and all of that templating that you need to do with some custom scripts in between. But what if I told you that you could leverage the same Bicep language that we you were using for your infrastructure to also model your application? Well, that's exactly what you can do with the new import Kubernetes statement.
This is using a new feature called Bicep extensibility, which allows me to import types other than Azure. So here I'm doing import Kubernetes, I'm passing in the cube config from my Azure Kubernetes service cluster, which is being passed in as a parameter, and I'm setting up that default name space. And now just like before, creating a new resource, this time I'm calling it backend deployment.
And look, all of the resources now are Kubernetes types instead of Azure. I have Damon sets, deployments, replica sets, everything that I need. I'm gonna be selecting deployment, creating a new resource definition here, and the same exact tooling experience, which you're familiar with in leveraging as part of your infrastructure, you can now leverage as part of your Kubernetes objects. So I'm selecting those required properties, filling out the name for my metadata, and I can also go in and set all of the optional parameters such as labels, annotations, the name space.
And this is where I can begin setting up things like my app label. I can go in and now specify the spec. Again, leveraging those required properties. So I no longer need to have multiple languages, multiple experiences with all of those different scripts that connect everything together. It's all Bicep, and I can easily pass in parameters as I need them with the same experience. Here on under annotations, I'm gonna be adding in the three Dapr annotations for enabling it, specifying the app ID and specifying the app port, and this is how you can add Dapr into any Kubernetes deployment.
And now I also need to specify that container image that we just built for our backend. So here I'm specifying the name of my container backend, and I can use all of that rich Bicep tooling to build a image path, which is customized based on my container registry. It's customized with the tag, so that way I can create a generalized template, which I can reuse not just in my production environment, but also running in pre-production and canary and as I scale out into more and more environments.
So here's how easy it is in Bicep to set all of that up for you. So just like that, we've now set up our backend as a Kubernetes deployment, and I can now get this deployed onto my AKS cluster. Let's also get that saved and take a look at our front-end definition as well, because for my front-end, I also wanna make this exposed to users. So if I switch over to that, we have the same import Kubernetes statement, as we just were using before. I have my deployment for my front-end, but I also have now modeled my service as well, opening up port 80 and targeting my front-end deployment. So let's go ahead and get that staged and committed and pushed up into my pet spotter repository.
And now that CICD pipeline that we were looking at before, we will now go ahead and deploy this into my AKs cluster. But now let's talk a little bit more about infrastructure, because when I was running locally, I was using a Redis container, but now that I'm running in Azure, I wanna leverage scalable production-ready resources. So here I have Cosmos DB set up as my state store. I have Azure Service bus set up as my pub/sub, and I'm gonna be using a Azure blob storage set up to upload my pet images. So you can see how we've now just swapped out those Dapr components with zero code rewrites. I didn't need to change anything about my front-end or backend.
Dapr takes care of all of that resource swapping for me, where I just specify what cloud resources I want to use. So if we pop back into our Pet Spotter app, now running on top of AKS, I can fill out the same form as before. I've unfortunately lost my dog, Winnie, the Nova Scotia retriever, with the email address that I want to get notified at.
And let's go ahead and choose those same files as before. This time around however, instead of being run on the local Redis container for all of my Dapr components, it's now going to all of that infrastructure, which we showed as part of our infrastructure file, and to prove that that's actually being uploaded into our Cosmos DB for our state storage, let me go ahead and open up the data explorer, click refresh, and there you can see all of the details about Winnie now persisted in my Azure Cosmos DB. And just like that, we've used Dapr to tackle all of the complexities of writing microservice applications by dropping in the building blocks that we need.
We've also leveraged swappable Dapr components to go from code to cloud without any app rewrites, and we were able to deploy to Kubernetes using the same Bicep language that we use to declare our infrastructure, leveraging the all new Bicep extensibility. Let's go over to Ahmed to learn how easy it is to write your applications on Azure. - Hi everyone, I'm Ahmed Sabbour, a senior product manager on the Azure Kubernetes service team.
You've seen how the infrastructure was deployed using Bicep templates to run on an AKS cluster, along with the other infrastructure components. We've also looked at how we built a microservices app using Dapr and visual studio code to simplify connectivity and state storage configuration. Here are some of the highlights for what you saw on the demo and what's coming next to help you get your cloud native apps from code to cloud faster. Bicep, which defines the infrastructure you want to deploy to Azure, now also has an extension for AKS that can be used to deploy Kubernetes manifests along with the other infrastructure deployment components. Visual Studio Code developer extensions for Kubernetes help you generate Docker files and Kubernetes manifests for your apps, so that you can spend more time working on your app and less time on writing boiler plate code.
Bridge to Kubernetes runs your code natively in your development environment while connected to a Kubernetes cluster. This enables a faster developer in a loop, so that you don't have to constantly build and deploy container images to the cluster just to test your changes. Automated deployments is a feature in the Azure portal that creates CICD pipelines using GitHub actions to build your source code into container images and deploy them to AKS clusters.
Azure load testing lets you quickly test the scalability of your apps. You can also build that into your CICD workflows using GitHub actions and Azure DevOps. Finally, playwright provides cross browser test automation through a single API. It's also available as a visual studio code extension with Cogen providing test recording capabilities. Now let's get back to the demo for more exciting features using Hugging Face machine learning models. Over to you, Adam.
- Thanks, Ahmed. Now on to the fun part. Let's talk about adding an advanced machine learning model to our application.
Now I'm not an ML expert myself and thinking through all of the concepts and technologies to integrate machine learning into our application is starting to make my head spin. Well, there's no need to worry. Today we're gonna be leveraging some of the new integrations between Azure Machine Learning and the Hugging Face ML community to easily pick a model, drop it into an ML fine tuning pipeline, and host our own model, so that way our application can use it. So if you're not familiar with Hugging Face, Hugging Face is an AI community where you can upload, share and consume different AI models. Now here I am in the Hugging Face homepage, where I can take a look at all of the models, data sets, and all of the different community aspects that Hugging Face offers. I'm interested in some of their models.
Now here in Hugging Face, you have a wide variety of different computer vision, natural language processing, audio, and other different ML models. I'm gonna choose image classification. And here you can see a bunch of different models uploaded by Microsoft, Google and other community members. So I'm gonna select the Microsoft Image Classification Model, and here you can get an overview of the model, what it does, a description, you can even deploy it into different inference endpoints or right here in the browser I can test it out to see how it performs.
Now let's say Winnie was found and the person who found Winnie has a new image that was not used as part of our training. So now let's upload Winnie and see that it's detecting a dog. There's a couch in the background, but what I really want this model to do is detect that this is in fact Winnie, the dog who has reported lost. So to do that, we're gonna be using Azure Machine Learning's built in fine tuning capabilities with Hugging Face.
So let's pop over into the Azure Machine Learning Studio to see how we can make this possible. So here in Azure ML, I have a couple different things available to me. I am gonna start off by going over into our pipeline to see the pipeline, which will take our base model and then fine-tune it with the images of reported lost pets. So opening up this pipeline, you'll see a couple things.
You'll see the data that will be coming in from our Pet Spotter application, and then you can see this pipeline component, which will be fine-tuning our model. So let's go ahead and open this up. Inside of our fine-tuning component, you'll see that this is compatible with PyTorch, as well as ML Flow, and it can also take in all of our training data with the output being a new model, which is fine-tuned on our data. So here in our fine-tuning component, you'll see that there are a bunch of parameters that we can use to customize this component and we can even get right into that Python code to customize it even further. But right out of the box, Azure Machine Learning gives you all of the tools that you need to take that base image and then fine-tune it even more. So let's open up VS code and here you can see the same Python backend that we were looking at before, and we're gonna be adding in that Azure machine learning integration.
So underneath our lost pet method, let's open this up, you'll see a couple new things. First, we're taking that Pet ID, which we got from the Dapr subscription, we're taking that ID and then using it to get the state of our pet so that way we'll know what it's, what's its name, it's breed, it's type. So that way we can make sure to set all of the parameters inside of our fine tuning pipeline appropriately. And now that we have our pet model loaded, we have the ability to go down and use a train model method, which will take all of that data that we've added to our pet and do a pipeline run to produce that output model. So if switch over into our petspotter.py file, we can scroll down and see where exactly we're integrating with Azure ML.
So if I go down into our class and open up train model, you'll see that we're leveraging the Azure ML Python packages to set up all of our different parameters. We're grabbing those images that were uploaded to Azure blob storage from the Dapr output binding, we're creating our request and setting up all of that training data, and most importantly, here's where we're setting up our pipeline component for fine tuning with the base model name, which is that Microsoft Image classification model, which is of type Hugging Face image family, and we're gonna be doing a image single label classification. So once this is all been configured here within our Python file, we can make sure that our pipeline has been created, but now we can create a new pipeline job. And with this, we're submitting a new run to our pipeline to do that fine tuning. So if I go back into our Pet Spotter app and report one more time, Winnie has been lost, we'll enter in all of the information, select those same images.
You'll note that there's going to be nine images that will be uploading as part of this training. I'll select those, open those, and once I hit submit, just like we already showcased, the state's being persisted through Dapr, the front-end is using Dapr to talk to the back end, but now we're kicking off a new Azure ML pipeline job. So that should only take a few seconds. If we go back over to the Azure ML studio, return back up to our pipelines, we'll now see a new pipeline run that took six seconds to complete.
So I'll open this up and we can see that the output of this is again, that new fine-tuned model. So if we go over into the models section of Azure ML Studio, we'll see the new pet match model, and this is what our application uses to match pictures of found pets to the reported pets. So if we open this up, we can actually use Azure ML Studio to deploy this model as a hosted realtime inference endpoint. And this is what allows us to take any image, send it over to this endpoint, and it'll score that image on how similar it is to any of the pre-trained images and pets.
So here's the rest endpoint that we're gonna be now integrating back into our backend. So if I return back over to VS code, we can take a look at how we now set up that found pet endpoint. So if I scroll, go over to the app.py and scroll down to our found pet endpoint, you'll see that it's very similar to lost pet, although instead of kicking off that job, we're now just getting that PET ID and that pet image that was reported as found predicting using that inference endpoint, how similar it is to any reported pets, if any, and then if there's a match that it detects, we'll then alert the owner. So returning back over into our pet class, let's scroll down and take a look at that predict image methods. So opening that up, you'll see very similar, we're grabbing that image out of the blob storage account.
We're setting up our payload, which is the image data. as well as some of the parameters that we want to use, and then we're gonna be using that hosted inference endpoint. So just in a few lines, we're now sending over request to Azure ML to then return a prediction score and if that's above our score threshold, we'll return true. So in this case, if it returns true, we'll then alert the owner. We're gonna be leveraging Dapr output bindings, which make it super easy to integrate with external services such as SendGrid for sending email.
All right, let's test out the entire app end to end. So I'm gonna select dog. You can, I know that this is some sort of retriever. Someone who might have found Winnie might just know that it's a retriever and then they have an image that was not used for training.
This is a new image that they can open and submit to the Found Pet page, and now they'll hit submit and now it'll hit the inference endpoint on Azure ML. It will predict whether or not it's a match. And given that this is a picture of Winnie, we hope that everything works end to end and I should be receiving an email telling me that Winnie has been found. So if I open up a browser to Outlook, we should now see a new email that's been delivered from Pet Spotter telling me that there's a new match, Winnie's been found, it has an ML confidence score of 0.87,
and then I can confirm that, yep, this is Winnie, and it will then match me with the reporter of the person who found Winnie. And just like that, look how easy it was to add in intelligent and customized machine learning experiences in our application, leveraging Azure ML along the way. In just minutes, we were able to pick a Hugging Face model, fine tune it with our own data, and then host that fine tune model on Azure.
Now I'd like to invite Takuto from the Azure AI marketing team to share more about the integration we just saw between Hugging Face and Azure Machine Learning. Takuto, can you highlight some of the benefits that users can expect to see in this integration, as well as maybe shed some light on how the term Hugging Face came to be? - Sure, hello everyone. My name is Takuto Higuchi and I work as a product marketing manager in the Azure AI marketing team.
I know Hugging Face is an interesting name, isn't it? It was actually named after the idea of a computer model being able to understand and respond to human emotions in a meaningful way, like a hug. Our foundation models in Azure Machine Learning provides a seamless experience for data scientists to take advantage of pre-trained AI models from Hugging Face and then utilize them for their deep learning workloads. This feature supports easy deployment and also fine tuning of these foundations models for values language tasks, like text classification, named entity recognition, summarization, question answering, and also translation. All of this can be achieved within Azure Machine Learning, making it a hassle-free experience for our users.
- That's amazing! Now, I've heard there's a couple other really exciting announcements in this AI space. Could you tell us a little bit more about those? - Sure. I am thrilled to share with you the latest developments in the open source AI space.
AI has seen rapid progress in recent years leading to the creation of powerful foundation models that are trained on massive amounts of data. These models can be easily adopted for various applications across industries, providing enterprises with a remarkable opportunity to integrate them into their deep learning workloads. As you saw in Aaron's demo, we're bringing foundation models in Azure Machine Learning available to all machine learning professional, allowing them to build in a personalized open source soda models from Hugging Face at scale.
Previously, to use Hugging Face models in Azure Machine Learning, users had to write or adapt scoring scripts, manage infrastructure and dependencies, and then also optimize models for inferencing. They also had to handle data pre-processing and adaptation of training scripts if they wanted to fine tune the models with their own data. With this new features, users can now fine tune and deploy foundation models from Hugging Face with ease, using Azure Machine Learning components and pipelines.
You will no longer have to worry about the infrastructure management, as created environments are provided. This service will provide you with a comprehensive repository of popular language models from Hugging Face through the Azure Machine Learning built-in registry. Users can not only use these pre-trained models for deployment and influencing directly, but they will also have the ability to fine tune for supported language tasks, using their own data.
Model evaluation components are also available in the same registry. All of this is possible with our AI super computing infrastructure, which is best in class and chosen by companies, like OpenAI, for training the state of the art AI models. While Azure Machine Learning is ideal for an organization with many machine learning professionals, what if you're not expert in machine learning but still want to use these large AI models in your business? This is where Azure Cognitive Services come in. We're pleased to announce the release of the next generation of cognitive services for vision, powered by the Florence large foundation model. This new Microsoft model is a state-of-the-art computer vision technology that offers improved image captioning and also groundbreaking customization capabilities with few shot learning. Traditionally, model customization required large data sets with hundreds of image per label, to achieve production level quality for vision tasks.
But Florence is trained on billions of text image pairs, allowing custom models to achieve high quality with just a few images. This reduces the barrier for creating models that can fit challenging use cases, where training data is limited. These large AI models are so powerful that it could dramatically transform your business, but how do we ensure responsible development in the use of these powerful models? As an industry leader, Microsoft is committed to putting a responsible AI principles into practice.
In December, 2022, we announced the responsible AI Dashboard within the responsible AI Toolkit, a suit of open source tools for customized responsible AI experience. Today, we're announcing the addition of two open source tools in the toolkit, the Responsible AI Mitigation Library and the Responsible AI Tracker. The Responsible Mitigations Library allows practitioners to experiment with different mitigation techniques more easily, while the Responsible AI Tracker uses visualizations to demonstrate effectiveness of different mitigations for more informed decision making. When used with other tools in the responsible AI Toolbox, they offer more efficient and effective means to help improve performance of systems across users, and the conditions. These capabilities are open sourced on GitHub, or you can also use within the Azure Machine Learning.
Now, back to your Aaron. - Thanks Takuto. Now that our application is running in the cloud and leveraging the latest in customized ML, let's make sure that it can hold up to production workloads because I want to be sure that not only does my application work for one user, but I can work for thousands, as traffic begins to pick up.
And not only do I need it to scale, I need to test my app to see how well it scales and where it begins to buckle under pressure. Now, I don't know about you, but I don't have a thousand friends that I can call up and ask them to hammer my website with traffic all at the same time. So let's take a look at how we can add scaling and more importantly, testing to our application.
We are gonna be leveraging the Keda open source project. Keda or Kubernetes event-driven autoscaling, is a project that allows you to scale your Kubernetes objects, not just based on CPU and memory utilization, but also external resources, because often with scaling, you're scaling based upon the symptoms, which are, how is my Kubernetes cluster responding to that load? But really, we should be scaling based upon things like queue depth or kind of the the reason that your Kubernetes cluster is being hammered with traffic in the first place. With Keda, they have dozens of scalers that you can choose from that span AWS, Azure, Google Cloud, or many other popular open source projects. And each of them is customized to be able to scale, based upon things like queue depth.
So here in Azure Service Bus, I can connect to my namespace and my topic because that's what we're gonna be leveraging for our Dapr pub/sub, and when the front-end sends a lot of requests to our backend and that queue depth starts to build up, we can go ahead and use Keda to tell our Kubernetes cluster to horizontally scale our backend container. So here, I'm using Headlamp, one of my favorite open source Kubernetes dashboards, and here I have my scaled object. And if we take a look at the scaled object, you'll see that we have our Azure service bus trigger, which I've set my subscription, my topic and my name space. I've also said to begin scaling on a queue depth of five, and once that trigger is hit, it goes ahead and scales up my backend deployment. And I've also set it up to have a minimum replica count of one. However, with Keda, you can scale all the way to zero, so that way if your application isn't being used, you can go ahead and set all of the containers to spin down completely, and then once traffic picks back up and that queued depth picks up, it will then scale your containers up.
So now that we have Keda set up, we also need to be able to make sure that our infrastructure can scale as well, and with Cosmos DB, it's super simple to set up auto scaling using the built-in throughput auto scaling. Here, I've set up a maximum RUs per second of 5,000, and what this does is it tells Cosmos DB to auto scale between 5,000 all the way down to 10% of my maximum, which is 500. And this gives me not only the cost control of like, getting no unexpected bills at the end of the month, but it also then lowers my RUs per second when my application is not being used perhaps, in the middle of the night.
So now that I've set up my Keda scalers to scale my application and have my Cosmos DB auto scaling, we're gonna be using Azure load testing to make sure that our application scales as expected. With Azure Load Testing, I can either get started with a quick test right in the portal or I can use the open source J meter script standard to customize my own load testing script. So here I'm opening up the Apache J meter UI, and here you can see I've authored an entire test that will take a mock pet entry, fill it out in our backend, which will then go ahead and begin scaling our app. So let's go ahead and open this test up in the Azure Load Testing UI. I'm gonna be running a test. In this case, I'm just gonna be testing production scale, so I'll fill out the description, and then with Azure Load Testing, it allows me to customize all of the different environment variables and parameters as part of my test.
So I can override, for example, how many threads per engine I want to run. So once I go ahead and start this test, Azure Load Testing spins up all of the backend infrastructure to simulate those virtual users and hit my Pet Spotter application with a ton of load. So that's off provisioning right now, spinning all of that up. Let's go ahead and open up our test. It should be in the executing phase now, and we can see that now we are starting to get all of the client side metrics, how many virtual users we're seeing as that's beginning to scale up from hundreds into thousands of virtual users.
We're seeing the response time of our application begin to show a little bit of that load, and we're seeing how many requests per second that we're hitting on our app. We can also see any errors that reported along the way as well. But more importantly, I can take a look and see all of the server side metrics, all of the resources that comprise my app, from the AKs cluster to the Cosmos DB to the Azure Service bus, all here in a single view to let me know how that infrastructure is handling those requests.
And as we expect, we're starting to see our requests rise and trend upwards. We're seeing our active connections, our messages, all begin to grow as this test progresses. So if I go back into Headlamp and take a look at our workloads, we should expect to see our backend beginning to scale and yep, just as we were seeing, as our scale goes up, we now have 64 pods and it just even now scaled up to a hundred, where we can see that Keda is ramping up our backend to keep up with all of that demand that our front-end and the Azure Load Testing is hitting. So as this backend is now scaling up to a hundred, our Cosmos DB has also been autoscaling to meet that demand as well. So if I pop back over into the browser and hit refresh, we now see that we've gone from our minimum 500 RUs per second, up to that nearly 2,100 RUs per second, to keep up with all of that simulated load. And so now we've seen just how easy it is to add in both scaling as well as testing into our application.
And now we've gone through the entire process of deploying our infrastructure with Bicep, writing our application with Dapr, integrating an advanced machine learning model with Azure Machine Learning. We've scaled everything with Keda and we've even load tested it with Azure Load Testing. So many amazing open source projects that made the entire process effortless. To learn more about some of the other scaling offerings available to you, I'll hand it over to Deborah Chen and Varun Shandilya to cover what's new and exciting in the space of scaling. - Thank you, Aaron.
Hi, I'm Deborah Chen, Principal Product Manager for Azure Cosmos DB. You just saw auto scale in action, as Cosmos DB automatically adjusted the throughput to match a traffic hitting the app's database. Auto scale is great for these variable workloads because we don't have to worry about managing capacity manually, saving time and hassle.
We also only pay for the throughput that we actually use, which helps us avoid over-provisioning. Finally, we get great app performance by taking advantage of a 10x auto scale scale range, which instantly reacts to spikes in traffic, so we don't get rate limited. All we have to do is set that maximum throughput that we want to scale to and Cosmo DB will automatically scale between that max value and 10% of the max value. Since this was a new workload, we did have to make an estimate of the max throughput to start with, but once we know the actual traffic patterns of our app, we can adjust as needed.
Developers using MySQL can now gain similar benefits for the critical applications with a general availability of IOPS autoscaling and Azure database from MySQL, flexible server business critical tier. With this new capability, the server automatically scales IOPS depending on the workload needs, saving you both time and money. Rapid low code app development is now even easier with integration of power apps in all tiers of Azure database from MySQL flexible server, now in public preview, and you can visualize your MySQL data with a new Power BI desktop integration, now generally available in all tiers of Azure database from MySQL flexible server. The Redis JSON module is now in preview on active geo replicated caches of Azure Cache for Redis Enterprise.
This simplifies development and minimizes downtime by enabling developers to read, write, and store JSON documents using a single atomic operation, while also simultaneously synchronizing all data across all active regions. The JSON compatibility also enables Redis search, which makes querying, secondary indexing and full tech search possible. Finally, Azure active directory and customer managed keys are now generally available in Azure database for Postgres flexible server in all regions worldwide. This enables you to build more secure apps by centrally managing your database user identities and access, and it puts you in full control of your keys usage, permissions, and lifecycle. Next up, we have Varun to talk about Azure infrastructure and scale. - Thank you, Deborah.
Hi, my name is Varun Shandilya. I'm product lead in Azure Core. Today I'm gonna cover how Azure is innovating in compute, storage and AI, to help developers get more out of the platform. Let me first start with compute at scale.
VMS's Flex enables customers to deploy highly available applications at scale. It lets you easily manage infrastructure at scale using variety of SKUs and management options. You can automatically and seamlessly scale out your applications to meet your workload needs and reduce operational overheads and costs.
I'm happy to announce general availability of mixing spot VMs and on demand VM in VM's Flex. It enables greater workload flexibility, expanding infrastructure cost optimization options. The new BPS virtual machine series, it's the latest generation Azure burstable general purpose VMs and the first arm-based Azure burstable VMs. These VMs provide a baseline level of CP performance and are capable of expanding to higher processing speeds, as workload volume increases. This is ideal for applications such as development and test servers, low traffic web servers, small databases, microservices, servers for proof of concepts, build servers and code repositories.
In addition, new series of burstable VMs include BSV2 and BASV2, running Intel Zion and AMD Epic processors, respectively, for X86 workloads. The new DLSV5 VM sizes provide two gigabits per RAM VCPU and are optimized for workloads that require less RAM per CPU than standard VM sizes, such as low to medium traffic web servers, virtual desktops, application servers, batch processing, analytics, and more. These VM sizes are optimal for reducing costs, offering non-memory intensive applications.
Now let's talk about storage options for your cloud native workloads. Introducing replica mounts on Azure disk persistent volumes to deliver the best experience and performance when running business critical stateful workloads on AKS. These mounts automatically pre-create replica attachments, ensuring rapid availability during part failovers and are optimized for part placements to maximize uptime for stateful labs.
The latest Azure disk CSI driver offers ability to fine tune performance and increase reliability at scale and is available on premium SST, standard SST and ultra disks. Azure offers a unique capability of mountain blob storage as a file system to a Kubernetes pod or applications using Blob fuse or NFS 3.0 options. This allows use of stateful Kubernetes applications, including HPC, analytics, image processing, and audio or video streaming. Previously, you had to manually install and manage the lifecycle of these open source Azure blob CSI drivers, including deployment, versioning, and upgrades. The Azure Blobs CSI driver is now managed AKS add-on with built-in storage classes for NFS and blobfuse.
This reduces the operational overhead and maximizes the time to value. Azure Elastic SAN, a new block storage option for larger Kubernetes workload, is now available in preview. It offers consolidated storage management, helping to reduce the number of required compute resources and supports AKS through the use of open source iSCSI driver. With fast attach and detached capabilities and low TCO through dynamic performance sharing, Elastic SAN is a versatile and cost effective option for businesses looking to optimize their infrastructure. We know security is important for you. That's why in Azure we continue to offer best-in-class technologies to help you encrypt your data at rest and in-transit.
And with Azure Confidential Computing, we are helping you to protect data while it's in use. We continue to expand our Azure confidential computing portfolio. In fact, we've added extra protection to your serverless containers on Azure container instances, called confidential containers. These containers run in a trusted execution environment, where all data in use in memory is protected using encryption keys generated by CPU firmware. We also provide customers at a station that your containers are running in this highly secure environment, as well as open source site card to support secure release of secrets.
Next, I wanna talk about our innovation in the AI space and how Azure is leading the AI infrastructure. Our design philosophy to provide the optimal infrastructure configuration of CPU, compute, storage, and latest GPU architecture from Nvidia all interconnected with Nvidia Melanox InfiniBand to allow unprecedented scale, this cloud surrounded by first-in-class AI platform services. Our focus is to deliver best performance and scale, but also on providing our customers choice at any scale.
We do this by offering multiple VM skews and series with configurations from mid-tier to small scale AI workloads, even allowing you to deploy virtual machines using one eighth of a GPU. Large AI models, especially natural language processings, or NLP, today train more than a trillion parameters and continue to evolve, often requiring longer training and massive scale data sets, along with using a mixture of experts. Customers already have an insatiable need for performance at scale, and the reality is the complexity and compute demand for training is expected to grow by 10-50 times over the next decade.
Azure AI Infra was designed for these challenges. High-end training is very sensitive to performance at large scale. With a single job running synchronously across thousands of GPUs, Azure ND series based on Nvidia GPU hardware is highly optimized for AI workloads, to ensure the best GPU throughput with high CPU core count and the fastest, lowest latency InfiniBand network optimized for all to one communication pattern.
This is why customers, like OpenAI, Meta, Nvidia, and others, are choosing Azure. We have worked with these leading AI companies to truly understand the challenges of sophisticated AI workloads and are committed to supporting these needs well into the future.