Bring your own model to Windows using Windows ML | BRK225

Show Video

Okay, good afternoon everyone. Yeah, thank you. Thank you. This is uh BRK225, bring your own model to Windows using the Windows ML runtime. And uh today we're going to talk all about the Windows ML runtime. We're going to dive

into it, understand sort of like why we have been uh designing Windows ML and how it's put together. We're going to do a live coding demo. Um, so we got lots of uh exciting content to cover here. Um, and we have six different machines on here. Um, which is continuing to fulfill my forever goal of making the KVM switch cry. So, all right, let's jump in. I am Ryan Demopoulos. I'm a

product manager that works on the Windows development platform team. Right now, I work on Windows ML and AI technologies. Um, in the past, I've worked on things like Win UI and WinApp SDK. I help ship the V1 of those uh features as well. So, some of you may

recognize me from that. And with me today is my amazingly talented partner in crime, Shiaoi. Thanks, Ryan. Hey, everyone. Uh, I'm Shiaoi Han, a software engineer working on Windows AI APIs and also the Windows ML. Um, I've been at Microsoft for about eight years, uh, having worked on various projects including Windows Subsystem for Android, uh, PC Game Pass, and a couple more. Um

today Ryan and I are very excited to not just tell you but also show you um some some significant improvements we've made to the AI story on Windows. Our focus today will be on Windows ML which serves as a key component uh for many of the capabilities we're providing for you the AI developers on Windows. Awesome. Okay, let's jump right in. We got a full 60 minutes so we'll get going. So today, right now, Windows powers the vast majority of desktop and laptop PCs all around the world. That alone is

sometimes staggering to think about, at least when I think about it. Um, and when you apply it to the revolution in AI that we are seeing all across the world, um, it really means that that AI revolution is going to unfold on Windows more than any other place. One of the greatest advantages, I guess, uh, powers of the Windows ecosystem is the incredible diversity of hardware that is available to all of your, uh, customers.

Ranges all the way from $300 laptops all the way up to powerful $3,000 desktop PCs bristling with performance and memory. And if you build your PCs like I like to build mine at home, you can go even above and beyond that. We know that when you're writing software for Windows, targeting that diversity of hardware can sometimes be a little bit challenging and that can be especially true when it comes to AI with all of the differences in silicon that are available. And so our goal as a development team and uh with Windows ML is to really help you do three key things. The first is to write code that scales across all of that diversity of hardware. Make your life easier to go and write that code. The second is to

make sure that you are maximizing the performance of every individual PC that is running on your customers machines. And finally, just to make your lives easier with uh dependencies. Um whether it's trying to figure out what dependencies you need, procuring them, including them in your installers, updating them, management, that whole type of thing. It's a lot of work,

especially in a diverse ecosystem, and so we really want to help you with that. So, with all of that in mind, we're going to do a very quick demo and make the KVM cry. All right. So, this is a Surface Laptop 7. It is a Qualcomm based device. And on this device, we have a small app that we've written. It is called ResNet

Build.exe exe because we are absolutely amazing at naming apps. And what ResNet build.exe does is it takes an image in and it inferences it against the ResNet 50 model. Um this app is only about all

things considered about 2 megabytes in size. I'm going to run it now. It only this isn't a heavy workload. Um it's going to uh put the workload onto the uh Qualcomb based NPU on this device. And you're going to see the device very briefly spike. So let's go. Actually, let me just show you here first. We should see what we are classifying. We're going to classify

this cute little puppy. All right. What is this thing? We're about to find out. So let's run it. And you will want to take a look at the NPU spike. Here we go. So the app ran. You

can see a little spike on the NPU. Very quick uh little bit of work. What this is doing is it's putting that AI workload and running it locally on this device on the NPU hardware that's on it. And with uh almost 100% confidence, it believes that it is a golden retriever. I'm going to next switch to this device. This is an

Intelbased laptop, a Surface device. And on this device, we have the very same app, same files, same size, same everything, same code. We are going to run the uh ResNet build demo against this uh cute picture of the puppy.

You'll see the npu spike here. Every time we've run it, it's a fairly small spike, but there is a spike there. You can see it. There's a really light workload. Um but you can see a little spike there running on the NPU.

And again, um it's put that workload on this Lun Earth Lake NPU. Um to go and determine that this is a golden retriever. I'm going to switch now to this AMD machine and we are going to do the exact same thing. Run this here on the AMD NPU and you can see a spike there. It's putting the workload there.

And finally, we will switch to this. And this does not have an NPU in it. This is a uh I'm seeing fuzzies here. Wh And that probably means I really did make the KVM cry. Excellent. All right. So, this is an uh Ad Dell Alienware with 4090 in it. And we're going to put the

workload on this 4090. I'm not going to show you a spike on the 4090 because this is like a speed bump for 4090. Um and plus the 4090 is already under load drawing the very desktop pixels that we've got. So, instead, what I'm going to do is I'm going to show you um this app. Uh it's going to activate the GPU engine. It's going to put a memory load

on it. So you can watch that in task manager. Let's just uh run it here. And there it is. You can see the GPU engine is engaged. It's got a memory load. It completes and then it's done.

So what we've just shown you here across all four of these machines is the same app, same code, uh running on pretty diverse silicon. We've got three different NPUs. Each of them are from different manufacturers. They are running uh they have different chip designs. And we're running on a GPU as well, which is like a totally different piece of hardware. And the thing that

makes all of this possible is a new built from the groundup version of Windows ML. So, we'll talk about that. Up until now, our approach has been to make Direct ML um your one-stop shop for local ondevice AI inference. We've been listening to your feedback uh on direct ML and we've also been building experience ourselves building AI experiences uh into Windows for example with click to-do and uh we've dwelt upon your feedback we've also um you know sort of analyzed our own experiences here and what we've come to the realization is that we need something in the platform that's faster faster in two different ways. First,

just wall clock faster. Something that can strip away the layers and the abstractions between your app code and the silicon that is running on your customer's machines. But we also need the platform to be able to move at the speed of AI innovation when it comes to silicon. So, we're seeing a flurry of activity in the NPU space where new chips are coming out. These chips are

very different. They're designed very differently and uh we really need the platform to be able to keep up to the extent where when new hardware hits the market, the platform is ready to go for your apps on uh day one. And after a lot of contemplation around that feedback and that experience, um we've designed a new Windows ML centered around those goals. This new Windows ML is going to serve as the AI nucleus of the Windows AI foundry. Um, if you didn't get a chance to catch the excellent talk from Tucker and Deianne uh earlier today, you should definitely go back and check that out.

They talked all about the Windows AI foundry. You can think of this as like an umbrella term and set of uh technologies to just help you do local ondevice AI uh in your apps for Windows. And what this means is that if you are for example using Foundry local to tap into the readytouse foundry catalog models or if you're using the built into Windows way uh um Windows AI APIs to go and do things like Laura for Filica or text intelligence or uh you know all of those types of things you'll be using uh Windows ML uh potentially without even knowing it under the covers. But if you want to bring your own model uh or if you want to take uh a really uh high degree fine grain control over how AI inference works on the device then you can use Windows ML directly via its public APIs. This new Windows ML is built around the Onyx runtime as its underlying tensor engine. Um, if you're

not familiar with the Onyx runtime, it's a fast, open, mature inferencing runtime that I'm sure uh some of you in the room have either kicked the tires on or maybe you're even using it in your apps today. And because it's built around the Onyx runtime, it's also designed to work with Onyx models. Um, the Onyx model format is a open standard format that's uh designed to give really high inference performance. Um in many cases we see um 20% or better uh inference performance after we've converted from a source format into the um onyx format. Um and it's also designed to be just very convertible itself. And what this means for you is for example if you have a pietorch model if you've trained one or if you've obtained one you can convert it into onyx format and then go and uh run that with windows ml um to go and run your local ondevice workloads.

WindowsML talks to silicon via execution providers. Execution providers are kind of like a translation layer between the runtime and the variety of silicon uh that the runtime sees on the device. We have taken the execution providers that we ourselves have developed for the various uh experience we we've shipped in Windows like uh uh click to do. We've uh refreshed them in working in partnership with our hardware partners. We've added new dedicated GPU execution providers to target the wide range of GPU hardware. And then we've combined

that with Onyx's great support for CPU and we've bundled all of that and wrapped it all up into the new Windows ML. So what this means, I know that's a lot of words, but what it means is if you bring an Onyx model and you run it with Windows ML, it's going to scale across all this different type of hardware um intelligently so that you can just not have to focus too much on it and really work uh focus on your workload. Um each type of processor is a first class citizen with its own dedicated execution providers to make sure that things run well. Also really importantly you should

take note of this. Windows ML will work on any Windows 11 PC. So this includes uh obviously the growing list uh yeah exactly working on them all. uh it includes all of the Cop-ilot Plus PCs that have been shipping in the market for some time obviously and those are copilot plus machines with dedicated NPUs but it goes beyond that. So if you

have for example this guy uh that has a powerful uh discrete GPU you can go and run Windows ML on this as well. In fact you can even run Windows ML on uh PCs with only a CPU. Just uh make sure you keep the workload reasonable for a CPU. Um and uh there's lots of AI workloads that are very reasonable on a CPU. So

you can go and use Windows ML for that as well. Shashi, why don't we talk a little bit about dependencies? Sure. Um, let me do this. All right. Um, so let's discuss how Windows ML can simplify the task of obtaining, deploying, and managing app dependencies. Uh, we know you want to

focus on your app code because that is the value you are trying to bring to your customers. But the status quo today is that you also need to grab an AI runtime out of a sea of complex options and that runtime becomes part of your app where the burden of maintaining it falls on you. These AI runtimes can also require additional add-ons for talking to specific type of AI hardware such as the Onyx execution providers further bloating your app and of course you need to include the bottles themselves. It is quite normal to have multiple copies of the same model where each copy is tuned to work specifically well on a specific type of um like processors for a hardware manufacturer. What this means is that you either have to deploy multiple copies of the same model to your customer's machine which eats up valuable disk space or uh you have to write complex logic in your installer trying to figure out which model would work in this current PC that the app is being installed to. All of these is your responsibility

to um find, deploy and manage. It's just a norm and it sucks. It does suck. Well, uh, Windows ML makes all of this much simpler. To get started, all you need to

do is to add a Nugit package to your app. This NG get package includes the Windows ML runtime headers, and it's going to set up um, dependencies between your app code and Windows ML runtime automatically. You just need to call our provided Bootstraper API from your app installer to install the runtime. And

since the runtime is packaged as a um store main package, that means we Microsoft handles servicing fixes for you. So you don't need to worry about updating the runtime anymore. And once your app is installed and initializes Windows runtime, Windows ML, then we will scan the current hardware and download any execution providers um applicable for this device. This is great for a couple of reasons. First, you don't need to find your own exec execution providers, include them in your app and maintain them. And also, since we only download the necessary execution providers for this specific PC, um that means you don't need uh your downloadable dependencies are very very few of them and also um the size of your app can stay as small as possible. You

don't need to worry about what PC or app is currently running on. All of that are handled by win. And suppose uh you've uh trained a custom model and have multiple variants that work on AMD, Qualcomm, Intel, and Nvidia. If you upload your app to Microsoft Store and references these models as resource packs with metadata indicating which device um or which hardware this model is targeting, then WinML will make sure that only the correct version of the model is downloaded when your app is installed. This means you don't need to

have complex logic in your app installer to determine which environment and hardware this app is running at. And you won't need to worry about having multiple copies of your model on your customer's machine. The goal of all of this is to really uh minimize the burden you have um in managing app dependencies across Windows wide array of hardware. All right, let's just briefly

talk a bit about performance. Windows ML introduces the concept of device policies. Device policies are basically a way for you to describe what outcome you want when you run your AI workload with Windows ML. For example, if you set the max performance device policy, Windows ML will attempt to get the most uh power out of that machine and achieve the fastest wall clock speed possible. So, if it sees a discrete graphics card like in a machine like this, it will select it and it'll go and run that workload there uh as lightning fast as it can. Uh similarly, let's say you're in a different situation where you've got ambient AI running in the background.

It's longunning AI. Maybe it's analyzing something the user is doing while that user um you know is interacting with their software and you don't want to chew up all the battery life. You can set the min overall power device policy and Windows ML will try and respect that. It'll try and sip as little power as possible. If there's an NPU available, it will select that NPU and try and run it there. If you don't set a device policy at all or if you set the default policy, then Windows ML will just run things on the CPU. The CPU

gives the greatest uh range of capacity. Every device has a CPU. It runs across um the most types of different models out there and it's going to um give the best accuracy inference across that wide range um as possible. And finally, if you don't want to sort of use any of those and you just know exactly what you want, I really want to run it on the GPU, I really want to run it on the NPU, you can do that as well. Um for example, you can say prefer NPU and Windows ML will just respect that. if it's got an

NPU, it'll go and select that device and run it on there as well. Eventually, uh in the not too distant future, uh we also want to add what we call workload splitting. So you can have a single AI workload that is split across multiple different types of processors to get even greater performance and maximize what you can get out of the hardware that your app is uh running on. So one

would do that, we'll work that into our our uh device uh policy uh system. Okay, so we spent a lot of time talking about sort of the vision of what we're trying to do. We've talked about um some design, you know, choices that we've made, how things are put together, some capabilities that it's got. I think a natural question maybe on some people's minds is when can I try this? Um, and the great news is that you can try this today. Um, so I'm happy to announce Yes. Yeah. So we're announcing Windows

ML 2.0 experimental one. Um, if you're familiar with sort of the nomenclature of like Win App SDK or how we name things, an experimental release basically means it's not yet ready or meant for production apps. So, please don't use it in your production apps.

Instead, we're sharing early and open bits with you so that you can try them out, download them, give them a spin, integrate them into your apps, and uh let us know what's working for you and let us know what isn't working for you. um so that we can make sure that we hone the APIs, hone the vision, add the capabilities set that you need um so that we can uh ship a stable GA later on this year. So with all of that said, um Shiai is now going to be very brave and fearless. She is going to take experimental one for a spin with live code from the bottoms up. Go for it, Shashi. Thanks, Ryan. Well, before we do

that, a little bit more talking. Let's go over the um API surface of AFR. No, not not sorry of the Windows ML first. Um Windows ML has two layers of API. The first is called the ML layer. You can

think of this as the main APIs that you use in the new Windows uh new Windows ML. Uh one set of APIs handles in initialization of the runtime. um it exposes an infrastructure object that you use to download um all the applicable execution providers and also make sure the runtime is up to date. The initialization APIs are exposed um in WinRT, but we've also provided uh um flat C wrappers with managed projections um as a convenience. So you don't need to learn and write WinRT. Another set of APIs in this layer is the generative AI APIs. These are

designed to help you more easily write generative AI loops with language models. And since we're built around the Onyx runtime, um there is a runtime layer which exposes all of the standard Onyx runtime uh functions uh via Slaty APIs with managed projections. This is great for two reasons. Um, first, if you've already written app code against the Onyx runtime, then it is very easy for you to port it to Windows ML because all the APIs that you're used to are already available here. And also, um, since we're exposing the full Onyx runtime API surface, um, that means you can take very, uh, fine grain control over how ondevice inference happens, especially if you want to do things that aren't exposed via the ML layer yet. All right. Um, so with all that in mind,

uh, let's see this in action. So, can you put on Yep. Gotcha. All right. So, this is, um, the quick ResNet build demo app that Ryan just showed. And today I'm going to show you how to write this from scratch. Um, the first thing we always

want to do is to get the model. Um, you can obvious obviously obtain open source model from PyTorch or Hoging Face and convert them to Onyx by yourself. But if you are using a bottle like RestNet, I highly recommend using the AI toolkit VS Code extension because we have just added a new functionality to it that makes this much easier. So um this is AI

toolkit for VS Code extension. You can see it in in the extension page. So we'll go to the AI toolkit tab. Um here under the model section, you'll see a new conversion tool in preview and it is just released today. What this tool does is it provides a streamlined experience um to get a model from hoging face converted to onyx optimize and quantize it. So it works really great with Windows ML. Um the model that has been optimized

via the model app um has very fast inference, small size and also low power consumption. These are not that important if you are doing cloud AI but on local AI they are the key characters that you are absolutely searching for. Um so let's take a look at this. I'm going to create a new project. And here you'll see a list of 11 popular models including RestNet.

Um, for these models, we have worked with our IHVs um to really optimize them, optimize them and make them work really well on Windows ML. But if you just want a model that is not in this list yet, um, you can still convert it to Onyx using Olive or other tools you like. Uh, and they should work with WinML as long as they work with Onyx today. They might not be as optimized as

these ones yet, but this is just a preview and we're still working closely with our IHVs to bring more models here. All right, so let's select Microsoft Resnet 50 and we're going to click next and I'll save it in my default folder and name it ResNet. Again, excellent name. I mean, straightforward, right? Um, so we'll go back to our AI toolkit. um page and

click on conversion again. Here you'll see the workflow we just created. It has three options. Convert to QN, convert to AMD MPU and also convert to Intel MPU. These are the three workflows that produce the models that are um best optimized for the Qualcomm MPU um Intel MPU and AMD MPU. We are working with them so that in the near future we have one recipe that will work on all of the three. So you don't

need three models anymore and just one will work for all but we're we're still working on that. All right. And since I'm demoing on the Surface Pro 7 with a Snapdragon MPU, I'm going to go with the convert to QN flow. Um I'm going to keep all the default parameters and just click run. going to make this slightly smaller. Actually, no, I'll just grow it. All right, you can see that we just

kick a run and it's converting a model to Onyx. Um, in a machine like this, it's normally going to take about 30 seconds to a minute to convert it to Onyx. Um, assume that you have already installed all the dependencies. So, we won't spend our time waiting for that.

But instead, I'm going to open Visual Studio and create our app. So, I'm going to create a new console project. Of course, I'm going to name it RestNet build demo. Yes.

All right. And uh keep all of that. Can't name it as something different. It's the same app that we showed. So, yeah, it has just keeping our promise.

All right. Um so, the project's created. The first thing we want to do is to change the .NET version. So, I will open the project properties. So we'll change the target OS to Windows and the target OS version will be 10.0.261000.0. The next thing we want to do is to install our Nougat package. So

I'll bring up Nougat manager. Um we'll search for our Nougat which is Microsoft.windows.ai.machinelearning. Um I'm going to install our um this version. Well, this nug get contains the winml runtime package which you need to um install. I think I select

the wrong one. So I'm going to reselect um it contains the uh Windows ML runtime package which you need to uh install from your app installer. It al also contains um Onyx runtime bits and also the ML layer APIs that download and provision the execution provider package for you. Um now that that's installed um

but before I move on, I want to show you this. Um if any of you here has worked with Onyx before, you will find these um nougats familiar. um like these are the execution provider nougat package that you used to have to include in your app if you want to target these um providers and they are compiled to your your app code. So they will be deployed to your customer's machine which might not even support these and that makes your size app size larger than you want, right? But with Windows ML, you no longer need any of those. We're just going to download them in like in runtime for you. And that's it. Uh let's go back to our install list. This is the only

Nougat package we'll need for this demo. Now, if you were to build a app using large language model, you might need the generative AI um nugit that we uh mentioned briefly earlier, it has a bunch of great helper methods that I won't be able to demo today, but uh we do have link to documentations at the end of the session. So, please check it out later. All right. Um so, we have prepared all the dependencies. Let's start write our app. I'm going to delete all that. Oops. Go back here. I'm going

to bring up some namespace that's going to make my app looks better. Um I'm going to create a main class, sorry, a program class with a main method. This is the entry point of our um console app. Now the first thing we want to do is to initialize Onyx uh runtime environment. This controls the logging level of Onyx and uh this is a required thing. um by runtime. So we'll

do that. The next thing we want to do is to initialize Windows ML. We will create the infrastructure object which needs to stay alive for this entire app process. Then we call download packages async on the infrastructure object which will scan this hardware and download missing and applicable execution providers packages. For example, for

this machine, it will going to be downloading the um QN execution providers. And then we'll call register execution provider libraries async which registers um the EPS with WinML so we know what to use uh when we try to deploy um the AI workload. All right, that's all we need for um initialization and starting from here is going to be standard AI code writing. So we need to prepare a couple of path when we need the path to our model file which I don't have here because I'm hoping that um AI toolkit has finished converting which yeah it did. Um so you can see it has a nice

evaluation results. You can also run the inference sample um in Python but here I'm just going to copy model path from here and then paste it to here. Hope I didn't make any mistake. All right, we have a path to our label file which contains a thousand labels that ResNet used to classify images. And lastly, we have a image file um that shows the puppy that Ryan uh classified before, but in case anybody forgot how it looks like, it's it looks like this.

It's a ball of fur. All right. Um so I'm going to close that. All right. Um the next thing we want to do is to um create the Onyx session. So let me create the session. We will need a session options which currently I'm keeping all everything as default. So it's going to run the workload in CPU. Now we create the inference session

uh with the session options we just created. And now I need to load the image um that I just show you. Um here I've written a couple of helper method to do that. So I am also going to bring that up using code snippet. Um so all it does here is to uh read the image from a file path um into a software bit map and then it process the software bit map. So it has the correct width and height, correct size, correct format and correct normalization that's required by uh restnet. These are

standard image prep-processing processing steps. You can do it my way or you can do it you know using helper library like OpenCV. Uh so I won't go into too much details here but all of these code will be uploaded to GitHub tonight so you'll be able to check it out as well. All right. Um so the image is loaded. Um now let's try to run the inference. So I'm going to do my

inference helper which um put the image input we just load into a tensor and we just feed that tensor into the or onyx runtime session and run the result run the inference and finally we want to show the output. So we get the results we um extract the output name and result tensor and uh we just print it in a userfriendly way. So, let me bring up my um print results helper as well. Um here we're just loading the labels into a list of strings. And then we um before we try to print anything, we need to apply softmax to the results um from RestNet because they're raw digits and we need softmax to transform them into um probabilities. And then we

just sort these probabilities and print top five class with the highest confidence. So let's try to run that. Um hopefully it builds. Sounds like it did. Um and hey uh you're seeing the same results that Ryan just showed before.

Yay from scratch. All right. Um so it still think it's a gold retriever. Uh there's some chance that it could be a doormat, which I I don't know about that, but I'm going with tennis ball. The chance is pretty

low. Ball looks good. Pretty good. Yep. Yep. All right. Um so that's good. Uh we just show this app running on a CPU. Now, um I mentioned that this machine has a MPU and a GPU. So in real life, you probably want to run your workload on those devices. So let's see how to

get a max performance out of this. I'm going to modify my session options and all we need to do is to set uh EP selection policy to um max performance and this going to translate to using the GPU if the GPU is available. So let's try that. Oh, and before I do that, I need to add a sleep here so the app doesn't um go away too fast. All right. So, and let me also bring up task

manager and make sure it's showing GPU correctly. And I don't need console. Need resnet build demo. All right, I think we have the correct setup. So, let's try to run it. So, here you see that it is actually

um engaging its GPU. Um, so by just changing that one property, uh, we moved our workload. Now, if I were to take this app to that powerful alienware, it's going to pick the Tensor RT RTX execution provider instantly and going to load the P uh, workload there. Um, all right. And what if we care about um, power consumption? Then we can change this to min overall power. I don't need that

space. And this is going to shift the workload to MPU. So let's take a look. Um let me close that. All right. So

F5. Come on. All right. So here you can see a little spike on MPU. So we have moved the workload to MPU properly. And again if I if I take this app right now and put it in my MD machine or my um Intel machine, they're going to be able to use the Vidus or OpenVO um uh MPU execution provider properly. So we have really made this much easier and you don't have to specify a device kind and anything like that in your session options. Thanks. Yeah. Awesome. And um let's take

a look at the output um of our app. Here you'll see a bunch of output written by our execution providers. These are actually doing the model compilation. Um now this restnet is a small model so it doesn't take a long time to compile and also I forgot to mention that um you know Onyx you like before you load any model to MP or GPU Onyx have to compile it um to this execution provide pro providers um specific um requirements so they can be loaded properly and if the model size is huge like couple gigabytes then the compilation step can take a few minutes which you definitely don't want to see during every run right so um We to make this better we introduced a model compilation API in onyx which can pre-ompile the model save it to disk and you can just load the pre-ompile version next time um when you want to run it. So

let's see that um all we need to do is to change um before we create a session. Uh let me do this here. Um all I'm doing is I have a path that I I want the compile model to be and if it's missing I'm going to say it's not compiled yet. So I will create a or model compilation options using the exception options we just created. So

this way it's going to inherit the device policy, device selection policy um and use the correct EP. And then we set the input model path to the raw model and the output path to the path I want it to be saved and just call compile model. It's pretty straightforward. All right. And the last thing I want to do is to switch the path I use um to create a session to to the compile path. And let's give it a go.

And I forgot to kill this app, but you can see that um the the output is refreshed and it says model compile successfully. You see a same inference results. And now if we were to run this app again, you'll see that all of those are gone. It's using the compile version. It's slightly faster, but it's on RestNet. It's not very noticeable. But again, you know, if you're using large language model or something like that, this is going to make your life much much better. And I think that is it for my demo. Um, I hope I hope you find it

useful. And again, all the code will be uploaded to GitHub tonight, so you'll be able to try it out. Uh, we will have link to documentations for all of our APIs. Um, uh, and we'll have the link posted at the end of the session, so um, you'll be able to check that check that out as well. That's awesome. [Applause] That was amazing. You wrote that all the

way from scratch. That's like so good. Um, awesome. Okay, so just before we move on from that demo, I'll just add one other point. So the code that Shiai wrote or any code that you write against Windows ML, um, it won't just work on this hardware up here today or even just the hardware that's in market right now.

It'll actually work on future hardware that, uh, comes to the market later or even hardware that has never been designed or conceived yet. Um earlier Shiahi mentioned that Windows ML will always uh scan Oh shoot. Thank you Shiai. Uh yeah I think we're good here. Earlier um Shiaosi uh Shiai mentioned that Windows ML will always download uh the execution providers for the hardware that it sees on whatever device that it's running on. Um, what makes that

possible is a a deep level of collaboration that we've done and a new certification program that we've stood up with all of our major IHVs across Nvidia, AMD, Intel, and Qualcomm. The way that this works is that as these manufacturers bring new hardware to market, at the same time, they will either refresh or create new execution providers to coincide with that new hardware. They will submit those new execution providers to us Microsoft. Uh

we will test those execution providers, certify them to make sure that there aren't regressions and inference accuracy, make sure everything looks good, and then we'll put them on our servers and make them available to Windows ML to be able to go and download um when it sees new hardware on the device. And we you don't have to change any of your app code. So basically, you write to Windows ML, there's future devices hit the market, and it just works. and you didn't have to go and

ship uh some sort of an update. Um that's uh that's effectively the vision of the certification program in this collaboration. These um hardware partners have just been fantastic to work with. They've really bought into the vision. They understand that the

power of the Windows ecosystem is all this hardware diversity and we have to make it so that all of you who are uh writing software can target uh all of that hardware. Um they've been fantastic. You've heard me talk a lot about what we're doing here. I want to just take like 60 seconds so that you can hear it in their words. Uh let's

take a listen. In the PC ecosystem, as we all know, there are multiple IHV platforms with their own architectures and SDKs. What the ISVS need is a way to develop and deploy apps such that they are platform independent. The Windows ML

runtime offers exactly that by auto install of the runtime and execution provider specific to the hardware it's running on. ISVS do not need to select EPS at compile time. Windows ML helps NVIDIA and other hardware vendors deliver optimizations to ISVS and users while reducing the barriers to adoption.

It empowers vendors to productize software innovations faster and allows app developers to unlock interesting new use cases. Windows ML serviceability infrastructure will automatically update the inference software engine of AI so that your applications can adapt as the hardware capabilities emerge. Windows ML is going to be a standardized way of landing workloads on these platforms across different engines. So be it NPU, GPU or CPU. Um through leveraging these deliverables which you know through the quality certification process with with Microsoft um ISVS and customers know uh that this workload is going to land on the right place and the right accelerator with the right balance of performance and battery life.

So you know as I look out in the audience I see a few of you IHVs out there who've uh we've been working with. It's been a real pleasure meeting with you. Pleasure and a lot of work meeting with you multiple times a week uh in person over teams trying to go and stand this up. Thank you so much for that.

[Applause] Um now it's not just our IHV partners that we've been partnering with. We've also been uh very busy partnering with a number of development partners as well. um the uh logos that you see on the screen, all of these folks, we've been sharing early builds of Windows ML. Um getting feedback, helping to hone the APIs, helping to hone the vision, um finding out what works for them and what doesn't as they take it and prototype it and integrate it into their um apps. And

um you know, similarly to how you can sort of hear it in the words of our great hardware vendors, I think it would be worthwhile listening to what these folks have uh to say about experimental one and their experiences trying it out. Let's take a quick listen. Premiere Pro and After Effects are leading professional video editing applications that often handle terabytes of video footage. Our goal is to adopt the new Windows ML once it matures enough to handle the heavy ML workloads required by our video editing apps. A reliable API that ensures consistent performance and accuracy across federalis compute devices would remove significant obstacles. We look forward

to replacing model load and inference code for multiple SDKs with Windows ML. This will simplify our code and testing whenever runtimes update. And we're also very excited that Windows ML could truly deliver build once work anywhere models. And we can't wait to verify and provide feedback on this possibility.

Windows ML is built upon the specialized version of Onyx runtime for ISVS like buffer zone which focus on Windows devices. This is significant since research teams can train AI models with Python environment that contains various packages and dependencies while OnX enable us to forget about those Python dependencies and use .NET and C++ easily without the complexities and Python dependencies. The simplicity amazed me following Microsoft easy approach. Get an Onyx model, add it to your app, and integrate it into your code. We converted a complex AI feature to Windows ML in just 3 days. We've already

adopted Windows ML to help us prototype it on a number of silicon platforms and it's going well. It wasn't a big lift from ORT and it gave us the accuracy and processing performance that we expected. Powder is an early adopter of Windows ML and it has enabled us to integrate models three times faster transforming speed into a key strategic advantage. Okay, so uh earlier you know we saw Shia's demo was a fairly simple uh demo was a console app. Why don't we

take a look at something a little bit more sophisticated using experimental one? If you could all please join me in welcoming Bart, CEO and co-founder of Powder. [Applause] Should be good. Should be good. Um, we are Powder. We run um AJI tools for

gamers powered by Windows ML. What is Powder? We transform game play into highlights automatically. How does it work? You just play. We record, find great

moments, and package them for sharing. Under the hood, we use Windows ML to keep things smooth across hardware. It's the kind of AI native experience that just starts to be possible.

So, this is Powder running on a Windows 11 Copilot Plus PC. It's an Asus Z330 powered by an AMD Stris Halo. I'm going to start a recording and we're going to have um an AI model running live to analyze game play. Uh let me check the game first. Okay, so this is a quirky indie game created by uh Tom who is in the audience uh one of our developer.

Uh and the AI model has been trained to detect level complexion which I just made one of uh I'm going to do another one. Uh in the background there is a vision model that is running to detect uh events. All right, enough uh digging. I'm going to finish the recording. So let's see what Powder found. Uh okay, so we can see a clip with uh two events uh in it. Level

completed once, twice. Perfect. That's exactly what we were shooting for. Um, for game developers, there is no SDK, no integration. Uh, we strictly build on top of our semantic understanding of the audio and the visuals of the game. And if you're a developer, we didn't deploy uh this AI model uh for uh one NPU in particular. We use Windows ML and it

enables us uh to deploy it across uh silicon. We do the work once and it works everywhere. Now, let's thank you and and thanks to the Win ML team, of course. Um, let's switch to something a little bit more cinematic perhaps. Uh, it's a game called Skull

and Bones from Ubisoft. Um, you can see about 40 minutes of game play. Uh and in it we found a number of player eliminations uh of ships uh being u uh sunk really. Um and uh all of that was uh detected live uh on the on the NPU. Uh and if you like to create great videos but you don't like the hassle of video editing uh the powder auto montage uh is where it all comes uh together.

Let me show you. Powder is like uh memories but for PC gamers. We take the raw footage and turn it into great stories uh that you can then uh share with your friends and your community on YouTube X, Tik Tok, whatever fits your world. Uh gamers get great footage, great content. Uh publishers get great UGC. Everyone wins. And the best part, this wasn't a heavy lift at all. We are a small team of 15

people, 10 developers. Um, and the promise of WinML will help us achieve our goal of uh deploying our AI model across all silicons. Uh, if you're a developer, this is your platform. We're it helps you uh deploy serious AI on device without needing deep silicon knowledge. Uh, we're proud to be early, but we're 10 times more excited to see what you create with it. Uh, thank

you, Bart. That's awesome. Thank you so much. It's been It's just been a real pleasure working with Powder. Thanks, guys. It's been a real pleasure working with you. Um, you know, this is on experimental one. We can't wait to

see you guys uh ship this and other features once we hit stable. It's going to be great. Thank you. Yep. Thanks, Mart. Yeah. Um, I mean, I'm a big gamer myself. if I'm not really good at it, but um you know, I think I really need this to catch all of the precious moment I'm like really good at and then show it off to my friends. Um so I can't wait to

see this in market and using our um great Windows ML um runtime. All right, so all of that are super exciting and what is even more exciting is about our uh release road map. Um so as you've seen uh we've released Windows ML experimental one today. Um it already contains a lot of the capabilities that we've discussed today. Uh including um

uh support for all types of processors, a um auto execution prov um provisioning so download the execution providers for you and an early version of the auto device policy for selecting um uh directing AI workloads. And it also includes a full onx runtime API surface to make it easier for you to pour your app into into Windows ML. Note that this is an experimental release, meaning um you should not use it against production yet. Um please definitely try it out. Your feedback is very valuable

to us. Um but we strongly recommend against using it um shipping any app on top of it. All right. And later this year, uh we will ship Windows ML 2.0 stable. Um the exact date is going to depend on the stability of our bits and also the feedback receive from experimental one. Um this will be the first launch uh the first fully supported launch and that will be ready for production use and it will have a lot of improvements especially based on your feedback today.

Um and yeah we we really hope that you'll find this useful and it's going to help you to make um a great AI experience on Windows. All right with that I'm going to wrap up uh my talk today our talk today. Um, again, please Sorry. Right. That's okay. I'm here, too. I'm All right, I'm out. Go back, please.

All right. Um, please try it out. Um, we have a ak.ms try win link that points to our um documentation. Um again all of

the code to our sample app that I showed to I wrote today and also um there will be another sample apps um um in Python and also C++ uh will be uploaded to the AK.ms winml build repo link um tonight so it's not ready yet um because I need to clean up clean up my code and also um if you want to talk us talk to us later uh you could find us in the hub um there will be a expert meetup booth um Ryan sometimes will be there. I will be there. Um and a lot of other

experts in Windows AI will be there. So come and uh chat with us. And if you want to file feedback, which you definitely should, um the first thing you can do is to um once our GitHub repo is live, you can file um GitHub issues from there. Um if you have general feedback about Windows AI foundry, you could email uh Windows AI info@ microsoft.com. And of course you can also use feedback hub to file um feedback. And also these are some of the

related related build session informations. Um there's a great overview of Windows AI foundry that happened earlier today. Also a more detailed talk about AI foundry that also already happened. So if you've missed them um they are recorded. So please check out the recording. Um and tomorrow

from 3 to 4 at this room there will be another talk um about Windows AI APIs. So if you are interested, please check it out. Um and that's it. Um I hope you enjoy our session. Yeah, thanks very

much. Um I hope it was helpful to you. Thank you.

2025-05-26 17:57

Show Video

Other news

Technology in the Workplace 2025-06-02 05:16

MIT Robotics - Cecilia Laschi - Methods and technologies for new robotics scenarios 2025-05-30 09:58

Война силами хайтека | Всё перевернулось с ног на голову (English subtitles) @Max_Katz 2025-05-26 16:41