OpsRamp Tech Talk: Optimize your AI deployments to maximize business value

OpsRamp Tech Talk: Optimize your AI deployments to maximize business value

Show Video

hola Barcelona that's the extent of Spanish I know so good afternoon thank you for being here for those of you in the back if you move forward it will be easier to see the demonstration uh but your choice don't be a stranger please come forward so my name is tuna Gandhi I run product and Technical marketing for opsramp and today we are here to talk to you about how you manage your AI deployments for the folks in the audience how many of you have like AI projects spinning up that you are responsible for let's see a show of hands how's that going yeah well the reason we we started investing in opsramp to manage AI deployments is because it's our thesis that once you get the infrastructure up and running you'll be spending majority of your time in the operations console looking at how to optimize those deployments how to assure performance how to make sure that all those dollars and Euros you spending with Nvidia are being put to good use um and so that's that's kind of the basis of this session um going to skip this over so one of the things that the AI deployments have done for us is it's kind of started this Resurgence of the data center a lot of you folks who have ai deployments you're doing them on your private Cloud because if you put your large scale AI deployments on the public Cloud not only is it very expensive but you have issues with latency and data gravity and sovereignty and security and all of those things the other thing that we see happening in the market with the AI deployments becoming front and center is Convergence of all these operations roles on top of that you have the ever explosion of observable data so these AI workloads not only are the data hungry they're also generating a lot of operational data if you look at the Nvidia gear that you're deploying or what you're doing with the applications on top of it they generate a lot of data so what do you do with this how do you make sure that you're using not only your infrastructure as efficiently as possible but taking advantage of the observable data sign that you're getting to make sure that it is performant and efficient so this is where we started investing in Ops Ram so that we can help you make sense of this world and go from being reactive to proactive and predictive kind of taking a book out of this AI uh itself to manage it so think about it as AI managing AI right what do you have to do you have to make sure that you can make sense of all of this information that you're getting you have to have a system that will constantly learn and relearn and you also have to have actionable insights coming out of that system that can be either automated fully or partially or under your control this is where opsramp we started investing in it so that we could observe all of the data that's coming not just the AI infrastructure data but looking at your AI infrastructure in context with the rest of your data center and workloads we're able to deploy some of the similar techniques that you're working on the foundational models the Transformer based models the neural Nets to make sure that we can analyze this data and then provide actionable insights that you can automate right so in terms of what you get with opsramp it's unified observability AI powered analytics and intelligent automation that still keeps you in control in terms of what it delivers to you if you put all these things together this unified observability we have built not just the Integrations on the Nvidia side but we have 3,000 plus Integrations these are to make sure that you're looking holistically at your entire environment so that you are you can take policy based actions on the data that you're getting so these Integrations are up and down you know the stack they are across your private Cloud thirdparty um infrastructure components public clouds multiple run times whether it's VMS or containers or you know bare metal that you might be running on as well as multiple application types now we put this all through a machine learning and AI engine that we have built uh if you think about uh you know monitoring tools everybody did Time series data and they call machine learning but you know we know it's statistical analysis you're looking at pattern matching things that you've seen in the past but you now also need to take advantage of the AI techniques the very AI projects that you're planning to deploy use the similar techniques to do things like training the model on observability data now a lot of models that you see out there are based on like you know I can put a picture of my dog on the moon that's not observability data so we're building foundational models that take advantage of the metrics logs traces and network flows so that we can make sense of your data center infrastructure and then we go up the stack and we look at your application types and give you a full up and down view from app to infrastructure and then finally you know using that data for some of the basic things that you guys have to do in your data center those same principles apply to your AI deployments whether you're doing automation of regular jobs or you're looking at Auto remediation of issues so what you get is something that gives you this Command Center across your entire environment across your entire Enterprise infrastructure and applications so that you can simplify your operations management again going back to once the infrastructure is and running you got to make sure those gpus are performant you know it's it's it's kind of very interesting in in the olden days when we used to do virtualization it was like okay don't let the load exceed a certain percentage right we would put a buffer in there and say do not go over 60% or 75% is the exact opposite when it comes to gpus you don't want the load to fall because it's so expensive to deploy them and then you're also looking at Power and Cooling and as well as your cost and performance when you're doing this so we are putting this all together to give you a centralized console that you can manage your entire infrastructure with all right so in terms of what opsr can do it has this whole workflow that goes from Discovery to resolution and it can do it across your virtualized environment it can do it for containers running on top of VMS it can do it on containers running on bare metal and also your AI native workloads now this is very critical because it's the same tool it cuts down on the learning curve as you go from one type of application to another it also cuts down and make sure that you're not building yet another Silo that's what got us into trouble to begin with so that is our goal is to make sure you get this holistic picture across the board no matter what your deployment apology is no matter what your runtime is and no matter what your application type is so with that in mind we started building these Integrations with the Nvidia infrastructure so we can support the Nvidia gpus you can see you can see performance you can see load you can see utilization you can see capacity uh you can see uh power consumption all of those are brought forward in a very Consolidated uh dashboard for you we also integrated with the Nvidia uh infin band and ethernet Fabrics as well as we integrate with their cluster manager and this is where we are going to show you in a demo we'll put it all together and show you what it really looks like in the product so with that let me invite Ron over to the stage Hey Ron good morning good afternoon thank you all right thank you for that um so as tuna mentioned I'm Ron Singler I lead our Solutions architecture team here at HP opsramp uh I've been here since the April and uh I'm loving every minute of it so um she asked who has had any AI projects uh at the beginning um anybody supporting developers that are developing Cloud native applications in your environment perfect um as you know 90 plus% of Anya I uh workload or application is developed in a cloud native architecture right they're utilizing python utilizing kubernetes serverless architectures um so what we did was um oh let me ask this first anybody familiar with uh private Cloud AI or pcai for short um they announced it this week awesome um so if you haven't stop by their Booth I would definitely check it out opsramp is included as part of that solution set um but obviously helps streamline um you and your U AI projects along the way so within opsramp we have what we call service Maps right um and a service map simply lays out your application or workload and in this case we're showing an AI inference job um with the services and showing the latency between those Services uh at that layer within the service map um now we're monitoring and observing all the actions within each layer we can send you alerts you see some uh numbers there next to the CPUs and gpus um annotating how many um we're actually monitoring or or observing at that layer but you know moving down the stack we're moving into the kubernetes layer you can see all of the pods which um service or workload is tied to which uh container within a pod uh below that the physical infrastructure showing again the CPUs and gpus and then moving down the stack into the network and storage environment now the other way that we show that in the solution is through dashboarding right and just moving you know top to bottom of that full stack again here we have the model requests um along with the latency by each model revision the total request and then showing any 500 or 400 errors um that we may be seeing in the environment showing CPU and memory utilization for that specific model and then moving down the stack into your Cube flow metrics right so your Cube flow is probably running in a kubernetes cluster running on a number of nodes and pods um you want to see how that model might be affecting the overall uh cluster um that is probably serving servicing many other U models within that environment um and then your actual kubernetes cluster metrics there below that get into GPU monitoring right how much utilization are you getting out of those gpus as tuna said earlier you want to maximize your utilization right you're spending a lot of money on gpus you want to use that um as much as possible uh and then you know continuing to move down the stack and the dashboard through uh Network and storage so I'm going to move over to the demo now and put on my glasses so I can see what we're talking about here all right perfect um so let me just refresh real quick make sure we're good to go all right so as I mentioned earlier the vast majority of AI workloads U or applications are written in the cloud native archit Ure right so um I'm modeling that here through a unified observability dashboard and we have an e-commerce system running and it looks like we have uh some problems with availability um 10 of our subservices are up but we've got two that are down um our infrastructure resources look like they're still up and running so no issues there as I scroll down I can see that one change in the environment has been detected so maybe somebody uh made a change without going through the right change control processes um we want to know about that because we have an alert uh or an issue within the environment we've automatically created a jir ticket uh on the back end so we can see that reflected here then we can see our ticket history for specific Services uh within the environment that we may um just want to take action on or check out now if I come back up to the top I can see my application response time doesn't look too horrible it's it's hovering right around 5 milliseconds with a few spikes in there but I can see got a couple services that are taking a second or longer to respond right so my grocery store and my shopper service here I've got six open alerts with two correlated alerts um I'll we'll come back to that here in a second then I can also see all of my alert history the number of times that's happened where I'm at right now as far as the alerts uh and check that out I've got uh a heat map here for the nodes within the environment the darker the red the more utilized that that specific resource is it could be CPU memory or or your GPU for that matter uh and if I come back up uh here I can see my request count is pretty low right now right so we're we're at um 2.9 and I should be spiking up a little bit higher uh and my store request count is is pretty low right now so I can obviously see that my latency is an issue in the environment right so um my store latency is up around 4 seconds obviously that's horrible um and and um my issue um my latency issues uh is is pretty active as well so let's take a look at these correlated alerts and so opsramp uh has two correlated alerts um this one um looks like we have multiple related alerts occurring within 20 minutes of each other on uh umun 203 uh the first metric was ecomerce it wants us to review the correlated alerts and uh recommended action in the activity law so if I click on the activity log here because we've seen these patterns before opsramp is recommending that um based on our first response policy um they're recommending that we run the process called stop latency test right so we already know how we can solve this problem based on the ml seeing the patterns and making a recommendation on how we can solve that problem but let's go figure out the why we'll click on this other correlated alert as well um here we'll click on the actual correlated alerts and look at the individual alerts here I have uh three different metrics or services so grocery store request my inventory request count my high visit uh uh store latency um and this one is actually a trace so let's dive into the trace by clicking on the time range for that we'll let this pop up and we can see immediately at the top all of the uh latency um each time this request is made we can see where it ran fine and then we we now have another issue we'll sort by duration and we'll pick the longest one so this one looks like about 4 seconds we can click on it and then click view details and it's going to show me a waterfall view of that trace and then all of the spans that are contained within that Trace right so all of the different API calls or service calls with in that specific uh process are shown here and we can see that our products and our in inventory requests are taking about 2 seconds each in a web app or uh in normally in in any application uh or the vast majority of applications let's say um you know a two second response time is pretty bad uh so let's go back over to the dashboard and let's take a look at the logs for this specific uh Spike here so I'll go back to the dashboard click view logs and it'll drop me right into the log view now I already have a log View saved called application server logs and all I've done here is I've I've sorted by uh the host umuntu 03 and then looking for a line called application server within every log and then that's what we're filtered on I'm looking at the last four hours uh obviously you can add any number of filters and save these however you would like if there's something that uh you need a shortcut to or of course you can browse all of those logs that were on the the previous SCH screen prior to the filter um so it looks like in our logs our developers have left the application server latency test uh running right so um most developers developing Cloud native applications want to try to break that application before it goes into production evidently our developers for some reason forget to turn that test off uh this specific test so uh we've seen this pattern before now we know that we've created First Response alert because they they tend to to be forgetful so we can go and respond to that happening so if I go back over to my log or excuse me my um correlated alert screen I can again click back down on my activity log as this suggests I can click run process and we already have the stop latency test process in here and we just give it some comments we'll put in Ron demo click run process and it's running right so uh it's going to go off and it's going to fix this problem automatically now you're probably thinking to yourself well is that you know autonomous it operations well no because we did it manually for the demo right however you can absolutely automate that entire process so that if a developer leaves a test running that you're familiar with that they have had problems with in the past you can create a ticket alert somebody email page whatever um and then automatically take action on that or wait for approval to take action however you would like to automate that um so you know complete autonomous it operations there for um our Cloud native applications so let's refresh this it takes a minute or two we can already see the number of correlated alerts have gone down our open alerts have gone down and over the next minute or two we'll see this um continue to go down and uh fix itself so um that's the meat of the demo right we can see we now have 12 subsurfaces up and running nothing is down we're fully available uh zero open alerts and zero correlated um alerts and we can see that uh our issues our latency issues is back down to zero we can see our number of requests uh starting to creep up um for both inventory and our store services so um you know it's all about OBS observing patterns within the data helping you understand how you can take action to solve those problems and get to a resolution uh much quicker so as we come to a close here um you're probably asking yourself well how do I get this cool great technology right so we have a number of uh Solutions and packages um that opsramp is available with I mentioned pcai earlier um opsramp obviously comes um as a standalone solution if that's something you want but we also bundle it with um opsramp as part of our managed Services offering uh so if you want or are currently using HP managed Services they're using opsramp on the back end give you access to all of these wonderful dashboards and they'll build them for you uh and and make that happen for you uh Complete Care Service right um that's part of their it Ops uh solution um there are two different versions uh that are packaged there um and then we're also included as part of uh the greenen lake Flex Solutions uh were embedded um with uh opsramp as part of AR Ruba Central um I think starting very soon um so and that was announced back in September um we've got opsramp in private Cloud AI I mentioned that earlier and then um opsramp and private Cloud disconnected so uh that was something that a number of our customers have been asking for for many many months um is opsramp running on premises in a vault and a completely disconnected environment that's now A solution that we're offering our customers part of uh what we call pced so there we go um so check us out um we're in the booth uh I'll be there the rest of the day number of my folks are in there to answer any of your detailed questions got uh the big wall demo at 3:00 um I'm sorry at 1:30 we've got our Hands-On lab if you want to get your hands on the product um actually use it uh it's a great walkth through um to get familiar with the product and then please follow us on all of our socials and um yeah

2024-12-02 05:53

Show Video

Other news