Accelerating Data Science with the C3 AI Platform

accelerating data science on C3 platform and how how we achieve that my name is Maddie masumi and I'm joined by some of my colleagues here today that I would ask them to stand up and show faces so you can work with them and ask the questions if they if you like so these are some of the data science members at C3 along with our product team mlai product team so please feel free to catch us for the remaining of the of the transform and we'll be happy to to work with you with any questions you might have I would like also some Brave volunteers to introduce themselves to get a sense of who is in the session if you are a customer or if you're a prospect if you could please just say your name the organization you're from the role in the organization and if you're an existing customer or Prospect anyone yes please I'm going to fit into a category now I'm Steve Meyer one of the co-founders of kung fu AI prospective partner for services for extending the C3 ad platform and also technology a lot of customers we have a lot of things in common so excited to be here super thank you Steve anyone else back in 2018 that's right anybody else wants to yes please I guess we've been working with C3 since 2020 that's right fantastic thank you maybe one more anyone else want to yes please back there fantastic thank you thank you thank you and thank you Simon um so for folks who were Brave I have one more question so especially for uh prospects of C3 platform I would like to understand you know what are data science practices in your organization uh the success stories from your team and especially the pain points that you might be solving like why are you here right there must be some pain points that you're facing today either in terms of model experimentation putting models in production or scaling out AI applications right anything that is on the top of your mind that you want to share share with us what brings you here and pain points yes in making capacity for the organization and what role does AI training uh we are very clear view with a dashboarding bi kind of a company today so how do you go much more in terms of predictions in terms of ability to optimize in real time in terms of really bringing the experience to the decision makers and much better quality and much uniform quality of decision making across that's the rational for my FBI and that's probably the problem of points and how they can be anywhere whether supply chain or manufacturing or you know all right thank you Jim snobby also mentioned that right not only in the boardroom you want to look ahead but also the rest of the organization any other prospects that are having pain points in scaling out their AI applications that want to talk about that is a testing of your whole ml system been a very problem data drift sounds brings a bell for you guys at all Model Management it does for some people all right so we're going to be talking about those just uh to set the the stage um but in order to get there let's look back in the last 40 years in the information technology and in computer engineering in general there has been these sort of defining moments and defining uh Milestones that that have been achieved from back in the 80s 86 instruction set architecture to introduction of java to the Advent of Hadoop and mesos which was essentially a precursor to kubernetes and most recently Spark what's the common theme here just uh asking the crowd here what do you see as a common theme Here concept change is constant definitely yes what else abstraction wow brilliant spot on abstractions so abstractions are very important why back in the 80s 86 instruction set architecture was an abstraction for machine code Java is an abstraction for interoperability of different OS systems Hadoop is an abstraction for distributed file based processing mesos is an abstraction for workload management spark is an abstraction for in-memory compute and we probably need more more abstractions going into the future but why why abstraction why are they what's the point of abstractions abstractions allow for separation of responsibility so that you can independently innovate in certain areas abstract that away abstract the complexity away so you don't have to deal with it as you go and progress into your journey in in the computer engineering history so that's the point and what's the benefit of abstractions why do we what do we get from introducing abstractions why do we need to have abstractions brilliant wow we have a good audience here increase productivity increase productivity is a result of abstractions and this is what indeed what we have we have shown in the past you should take c3ai as an abstraction that I will go into details of why and how how much increase in productivity we get as we see it on top of infrastructure as a service Cloud providers uh We've shown between 18 to 20x 26x Improvement in productivity is gained can you imagine today that you had to train a large language model using Punch Cards and how difficult that would be why can we do that thanks to abstractions so abstractions are indeed the enablers for innovation in in computer history the reason we can have chat GPT today is the is due to abstractions so you guys all have been in the general area you've seen this huge Banner this is Enterprise AI have you wondered what is enterprise AI what do we mean by Enterprise AI let me take you through a journey to introduce what Enterprise AI is and I'll use a video game analogy here to draw analogs between the video game that some of us have played and still play some of us we won't say who but also with respect to Enterprise AI in the video game Journey level one is easy it's teaching you the rules of the game uh how to play the game and you're you're learning um you get quick wins and you go to the next level you know that there's the next level and you know that the next level is going to be harder and you're learning as you go you're learning new tactics you know you're learning new strategies you're expected to fail at each step at each level before you go to the next one you've you failed so many times and you learn and you advance right this is all familiar to us um the other thing that is specific to video games is that you can't see what's next until you actually get there so as you're in level one you don't know what level two looks like as you're in level two you don't know what level three four or five looks like you don't know how many levels there are right sometimes and um with with each new level the strategies that worked for you will not work in the new level so whatever strategy got you to level three will likely not take you to level four you have to learn new things you have to adapt you have to learn from not only past mistakes but also adapt new strategies to win the game so with that let's look at the Enterprise AI Journey just like video games we have levels so level one you train a model right easy we're learning railing the rules of the game quick wins great life is good but we know that there are levels ahead of us we know there are multiple levels um six seven Next Level could be now that you train the model let's deploy the model and Implement batch inference let's now um use the predictions of this model on top of a tableau dashboard and and show it to the end user that's level one level two you may start to have multiple models and you may need to in order to manage your runtime for different models containerize those models you may want to be required to provide those as rest apis and Version Control and explainability might be needed from from the end users there are level um five six and seven and the question always is is there another level and level five you might have on-demand inference you might have ml Ops model Ops as a requirement drift detection in levels level six you might have um have to essentially have support for shadow models Champion Challengers a b testing this is really getting hard now right we are at level six we're living we're playing the video game at level six and we have to design a system that works with all these requirements you may have dozens of models in production do you think there's a Next Level I think we are pretty Advanced at this point right we're pretty Advanced here remember in the video game we are we always have a blind spot we don't know what the next level level is you think we have a blind spot here probably probably there is one more level s seven maybe another level eight nine ten getting difficult so level seven um whatever you designed your system off of up to level six will probably not be good enough for level seven you have now model lineage you have to support model risk management you may need to scale to hundreds of models in production doesn't if we thought doesn't model in production is is difficult how about 100 models in level 8 you may want to introduce model training templates it's a it's a whole new concept you didn't even know what model Training Concepts are model training templates are modern monitoring and retraining within the application imagine having an AI application where you can go to the UI and train a model by the end user generative AI comes in right at level eight and the user one generative AI as part of the the UI and at level 10 you may have hundreds of mlai applications so we go from hundreds of ml models to hundreds of ml AI applications in production so imagine that complexity you may have industrial methodology for AI and ML and tens of thousands of models we have customers who are playing at level five we have customers who are playing at level eight and we have customers who are playing at level 10. guess who is playing at level 10. so with shell we announced this is this article is a bit old 2022. um we have 20 billion rows of data being processed weekly three million sensors we have as of at that time 11 000 machine learning models in production now it's more Richard can tell us about the num what the number is 16 000. so in the last year this is very interesting in the last

year we've added 5 000 new models in production this article is from March of March 8th what is today today is exactly exactly a year ago so 5 000 models trained in production within a year so we are playing at different levels right let's talk about these two foundational papers which I personally found very interesting and I really encourage you you all to read if you haven't so far um they're both they both happen to be out of Google um they bring up very interesting assertions essentially in terms of what are the requirements of mlai architectures and systems the first assertion they talk about is that only a tiny fraction of code in many ml systems is actually developed is devoted to learning or prediction much of the remainder may be described as Plumbing sounds familiar so 95 percent of the code we write for nml system oftentimes is written to handle monitoring feature extraction configuration data collection and only five percent is truly devoted to learning piece which is the core capability of our systems assertion two machine learning software systems are fundamentally different from traditional software systems you wonder how they might be different so in a traditional Software System you have code and you have a system that you run the code on in order to test this whole system you need to have unit tests for your code you need to have monitoring system monitoring for for the system that is running the code and you need integration test right but let's see what ml systems look like what you see as new components here we see data and we see model training and as a result of introduction of these two concepts we have to do way more testing and in terms of unit functional testing and integration testing we need to have data tests we need to have a skew test data monitoring we need to have ml infras infrastructure test we need to have model tests and production monitoring in addition to the existing tests that you have in your traditional system so this talks to the complexity of the mlai systems and Enterprise AI one last assertion consider ml training as analogous to compilation where source is both code and training data by the analogy training data needs testing like code and trained model needs production practices like a binary dose such as debuggability role rollbacks and monitoring so these all talk about how the Beast that we're working with is is something else and there goes the need for a new abstraction layer We Believe C3 is that abstraction layer that is needed for Designing developing deploying and operating large-scale AI applications let's look at the the current approach in today's most most of today's Enterprises we all have clouds we are all working on one of the three or more hybrid clouds Google Amazon and Azure and they're great we love them without clouds we wouldn't be here right without cloud C3 doesn't exist and a lot of customers want to be able to do their jobs but let's let's walk through what happens when you start uh running experiments running projects and writing applications on Discord so for each application you start with a notebook for data pre-processing right you load your data you transform the data into a better version you dump it for future use someone else picks that data up in another notebook does feature engineering another person or yourself later have another notebook for Eda you have another notebook for model training right how do we test these steps of the way how do we test different features that you have in in the model in a best practices way as it's improved in software engineering of the way even for one application we end up with this notebook underscore final notebook name underscore final final underscore final capital letter underscore final final underscore VF right we've all been there so as you can see this is not the best way to build robust Enterprise AI applications so what do we do C3 comes in and we sit on top of the clouds we provide these abstractions for data unification abstraction for AI machine learning abstraction for continuous data processing for elastic compute and the result of this is a set of either CTR applications or your custom AI apps or data science products that are built on top of this this capability but how do we do that how do we not end up with that spaghetti of notebooks and models and so on the approach of C3 has been completely different we end up essentially providing ways to be able to have data pipelines which are reusable feature engineering and pipeline feature pipelines are meta-driven reusable persisted searchable every component of the way from data transforms to Future engineering to model training and model deployment are metadata driven types that are designed documented and tested and essentially are upserted and they're you're able to come with your colleagues within the team across different teams collaborate using those artifacts so the full reusability a cohesive experience that is provided by this architecture Version Control composability so composability of each of these components from a feature to an ml Pipeline pre-processing and post-processing of pipelines is a is a big enabler for increased efficiency a lagging logging monitoring and debugging capabilities are super important you remember in the new ml systems we don't only have unit tests for code we have tests for data we have tests for ML infrastructure so debugging needs to be easy debugging and monitoring and logging needs to be there and all the testings that we talked about so let's talk about how the world with we learn without C3 looks like what's the current state of things and what's the future state of things what we have enabled essentially by these architecture is that you today your data teams and your data data data engineer and data science teams essentially they're spending majority of their time on data preparation and future engineering and on model governance scheduling and app integration and least amount of time is really spent on where most of the value is which is model experimentation and model development so with C3 AI the data scientists spend more time on the value-add activities which is as we talked the middle part and less time on TDS tasks such as data wrangling and scheduling why thanks to abstractions they have been abstracted away and simplified d3i provides out-of-box capabilities that the DS team are essentially manually designing today such as scheduling and API endpoints those are out of the box capabilities for us we unblock promotion of thousands of models with robust model operations and governance Central centralized registry and a pre-packaged model Ops per user these are the ways we are essentially allowing the users to spend more and more time on model experimentation citria enhances your cloud platform's data stores by providing data unification and virtualization we enable artifact testing and reusability as we talked about so we can go into the details of you know in each step of the way essentially how what are the steps where we by which we enable this reduction and has reallocation of the task essentially and reallocation of the time spent on task I won't go through every single line here but the idea is that for example on the data modeling part if a C3 type is remapped to a different origin Source all the downstream analytics and applications are updated automatically there is a single fetch command no matter where your data your persistent data Technologies thanks to that abstraction layer there is a in terms of future engineering which is one of my favorite Parts the expression engine library has very rich capabilities for outlier removals out of the box for interpolation for other operations logical operations all features are automatically stored as searchable metadata so you might be familiar with the concept of feature store where you store the a version of the data but here what we've done is uh we go one step back and we store the definition the metadata Associated to each feature what are the benefits of that these are designed as objects that are composable you can test them you can persist them other your colleagues can can search and access those they can compose those into more complicated features and essentially opens the way for collaboration here on the model training side we can recompose existing models as multi-step ml pipelines that are again objects that are persisted in the database their versioned there are metadata driven they are again persisted and your colleagues can access them and work with them enabling you know Innovation and uh working together culture we can quickly Define model experimentation setups and train large number of models using their model deployment framework model explainability is provided off the shelf on these ml Pipelines and train models are automatically registered for Model Management so these are the ways in fact we enable that increasing productivity right that our colleague here mentioned thanks to the abstractions and on the last chunk of the the work we have scheduling monitoring application integration which today our data data teams data science or data Engineers are spending a lot of time on we are again abstracting away a lot of the complexity there jobs are backed by easy configurable resources that we can manage via a declarative language job history and statuses are logged for easy monitoring of the jobs model deployment segments are allow deploying and monitoring models against the full population a segment of the population or per asset these are some of the examples of how this increased efficiency and increased productivity is gained via the abstraction of uh C3 platform so now I would like to open it up for some discussion any questions you might have any best practices that C3 customers may want to uh share with the with the crowd in terms of how they're using these capabilities to achieve better efficiency and productivity in their teams what is the composition of Coe teams that are set up and any other thing that you might be you think might be good for the rest of the crowd to know existing C3 customers in the room if you would like to share please feel free to Enlighten us Richard um so running for three years because you were 34 I'd control valve sixty thousand piece of equipment across multiple applications including optimization integral reliability notes what we found was the initial start building up the capability and in the Coes was starting off with that's a very sort of high-level qualified data scientist and I'd say you know maybe it was actually one of them from the C3 so building up the the skills of knowledge on the platform on C3 is an arduous task in itself it's very foreign so we found complementing the development teams and we sort of created these pods which are generally made up of around two data scientists to three four and Developers for each application every stocked up and development the deep understanding of the platform thankfully I think through our experience you've got a professional set of documentation you've got a slightly more performant uh platform you've got a type of integration of visual studio and these things didn't exist in the beginning so because it was really was a long journey um keeping that compliments and really also ensuring that the data engineering piece is done well because without that everything else is second agreement so the daily scientists are also the unicorns also they think they are but they're certainly you know they're sort of heavy equations so we you know we did that we also worked pretty much in the DeVos which is a methodology um because at the end of the day these apps have to be supported and and when something goes wrong the question is is that data availability data corruption issues on the platform model issues a lot of problems um the complexity of the data Sciences from the application it's our remote operations by Infinity can actually create models and these we find have enabled us to accelerate to its own growth thank you Richard so this is this goes back to again 2018 with that version of C3 this latest version are probably even even better but but no I think the idea is that the way we set up the Coe in the beginning I think there were four or five C3 employees and about 10 to 12 across two products exactly 10 10 to 12 shell data scientists and engineers and at the same time we were working on two applications so in the first couple of months three to four months we not only trained these uh to all shell developers and data scientists but also essentially built from scratch the two applications that shell wanted us to and it was a very for me it was a very fulfilling experience and we hope to do that with with many more customers anybody else who wants to share some some insights or or if there's any question by the way from the like yes please of course acceleration I would say the beginning is a lot of ups and downs you know we've run into a lot of problems here I haven't certainly at the time I saw this is right um there's a lot of changes that have happened on the platform very very important documentation there was no Source Forge if you could go to and sort of put up something experience as we sort of develop the applications and build up the code for infos and knowledge in-house slowly expanding I'm now working with external Partners to build up capability for contract resources as well so Alexander so we've gone through we've probably got some products and total introduction in terms of the items that we investigated from a proven concept of accelerator approach that we sort of harmful or fail there's probably another time and there's another client or six and developments at the moment at the moment so it has got a better and faster we've also got a much wider internal Shell Code base to referred to um we continue to use reactors as our front energy a lot more integration of other systems we as we progress away the eyes so yeah absolutely I think I think and now they've got a bigger support community in totally as well so we're not starting the size of the series is something two months into that James that just after we've finished training the first week of people and staff development we picked up to work for central time and then since then we you know have a number of other application areas so you went Institute with the virtualization phase we that was a problem that we had in General Studies but so it's a historian we use this from isosceles which is a proprietary close the data skills so we had to work automatic together without spine to so it was a separate piece of the program that we ran in parallel at the same time now we have that flowing through a live streams we don't only use that data before but you know that in itself was a joke whether it's what you do today which is one of them tomorrow um you know the performance structure assistant so we can contribute as well rather just take in terms of best practice it's speaking of Attraction it's amazing listening to that panel earlier and your comments how pretty much everything that was said is exactly the same experience in the five years receiver apart from the precise you know and so I don't agree with what you do with that but no I mean I think it's a really short engine it's the same as a fair discussions over the course of these last few days I think it's very much the same stories they're probably the same in every industry things yes yes and Andrew's always there yeah yeah that was very confusing exactly you have to leave some room for improvement you know all right thank you so much everyone we are at our thank you we have to go to the General Session area really appreciate your time thank you

2023-09-10

Show video