Accelerating Data Science with the C3 AI Platform

Show video

accelerating data science on C3  platform and how how we achieve that   my name is Maddie masumi and I'm joined by some  of my colleagues here today that I would ask them   to stand up and show faces so you can work with  them and ask the questions if they if you like   so these are some of the data science members at  C3 along with our product team mlai product team   so please feel free to catch us for the  remaining of the of the transform and   we'll be happy to to work with you  with any questions you might have   I would like also some Brave volunteers to  introduce themselves to get a sense of who   is in the session if you are a customer or if  you're a prospect if you could please just say   your name the organization you're from the role  in the organization and if you're an existing   customer or Prospect anyone yes please I'm  going to fit into a category now I'm Steve   Meyer one of the co-founders of kung fu  AI prospective partner for services for extending the C3 ad platform and also  technology a lot of customers we have   a lot of things in common so excited to  be here super thank you Steve anyone else back in 2018 that's right  anybody else wants to yes please I guess we've been working  with C3 since 2020 that's right fantastic thank you maybe one more anyone else want to yes please back there fantastic thank you thank you thank you and thank  you Simon um so for folks who were Brave I have   one more question so especially for uh prospects  of C3 platform I would like to understand you   know what are data science practices in your  organization uh the success stories from your   team and especially the pain points that you  might be solving like why are you here right   there must be some pain points that you're facing  today either in terms of model experimentation   putting models in production or scaling out AI  applications right anything that is on the top   of your mind that you want to share share with  us what brings you here and pain points yes in making capacity for the organization  and what role does AI training   uh we are very clear view  with a dashboarding bi kind of a company today so how do you go much more in  terms of predictions in terms of ability to   optimize in real time in terms of really  bringing the experience to the decision   makers and much better quality and much  uniform quality of decision making across   that's the rational for my FBI and that's  probably the problem of points and how   they can be anywhere whether supply  chain or manufacturing or you know all right thank you Jim snobby also mentioned  that right not only in the boardroom you want   to look ahead but also the rest of the  organization any other prospects that   are having pain points in scaling out their  AI applications that want to talk about that is a testing of your whole ml system  been a very problem data drift sounds   brings a bell for you guys at all Model Management it does for some people all right so we're  going to be talking about those just uh to   set the the stage um but in order to get  there let's look back in the last 40 years   in the information technology and in computer  engineering in general there has been these   sort of defining moments and defining uh  Milestones that that have been achieved   from back in the 80s 86 instruction set  architecture to introduction of java to the   Advent of Hadoop and mesos which was essentially  a precursor to kubernetes and most recently Spark   what's the common theme here just uh asking the  crowd here what do you see as a common theme Here concept change is constant definitely yes what else abstraction wow brilliant spot on abstractions so abstractions are very important  why back in the 80s 86 instruction set   architecture was an abstraction for machine code  Java is an abstraction for interoperability of   different OS systems Hadoop is an abstraction  for distributed file based processing   mesos is an abstraction for workload  management spark is an abstraction   for in-memory compute and we probably need  more more abstractions going into the future   but why why abstraction why are they what's  the point of abstractions abstractions allow   for separation of responsibility so that you  can independently innovate in certain areas   abstract that away abstract the complexity away  so you don't have to deal with it as you go and   progress into your journey in in the computer  engineering history so that's the point and   what's the benefit of abstractions   why do we what do we get from introducing  abstractions why do we need to have abstractions   brilliant wow we have a good  audience here increase productivity   increase productivity is a result of abstractions  and this is what indeed what we have we have   shown in the past you should take c3ai as an  abstraction that I will go into details of why   and how how much increase in productivity we  get as we see it on top of infrastructure as a   service Cloud providers uh We've shown between 18  to 20x 26x Improvement in productivity is gained   can you imagine today that you had to train  a large language model using Punch Cards   and how difficult that would be why can we do that   thanks to abstractions so abstractions are indeed  the enablers for innovation in in computer history   the reason we can have chat GPT  today is the is due to abstractions   so you guys all have been in the general area  you've seen this huge Banner this is Enterprise AI   have you wondered what is enterprise AI what do we mean by Enterprise AI let me take you through a journey  to introduce what Enterprise AI is   and I'll use a video game analogy here to draw  analogs between the video game that some of us   have played and still play some of us we won't  say who but also with respect to Enterprise AI   in the video game Journey level one is easy it's  teaching you the rules of the game uh how to play   the game and you're you're learning um you  get quick wins and you go to the next level   you know that there's the next level and you  know that the next level is going to be harder   and you're learning as you go you're learning new  tactics you know you're learning new strategies   you're expected to fail at each step at each  level before you go to the next one you've   you failed so many times and you learn and  you advance right this is all familiar to us   um the other thing that is specific to video  games is that you can't see what's next until   you actually get there so as you're in level one  you don't know what level two looks like as you're   in level two you don't know what level three  four or five looks like you don't know how many   levels there are right sometimes and um with with  each new level the strategies that worked for you   will not work in the new level so  whatever strategy got you to level three will likely not take you to level four you  have to learn new things you have to adapt   you have to learn from not only past  mistakes but also adapt new strategies   to win the game so with that let's  look at the Enterprise AI Journey just like video games we have levels so level  one you train a model right easy we're learning   railing the rules of the game quick wins great  life is good but we know that there are levels   ahead of us we know there are multiple levels um  six seven Next Level could be now that you train   the model let's deploy the model and Implement  batch inference let's now um use the predictions   of this model on top of a tableau dashboard and  and show it to the end user that's level one level   two you may start to have multiple models and you  may need to in order to manage your runtime for   different models containerize those models you may  want to be required to provide those as rest apis   and Version Control and explainability  might be needed from from the end users   there are level um five six and seven and  the question always is is there another level   and level five you might have on-demand inference  you might have ml Ops model Ops as a requirement   drift detection in levels level six you might  have um have to essentially have support for   shadow models Champion Challengers a b testing  this is really getting hard now right we are at   level six we're living we're playing the video  game at level six and we have to design a system   that works with all these requirements you  may have dozens of models in production   do you think there's a Next Level I think  we are pretty Advanced at this point right   we're pretty Advanced here remember in  the video game we are we always have a   blind spot we don't know what the next level  level is you think we have a blind spot here   probably probably there is one more level  s seven maybe another level eight nine ten   getting difficult so level seven um whatever  you designed your system off of up to level six   will probably not be good enough for  level seven you have now model lineage   you have to support model risk management you may  need to scale to hundreds of models in production   doesn't if we thought doesn't model in  production is is difficult how about 100 models   in level 8 you may want to introduce  model training templates it's a it's a   whole new concept you didn't even know what model  Training Concepts are model training templates are   modern monitoring and retraining within the  application imagine having an AI application where   you can go to the UI and train a model by the end  user generative AI comes in right at level eight   and the user one generative AI as part of the the  UI and at level 10 you may have hundreds of mlai   applications so we go from hundreds of ml models  to hundreds of ml AI applications in production   so imagine that complexity you may have  industrial methodology for AI and ML   and tens of thousands of models we have  customers who are playing at level five   we have customers who are playing at level eight  and we have customers who are playing at level 10. guess who is playing at level 10. so with shell we  announced this is this article is a bit old 2022.   um we have 20 billion rows of data being processed  weekly three million sensors we have as of at that   time 11 000 machine learning models in production  now it's more Richard can tell us about the num   what the number is 16 000. so in the last  year this is very interesting in the last  

year we've added 5 000 new models in production  this article is from March of March 8th what is   today today is exactly exactly a year ago so 5  000 models trained in production within a year   so we are playing at different levels right  let's talk about these two foundational papers   which I personally found very interesting  and I really encourage you you all to read   if you haven't so far um they're both  they both happen to be out of Google   um they bring up very interesting assertions  essentially in terms of what are the requirements   of mlai architectures and systems the first  assertion they talk about is that only a tiny   fraction of code in many ml systems is actually  developed is devoted to learning or prediction   much of the remainder may be described as Plumbing  sounds familiar so 95 percent of the code we write   for nml system oftentimes is written to handle  monitoring feature extraction configuration   data collection and only five percent is truly  devoted to learning piece which is the core   capability of our systems assertion two machine  learning software systems are fundamentally   different from traditional software systems  you wonder how they might be different so in a   traditional Software System you have code and you  have a system that you run the code on in order to   test this whole system you need to have unit tests  for your code you need to have monitoring system   monitoring for for the system that is running  the code and you need integration test right   but let's see what ml systems look like what you  see as new components here we see data and we see   model training and as a result of introduction of  these two concepts we have to do way more testing   and in terms of unit functional testing and  integration testing we need to have data tests   we need to have a skew test data monitoring we  need to have ml infras infrastructure test we   need to have model tests and production monitoring  in addition to the existing tests that you have   in your traditional system so this talks to the  complexity of the mlai systems and Enterprise AI   one last assertion consider ml training as  analogous to compilation where source is both   code and training data by the analogy training  data needs testing like code and trained model   needs production practices like a binary dose such  as debuggability role rollbacks and monitoring   so these all talk about how the Beast that  we're working with is is something else and   there goes the need for a new abstraction layer  We Believe C3 is that abstraction layer that is   needed for Designing developing deploying  and operating large-scale AI applications let's look at the the current approach in today's  most most of today's Enterprises we all have   clouds we are all working on one of the three  or more hybrid clouds Google Amazon and Azure   and they're great we love them without clouds we  wouldn't be here right without cloud C3 doesn't   exist and a lot of customers want to be able to  do their jobs but let's let's walk through what   happens when you start uh running experiments  running projects and writing applications   on Discord so for each application you start  with a notebook for data pre-processing right   you load your data you transform the data into a  better version you dump it for future use someone   else picks that data up in another notebook  does feature engineering another person or   yourself later have another notebook for Eda you  have another notebook for model training right   how do we test these steps of the way  how do we test different features that   you have in in the model in a best practices  way as it's improved in software engineering of the way even for one application we  end up with this notebook underscore final   notebook name underscore final final   underscore final capital letter underscore final  final underscore VF right we've all been there   so as you can see this is not the best way  to build robust Enterprise AI applications   so what do we do C3 comes in and we sit on top  of the clouds we provide these abstractions for   data unification abstraction for AI machine  learning abstraction for continuous data   processing for elastic compute and the result  of this is a set of either CTR applications or   your custom AI apps or data science products  that are built on top of this this capability   but how do we do that how do we not end up with  that spaghetti of notebooks and models and so on the approach of C3 has been  completely different we   end up essentially providing ways to be able  to have data pipelines which are reusable   feature engineering and pipeline feature pipelines  are meta-driven reusable persisted searchable   every component of the way from data transforms  to Future engineering to model training and   model deployment are metadata driven types  that are designed documented and tested   and essentially are upserted and they're  you're able to come with your colleagues   within the team across different teams  collaborate using those artifacts   so the full reusability a cohesive experience  that is provided by this architecture Version   Control composability so composability of each of  these components from a feature to an ml Pipeline   pre-processing and post-processing of pipelines  is a is a big enabler for increased efficiency   a lagging logging monitoring and debugging  capabilities are super important you remember   in the new ml systems we don't only have unit  tests for code we have tests for data we have   tests for ML infrastructure so debugging needs  to be easy debugging and monitoring and logging   needs to be there and all the testings that we  talked about so let's talk about how the world   with we learn without C3 looks like what's the  current state of things and what's the future   state of things what we have enabled essentially  by these architecture is that you today your   data teams and your data data data engineer and  data science teams essentially they're spending   majority of their time on data preparation  and future engineering and on model governance   scheduling and app integration and least amount of  time is really spent on where most of the value is which is model experimentation and  model development so with C3 AI the   data scientists spend more time on the value-add  activities which is as we talked the middle part   and less time on TDS tasks such as data wrangling  and scheduling why thanks to abstractions they   have been abstracted away and simplified d3i  provides out-of-box capabilities that the DS team   are essentially manually designing today such as  scheduling and API endpoints those are out of the   box capabilities for us we unblock promotion of  thousands of models with robust model operations   and governance Central centralized registry and  a pre-packaged model Ops per user these are the   ways we are essentially allowing the users to  spend more and more time on model experimentation   citria enhances your cloud platform's data stores  by providing data unification and virtualization   we enable artifact testing and reusability as we  talked about so we can go into the details of you   know in each step of the way essentially how  what are the steps where we by which we enable   this reduction and has reallocation of  the task essentially and reallocation   of the time spent on task I won't go through  every single line here but the idea is that   for example on the data modeling part  if a C3 type is remapped to a different   origin Source all the downstream analytics  and applications are updated automatically   there is a single fetch command no matter where  your data your persistent data Technologies thanks   to that abstraction layer there is a in terms of  future engineering which is one of my favorite   Parts the expression engine library has very rich  capabilities for outlier removals out of the box   for interpolation for other operations logical  operations all features are automatically stored   as searchable metadata so you might be  familiar with the concept of feature   store where you store the a version of  the data but here what we've done is uh   we go one step back and we store the definition  the metadata Associated to each feature what are   the benefits of that these are designed as objects  that are composable you can test them you can   persist them other your colleagues can can search  and access those they can compose those into more   complicated features and essentially opens the way  for collaboration here on the model training side we can recompose existing models as multi-step  ml pipelines that are again objects that are   persisted in the database their versioned there  are metadata driven they are again persisted   and your colleagues can access them and work  with them enabling you know Innovation and   uh working together culture we can  quickly Define model experimentation   setups and train large number of models  using their model deployment framework   model explainability is provided off the shelf  on these ml Pipelines and train models are   automatically registered for Model Management  so these are the ways in fact we enable that   increasing productivity right that our colleague  here mentioned thanks to the abstractions   and on the last chunk of the the work we have  scheduling monitoring application integration   which today our data data teams data science  or data Engineers are spending a lot of time on   we are again abstracting away  a lot of the complexity there   jobs are backed by easy configurable resources  that we can manage via a declarative language   job history and statuses are logged for easy  monitoring of the jobs model deployment segments   are allow deploying and monitoring models  against the full population a segment of the   population or per asset these are some of the  examples of how this increased efficiency and   increased productivity is gained via  the abstraction of uh C3 platform so now I would like to open it up for some discussion   any questions you might have any best  practices that C3 customers may want to   uh share with the with the crowd in terms of how  they're using these capabilities to achieve better efficiency and productivity in their teams what  is the composition of Coe teams that are set up   and any other thing that you might be you think  might be good for the rest of the crowd to know existing C3 customers in the room if you  would like to share please feel free to Enlighten us Richard um so running for three years because  you were 34 I'd control valve   sixty thousand piece of equipment across multiple  applications including optimization integral   reliability notes what we found was the initial  start building up the capability and in the Coes   was starting off with that's a very sort of  high-level qualified data scientist and I'd   say you know maybe it was actually one  of them from the C3 so building up the   the skills of knowledge on the platform on C3  is an arduous task in itself it's very foreign so we found complementing the development  teams and we sort of created these pods   which are generally made up of around  two data scientists to three four and   Developers for each application  every stocked up and development the deep understanding of the platform  thankfully I think through our experience   you've got a professional set of documentation  you've got a slightly more performant uh   platform you've got a type of integration  of visual studio and these things didn't   exist in the beginning so because  it was really was a long journey   um keeping that compliments and really also  ensuring that the data engineering piece is   done well because without that everything else is  second agreement so the daily scientists are also   the unicorns also they think they are but they're  certainly you know they're sort of heavy equations so we you know we did that we also worked  pretty much in the DeVos which is a methodology   um because at the end of the day  these apps have to be supported   and and when something goes  wrong the question is is that   data availability data corruption issues on  the platform model issues a lot of problems um the complexity of the data  Sciences from the application   it's our remote operations by  Infinity can actually create models and these we find have enabled us  to accelerate to its own growth thank you Richard so this is this goes back to again 2018  with that version of C3 this latest version   are probably even even better but but no  I think the idea is that the way we set up   the Coe in the beginning I think there were  four or five C3 employees and about 10 to 12   across two products exactly 10 10 to 12 shell data  scientists and engineers and at the same time we   were working on two applications so in the first  couple of months three to four months we not only   trained these uh to all shell developers and data  scientists but also essentially built from scratch   the two applications that shell wanted us to and  it was a very for me it was a very fulfilling   experience and we hope to do that with with many  more customers anybody else who wants to share   some some insights or or if there's any question  by the way from the like yes please of course acceleration I would say the beginning  is a lot of ups and downs you know we've   run into a lot of problems here I haven't  certainly at the time I saw this is right um there's a lot of changes that have  happened on the platform very very   important documentation there was no Source Forge  if you could go to and sort of put up something experience as we sort of develop the applications   and build up the code for infos and  knowledge in-house slowly expanding I'm now working with external Partners to build  up capability for contract resources as well   so Alexander so we've gone through we've  probably got some products and total introduction in terms of the items that we investigated  from a proven concept of accelerator   approach that we sort of harmful or  fail there's probably another time and there's another client or six and developments  at the moment at the moment so it has got a better   and faster we've also got a much wider  internal Shell Code base to referred to um we continue to use reactors as our front energy a lot more integration of other  systems we as we progress away the eyes so yeah absolutely I think I think and now they've got a bigger  support community in totally as well so we're not starting the size of the series is something two months into that James that just  after we've finished training the first   week of people and staff development  we picked up to work for central time and then since then we you know have  a number of other application areas so you went Institute with the virtualization phase we that was a problem that we had in General  Studies but so it's a historian we use this   from isosceles which is a proprietary close the  data skills so we had to work automatic together   without spine to so it was a separate piece of the  program that we ran in parallel at the same time   now we have that flowing through a live streams we don't only use that data before but you know that in itself was a joke whether it's what you do today  which is one of them tomorrow   um you know the performance structure assistant so we can contribute as well rather just take   in terms of best practice it's speaking of  Attraction it's amazing listening to that   panel earlier and your comments how pretty much  everything that was said is exactly the same   experience in the five years receiver  apart from the precise you know and so I don't agree with what you do with that but  no I mean I think it's a really short engine   it's the same as a fair discussions over the  course of these last few days I think it's   very much the same stories they're  probably the same in every industry things yes yes and Andrew's always there yeah  yeah that was very confusing exactly you have to leave some room for improvement you  know all right thank you so much everyone we   are at our thank you we have to go to the General  Session area really appreciate your time thank you

2023-09-10

Show video