How to build a Machine Learning strategy - Eyes on Enterprise
Everyone. Welcome to eyes on enterprise where, I'll be bringing on Googlers, to talk about how the technology landscapes. Are changing, how enterprises, adapt modernized, and scale my name is Stephanie Wong developer, advocate, and today I have you Fang quo developer, advocate for machine learning on the show welcome to the show thanks so much Stephanie I'm really excited to be here and chat about machine learning so first. Off just thinking about some examples of it do, you have any that you've encountered, of, specific. Ml use cases that have come in and kind of swept in a company, and off its feet a little bit well the, you, know first kind of big examples. Around these came, from, have, image recognition, type. Use cases so, you know on the, one hand you have common examples, like, Google, photos you're, really changing how we think about storing. And searching for pictures and other, kinds of recognition tasks. Where, you, can Auto tag images, and, then of course medicine, is now seeing a rapid, shift in. Diagnosis. In, radiology, where they're. Applying this technology to find. Things more accurately more quickly and more consistently. Than you, know doctors. Can't because machines. Don't fatigue and they're, always you, know consistently, they're not gonna be upset because they, were late for work and stuck in traffic earlier, that morning yeah, and one, thing I was thinking about earlier was machine, learning versus AI think there's a lot of confusion there about which term is technically, accurate so what's your opinion on that classically. Or traditionally, AI, was. Referring, to the broad field of just. General, artificial. Intelligence AI and it. Was about, systems. That, were, artificial. Intelligence. And anything. Really could apply to that back in you. Know decades ago in 80s a lot of these were rule-based systems, so there were things that behaved like an intelligence, you, know would at least how back then folks thought about it but, they were just rule-based systems if statements, and things like that if this happens then we'll do this and not, terribly. Complex. Right but, there were lots of different approaches, to doing, AI, so, AI is kind of the broad umbrella yeah, machine, learning then. Becomes, one, set of tool. To, try to achieve AI, and it's. Possible that will ultimately kind. Of achieve, or do better in AI with some other set of tools that isn't even machine learning or maybe evolves, from machine learning and so, machine learning is really about. Some. Mixture, of statistical, inference and kind, of training, on some. Set of data and creating a model and doing these predictions. Yeah that's great I think they're used a lot of the times interchangeably, but it's, great clarification, to have because they really are quite different in what they encompass absolutely. All right so let's talk about a typical use case for an enterprise for, a company that's just looking to get started dip, their toes into machine learning for the first time what.
Would You describe as the workflow, for them to get started yeah. So you. Know assuming that they have some. Kind of system, they're trying to model some kind of tasks they have in mind the, first consideration, then becomes checking. To make sure they have the data and if not collecting, that data and so, for, example if they want to, figure. Out consumer. Preferences, say they're a retail store and so, they'll collect that data around. Purchasing. Patterns and things like that and many, retailers these days probably, have that data already that's sitting around somewhere. They've got to figure out how to get their hands on it because it might be siloed, off in. Different parts of the organization and. Their people might be holding on closely to that and so, getting, access to that in a way where you can aggregate, data together is, important, and then, once you have that data you can do analysis, on it so that becomes kind of the next piece looking. At what data do you have is. It useful, are, there signals in it and from. There you can do kind of the traditional machine, learning training piece, of it where you take, that data train up a model and then, deploy, it to make predictions but. We're not quite done yet, just. Because you've created a model and put it in production doesn't, mean that you can now forget, about it and just let it sit there just as you wouldn't leave. Production, code out, on your website, or a system and just say I'm done we don't, need software developers anymore this is the software has done is you're going to sit there that's not going to work and so, models, need to be updated over time as well as purchasing. Patterns change, with. The seasons, with the years. That information, should feedback and inform updated, models, right, so content evaluation, of model and then as things change you need to incorporate that into training, exactly, so it's kind, of in a lot of ways just like software, developer you may name your model, over, time so I want to dive into that a little bit more because I know that you had a video that was very popular a couple years ago that were was, on the seven steps to machine learning and that included, gathering. Your data data, preparation choosing. A model training. Evaluation. And, hyper parameter, tuning and then prediction, as well so, can you tell me about each, of the steps a little bit in detail who, do you hire for each of those and. What. Are the trade-offs a, little bit more about that sure, so with, have, these seven steps so to speak you know there's no real, you, know seven steps six steps ten steps you can break, it out however you'd like it. Was just really one way to delineate. It and a. Lot of times different, steps can be collapsed, into one you, know job role could be one week's worth of work where. Like you gather data and you have, do, that day of preparation cleaning, and things like that that might go together you might end up making a pipeline where, as new data comes in and you know how you would like to transform. It you can set up a ongoing, streaming, job where, that data, coming in gets transformed, and then deposited, in some kind of data warehouse, then. Looking, onwards, to things like training and model development as well as as you alluded to hyper parameter, tuning this, is really the the, meat of it where a lot of people spend, a lot of their kind of, mental. Energy thinking. About modeling. In a, lot of ways especially, for. Known use cases and. Kind. Of solved, problems, that aren't research, bleeding. Edge situations. You, can use a lot of existing state-of-the-art. Models, and adapt. Them for your particular, data set customize, it and, get really pretty. Much state-of-the-art results for your data set so I want to go back to a point that you mentioned earlier about model, evaluation can. You talk about why. It's, so important, for an organization to consistently, evaluate, their model are, there any use cases what that showcased that it's so crucial or, failures, that you've seen yeah, so so we can take kind of a toy example and work through right, the the.
Data That, you are predicting. On needs. To come from the same world so, to speak that you've trained, the model on so. This in. Statistical, terms means like your training, evaluation, tests that all this data needs to come from the same distribution, essentially. And. Furthermore. That. Distribution also needs to reflect the reality, of the world right, so there's two pieces to it for example, if let's, say we go and we train a model that can recognize. Pictures. Of, lawn. Furniture, so chairs and you can go and you, go in the studio and you take your your, you know your chair company you make lots of different kinds of chairs so you're gonna make a model then maybe you make it happen people can hold, up to their camera to the chair, and be like yes it's this kind of chair now I can go buy one for my lawn as well yeah, and so, you bring, in you set up a big studio and you bring in all your latest. Hottest, chairs you know for the next season and you put it on nice backdrops, with good lighting and you take a bunch of pictures and videos and. Do a pulse shoot then, you take that content. All those images and you, train it a machine learning model that can recognize these images this. Is great if your task that you're trying to make is, a. Model. That can recognize pictures, of your products in say your catalog right. Because those are the same pictures, basically and there will be in the same situations, and lighting all that but, it, doesn't really help our actual, use case where they, are going to be in your. Customers, homes, with. The backdrop of grass and other. You know the patio and things like that they'll be in the context of other things so you gotta catch your data in the same world. That. You're gonna be predicting, on and, furthermore. And. This is kind of the ongoing evaluation, piece what. Happens when winter comes and that green grass turns white mm-hmm. And your models never seen snow before, it's. Gonna start having problems, and so, this. Is why you need a diversity, of training data in all sorts of scenarios that. Is representative. Of how you expect, your, model to be seeing these images during, prediction, time so hence. Constant evaluation, and then just. As important, to collect data points of labels. That are the prediction. That you're looking for but also examples where it's not that absolutely. Yeah so in. This particular scenario it would mean also, having images of, these. Environments. With your chair removed, right so just a normal, lawn maybe. There's some leaves on the lawn so of course the question arises what, if the validation performance.
Of The model is just isn't, up to par yeah, I mean poor model drones can come, from all sorts of factors you know it includes things like the wrong modeling technique maybe it'll have enough data, on. Maybe, you're overfitting, on that data there's. There's lots of possible causes and this is kind of where expertise, and experience, really. Become valuable is it's you, know easy to follow the. Kind of golden, path of just do this and do this and do this and all work but then when something goes wrong fixing. Problems that's where. You. Know expertise. Comes through yeah and overfitting, what do you can just describe that is so, overfitting, is essentially. A, way, to describe the fact that the model has memorized, the training, data this. Is bad because it, is gonna struggle them with real, predictions, because, all it knows is the exact, training. There so in an example with images, it's basically memorized these pictures, and said, you know hi we'll only classify. This as a lawn chair if I see this exact, image or these 10 exact, images, so let's say I have, my problem I've collected the data prepared, it what, do I do to set it up and what, tooling exists out there what's the general developer, experience, for machine learning yeah so for developing. Your own model, there's, a lot of libraries, out there today. Things. Like tensorflow being a prominent one an industry, that's really, seen a lot of adoption, of, course there's things like scikit-learn which is as I mentioned earlier it's kind of a machine, learning but not technically deep learning it's deep, learning adjacent, but also a machine learning toolkit. And of course there's also things like Carris and pi torch and other deep learning tools that, can also be employed, to, your. Model a given. Data set and then in terms of a language like r which is more targeted, toward statisticians. Who, have a lot more experience with that sort of thing you. Can also use it for structured. Data sometimes, language, models will show up there but, by and large the industry has seen this real, shift. To, using. Python as the. Language. Of choice around, doing machine learning so, I want to talk about this by problem type because as you said a lot of industries have their own styles. Of data sets and they need to approach it in the right way so can you talk about how, you would do that yeah so I think the first question you have to ask yourself, when you you, know want to do some kind of machine learning a data science task is is it, a descriptive, problem, is it, a predictive. Problem, or is a prescriptive. Problem, so, I know those are kind of long words and they all kind of sound similar so let's go into each of those a little bit a descriptive. Problem is really. Kind, of your traditional, classic data analysis, problem, a lot of times when people think they need a machine learning solution really. They're just asking you know given, some data, can, you help me find some patterns in it I don't really know what I'm looking for can, you just describe some interesting things so dashboarding, putting up some nice visualizations, cleaning. Up the data so you can do that that, is in a lot of ways a good first step to even, doing other types types, of machine learning tasks, so that's descriptive. And then. There's predictive. Which is much, of what we classically, think of as machine learning you know you have your training. Data there's something you're trying to predict, right and, then you're.
Trying To model and make. These predictions so, that's something where. You have. Inputs. In the real world and then you say you, know i predict, today this. Will you know be a good outcome or not and things like that are given, a tweet, is it a positive, tweet or a negative tweets like that and then, thirdly we have the prescriptive. Tasks, and these, are a little more uncommon. At this point because. This, is where you were saying i want to build a system that, will tell me what to do next and so. That has some element, of almost, like if we're playing a game of like, chess or something right what does the next move i should make and this. System should. Be. Thinking about what is my, my, opponent, playing and, what options. You have, and kind. Of gaming it out so it's a little bit of game theory in there things like that and so this is why it's a little harder to do in the real world because, such. A system would need so many inputs right you can't just model, the entire known universe and, be like ah the perfect man. Exactly. Alright so I want to talk a little bit about processing. Because a lot of companies are looking towards the cloud to do training and production is this. The only viable. Option to really do it at that scale or is there an opportunity for hybrid approach when. You look at it from, the perspective of, your data set if you have a huge, data set and you, don't have any way of really storing, it on Prem and there's, more data coming in all the time there's. Only one answer right, but in a lot of situations today, maybe, you're, only, interested in doing machine learning on a certain, slice of data on a certain subset and there, it might make sense to do training locally, and then, deploy that to the cloud - you, know alongside all, of your other applications. There's, a number, of different ways to slice, and dice the, different, workflows, in the cloud you, know but it basically boils, down to self-service. And managed. Services and, so a self-serve, situation, would be your classic make. A VM configure. Yourself put whatever you want on it run your training now, maybe you put it in a kubernetes cluster if, you need more machines things like that and then, on the other hand you, have managed services which are more along the lines of putting. Up a Python package or putting up your code and then having this managed service take care of the provisioning. And setup and. Teardown of these, machines, after, they finished running your job running your training job and I'll put that exported. Model file in terms of thinking about which one to do I think the big, one that I've seen come up in terms of why would you want to serve a machine learning model locally. And that. Answer typically. Is along the lines of response, time and latency so, sometimes. You, have a model that goes on a mobile device and, it. Needs to work either more quickly or work. In situations with no connectivity, and so, then you would shrink, that model down put it on your phone and embed. It into your application on, the other side of things you why would you keep it on a local server and your own data center maybe you.
Need To be close, proximity. You need to be close to wherever. You're communicating to, typically. That shows up in kind. Of financial situations, where you need to make kind, of millisecond, decisions, and these predictions are only good for the next slight, window of time before the, markets change oh yeah yeah, so it makes me think of stock market and then on the mobile side IOT. Use cases absolutely yeah so one thing I've been curious about was how long training needs to take and how, many iterations does, a team need to go through before achieving. An optimal, model that they can deploy in production, yeah this is kind of the. Quintessential. Question about machine learning. You. Can train a small, model, on a moderate. Sized data set in seconds. Low number of minutes but, if you have a big data set and you have really high you. Know requirements. Of accuracy, or performance. It might take weeks. Even to. Train up a mind like some of these huge research models. That are really pushing the state-of-the-art, they're. Literally training. On hundreds, of, GPUs. Pushing. You know petaflop, sub power Wow and doing. This for weeks on end yeah. And then at the end you have a model and it's like oh I hope that works yes I just burned all of this computer and time. Hoping. To get something out of that but. Yeah it really comes down to what are you optimizing, for do you how much do care about those accuracy. Metrics you know in a situation where maybe you're making recommendations. On a website you, know you're, shopping and it says you know hey some other products that are similar to this product you know for them is a little bit off or looks, weird, yeah. Take, care is the real question exactly yeah. But. On the other hand in situations, like a self-driving car or, a medical diagnosis. Like, these, are literal, life-and-death decisions, and so, having a model that is reliable. Performance and accurate is way. Way more important. And the stakes are much higher the. Self-driving car news case is another one where latency. Matters and so that's why you are not gonna see cards, signaling. It back up to the cloud be like well should I turn left right. You. Might run into a wall at that point exactly so what are our options at Google to move from dev to prod and have, it work reliably in production at that scale, one, of the services, we have is the cloud platforms. Prediction. Kind. Of sort of managed service and all you really need to do is take your exported, model say. Here, it is give. It a name and, you're. Basically done it's. Pretty magical in that sense because it's an auto scaling service, so you don't have to worry about provisioning, infrastructure, responding. To spikes. In, demand you know maybe in during the holidays during weekends, things like that and so, it'll, automatically, scale, up and then, automatically, scale back down when. Traffic dies off and all. You have to do is finish training model and kind of toss it over the wall in some ways you. Know I know it's said a lot that you don't want to just toss the model over the wall from your data science team to kind of your production ization team but, if your model. Is being deployed in production by. A service, and you, know the data scientists can basically just do that themselves it's, like one command yeah then it, kind of changes the game in terms of where you can spend your time you can focus on getting good data getting good training outputs, and then once, that's done and you're ready to push it to production it's, not like this onerous task that.
You Then have to you know get another team to handle and coordinate, with the modeling. Team and things like that yeah I'm seeing that as a commonality for you, know cloud in general we. Talked a little bit about managed services versus custom modeling and I know we have varying, options at Google as well can you dive into that yeah, I mean it all comes down to your comfort with kind of building, running, and maintaining these. Systems, yourself and expertise, you have in-house, as well. As when. You think about do, you want this to be kind of an investment in the long term or is, it something where you wanted to just kind of work, and like have it be good enough manage. Services, things like auto ml and they're, gonna be, necessarily. A little bit more constrained. Because, you can't have. Them be fully customized that's the nature by definition of this comparison. I think there's, a key point here for, those who don't necessarily have, the right skill sets or. They don't have the time then that's a great option is to utilize machine, learning api's and then even for those who have experience with tensorflow and I, know tensorflow 2 is out now but we have libraries, that, people, can leverage instead of reinventing, the wheel and creating, a model for me they get it go exactly okay so I want to talk about some of the amazing use cases because I know you've worked with many companies in the past so industry, use cases in medical, field IOT, that, are really utilizing. The advancements, in machine learning and some of the Google tooling two that come to mind is one. Is, the type of advancements, on the medical side right diabetic, retinopathy is, kind of this leading cause of blindness, and Google. Has made, huge strides in, improving the, diagnosis, of, diabetic. Rehme, retinopathy, which has kind of helps. Prevent blindness. As. A result kind of side effect of diabetes, and, so that. Effort is kind of ongoing and, the work they've done in the publication's, you know for me it was the first time I'd seen a computer science paper get published, in the, in, JAMA in the Journal of the American Medical Association, which, is like you know when does computer science get to publish in the medical journal, so. That was really cool on. The other kind, of use case I got, to work a little bit with rainforest, connection and there are nonprofit that kind of helps prevent illegal logging in the Amazon and they have these devices, called. Guardians out in the. Forest, and they're, listening, they're. Listening, for the sound of, trees, being cut and cut down right because they're the zoo so then they have all of this audio data coming in and they need to process that and they need to pick out from there and recognize, you know oh they're sound from them then they can notify. Law enforcement right and they can go out and prevent that stuff that's really cool yeah so. And the other cool thing about it is now they're taking that data they're saying Wow all this data and they're realizing what else is in this data set they're, hearing sounds. Of animals endangered. Species yeah and they can start, using that information to find you, know movement patterns, of animals. Throughout their jungle which is like just wild, yeah okay, so we just covered a lot, of surface area and when it comes to machine learning I mean companies need to think about having, enough for compute and storage the, data sets the, right personnel and then constantly reevaluate, their models as new data comes in so, how.
Can You kind of digest. All of this yeah it's a real culture shift in. Terms of making a company, and business data. Aware and using. Kind of data in an intelligent, fashion it's. Not gonna happen overnight but, I think getting. Started, later, is really important and you. Know it's gonna take time but it will really, transform and. Of how a business can you, know really take advantage of, the information that it already has so with all that being said what can people do to get value out of ml right, now yeah, I mean it really depends on kind of where, you're at in terms of your experience and, when you already know but, you you. Can kind of think about it, in terms of do, you wanted to have managed services and api's where, you can get started just understanding, and digesting, some of these high-level concepts about machine learning and bring it in and seeing what comes out the other side so to speak and. If you already have some familiar with that then, diving into the tools directly whether you're using tensorflow. Or Kerris and just trying, it out there's tons. Of amazing materials, out there today in terms of tutorials. And guides and, workshops. And videos. Another, area if you're looking to like just play, around with ml so to speak from a conceptual level Google. Has a set of AI experiments. Which are just, a ton of fun you can try them out in your browser also, you have your show AI adventures. So I'm encouraging, everyone to check that series out because he's gonna go a lot more in detail on some of these topics, overall. Please, check, out the blogs the videos we're gonna have the links to what he just mentioned in the description below I want, to thank you so much for being on the show today you think I learned a ton, thanks, so much Stephanie it's been a blast and also. Please. Comment on your, thoughts on the show what, ml projects you're working on the tools you're using and, your thoughts about what we discussed today thanks. Again for checking out ice on enterprise.