Tackling high-value business problems using AutoML on structured data
Hello. Singapore, and thanks, for staying us today hope, you all have had a great. Fun and informative day, so as the last session I have, a great session for you to finish and wrap our day to kind of touch many, of the points that. George. Covered. Earlier I'm going to go. A bit more deeper, into, our new. Solution, cloud also. Ml, tables, so, very. Excited, to present it's just a new solution. We launched this year and. Happy. To share that with you all my, name is Miko I'm a cloud, customer, engineer here in Singapore, so. Before, I kind of go into the product details give you a bit, of backdrop. What is Auto ml, so Auto ml kind of came. Out of. Image. Recognition research. That. Was going, in the AI industry. In in the last 10 years or so so. Coming from things like alex net or Google net ResNet, going, from eight layers of neural network to now hundred, and sixty layers which ResNet it. Became very, clear quickly, that creating. These kind of complex, models for complex datasets it's. Very. Time consuming and, not sometimes, very, feasible job, for. People. Even, the AI scientists. To know what kind of model works best, with what data so. Out of this research Google. Research. Published a paper called oml. Which, then presents a model that, can be applied for machine, learning where, the, machine learning creates, the machine learning model so. You, don't have to manually start bringing. Different, layers, and different. Filters. Together to, decide, what is the best model but we have a, concept. Or a platform that builds, that kind of model particularly. For, your custom data set so. Over. The last few years we, launched number, of auto ml solutions. For unstructured data some. Of the things that George mentioned and now, this year we launched Auto ml, for structured, data, so. What. Mean by structured, data stuff, that lives in databases, so your sequel. Databases, you, have columns, rows all that, and we, can then, empower. Auto, ml, to taught to do predictions. And and, build a machine learning models directly, into this kind of a table structure. In. A table you have the columns and and, one column you need to choose this is the one I want to predict. And. In. A study, in. This concept, was. Found, out that structured. Data is obviously where all the enterprise. Value. That.
Can Be harnessed by AI lives. So, structured. Is what we are look, talking today and as some of the unstructured, stuff, is there already available. Through Google ai solutions. Machine. Learning you probably heard lots about, that. Today there. Are so many applications in, every industry whether. You're talking about a, customer. Understanding, customer conversion, churn. Looking. At a financial sector, whether it's risk analysis, or credit scoring going. Into industry looking at applications. Like predictive. Maintenance. Analysis. For IOT, IOT. Devices the. Numbers are there we are happy to talk to all all you about what. Are the particular ones, in your industry that we are seeing so talk to back to us and we are happy to share. So. Going. Back to machine. Learning and the main challenge, if. You. Look at this slide, it's probably. What the, your. Data scientist, thinks, day-in day-out so, if you have a machine. Learning a I challenge, you want to build a start a new project you. Need to go through at least these six stages, so. Things, like data preparation. Feature. Engineering. Choosing. The right model, and more. Often than not it's, not, choosing the right model but trying multiple, models then. Going, into hyper. Parameter, tuning. Evaluating. In the model and then finally. How. Do you deploy that model into into, a, something. That can be utilized by the business application, or. Otherwise. And. Most. Often that not this, this this is a kind of a continuous, cycle so. If you don't get the accuracy that you, need you need to start from, the beginning go, again and what. We see is that a challenge that, machine. Learning projects, while they are all cool they. Can take weeks and months to complete I had one customer saying, years. Because. Maybe. The accuracy, isn't there and you, haven't gotten the right. Model, figured, out yet so, it's all very time consuming and, we know data scientists, are rare breed to come by and their time is very precious so. Introducing. Odd Auto ml tables, all. That cycle, of six steps we, can do in one hour we. Basically automate. Everything we can take. The data in and and, have, you model. Pre-built. For. Your, customized. For your data set so, everything. From. The from the left you. Upload the data, we. Give you some some, feedback, on how do we see. The data, we. Launch the training service that does everything spoken. In the barrier steps whether it's a, feature. Prepare a selection, model. Model, model selection, hyper. Parameter, tuning. Evaluating. The model and then giving. You a model, that you can either, have, automatically. Deployed on Google Cloud or you can take home as a container, and run it within, any any of your other workloads. So. Auto. Ml tables is integrated. With bigquery. And it, speaks, with all of the data types that bigquery does, so, out-of-the-box. It handles, all the typical. Data. Types whether it's numbers text, string. So you can have a whole host, drinks, there you don't need to figure out how, do you start feature engineering a. Open-ended, sentence, which, could be for example product. Description. The system will think, that for you and it, also supports more, advanced, datatype to support it in bigquery like. A nest nested, field or lists. And. The. Data. Scientists, also need to deal deal with. Data. Quality, so how. Do, you make sure that the data is as clean as it needs to be we, have also guardrails. In the system to make sure that the. Model. That you get out of is is still. Managed, manages, things like missing, data or. Imbalanced. Tables. And. And manages. To ignore. Those things like outliers. And. The model overall, performs. Well and. The, cool thing about Auto, ml, is that a big.
Part Of that, machine. Learning model, development. Cycle. Is is trying, to understand, what model, is the most relevant for your data and if, you worked in a machine learning project, you, may know that the, this, election, is sometimes. Off-the-cuff. Or, you may have a good idea of what it may be but, you don't really have a. Confirmation. That this is the best model until, you tried something else and something. Else and something else so. What, we did we have a model. Zoo if you will that, includes. All of your typical models that you may know anything, from linear regression, to neural networks deep and wide neural neural networks. Gradient. Booth boosting, all that, kind of a typical and on. Top of that we do, ensamble, of n sample networks, as well so we combine multiple network, types into. One bigger, network. To, provide you a very. Tailored. Model. For more complex datasets, so. Google. Works also with, our research. Team the, brain brain we. Call call, them. Who, is. Active, in developing, machine. Learning and sharing their findings, in the industry, in general so, we. Also, incorporate. More, complex, models that comes out of that research, if. You know for, example transformer. For, machine. Translation is, been very popular. As. A model we share. The. Models that come up from that research into, the cloud. Auto ml tables, and have, those available for you. So. That's all good but the most critical, thing we think about how. We design this product is have the best best model. Quality, so, how. We compared. Ourselves, against. Other. Solutions. Out there so. Maybe, a bit small to read but. The. Auto ml solution, is it's the blue bar and the. Top is the, best solution available. So Auto, ml, typically. Scores. Within, 25, percent, of the, top models. Hand. Crafted. By data scientists. Typically, over weeks or months, project. We, are using a data. From. Machine. Learning data, scientist, competition. Website called Kegel you. May know if you are active in the field where, data scientists, compete against each other on a public. Data sets who can build the best model, for this data set and we. Pit, it then. Auto ml, against. The data scientists. In few. Popular. Competitions. And one. Of these is was a, price, suggestion, challenge so. We have a data set where. For. Example. Users. Upload. A product. On a. Ecommerce. Website, to be sold and the, system needs to recommend, a price for this and. It. Has an input data, for example some of the product description or, the category, and it, needs, to output, a. Suggested. Value. So looking. At this graph it's, basically, the all of the. Thousand. Plus. People. Who. Competed, on this this, particular. Competition. And, and. Then ordered. From. Left to right so right is the best the lowest most. Score. Wins, and and. We can see that typically, there's a 50%, of the data scientists, who are probably still picking up skills and there is a good. Learning, curve happening. There and then, the top 50% are, are. Already. Very seasoned, and have fairly, fairly similar, performance, and there's a few winning, entries who who probably. Spend. A lot of time creating this model actually this competition, was won by a person, who submitted 99, different versions and into. This and spent probably a month months, to do that. It's. Bit small but. We. Run, auto ml, on this, data set and after, one hour. The. The. Model coming out from, auto ml was already better than, 60%. Of of the competition, and then. We let it run run a bit more and after 25. 24. Hours we, have Auto ml scoring, on a top 25, percent so. Going. From almost. No time spent, on. Hand, crafting models and all that we can have now, machine. Learning models available. For, structured data in, in. Less than less than a day day and. It, says save money I like to think it's save time but time is money so. Why. Although ml is, tables. Is great because, it it gives you, your, developers, your whole team velocity. You can move faster, and have. Projects. Turn. Up quicker with. Meaningful, results, and. Here's. Some of the customers, that we worked during the beta phase and hoping to have all you here next year. So. Few. Words to note, the Auto ml is great but, you need to have data. In in, data. Set we looked we're talking about. Supervised. Learning so, you need to have a fairly, clean bit of data if you if you have concerns. About data cleansing. Or preparation we do have complimentary. Solutions, that can work with auto ml tables and you. Need to give it one hour at least a training time and. That. That's. Pretty. Much the starting point the data so nice I think, we relaxed, now we work with anything from thousand. Rows of data to, a hundred million rows of data obviously, we. Always like to save more data better but order, milk and pretty much Gayle from small to fairly large.
So. Just. To make things more real I'm going to quickly show you how Auto, ml tables looks from, from a, user's. Point of view on the, cloud, console, so, if I can have my, console. Here. Those. Of you familiar with the console you, can find Auto ml tables at. The. Artificial, intelligence section, and it's, called tables. So. Pretty. Much six steps to it and. We. Have 20 minutes of time and I'm only going to take ten it's so fast you can really go. Through all of this. Having. A machine learning model, production. In ten minutes or so excluding. The training. Time so, as I mentioned we, integrate. Firstly. To bigquery so if you are using bigquery you, can today just. Select which, table, you want to analyze and off, you go. If, you. Don't have bigquery, we do have a, option. For you to upload a CSV so, any database, you have you just export a CSV you. Load it by a cloud storage or if you have a smaller file you can even upload it from your PC. One. Once the data. Is uploaded we confirm. With you that this is the data we read in, the schema we understood, you can tweak. It a bit most, of the time there is no need so you can say like how. To treat, some, of the low. Cardinality. Numbers. Whether, those should be treated as numeric. Or categorical. Or if. You are, unsure that, you have all the data available on, prediction, time you, can also make things nullable, so that you. Can still make predictions, even not all the input, parameters, are available. However. There is a one one, thing you need to choose on this page which, is the target. Column so you. Choose what, do you want to predict in this case so in in here. We've loaded a, marketing. Financial. Marketing. Public. Data set which, looks, at. Would. The user if we, have their profile, make. A fixed. Deposit if we, targeted, them on a marketing, campaign so try, to understand. What. Are the profiles, that would. Respond, in making, deposit so that we can provide a more targeted. Targeted. Campaign. And. So. Scheme, up we select the column and, we. Are ready to go into analyze analyze. Gives. You then. Feedback. On what, the data was read read in so. You can understand, for example if, there's any missing values, you, may want to see if the data you load it was correct or if you have more, correct. Data. Available. Cardinality. Correlation. There's a simple algorithm, that provides, correlation. With the target column kind of understand, what, do we see is that what what features, are most, related before, we actually go into the training phase. You. Have a simple. Tool tooling also to confirm that, for, example, data, distribution. Is as you expect, it to be so, before. You hit the training button it's it's maybe useful. Just to kind of a quick peak peak that. For example, you. Have. The whole year data that. You you haven't just uploaded, say one month data even if you wanted to train the whole year's data it's, just really. The tools to help, you confirm. That what, you upload it is it's what we got have received at our end. Then. Train. It's. Really. Even. More simple we have one parameter to give in what is our training budget, the, so training budget is is, how. Long do, we give the system time. To, try all those different, combinations of, networks, to hyper parameter, tuning, and and. And then come up with the best model possible, for you for your data set. Minimum. One hour we. Also give a bit of ballpark. That if you have more than, 10.
Million Rows of data you. May you may give a bit more but, if you have a medium or or small size. Data set one hour can, be good. Enough to give you kind of same sense of feel, will. This data have good accuracy, so, that we. Can start, looking at it will, more time be. Something that is it, gives more relevance. To the accuracy. When. We launch the training just to say to, give a picture we, actually launch 92. VMs, in background. So, all of the all of the training, happens in parallel, so you don't need to wait for one model to come complete, before we move to the next we, paralyze, as much as possible and and, make sure that at that, the turnaround time with, a good quality model, it's as as smooth as possible for you. So. In, here, we give one hour not. To worry I'm not going to make you sit here for one hour to wait for it to complete. There's. Really no other options, to, need, need to think, about there's a couple of. Optimization. Criteria. Like. Roc. Precision. Recall if you, are a data analyst, or scientist you can you can decide. Which makes more sense for you but most of the time you can just hit go go. Have lunch and, come back and your. Model is ready. Once. The training has completed, we, give feedback on, the. Model quality that we achieved, things. Like precision. Recall. ROC. Accuracy. These, can then mean. Different, things depending, on, your use case so some some use case 90, percent accuracy may be great some you, need more but. You. Can also drill, into the, the. Numbers, so. We give, for. Example for. This, is a binary, classification, so, we two labels we. Can give for. Each label. Your. Accuracy, scores, F one. False. Positives, etc, and you, will have your standard. Confusion. Matrix available. As well so, in this case we, have a marketing. Marketing. Information and obviously, for, banking, marketing, most people probably don't answer. Positively. So we can see maybe 5% of people responded. Positively. And. Four. For. The people who who responded, to them positively. We, have about 50. Percent accuracy. In to, correctly. Labeling, these but, if you look at marketing, campaign, if you, would tell to the marketing department I can have a campaign, that will have a 50% positive. Response, ratio they'd be ecstatic and. You'll. Have also a few, few details, like, feature. Importance. So what, what, feature we with. We. Learned that is the most relevant, for this data set. And. That. Gives gives you an idea. On on you, need to decide, whether this model is good or maybe I need to go back and give some more training time to, get even more a higher quality, model. Out. But. If you're comfortable that this is what I need. You can then also use Google cloud to. Host the model for you. And and, predict, that this so we. Do batch. Prediction, and online online prediction, so if you, want to basically fill. All. Of your customer. Table, with. A prediction, you, can just provide. Again things, like bigquery data set and and. And provide, predictions, directly, into bigquery or. You. Can upload a CSV and, have, the have, the system to fill up the prediction, column. For. Online prediction. We. Do, provide. Here's. A little little tool where you can just try out whether, this, person will. Give us any money, and, predict. And. So, you'll, have. You'll. Have know this person is not going to give us any money so we're pretty confident, on that so typical. Typical. Prediction. Response, so you'll have all the labels with. Confidence, saying, we. Are pretty confident, that this person is not something that we may, be wanting, to target you know in a campaign. And. And. You have a few, other, ways. To, work with this so once. You launch. This set up onto the cloud we. Expose, a REST. API that. You can integrate with your applications, and use, it like, any other Google. AI API, but. With your own data set and your own model so each model but you can have multiple models each, model, will, have its own REST.
API Endpoint. And. Alternatively. You. Can take take. The model. Home and and. Download, a container, or publish. It to into the Google. Container. Repository. And this container, then you can you can take take a look. What's. Inside if, you're, really interested in what. What, the system put together for, your model you can take it home take. The take the code out and, and. Look. What. Kind of, model. Was put together and it may be extend, that even if you like, play with it and. That's, pretty much it so we went from, importing. Data to. To, a model, that was 90%, accurate, in this case in one. Hour or so how cool is that so. Essentially. Our ml, tables is something you should look if you if you have have. Structured, data and you want to get started with, with. Making. Predictions. Whether. The use, case is, your. Industry specific, or you want to try it out just, to, let, you know pricing. Is very straightforward. We, charge just, by training hours it's 20 bucks an hour, so, depending. On how many how many hours you train it's a fixed cost and that. Fixed cost then covers, you try. Trying, out all of all of those networks hyper parameter, tuning everything, is included, and, you are, ready to go and. That's. Pretty much it what, I had to share about Auto ml tables hopefully, you, guys found. That interesting, and if, you have any questions, I can be available later at the Auto ml booth which, we have just across the hall.
2020-02-24 01:39