Scaling Machine Learning on Industrial Time Series with Cloud Bigtable and AutoML (Cloud Next '18)
My. Name is guy Rangel I'm CTO. And co-founder of, coke Knights with. Me I have a quarter, page and we're. Going to talk about how. Cognate. Is using, google, cloud technologies, to, enable machine, learning on industrial. Data, and in. Particular we'll talk about. Time. Series data and, BigTable. As two of the key technologies. And. Problems, that we are solving. So. A little bit about ignites. We're a young company less. Than two years old just, crossed a hundred employees, we're. Working with asset, intensive, industries, so. Basically means, large. Industrial companies, that have, big. Machinery that costs lots of money a. Lot. In the oil and gas vertical, also. In shipping. And. Our, mission is, to liberate. Industrial. Data from. Silos, and piece. That data together, to, form. A model. Of industrial, reality. So. That humans. And machines can, make better decisions and take. Better actions. So. It's a model that's, real, time and. Historic. So you have both the present state what's, going on and it's. It. Has all the previous, data too which, is important if you want to try to predict the future. So. What exactly is a an. Industrial. Reality, model or I'll try to to, show, a little bit. It depends, on. How. You view it, there, are many angles to kind of look at this model. This. Is a typical operator, view. So. If you're in. A control room in an industrial, plant this is very, close to what you typically see this is data now streaming in live from the North Sea, that's, about one, to two seconds, delay it. Is data. That it is concerning a single tank outside on a oil, platform, or inside, actually. The. Tank is called 20 V a zero zero two and a. Typical. Oil platform, will contain. Anywhere. From ten thousand, to a hundred thousand. Sensors. Climb series like this so, here you have a handful, but it it's just a tiny piece of a huge machinery. So. This is the kind of real-time what's happening, right now. You. Also, want. To see as an analyst, what has happened in the past, each. One of these squiggly lines represents. About one gigabyte, of data this is one. Year, of. Data. I really. Like this chart it's my new hobby to play, with it it's kind of like Google Earth, that you zoom, in and like, view, the data at any resolution, and. You. Know given that this is about ten gigabytes, of data. You've. Probably noticed by now that the next, Wi-Fi. Doesn't support downloading. All that data this. Fast so you need a back-end that can quickly, crunch the data and give you the, data at the resolution that you want and. You. Can go all the way down here to the to the rural data points, which will pop up when you zoom in enough. And, of course you can view this. In. Different ways so. This is a view that humans. Tend to like it's. A three-dimensional, view we imported, the entire cad model, of, the. Oil platform we, connected, it with all the other data so. For instance if we want to see the. Tank that we just viewed. Data, from which. Is aptly, named twenty, VA zero, zero two. You. Can see exactly where that is, and. What it looks like and what it's connected to you, know browsed. The. Data in. Three. Dimensions, so this, just. To give you a little. Impression. Of what we mean by. Industrial. Reality, model I. Want. This model to be up-to-date and, contain, the data today and for, what, happened. In the past. By. The way the, the charting library that. I just showed you we, couldn't, find that so we had to make it. And. We open sourced it so if you're interested yeah you can use that it's not tightly, coupled to the cogniser back-end so if you, you. Know have any kind of provider that can give you data at different resolutions, you can use that, uses. Reactant. D3. So. The scale of, data. To, be ingested, is huge. And it's growing very fast. If. You look cognate. Handles, lots of different data types to build it the model of industrial reality. So. That can be a RP data it can be maintenance logs it can be 3d, models like you saw but. If you look at the data, by volume. Ninety-nine. Point seven percent of, the data that we have is, time. Series data that's. Really where the the, huge data is even, though the 3d model was large it's, about ten gigabytes. That. The time series data dwarfs, that and it's. Exploding. And all, of that times there's data I need to, go somewhere it needs to be stored and, to be processed, it need it needs to be queryable.
So. How do we do that what was what, is under the hood. So, when, we started out building. Cognates. We. Started with a few principles. And one. Of them is impact on it it, kind. Of seems strange to say this but you have to. Kind. Of write it down to it for, it to matter. But I it's, easy as a technologist, to. To. Build technology, for technology's, sake for, because, it's cool and I've been guilty of that in, the past. We've. Been lucky to have large. Customers, demanding. Customers, very early on to, guide, us in finding, out what the real use cases are. That. Create real value, then. It speeds we want to show something as fast as possible, so. You. Can iterate you can get feedback. And. Those two put together the. Kind of there's. A consequence, to that and that is that you want to use managed, services, wherever you can, especially. For anything that is stateful, because. Handling. You know a stateful, storage, service, that has to scale up and down that. Has to you know have backups, have redundancy. Have. All that logs for who access. The data etc all, of that stuff is just painful, to implement and it's, gonna slow you down and it's. Not something that yeah. So. We. Recognized. Very early that we needed time series database and. Our. Hypothesis. Was that we could get this as a managed service to that there, would be something that out of the box as an API supported. This and. Our requirements, were that it. Would be robust and durable so it means that we don't drop data. No. Data point, should be dropped. It. Will, have to support a huge, volume of reads, and writes rights, in particular and you always get new data in. Low latency so that you can show the real-time version. Of what's going on. You. Want to see data at. Any. Time scale so. You want to zoom that view and it is zoomed in view that I showed you. You. Want to be able to efficiently backfill so. If you're onboarding a new customer, and that customer has, a million data points per second, being generated and you can handle to millions then. Backfilling, it's gonna take a long time if, they have a year of data for you to backfill it's gonna take another, year before you're done with that because you, can spend 1 million of your capacity on the new data and then 1, million per second will be old, and. You, want to be able to efficiently. Map over, data, in. In. Order, so, the sequential, reads for training Mobile's for instance. So. We experimented with the open, ts, to be at the beginning, it's. A great piece of software and. The. Cool thing is you can use BigTable. Which is a managed, storage. Back-end. So. You can use open TS to me with BigTable, as the backend, which. Is very nice but it had a few shortcomings, so. For. Instance it's not durable, that, means if, if. You send. A. Piece, of data to it the data points it, will acknowledge that you got the data point before it's written which means. You can actually lose data if you're scaling it up and down. And. It, instantly, used the front fill path, for, for batch. Backfields, which, made. Backfilling. Very, inefficient, and. There were a few other, things as well. So. We chose to build our own time, series logic on top of BigTable. So. BigTable, is a, fully managed service which, as you know we really, wanted it. Supports a huge number of reads and writes per node, it's. Been tried and tested on very. Large. User-facing. Distributed, systems, at Google. And. It. Has this property which most, time series databases, don't our, know most, key value stores don't have which, is that you can scan forward efficiently, the the keys are stored in. Order, a lot. Of key value stores will hash the, keys so. That you. Get the. Load distributed, evenly. But. BigTable. Doesn't do that that. Means you don't have to jump around when you're reading sequential, data it also means that so. The, flip side of that is that you can run into situations where you get hot spotting, so. You need to write your code around that. But. For us it's a it's, a price that's it's. Been worth it for us so. I'll, hand it over to Carter. Page here, who, is a senior engineering manager for cloud the table. Thanks. Gear. So.
I'll Talk a little bit about. I'll. Talk a little bit about clapping, table how to get a fit for IOT, and why. Are we seeing more customers, coming to it I do, want to say like I'm I'm particularly. Excited to be presenting with cognate I think that the stuff that they're doing is, really neat I, think, his point about. Doing. Impactful, things. Doing. A comprehensive, story, of IOT is very exciting, the idea of not just connecting, that other devices and getting the data but. Once you've got literally, tens of thousand devices way, more than a single human could actually monitor. Thinking. About how do you extract. Data react, to that and manage, you know very high, risk. Asset. Situations. And he's going to get into some really cool stuff after. This but. Let me talk a little bit about cloud big table and how. That, is. A good fit for these types of use cases a quick. Show of hands just, to get a sense of the audience who's familiar with, distributed. Databases like, Cassandra, HBase things. Like that okay, all right so this is not, going to be rocket science to most people. The main thing you, know particularly for large IOT, use cases where people are looking at collecting. Massive. Amounts of metrics is, being. Able to handle this really large-scale traffic. So a couple years ago for example we did a load, test with a company, where, we process the, entire US, trade, basically, simulated, the entire US trade markets, all, being processed together which. We process like 25 billion records in about an hour and that's. Capable just, due to this scale of cod BigTable, and how it works we were peaking, at about 34, gigabytes per second, and about. 34, million operations. Per second on. BigTable, on. A single instance and, the reason this works is because, BigTable. Was built for very, very. High, scalability and you, essentially get linear characteristics. Way out on the curve so, BigTable. Was initially, designed by, Google as a backing, store for our crawler and so, it was stored to keep a copy of the, worldwide web it's. Been used its expanded, internally, it's been used for a lot of other products, as well and, so. We've put about 14 years of engineering into keeping. Keep finding new upper limits and breaking. Those so. Every. Distributed, system, eventually. You have this you have this straight line that goes out and a Mitchell it flattens, out everything. Eventually hits a bottleneck, or, you, might hit something where just you. Know if you've got an HBase cluster, with a thousand machines you're. Just going to hit probabilistic, machine, failures, and there's an overhead for your operations. And things like that. So. We. Will eventually flatten, out but pretty far out and. We. Would take a lot of work for you to get up to the kind of scale where, you would notice the. Reason. This is important, linear scaling from a business perspective is, this gives you predictable, cost of revenue so, when, you're thinking about building, a system and you're like I've got a terabyte of data and now this has to go to ten terabytes petabytes. Usually. If you're building this on top of your own kind of home managed Cassandra system you're, gonna have to rethink. Each time you hit one of these new tears alright now how am I going to deal with this I've got a lot more machines, I'm, gonna need a new on-call rotation I'm, gonna need new strategies to be able to deal with this here. It's just a matter of you. Know cranking, up your nodes and the number of nodes you need is proportional, to the throughput, that you need. So. Give a quick overview for, how cloud, bigtable works you have clients, that basically talk to a single. Front end point that load balances, to the nodes so you don't need to think about address. Mapping or talking to individual nodes themselves. I'll. Take that layer away, and talk a little bit about what's going on in eat the covers so, the. Data itself is being stored durably, in a underlying, file system called Colossus, and the, BigTable servers themselves are actually not storing, any data they. Are taking the responsibility for. Serving the data and every. Row is assigned to only, one node, so the entire key space is basically balanced, across these different nodes and this. Allows a similar one to be by being responsible for Anna vigil row allows. For add, misty of operations. On it allows for read your own rights. The. Underlying. The. Advantage here of having the, file. System, and the data dissociated, is allows us to do some some, clever things in terms of being able to rebalance workloads, very aggressively, so.
You May, have a customer. That has changed their underlying, workload which then impacts. How are using BigTable or, you may have diurnal, patterns you may have different, things that are coming on and going going off during the middle of the day changes, which, tablet. Servers or nodes, are getting more or less activity and, what. We'll do is we'll actually identify, these changes, in patterns and we, will just reassign, BigTable. Resign. Areas. Of data to different notes and so. This allows a, couple, of things one is allows. You not have to not have to worry too much about per. Node hot spotting you can hotspot individual, rows which. Is which is a problem but. In terms of getting unlucky, and having one server that's hotter, than the rest we'll balance that out it. Also means higher utilization. And by, having higher utilization these knows by by keeping things balanced, you're not having a provision for the hottest note or trying to keep all the notes fairly well balanced and that actually means, cheaper. Service. Relative. To running this on a HBase, cluster Cassandra cluster. In. Addition to rebalancing. You can resize up, and down fairly, trivially, we, have some, customers who might have, ingestion workloads, which only need a few nodes and they might run a batch at the end of the month through the end of the week and they might want to scan, a scale, but up to say 300, notes and you, can do that fairly instantaneously. They'll take maybe you, know if you've got a really large data set may take 10 to 20 minutes for, the, data to rebalance or the nodes just added but, it can be a good way to to. Make those batch jobs you run once a week run really fast when, you're done you scale it back down again. The. Basic, data, model it's kind of like a a. It's. Key value but has, more, dimensions to it so, you have a single index which is Euro key and then. Your data is stored in, columns, and the columns are a tuple, of basically. A column qualifier, and a, column family the column family is defined in your schema, and the column qualifier, is defined at insertion, time the. Table is sparse so any you. Know column family calling qualifier. Tuples. That you don't fill in for a given row don't, cost against, your, actual, space. The. Database. Is also three dimensional, so under each of those cells is an, arbitrary, number of versions so. You, can either, you. Can you can keep, them there you know pretty, much indefinitely as, long as your row. Key doesn't get beyond a couple of hundred Meg's or you. Can instill. Garbage collection say alright I want you to wipe out any data that's over, a week or is you know just keep the last five versions or something like that. Wednesday. Yesterday, we announced that we have replication, now so, between. Two, big, tables in a region we. Will replicate the data between them this has a few. Advantages the. First is it. Your, failure domain so you're no longer living to the failure domain of a single zone you've got two zones of the failure domains that. Gives you a higher availability, and another. Advantage that people use particularly, because the the replication is asynchronous is a workload isolation. So, some, customers may have, their critical, low latency, serve loads on one cluster. And then maybe doing batch reloads, in another and. They. By doing it on the other one they're not interfering, with each other essentially. The. Effective. Result. Of this on, the high availability size would get an extra nine unto, our SLA if, you're. Using, the. We, have a we have a have a high availability. Application. Policy, I won't. Totally get into those right now but you can go read up on these you have application, policies that can define how you want your traffic to be routed and if, you use the high, availability one. You'll, get automatic, zero-touch failover, so if there's any problems until the night you don't have to get paged, it'll just failover. For you. Being. A large, data. Tool and you, know a, database that's designed for, terabytes. Petabytes, actually. We. Need really powerful, tools to be able to make. Them the most advantage of your database and so we've got deep integrations, with bigquery. With, data proc dataflow. Bigquery. The. Queries. Are not as fast as if you're going against native bigquery because big query is an offline store, and there's. Certain optimizations a bigquery to be online be able to get single-digit.
Milliseconds, Latency which makes the bigquery queries. On a little slower but, it's nice because you can do. Ad hoc queries in your data without having to write a MapReduce job if. You do want to write a MapReduce job you can use Cloud Data proc or you can use our internal replacement, for MapReduce. Which is housed inside a data flow and then, also this week we're announcing that we have a deep. First-order integration, with tensor, flow which, just got put on to get out and so, people, can start playing around with that so. I'll hand it back to gear here. Thank. You. So. With. That, background I'll, go into a little bit of detail on how cognate is using. A. Cloud, BigTable, to store its data and then we'll move on into machine, learning and. Eventually. You'll see what I wouldn't no can do. So. Carter. Talked about the. Data model of BigTable, this. Is our basic. Data. Schema, so. The first thing that is very important is how you choose, your row, keys. Because. That's the only thing that you can kind of look up right. And. Fast. So. Our row keys consists. Of a sensor, ID, plus. A timestamp. Or a time bucket. That. Means that you can look up, very. Fast, the. Values. For, a particular. Sensor at a particular, point in time. And. You can also then scan. Through, all the values of a sensor in order. Which. Is nice if you want to train a mobile and, that's a very inexpensive, operation. Inside. Of each row. We store more than one value it's not just one value per row. We we stuff a lot of, time. Data. Points inside, each row, typically. Around a thousand, data. Points per row. So. You have the, timestamps, which. Are just unique times. Timestamps, and then you have values. Which, are floating points and of course it's binary. Stuff, it's just drawn, out in readable. Numbers here but, it's all binary to save space and actually, a big table. That's. Compression for you too so, what. You'll see if you if you if you stop, ingesting. New data into, your big table instance you'll see that the total size. Of it goes down which can be scary at first if you don't know what's going on where's, my data going but. It's, actually a good thing. Here's. How we architected our data. Ingestion, pipeline. So. Every. Step along this path is. Auto scaling, the. Carter talked about how easy it is to scale BigTable. And we have a service which looks at the lobe and then scales it up and down it's. A it's, pretty simple logic. So. It starts with cloud, load balancing, and then an API node which is a kubernetes, service. Which has this authentication, and authorization. Then. It will put the, data point on to a pub sub Q and then. It will say to the client we got your data we're not gonna lose it.
So. Pub sub is another, component. That we use a lot it's. A very nice. Component. In the way that it, scales, to whatever, you, ask for if you look at the documentation, for pub, sub we will say like the limit on the number of publisher pre-operation. Unlimited. And, subscribe. Operations, unlimited. So it's kind of that's, kind of bold. Statement. We. Haven't run into the. Limit there. Once. It's on the queue it gets picked up by a subscriber, to that queue which will. Package the data and, write. It to BigTable so. That's where our time. Series writing. Logic, lives. And, once. It's been written there's. A, new, job put, on a queue to. Compute aggregates, so. Those are roll-ups that, we use to be able to to. Answer queries about any arbitrary. Time. Scale and efficiently. That's. What we need in order to do the dynamic zooming, that that, you saw at the beginning. Your. KP is our throughput, and latency, for. This kind of pipeline it's typically able to you you look at, and. We get so. The data that comes in is queryable, after. 200 milliseconds, in the 99th, percentile, and we, regularly. Have little million. Data points per second and I'm, pretty sure it could do, much more than that -. Yeah. And querying much. Simpler this is some kind of synchronous, operation so, it goes to the API in old and then straight, to BigTable. One. Of the optimizations. That you want to do if you want to transfer, a lot of data here is that typically. API. Developers, like to have JSON, data, and, for most applications. Like dashboards that makes sense. If, you're writing a spark connector, to, your API and, you want to run machine learning on that and transfer lots of data then the, Jason, serialization. And deserialization, becomes. An issue so, you want is something. Like Portugal buffer another, binary, protocol, for that and. It's not really the size that matters because if you gzip, to Jason it's very small anyway because it's very repetitive but, it's it's, the memory. Overhead of, doing that serialization. So that's a that's a nice optimization I. Want. To talk a, little. Bit about, cleaning. Of industrial. Data because.
It's Something that's often overlooked. And. If. You want to make, a useful, application. In. The industrial IOT space it's. Not enough to have time series in AI a lot. Of people are running around saying time series plus AI is gonna solve everything. But. A like a very simple question is if you, have a hundred thousand, time series coming from an oil rig and you want to make a predictive, model for this one tank that we saw. Which. Time, series will. You pick how. Do you pick those you're, gonna manually, go over all the diagrams it's gonna take you a lot of time, so. Typically we see these data science projects. And they. They're, really about finding, the right data. So. That's about 80%, of the time they, spent on and then. You, know at. The. End of the project you have a wrap up where you try to model something. If. You, want to. Truly. Understand, what's going on in the industrial world you. Need to be able to get. Data from a lot of sources data like the. Metadata, of the time series the equipment, information like, who made it when was it replaced the. Failure. Logs. From, previously, work orders, 3d. Models and the process information like how things are physically connected and logically connected which. Component. Is upstream of this and. It's. Not enough to have all this data in one place it needs to be connected the. Hard part is connecting, it and the glue that holds, this together is the the the, object in the physical world and, the. Unfortunate thing is that the same physical, object. Has. A different name depending, on what system you, ask. So. We spent a lot of time. On. This, contact realization like, figuring. Out how we map the IDs from one system to, a, unique, ID for each asset. For each physical, object. So. If you look at the, cleaning pipeline, here there's a this thing called an asset matter which, we spend a lot of time developing. And, which will. Assist. Expert, and. Do, automatic. Mapping, in many, cases of, IDs. From one system to another to. Be able to make this connected, contextualized. Model. So. I'm. Not gonna. You're. Probably wondering, now, what this windmill. Does and why it's here. So. I'm just gonna say a little, bit more and then I'll get to it. Predictive. Maintenance you've, probably, all heard. About this there is a great, business, case for predictive. Maintenance. We. Have seen cases where a, single. Equipment, failure on, a piece, of subsea equipment costs. A hundred, million dollars, to, fix, so. Obviously you want to prevent, that right. But. This is also why it is, so hard to do. Because. The failures are fairly, rare, there. Is not a lot of labeled, data. Imagine. What we would cost, to get enough label data to kind of validate your, mobile let. Alone train it. So. You, typically, start with these unsupervised. Approaches. And. For. Anomaly detection what we've seen is there's between two classes of. Approaches. For how to do. This, one. Is forecasting, based so. Means that you will take a set of sensors, you'll. Hide. One. Of the values and you, try to predict it using the others and. If. Your prediction, is. Far. From, what, is the actual value then. You flag, that as an anomaly.
The. Other approaches you take you. Take your, sensor. Data, your. Set of sensors you plot, them in a. N-dimensional. Space and you see what. What. Points are close to each other they. Form clusters, and, those clusters, typically, represent, different operational, states so. You'll have a cluster. Around the running, state you'll have a cluster around the idle state you have a clustering stick powering. Up powering down maintenance, and. If. You have new points and it doesn't fit, it's far, from any of these cluster, centers. Then. That is an anomaly, now. Let's. Let's. Look at this, live. So. This demo. Is. As. Live as it gets. It. Has a lot of moving parts, literally. So. There's no everything. That you see here is live. There's. No pre, trained model. There. Is no. There. Is no. Pre-generated. Data, the data is going to be created. Right here right. Now, and. We'll train. The model and we'll. See if it works so. Are, you excited. Okay. So. Let's see if we we. Get data from this wind, turbine now and. It looks like we do this, is a different. View of the wind, turbine. It's kind of a 3d. View. Into what's, going on here that's the sensor values I can. Turn this knob I, can. Increase. The speed, you'll. See it will start to produce more, energy. It's a, it's. Producing, a lot of energy for a wind turbine, this small. So. Let's, let's. Go into drew Jupiter. I'm, sure those, of you are working, with data are kind of familiar with this so, we're going to interact, with the cold night API. We are, Python. SDK which, is also open, source. So. First we're gonna just log, in here we're. Gonna select, the, write data, and. Then we're gonna plot it this. Live, plotter, will. Show. Kind. Of the analyst, view, of this you. See if I. Adjust. The speeds this. It'll, go. Down. If, I pick, it up it's, gonna go up a bit. And. This. What. You see on the screen is going to be our training data so. I'm gonna give it a little bit more time so. It's, seen a, little bit of normal operation, of this wind turbine we, brought this. We. Painfully. Brought this wind turbine for you here it's 3d, printed, looks, very homemade. We. Got it through airport. Security we, were I was taken. Aside by security, here at Google. Next, they, were they were wondering what this kind of base is. Because. This, wind, turbine thing isn't mounted, on it and as wires coming out of it and it has this like red, scary, light on it. So. I've. Done a lot of explaining, to bring this here now. I need, to stop this plotting to move on, I'll. Create an. Anomaly. Detector. For this and I'll select a. Time. Range for it now. This is pushing the training, operation, up, into. The clouds so. It doesn't actually happen on my computer. The. SDK will just do. An API call to train the model and then. We get the job ID and, we can vary, for the state, of that job it's. Not a lot of data right now so it took very little, time to train and. Now. We. Can create another plotter. And. It's. Gonna plot. Again, live. Data from this, wind turbine, but. This time it's also going to plot the, output, of. The. Predictor. Normally. Detector. So. How, can we introduce an, anomaly, here well, I'm gonna use. Brute. Force, I'm. Going to hold. It back here. It. Takes a little bit of time for the data to appear. In this Python, thing and now you'll see it's, detected, an anomaly you see their red background there. And. It should go back to normal, again once I've let, it you. There. There. You have. Live. Anomaly. Detection. Yeah. Now it's back to. Normal so. Operators. Can use this to monitor, you. Know if, you want to monitor a hundred, thousand time series you can't do that manually, you can't put it up on the screen and people watch it. Well. Now operators, can be alerted. To. You. Know strange, conditions, I think you can kind of look into what's going on and hopefully. Prevent. The. Next 100, million dollar failure, before, it happens. So, another very useful, thing that, we do like once, you've detected this anomaly your first question is going to be something. Like has this happened before. Or, has something similar happens. So. We. Implemented something called similarity, search. In. The API which is not really machine learning but it is a very useful thing and it is very computationally, expensive. To, do so, you can take a, time. Period of a set of sensor and look for that pattern in a different set of sensors or in the same, and. Find, the. Similar portions. So, this this. This talk also has all two ml in the, title and. There's. A bit okay. So auto, ammo. On. Oil, rigs you have, hydrocarbons. And. If. There, is a leak and you, have a spark then, it's potentially. Very. Dangerous, so. They're very. They're. Very they're, always looking to. See if there are. Faulty. Wires faulty, wires can be very dangerous, and. So. Typically what they do is they have these regular expect inspections, every six months or every year where they go over everything but these damaged.
Wires They they. Don't appear over time there they're typically the, result of someone stepping on something or you, know some. Something. Mechanical, happened at, a particular time point in time so, what we try to do is build a model which can detect faulty. Wires. So. That you can wear, a. Camera. On a helmet or you, CCTV, or other ways of gathering images. And come in the background always look for, this kind of. Failure. I'm, not trying, to replace, those, inspections, we, are trying to augment. Them and. Make it even better so. We. Trained the model using, tensorflow. To. Detect. Faulty, wires and also. Tell. Us where in the image the. Failure is this. Is a project that we spent three months on and. We were able to get to. Particular. Accuracy, and then. That. Was about ninety five point, five percent on precision and recall and, then. In June we got access to. Auto. Ml so, we figured would upload the, data set there and, see. How. Well that would do. So. What that looks like so. Training. A model with all two ml is just if. You if you can upload a bunch of images onto a web page and you can edit a CSV file then you can do it. So. We did that and in. Five minutes I was able to get model with. 96.1. And, 96. Point to some. Precision. And recall so that was half, a percentage points up and. Then but. Then we. We. Didn't give up we tweaked our own wobble a little bit more so we were able to get that one even. Higher. So. We. Were able to get to 95. Percent precision, and 99, percent recall. On our own model which. It's. Kind of it's better on one metric but not the other. So. It's. Still but. The big, difference I think is that we, spent, three. Months doing, it and it, took five minutes to train they're all 2 ml. Let's. See, what what, that can do on. My way here. On. The first day at Moscone I found this, wire. Outside. And. It looked dangerous. So. I yeah. So now I run the. Ultramel, on it and. It was not part of the training, sets this. Is kind of a new hobby for me to kind of go around and look for faulty, wires. But. Yeah it really, caught that quite, well and on, my way to the, rehearsal. Which. I did back. In June, I also found. This. Full-day, traffic light wire. This is kind of a knob that you push to make that pedestrian, light go green and, it. Also catches. That really, well so. So, I think it is possible to do. To. Do this and I think there is a class of problems that, you, can solve in industry, like rust and, leaks and, kind of using. Using. Camera, feeds as a sensor. That. Can be very powerful and. It's. Rememba Lin getting people out of harm's way. Yes. So I'll hand it over to Carter. To wrap it up. Thanks. Gear. If. We have any time yeah we'll have a couple time for four questions after this I'm sure Europe will build it join us yeah. So, quick. Review. Again, the. Cognate. Basically built the system on top of GCP focusing, on the, scalability. And the throughput of cloud BigTable, and.
Focusing. On the capabilities, of auto ml he. Was able to take, a a training. Exercise which, took three months with tensor flow I was able to do it in five minutes with Auto ml which, is really. Exciting for companies that want to tap, into this type of technology without. Having to hire an army of data scientists. That. Everyone can hire a gear and they're in their company so. Also. One, of the things actually I don't know if it was clear when things were switching around when he did the the windmill, was. Was. The you, saw some code but he didn't actually program. The the training model for that he, just highlighted, the part that he had done before and said this is bad and so. They were able to put this out in the field they don't need to have in programmers. They can have engineers. Who know what looks that on the graph highlight, that and then, expand it out to tens of thousands of metrics and be able to to, quickly identify things that are going going, wrong and that's, really exciting to be able to scale these things out to to, industry, we. Have replication, now in cloud BigTable which is going to provide higher availability and, some, great features around workload isolation, and it. Provides a great basis for large-scale. Data. Analysis. And machine learning if. You want to learn more about the, technologies, you saw here here's, a handful of links there is the cloud. BigTable. Main. Product page at cloud, google.com. Slash, BigTable. There's, a bunch of material underneath, that and then. We also have, the, various. Machine learning pages up there so you can learn about machine. Learning in general under the products page and, there's a master, page for auto ml and then at. The bottom there is the tensorflow integration. With. BigTable, that we launched this week and. So, that's all we have.