GeoTab - Data Management Platform with GCP Technology
What's cooler than a petabyte, of data using. The data to prove your customer business your own business and even the world hello I'm Felipe, Hoffa a Google cloud developer, advocate, and today, I'm in Toronto Canada, too. With my good friends at Utah, they, are the world largest provider, of telematics, tracking, more than a million vehicles, on the, seven continents and, we're going to beat China South their, senior data scientist, who's going to show us how they manage all of this data and even how you can, use it to understand, how the world moves around your, house so, let's get started, join, me. Well. Here with us our senior, data scientist, do, top avianna hi Phillip, very welcome to Toronto I'm so happy to be here and, so, what does tableau so, geo top is a telematics, company will provide open, platform, fleet management, solution. To, business, of all kinds, and all, sizes so. We're now having over, 20,000. Customers with. Us and, those customers using. The. Geotech devices, to connect their, vehicles. To their business so they can manage the monitor the. Productivity, either the efficiency, the, safety and the. Compliance. Of the, drivers and vehicles so. We're, now having, over. 1 million, devices. Ain't. Running vehicles and from, those vehicles we're collecting, over. 2.1. Billion data set per. Day so. Whatever, device, look like this. Is jus Tapco device you. Can see it's a small box beautiful, and there. Is a whole bunch of magic happening, inside the box collecting, the data it's, actually. Very easy install, you can just plug, in play, even. I can plug it yeah, we have a harness already, installed on the truck and the, only thing you need to do is just plug, the device in and after a few, seconds, you will hear. Pink. And now it started to work oh no, it's data. To, your platform that's, right, how does data get it to the platform once. The device, filtering. And calibration, of the, data there, will the device itself will just ensure that, we, are sending the, data that carries the, most information, with, the least Network, cost so withing I believe. Was in several seconds, the device data, will goes into, customers, databases, as well as our Big Data environment for. Analysis, you. Are a data scientist. What's. Your job by, the scientist one, part of my job is, playing. With those data I'm very happy to have this large volume of data to work with from. My, can create whole bunch of analysis. From different angles to help the internal, users, to. Diagnose, any. Possible. Potential risks or flaws, or bugs and on, the, other hand I also work as a building engineer, so. As an engineer I need. To ensure that I, design. And develop. A reliable. System. You. Have to create the pipelines store the data securely, yes.
Analysis. Is possible for everyone yeah and his business grow I need to make sure in my application. Survive. For that long time and I also grow. Cannot. Dab to the future change value. Scale this, platform as, it grows more complex, it collects more data more, people are interested in looking at it. Long. Ago did you start I. Started. YouTube in 2014. 2014. Four, years ago yeah I mean. 2014. You chose bigquery, because you, were able to finally, put all of this. Actually. Grow with bigquery we, tested, with several called providers, and then women was. Google bigquery and we were very happy with the performance of. The bigquery API and, also, the capability, of pick return very big queries based, of large. Scale of data we, were quite happy with that and was there was big hurry so, how does it look like now, so. We're, having 2.1. Billion data points per. Day. Yeah. When I joined in 2014. There were around, 300 million, data, per day so it's, almost. Six times of grow you. Would have encouraged. Other team also to push their data, to. To monitor do. Your work in in, bigquery. So. Right now a few goes into to have office you'll be surrounded, by the. Dashboards. And monitors, and where. The whole company, is using data to monitor. The work and our. Work were. Data-driven. Companies, actually can, you show me one of these that's work yes, okay, as, I mentioned, we're, just these dashboards to, see. How much data come in and how quickly they're coming. To bakery wait a second, this dashboard is a Google slide yes. And it's, a life dashboard. It will be refreshed, every five minutes, well yeah. And, what can you see here so. From, the top you can see the total number of records were in pink right now is over like one petabyte, and. More. Than a petabyte right, and, just, for today we, got over two. Point, 1 billion rows already well. Compare. It to other, reports, a year ago it's, almost double and. From. These. Gauges you. Can see like per. Second, we are having over. 16. Thousands, of GPS records and, GPS. Records is, less than one third of our total data. Points, so this advice is not only recording, the GPS position so, GPS. Position, just one third of it we.
Also Get, all. Kinds of the, engine, diagnostic, data, from. The vehicle. Engine, computer such. As whether. The driver is but buckle. Is the, seat. Belt, how. And. Also. The fuel, consumption. Whether. There is any like diagnostic, signal, come out from the computer. All. Kind of things, and, also the temperature also. Temperature so you're tracking part of the inside, of the track plane to your connection yes, yes, and then this data comes out from the device if. Your platform it to be query right, and you're tracking the time that it takes to do all of this yes, so it's only, seven. Seconds seven, seconds, from device to bigquery right and, you have the chart. Of the latency monitoring. Performance. From, our servers, so. Tell, me more about this like that's, really in the background we're running queries. At. The data in bigquery we. Use the Jupiter notebook which is, hosted, by Google Data lab, we. Used it we, write queries in the log book and, produce, all. These. Pictures. And, gauges. We push up these. Guys on to Google Cloud storage, and this. Google slide can pull those files from, Google Cloud storage and produce, this beautiful, dashboard, and. Jobs. Are setting and the. Google slides, can actually refresh itself every five minutes that's a really creative unity, of life and, then. To orchestrate, these queries oh yes. Because queries, do have dependencies, and. We. Are using the Apache. Airflow, to, orchestrate, those dependences, and it's. Easy, for us to know is. There. Anything going on with either. Of the, dependencies. So the query flies, airflow. What. Else. Also. We're doing Marshall learning projects. Yep so we're using a tensor, flow as a framework. We. Deploy, applications and, pipe lies on to a Google, data flow and also are, the Kuban any clusters, can you show me one of your current, projects, yes. It will be very exciting to show you, one. Of our website, we just launched it's called data dodge your table comm do, tab see the. Value in big data and we. Want to show that. Those. Value, of big data can. Definitely. Empower business. Because. We're not just about telematics, and telematics, is not just about GPS, in a position we, have the, engine diagnostic. From the vehicles, engine computer, we, have all. Kinds of other data that comes from the auxilary. So. Jus. Cab can actually provide an open platform so. Business, can use those. Data and even. Marry with, other data sources they have to. Create, creative. Tools. Or, reports, or any. Intelligence. Out of those data so, as a start point we, produce, a. Set, of. Intelligence. Data set to, help people to, show. People these. Are the technical things, that we can do with the data, aggregating, data within. Multiple customers, yes, within multiple customers. Because we want to make sure there's no any privacy. Issues to confidentiality issues. And the customer, need or the users need to worry about their data and we can see what's happening within cities, that's, right, either. The government or, the. People can divert. This data to, do. Some solutions, for the smart city even so, for example, we. Analysis. What are they like. Areas, with high risk of in. Incidents. So you're empowering, smart, city governments. And drivers. To see where are the dangerous areas, can, I see the website yes of course. Theta. Dot, your top cop within. The. Website you. Can find the, Data Solutions which is the. Ones that I mentioned that, we create some. Intelligence. Data at as a starting point we.
Have Three categories of those, reports. One is weather so, you. Believe it or not like the. Vehicles, are actually collecting the. Temperature, and the, barometer, pressure we, can use, this data to report real-time. And hyper local temperatures. You're, getting hyper local weather, from all of these tracks. Granularity. With the, regular, weather. Station, might not be able to we. Also have the urban thrust structure like. The high risk driving, areas. Are one, of them if, the third category is the local. Location. Analytics, so for example where. Are, the, most popular gas. Stations, and what. Are the gas station types maybe, some of the gas stations are exclusively. For truck others. Are for passenger car, so, we, have those data and can help users to locate their. Proper, gas station, and also. About, the parking parking, lots we you, can interestingly say, like, what are the, circle. For parking, so show me a little more about the everybody. Infrastructure. What kind of data you're called yeah there is, you want dataset at I produce which. Is the. Hazardous. Driving area, as I, mentioned, we're collecting accelerometer. Data, so. Our, former, engineers. Calibrate. And, clean those data and, we. Also have a whole bunch of talented, engineer. To do lots. A lots of experiments. Within the real car the, purpose for that is like we, need to find a threshold of the, acceleration, so that we can detect okay. Right now the, driver is doing a heartbreak or harsh, heading or a hush cornering, and now you're counting the number if you then yes, and also as we know that the. Bigger the study is the, more the incidents, would potentially, be so. We, counted traffic flow within that area and use, that to normalize, the incident count and then, we come, up with a score, measure, what's. The level of risk, can, we take take a look at the data yes, yes and so, as a user you can, actually give your email address to register into this website. And then, within, maybe, several minutes you would get a response email and just, follow the instructions you can get into, our. System, to get the data so, anyone that registers. And agrees to the license, of this data can, take a look at it and find, what. Are the most dangerous areas that's, right how, would a quarter you look in a white paper and, website it has introduction. To what. Does this data said do and how people can use this, data, set to produce a, whole bunch of insights we. Listed. Several examples. Here as the. First example we're, looking at the city's prone to hazardous driving in, Ontario, Canada. But. I live in California what if I wanted to see what are the most hazardous. Areas. In California that would be interesting. So. Right now suppose, I'm already in the Vickery did I said you, already have permission to look at this table what, a little share within seven seconds we get the results.
And You, can see a. Little. Bit the. Top cities is, Los. Angeles because, it's one of the biggest cities - yes. Followed, by Long, Beach San, Francisco, San Diego, Oakland, and. If we wanted to find the points that were the most dangerous, point. Okay, if you, want. To look at Los Angeles. This. Square is going over much, later. The. Status ed itself is not, very. Big it's. Megabytes. Let's go, so. The, fact of this data is not impressive these tables are about ten megabytes but, what's impressive eh how, much you, produce them right, in order to produce this, small, table were actually, run, analysis, best of at. Least one year of GPS, and one year of acceleration. And, maybe, other data centers we need so. In. A background there, will. Be like over 50 terabytes of data or supporting, this so. You're able to do everything with in bigquery you can analyze those 50 terabytes publish. The new tables, and make this tables available, for, anyone that wants to access it that's right so here the columns we, have the. Average. Latitude. In average longitude, this, is the, location of, it's. At the average location of the incidents, happening, within a scope, of the area and the, size of area is around. 150. Meter by a hundred, and 50 meter have. The 50 meter square, yes. And it will also give the boundary points, of this, square we give four. Points, the, whole, point, and also. The city-county if, it's in the state and, the. State of, Poorman province, name country. Name is, or country code this. Every nice goal that I mentioned, we come up with those. Measurement. And, the. Incident, total. Total number of incidents in, fact so you have one count the first row. This. Case has the area. With the maximum, severity, score, yes. That's right, and we also. Break. Down the incidence by, vehicle, types which, is like, passenger. Car hot. Truck on. The road truck or on, the road vehicle. Off the road we'll call a track tractors construction. Vehicles or. Electrical. Vehicles and even like natural, gas vehicles, but, in, other columns, we are capturing, all kinds, of New Yorkers data so. You are able to see how traffic conditions vary. Depending on the type of yes and people may be also. Interested. In ok what type of vehicles, are collecting them are creating. The most incidents. On certain, areas, so, they can help their analysis, based off more, granular, data. And. Then of the first row you have the most hazardous one and the second row we. Have one with that, have had more incident, yeah. If, we wanted to look at that. Yeah. Let's, copy copy. The incident. Position, and open. Google. Map with. A street view we can jump into the, exact place. It. Looks, like, this. Turn around.
Distribution. Center, or. Construction. Area. Which. Like, the driveway. To the, traffic, without any traffic light control, okay. Show me a little more about that like what's. Happening behind the scenes to. Produce these tables as, I mentioned we use notebook. To explore. To write queries in this bore the, results, and, actually. Running the. Query jobs so. First we need to like. Calculate, the traffic flow to. Normalize the data we. Need to count the incident, from the raw accelerometer. Data comes, from the device and, also. A very, important. Step is we, remove, for, the differential. Information. From this data and then, we aggregate, and we. Set up. The. Jobs running, in a background, month by month and, day by day to. Collecting, data and run those reports, that's. How you end up with a query, like this one yeah we're, able to actually leverage, the power from bigquery we. Can write like, the user defined functions, UDF's. To. Set a custom my, state date that we want to look at because. These queries. Running, as job, we cannot manually, set a date so we use UDF to define to get what's, today's date and. With. These. Sub. Queries so you can make. This queries look beautiful, and easy to manage, then. For each new hash for each hour you're, getting there yes, we, were able to use such, as array, aggregation. Function, so the we can easily like. Combination. And manipulate. Many, point data is how much data with this query go. Through this. Query so. For one day, of GPS. We'll. Probably have, over. 15. Terabytes 50. Yes. And, these queries can just finish within, minutes. Minute. Yeah it's really amazing yeah, if, I was everything, you're doing it's. Really good and that's. How you are able to handle complexity, you. This within. L flowed yes. That's right there are lots of such, queries running in a background and we manage it in, in. A. Diagram. Dependency. Diagram in, the airflow so, we can monitor them and what are the plans for the future, so. Definitely. More intelligence. Data said I'm gonna, come out more intelligent, data nice and, also as. You said these jobs, are, batch. Jobs around. The analyst, based on historical, data and in. The future right, now I'm working on the. Streaming, analytics. So. We're hoping in the near future we, can really, see, the. City map in the. Real-time fashion, say for example you're gonna say. What's. The traffic, flow now, on the roads across. The road so you will be able to produce streaming, analytics with, better flow was. Data flow yes. Apogee being in Cooper Nettie and also yeah. And also my. Coworkers are working on a machine learning project to, learn the driving patterns of the fleet in what. We know it we can help the, users to compare, the, fleets we, can kind, of benchmark the work to. Improve the productivity in, efficiency. Also. The safety and compliance yes, so with Matilda you will be able to extract, more knowledge, produce. Actionable. Advice, improve. The safety improve, the productivity, yes. We, expand. Our insights, like we always did it we did it well but with machine learning and the big data we're, going to have. The capability to. Do it with. More. Deeper, insights, you, know I'm really, impressed how you went from like this, device. To disparate databases to. Join all of this a different place having how the platform, has grown the last four, years it's, impressive, yes, I'm, so glad you were able to you, wanted to share this data and, analysis, with me yeah, right because your hub is believing. In management. By measurement, and we. Want to help other, business, to do the same thing and it's, we're. Driving where. Make decisions, are, driven by data and we're, gonna help people to do it - that's. Really awesome so thank. You for joining us everyone. Well you can follow me at Felipe, Hoffa and keep, tuned for more that. Videos. Stay. Cool. You.
2018-10-19 14:04
Awesome!
Haha she's my junior high and high school classmate:)
Wow, this cool hat at the beginning of the video