Run your Business with Real-time Insights from Data (Cloud Next '18)
I'm. Really excited to be here to talk about a topic that I personally am quite passionate, about which. Is taking. Data as it's being produced and deriving. Insights from, that data in real time to, drive business decisions, and actions. So. Over the let me set the agenda of what we'll cover over the next 45, to 50 minutes this. Is a 100. Level introductory. Talk on stream, processing, on real-time stream processing, with, a focus, on business, use, cases enabled. By real-time stream processing, so, what we'll do is we'll cover common. Stream, processing, architectural. Patterns, so, I'll highlight the architectural, pattern and for each architectural, pattern I will pair that with a few real-world. Use cases that leverage that pattern so there's a very pragmatic. Talk. We'll cover pragmatic. Patterns, and use cases so, let me also talk about what we will not cover what will not cover is we. Won't be going under, the hood and look at, how the various, stream processing, technologies, actually work, so we won't look at how they work we'll look at how to use them for your use cases, will. Also not be looking, at snippets, of code we won't go to the code level we'll stay at the architectural, pattern level. The. Second. Half of this talk is actually what I suspect, will, be more compelling to you we, are fortunate to have Sheriff. Hadiya tiller from go Jack come. Here and join us and talk to us about various, stream processing, use cases business, use cases, implemented. At go-jek I mean, a lot of these use cases have actually been implemented, by Sheriff who is a data engineer so we're very fortunate to. Have him here to talk about these use cases go. Jack is is a fascinating, company it is a fast-growing, company, based in Indonesia, and they're also they. Have a presence in other parts of Southeast Asia as well they, have a bunch of offerings, and I'll let sheriff. Talk about go check in more detail but let me just give you a preview of most, of their their note. Where the offerings, are in, ride-hailing, and, logistics, or delivery. So really excited to have sheriff here talk, about how. They've implemented business, use cases that leverage real-time stream processing, and towards. The end we'll leave a few minutes for Q&A. So. Before, we dive into the architectural, patterns, let's take a step back and look at the the big building, blocks for most end to end real-time, stream processing, applications, so. At the very left we have these sources, of data the applications, that are generating, the continuous, streams of data that, we'll be processing the first thing you want to do is take those streams, of data and, write them to a reliable. Scalable. High, throughput streaming. Data Channel think, of this as the, the big, pipes through which a streaming data moves, and. The two most common, options are Apache, Kafka, which is a popular. Open source option, which is available on Google Cloud and then, the other option, is pub/sub, which is google's fully, managed, and serverless. Offering. Well. After, you've ingested the data often, that that data is going to be in a raw, format, not a format, that is amenable for analysis, so you want to take that raw data and, transform. That data into, a format, that is much. Better for analysis, and you do that in real time stream processing, engines and on Google Cloud the two most common.
Options The recommended, options are either data proc or dataflow so. Data proc is Google's, managed, offering, for the popular, open, source distributed. Processing, engine, Apache spark. Dataflow. Is Google's cloud native solution, for both stream and batch processing, and dataflow. Is a fully managed. And serverless offering. Well. So after you've transformed, the data now it's it's ready to be analyzed. You. Can do the analysis, in dataflow, or data proc itself, both offer, fairly. Flexible, programmatic. SDKs you can do arbitrary, analysis, and processing, within, data flow or data proc but, another common, pattern is you take the transform, data and you write it into bigquery. Bigquery. Is our infinitely, scalable fully. Managed. Equal based data. Warehouse, in the cloud so. Once the data is written to bigquery you, can issue sequel, queries to bigquery and analyze, the data so. Great at this point we have analyzed, the data and we've likely derived, the insight that is of interest, but. The whole point of real-time stream processing, is to make that insight. Or the results of our analysis. Actionable. In real time which means that. That, invariably, means there has to be some end user, facing, application. Where, the insight, or the results, of the analysis, are served and to, do that we write the analysis, the, system's two data stores that can serve the data, common. Options here are no sequel stores such as BigTable, and data store cloud, spanner, is a. Transactional. Database that is increasingly, becoming popular, for stream processing use cases and. Of course you can write the data to bigquery because when a sequel. Query is issued to bigquery say, through the JDBC, client it immediately, serves, the results so you need some system to serve the results to the end-user applications. And that happens through, bigquery. BigTable, data store or spanner, there other options as well you can use cloud sequel, and so on and so forth but in stream processing, and these four options listed, here tend to be the common options now. I do want to highlight that on Google Cloud these. Tools are seamlessly. Integrated, so. That you can focus on your application logic and, and the api's are neatly integrated for you to transfer, data from one system to the other three, data and write write data from one system to the other and. The other thing I want to highlight is most of these tools are managed, and some, of them are server lists and what what I mean by managed and server list is a managed, offering is one where you as a user do not have to worry about spinning. Up clusters, managing, clusters managing, VMs and scaling up and down and serverless. Offerings, are where you can just forget that there's a cluster in the background you just submit your workload, we, take care of spinning, up VMs, on your behalf scaling them up and down and so forth so. It's really compelling because these are all seamlessly, integrated, and many of them are managed, and serviced so with. That let's start and. Dive into some specific, patterns. But, but first let me set some context, for simplicity. When I talk about a pattern I will have an architectural, diagram that just focuses on the pattern and. Just to keep things consistent we'll, focus on using, pub/sub, as our default streaming. Data channel and we'll use data flow as our default stream, processing, tool. Of choice I do. Want to highlight that in real-world, use cases you will actually be combining a bunch of these patterns but over here just for simplicity so when you look at our architectural diagram it's easy to read I'll just highlight one pattern at a time I just want to repeat the real-world use cases will clearly be a combination, of these and, when you look at sheriff stock that actually become very apparent, and. Before we move on just one piece of advice always. Archive. The raw data. With. Stream processing. Mistakes. Will happen and often mistakes, happen, because of errors in user code and. If you've not archived, the data it can become difficult to recover but if you have archived, the data then you can always backfill.
Recover By running a batch job and so forth so just a rule, of thumb always. Be safe archive. The raw data and of course you don't need to keep the archived data for yours on and keep it for a few months and then when you know that you know things have worked, fine you can you know discard, the archived data and often archiving. The data is very little cost. Alright. So with that let's dive into the first part and the very first part and I want to talk about is, very. Simple, it, just entails. Reading. The data streams events that come in and writing, them directly. Into bigquery, Vic query has an API called, the bigquery streaming. API it's a high throughput streaming, API and as soon as you write an event into bigquery through, this API any sequel. Query issued to bigquery that. After will, have results that incorporate, these new events, so in many cases to do real-time processing. Of data all you have to do is stream, the data into, bigquery as, it comes in. Of. Course that would, always work this works when your data comes in in. A JSON format, and, often you want the fields of the the records to match the columns of your table there are non-trivial, use cases where that is the case you have JSON data coming in at maps to feels your table you just write it in and you're good to go and, often, real-time stream processing, is as simple as that and I did mention this is a pragmatic, talk, I'm not always. Gonna you, know get into a sophisticated, stream, processing, and in many cases this is all you need and you're good to go but of course that's not going to be true all the time in in in, most cases you will need to do some processing, of the data because it comes in in a raw format, and you want to do some. Transformations. To convert it from whatever RAW format it came in to, some slightly structured, format, that is amenable for analysis, in many cases you will get malformed. Data and you want to filter out the malformed, data as. You doing the transformation, and cleaning you might convert, data from strings, to more structured, fields. Like date/time, fields, and inns and fields and floats. And so forth so you do that in cloud dataflow cloud, dataflow you can take, the incoming data and process, each event, one by one so over. Here we're talking about something fairly simple where you're not looking at multiple events and aggregation you're actually looking at each event in.
Isolation, And, of. Course in many, cases the events coming in are not going to have all the the information you, need to do your analysis, so what you need to do in many cases is join, the incoming, data with. Relevant, metadata, that will be in other data stores and in common data stores that hold metadata, our might, be in some bigquery table it could be a sorry. Bigquery, table it could be in, big table or spanner. Or even other relational, databases like cloud sequel so you join the incoming, data with relevant metadata from other data stores you have enriched, your data and now you have your transform, clean filtered, and rich data you write it into bigquery and you're, ready for analysis, now I do want to highlight that systems, like data flow they actually the SDK, is provide, very, easy to use operators, so you merely invoke, the operator, to do transformations, or filtering or joins and often. You actually have to provide just a small snippet of code that contains your custom logic that determines what you want to do with the events so. The SDKs make all of this very easy and of course these systems are integrated so moving data from one place to the other is also incredibly. Easy thanks, to the pre-built api's. Now let's, talk about business use cases now when, it comes to writing data to bigquery in real time this is applicable, across industries, and business business use cases this, is applicable, in any situation. Where you're, using metrics, and dashboards to, run your business which increasingly, is most, businesses, in. Many cases you just stream the data in and you now are having, dashboards. So when you issue a bi query, the results reflect. What happened, now they're, not reflecting, what happened yesterday and I want to contrast, this with what happens traditionally, where you might have a daily, job at an ETL job ETL is the, term commonly, used you have an ETL job that's taking your raw data and doing a whole bunch of transformations, in a batch format, and then loading it into your analytic data warehouse now, don't.
Get Me wrong there will still be use cases where you need to do that batch processing, because what, whatever, analysis, you're doing in that job may not be amenable for, a streaming. Pattern, but if you, look at your ETL, jobs you'll find that a significant. Chunk of those jobs can be converted to, streaming jobs and if you, convert them to a streaming job dashboards. That that are powered by that data reflect, the state of the world as of now not as of yesterday or as of last week, very. Cool so, now let's look at a slightly more sophisticated pattern, where you're actually looking at a whole bunch of events together in many. Scenarios. What you're interested particularly, when you're talking about real-time analysis, what, you're interested in is what happened, in a very recent. Or the most recent, window of time what, you're interested in knowing is, what, happened in the most recent, 30 seconds, or the most recent say 5 minutes or maybe even couple, hours but the key thing is you, want to know what happened in a recent. Window of time and as time, proceeds you. Want to constantly, look and analyze. What, happened within the most recent window and to do that you used fixed, or sliding windows there's a subtle difference between, the two fixed windows don't overlap, so one. Window, ends, and the next window begins and then, with sliding windows, you, actually have a large window that moves a little bit every time so there is overlap, across, subsequent. Windows and depending, on your use case you can use one or the other. Essentially. You in VOC you pick, the type of window you want you, invoke, the window within your code and you provide another snippet of code that determines how, you want to process that if the events, that fit. Into a particular window, and then the framework essentially. Handles, the rest the, framework will handle the creation, of these windows, and the, collection, of the events that fit into a window and essentially it'll keep track of time and as soon as a window completes, as soon as it completes it will call your snippet of code analyze, the events that fell, into that window so the framework simplifies. This windowing process, significantly so. Now let's look at use cases that are enabled, by this and there are a whole bunch of use cases let. Me start with retail, any e-commerce because, they often tend to be use cases that are easy to understand. And empathize, with. So. With e-commerce. Let's. Say you want to keep. Track of popular.
Items. You. Want to know items that are becoming popular, as, soon, as that that trend starts, to surface. And. You may not really care about the popularity of that item last week or yesterday, because you want to catch trends, early so what you do is you keep track of the searches that people are issuing in the e-commerce website and you're. Analyzing. The searches, that came in within say the past couple of hours and then, you know the products, that are popular. Now and, then you can match that with, your inventory, to know if let's, just say that the inventory cannot, keep up with the demand or if, demand. Is higher than the inventory then you can do dynamic pricing. Similar. Use cases where you might have some, product, quality issues a new batch of. Products. Came in and they have some defects, so now you're seeing a high number of complaints, or returns and, you want to capture that very. Quickly and and, when you're capturing this you're not really interested in the number of returns. Or complaints, that came in last week you want to know what happened in the past say, few hours and. Again, fixed and sliding windows enable them there's similar use cases and IT operations when, you're tracking metrics and you can see how that metric is moving. Over. A recent window of time and you can also contrast. That with the historical, information, it's. Also particularly. Common in finance where for stock trading strategies a lot of their strategies, actually look at, multiple. Windows of time simultaneously. Because they know how the stock is moving so a bunch of use cases that leverage fixed, and sliding windows. Now. Let's actually one more use case before we move on there. Are certain use cases where you're not necessarily interested in, what happened, in the most recent window what you want to do is you actually want to do some smoothing, and denoising, this is particularly true in IOT where you have a lot, of cheap sensors and these sensors are sending information but the information coming in is is often very noisy, there's a whole bunch of noise and you don't want to be making big decisions based, on the value of one sensor, reading so, an, easy way to deal with that is you define a reasonable, window of time and you just average, out say the sensor values within that window so that essentially serves as a simple, denoising.
And And smoothing, technique which is common, in IOT use cases and it's also applied in in similar, other use cases. Very. Cool so moving on now, let's talk about fairly, distinct. But very popular type of windowing, called session, windowing, and this is useful. For. Consumer. Internet and mobile applications. Essentially. What session, windowing session. Windows. Are used for is, collecting. Your click logs all the activity, that users, perform on the web application, or the mobile application, all that information is streaming in and what, you want to do is isolate each, user session, so you can analyze what happened within that session so, this is different from fixed and sliding windows because the windows are not fixed, and also you have a separate, window per, user so all you have to do is to, the in the framework you just have to mention what. Is the field, in your incoming events that corresponds. To the user ID and then, the framework will automatically. Split the events based on that user ID and then, you have to pass another parameter which is the. Duration. Of inactivity. That signifies. The end of a session because. They're you know many ways you know a session, has ended the. Most common way it tends to be ok the person is no longer you. Know active on my application, so let's consider the session to be done so you specify these two parameters what is the key that specifies the user ID and what, is that duration of inactivity that signifies, the end of a session and the framework, handles, the rest the framework will take these events split, these events by ID. A window, as has ended, and as soon as that window has ended it will invoke your snippet of code and and. No surprises here this is used by. Consumer. Internet mobile applications, to understand, how users are, navigating. Through that application, because what, you are able to do is get the events, that fell within a sessions you can actually understand, the fact that you know the user came in may have issued a search query then clicked on a certain button stayed on a certain, page for a certain duration of time and so on and so forth so you analyze, each session right the results to bigquery and now you actually have an aggregated, view of how users navigated. Through your application, and this is a very common use case particularly when you are deploying new versions of your application or you're doing a/b testing, you want to identify. Mistakes. Soon you want to identify something you might have shipped that is leading to a significant. Drop in a certain usage. Pattern that is critical so as soon as you run your a B test you have your streaming job analyzing, user sessions, identify, issues quickly and maybe roll back it's a very common use case and and a fairly distinct use case that was worth calling out. So. That was windowing, it's. Moving along, now. Let's, talk about a topic that is likely to be popular GCP and and google cloud next machine, learning is clearly a theme of the conference and. And many, scenarios, you want to apply machine learning within, a streaming context, actually quick, show of hands I am curious to know how many folks in the audience are interested. In applying machine learning, on streaming. Data. Alright, that's that's, as I suspected, there's. A lot of interest now in. This, particular scenario I'm. Talking about users. Training, their own custom models for their use case and that training that heavy lifting happens. In, a batch, format, and it happens, not within the streaming engine it'll happen in something like cloud machine learning cm, le which is cloud machine learning engine that lets you train your own models in.
This Particular diagram I'm using tensorflow because that was the most recognizable. Logo that I could use to signify machine learning but of course you can train other mortals, scikit-learn x3 boost or a bunch of other frameworks. That let you do machine learning of course the heavy lifting, is happening where, you're training a machine learning model but once it's trained, you can leverage the train model, within, your streaming application, to do influence or scoring, on the events, as they stream in now, of course before you can do the scoring and. I want to reiterate, the fact here that real-world, use case will employ a bunch of other streaming, use other. Patterns, that we mentioned often before the, you. Can pass the data to the machine learning model you will be doing a whole bunch of transformations. Joining. And enriching, the data from external meta stores is very common because you want to fetch features, you, do the transformation create. The feature vector pass it to the model for scoring. Now. Some of you may, be thinking that this, doesn't sound very remarkable. Because you, know it's rather simple, to take a machine learning model and invoke, it within the context of a streaming application, and that is true this as an architectural, pattern is not tremendously. Remarkable, but I wanted, to reaffirm the fact that this is happening this is a very common pattern and. Is actually very easy to implement pattern, and often within your streaming application, you can constantly, be checking to see if a new model, has been produced and when a new model is produced you take the new model or often. Customers, even run some a be testing where they're using bunch of models to score different events and see what's happening I assume already affirm that this is a very easy pattern. To implement with GCP tools particularly when you're using something like cloud, machine learning engine and customers, are doing it. Increasingly. Common, pattern and a lot more customers are doing it. So. Let's look at a few use cases ecommerce. And finance real time fraud detection that's, sort, of the poster, child for real-time stream processing and machine learning so, actually won't beat that horse anymore I'll move on there are other use cases for example digital, advertising where, you have all the click and impression. Information. Streaming in all the time and you can use that in real time to determine what, ads have, the highest proclivity, of being clicked so as the data streams in you apply machine learning model, and you you can determine what is the right ad to show for subsequent users. That come in now. These. Are all again fairly I guess. Relatively. Known. Use. Cases because digital advertising has, been around for a while I'm particularly. It's. Gratifying to talk about use cases in healthcare, where literally. Lives are being saved. Very. Commonly. Referenced, use, case is one of doing. Sepsis. Identification. Sepsis, is a condition. That happens, after fairly invasive operations, it's the type of infection and. Historically. Nurses. Would have to keep checking in on the patient periodically. To see how they're doing it to see if they're at risk for sepsis but, now because, patients are you know connected, to various devices and you're monitoring their vital statistics, as that, data streams in you can apply a machine learning model that can determine who's at risk for sepsis and if, a patient is identified, a pager alert can go out to a nurse so the nurses don't, have to proactively, keep checking and you can offer more proactive. Care. To the post-operative, patients, was. A particularly gratifying use case to talk about now.
In, The previous example the user had to build the machine learning model but that's not always true again. If you've you know at next you might have visited, a bunch of talks, where we spoke about our machine learning api is that do the machine learning for you you give us the data the images or the speech and so forth and we apply the machine learning and return the results and invoking. These, api's. From, a streaming application, is again an increasingly, better common. Pattern. And, again it may seem unremarkable because, they and the heavy lifting is happening in the machine learning ApS that is actually true this is come. To think of it a fairly easy pattern, to implement, but it's a increasingly, common pattern and just one piece of advice when, you're calling these api is you wanna for proper performance you want to limit the amount of data that's going, over the network so, commonly what people will do is they'll use a fixed, window or even a sliding window but often a fixed window so just to get a group of few events and then, make batch calls to the APS or rather make, calls to the API is with a batch of data rather than calling the API each individual, time but, again this is a very simple pattern but it enables, some fairly compelling, use cases and let me let me walk through these there's, one use case and these are actually used cases that customers have implemented this one use case where if. You have multiplayer, online sessions. This, is where you know a bunch of folks are wearing their headsets, and microphones, and they're playing say a first-person shooter a world of warcraft or what have you they're constantly talking to each other and they're playing against each other sometimes. These sessions actually turn verbally, abusive this. Is a common, problem and now with the, speech-to-text api's. As that speech information comes, in you invoke the API you, convert that speech to text and then you convert that text you apply NLP, techniques, to identify when it's becoming abusive, and then, you can take corrective action, in that session, another. Use case is finance where for trading strategies you want to consume. The, world text, news information, about you. Know, various. Organizations, of the world Thomson. Reuters for example serves a whole bunch of financial news you want to consume that news information invoke. NLP, API is to identify the entity is mentioned in the news articles, and also, the sentiment, of the news article and that is a rich source of information to determine trading, strategies. Another. Example which is. I. Guess a du, jour example. In in today's world is, for. Home security where, you have your camera. Feeds security, camera feeds coming in and you can invoke vision, api's to identify, entities in the, the image, frames and you can use that to dynamically, determine, where, when there might be a physical, threat outside the security cameras outside your home and so forth so. Very cool simple. Pattern. But very useful pattern, now the last part I want to talk about is a slightly. More sophisticated pattern. Than the other ones we've discussed so far it's called stateful, streaming, and this is a pattern where as the, events, flow by your maintain maintaining, some state information, about the entities. In your events, and you're using that state information to take some business position and. Again the key thing here is the framework really simplifies, this it gives you API is where based on a key such as a user ID you can add. State, information, and that information is kept in memory you can add state information you can read, the state information and take action or you can update that state information so you as a user often, just have to read snippets of code to do the add V. Lookup and update operations, and then. The framework handles, the rest the framework will make sure that this state information and it can be arbitrary, information it's up to the application to, decide what is maintained it's. The framework will make sure that this information is, maintained in memory and is available in, a performant, fashion so and these are often distributed, application, so it'll make sure that fault, discovered that as, if a VM goes down for example the state information isn't lost the framework will take care of that so you as a user just have to focus on your lookup, update, operations, often, let. Me just talk about one use case where this is used in, the use cases ecommerce where you. Have click events and, events that tell, you about the activity of the user on the e-commerce website, as these, events come by you can keep information, the user for example search, for these products the, user added a certain product to that cart removed a certain product for the cart and so on and so forth and based, on that information as the next event comes in you can dynamically, determine, how to personalize, the experience of the user so you can dynamically, determine, for example that, hey the user removed an item from the cart but is still searching for similar items let, me give them an offer or let, me dynamically, adjust the recommendations.
That Are made for similar products and so forth so, that's what stateful stream processing enables, and there are other use cases as well for example we have a customer where in mobile gaming they're, dynamically, customizing. The difficulty, level of the mobile application, when. They identify that a user is getting stuck so, whole bunch of streaming, patterns, and and we did a whirlwind, tour of common patterns and next. Please join me in welcoming sheriff to stage and talk about go-jek and how they're using, real-time stream processing. Thank. You. So. Good afternoon everyone I'm. Sheriff from Gajic I'm, not. A engineer from Joe Jack in a bi and today. I want to share with you about the real-world examples. Of kojic using data, flow as our data. Streaming, and patch pipeline. So. COGIC, is. Internation. Own technology, startup that. Has. Specialist. In. Transport. For, delivery and payment, and. Kojak's. Started in 2007. 2010. With only call center with. No application, we created object, services, object, is like read, Helen surface motorcycle in Indonesia and in. 2015. Kojak, launched an app with, only three services we, have Co Mart we. Have Co, food and Quran itself and in. 2016. Kojak, have expansion, in 50 big cities in Indonesia and create, new services since, and, this. Is our application. Our surface by now we have currently. 18. Products serving very different needs and, this. Is collect, food pills nationwide, currently. We have 1, million driver which, completed 3 million chips per day and our, application ordered to noted more than 76, million, Toone. Loads and we, have more, than 200 thousand, merchants. In 50 big cities in Indonesia and. We. Moved to our consecutive. Growth. So. In. Bi this, is our current step in second quarter of 2018. From. The previous month until. This one our. Creator of data, growing, more. Than 30%. From. Per month and from. The data we, can create more than 700. Multi-resolution datasets, and from. That we can create more than 20,000. Data points from. Our data analysts, we can they, create the. Meta base and tableau, sheets and. From. There, they create more than 13,000. Data base car and double sheets and our. User which, means our analyst, credit, the. User is more than 2,000, moment and this. Number are. Increasing, day by day it means we need more. Big. Platform. And more, scale for that day, so. What. Is sharing for what, is the challenge from the previous. Architectural. We have, previously. We use Pacific. Well Pentaho, and also, the paddle script and the, previous. Profuse. Architecture. We. Have. A report. At the day plus one so we couldn't identify. The. Problems at the moment and we. Already, also. Had. Had. Concern. About the bosnian power because, that occurs, in patch and it already big so. We need to process offer two billion data in one shots because, it's, big, so, we need a bigger machine with bigger on that, because cpu and we. Need takes. A long time to, process the data and the, business needs more, real-time, data insights so previously we use.
Most. Basic well as you may know that pasta sequelae needs, maintenance. Daily, weekly. Monthly at least you have to archive your data you. Have to vacuum your data and you. Have to, index. Your data to remain, your very fast and you also. Need. To. Yeah. Maintaining, the, position server, itself and we. Also use the Pentaho and pattern script to to our atl, patch which. Comment. By the air. Flow, then. What's next we. Move to our tcp, platform. And what. We want is we, want high performance, capability, with minimal operational, maintenance using, data flow with. Marcra neural data with high fellow city and less, London's using stream processing and we, want to Apple to soft with these problems with real-time data insights, so, at the moment we, stack driver to Motorola Rock jobs, and the machine and we. CCS for our data leak we, use bigquery as our analytical. Purpose, database, we, use pops, up to deliver our streaming message, and we, use data flow for our streaming, data pipeline. So. We move to the use case before. We move to the use case I want, to show you about the, go check goals so. What is Kaushik goes the. Gaggia calls is on the roster, metric so, what is north parametric, north. Automatic is something, that you want to align with, the organization, behind, so, the entered only knows what conventicle is so, at Gajic that would be complete, booking transactions, so. What, is metrics that matter when, we want to increase the composite, beginning transactions, we, have also. To. Increase the total bookings, we have also need. To consider about the allocation and reduce. The cancellation, rate total. Bookings, is. The. Bookings, that credit by the customer, and the, allocation is something, that we, want to align. About, the customer. And the driver and the, cancellation rate is. When. The. Customer. Created, the booking, the. Driver couldn't, complete it and. For. That use case I want, to show you that, when. The, allocation, is mismatch, between, the driver and the customer. The, combat bookings would wouldn't, be reached so. How, to rebalance supply, and demand and, how. Do we view. It in real time, so. This is the project supply demand use case. So. As, a user I want, to know which location have, mismatch levels, of supply, demand in, real time and I, want to know the particular trailers are please on the real time editor aggregation, so, if you see the map in, front of you if I hover our pointer, in the, small. Dots we just try for I can. Know. The tray. For title, information according. The name the, plate number and also the rating. Of the tray for itself and I, want to able tone it if I prefer in, the main area to move to hide him in area. So. This. Is after this question is the pipeline we already, made and. As. You know we have. So. Many pipeline here we. Have three big pipelines in here with. One is streaming to click on pipeline, the, second one is dimensioned, pipeline from the bigquery and stream. Tray for location pipeline we got from the pub sub tray for locking show lock or the, driver ping. For. Your information our driver ping is. Created. Time every. 10 seconds, it means and that, day we more, we, have more than 10 billions, data pings. So. From. The Civic putting pipeline, we.
Consume From our pops. Up booking lock or, messages. And we, created, model. Which grouped by the su, ID and, also. The status and we used sliding, window as an animation before the sliding window we, are only concerned about the, current, window. And weak. Filter, with, only created, status. And the. Dimension pipeline we query from the query we call from truth, table location. Table service, Allah trouble and, also the traffic profile, each, of them will create age, motel, and will be refresh or windowed, pertain and on. Trim. Driver obligation pipeline we consume from pub sub preferred location lock and we, created a model and we, use fixed, window and. After there all of them will be joined and, will. Be stored to our strong, stream. Hemlock memory store and our API will. Call the stream or memory store and we'll, create the mapping from this, and. We. Move to implementation, for. The implementation we, only focus about the how we could, so. Which, is good, the data we. Just got the pipeline using the aperture beam SDK and we, deploy to the data flow as, data, flow and that's it we didn't sphere very machine any cluster, or, stackdriver what. We have to do is just push the the, code to the aperture beam and about a beam will do the rest like. Creating the machine and. Create. A pipeline field for us and also, the spectra for by default. And. This the pipeline implementations, that, already. Happened. As you, can see this, is the auto skill, capability. Of dataflow. Which. Means we can data. Flow gain skill. From one worker to, Sarah Walker based, on how big your data and hope, your processing, is. And. This is scatter my implementations, is for fast forwarded actually. Refresh in every minute, okay. So. The, idea of, the. Inside, in here we. Got that the reciprocal done last long, which, is good because, the. Booking might be big in this area but. Not for a long time because. The. Driver took the order immediately, so. The. The. Aggregated booking, the credit will be disappear in the after several. Minutes. But. That. Insect, is not enough. How. If in, the, past equal area there are so, cancelled, booking happen in that day, so. If there, are many booking it. Can be completed or cancelled, how can we make sure that every. Booking is completed, not, conserved. So. That's. Why we created extended. Use cages and. This. Is our. Estimate, these cases, to. Cancellation. Aratus kiss previously. We, got this, chart. And the, day after a day plus one and we, couldn't say that oh there's. Something. Happened yesterday there's, a spike. Of cancellation, yesterday but why. Was that happen, but that was too late because that was yesterday. And maybe, today. It not, happened so. As, a user I want to be notified when there's huge. Cancellation, at that time so, I can determine the root cause and fix, it as soon as possible. You. Know this pipeline before these from our menus, kiss from. This use case I want to show you that in a pudgy beam we, can create, a fan out from. This previous pipeline, so we don't create pipeline, from, scratch what. We have to do is we have to concern about the stream booking Kong pipeline only because. We. Don't care about the dimension pipeline and this dream trip stream, plate for location pipeline so we movie dot and. After. We created the model which, is crewed by booking status and s2 ID we, create a find out, to. Sing to pops, up to new stream and will be, consumed. By our cancellation. Pipeline, and the result will. Be like this is, the our stack driver, as. You may see for example if. The. Threshold. Of castle booking is more. Than 10% of total booking, it will give a lot of they, give the alert and. After. It. Go. Up the, threshold, will give us the alert like, this alert Castle cook is more than 10% but. Again. That's. Not enough, how. Do we know the reason after we, get the alert so, what I couldn't. Do anything about it. We. Have to know the reason behind the castle, booking. And. These. There are another use case again. So. As a user I want to know the reason castle booking in real time and I want to have to notify myself, and the. Merchant if item, may be sold out or maybe the price difference, with apps, and. Current. Store or maybe story is closed out of the scheduled. And. This. Is the, new pipeline that will create it we, still consume from our pops up booking lock we filter status by the castle. Only and we, create the different motel we. Created. Back route by there as to ID and the castle reason and we use fixed window and then. We, sing to the our pub, subtotal stream and we, created notification, system and also, this. Pop one can. Be synced to our pay queries so our analysts, can determine the. Their. Result. Better so. This is the. JSON. Format that we. Sing. To the pops. Up downstream it includes, s to ID s to ID and, the. Merchant name and also castle reason and how. Many it happened in that day. So. After the. Use cases to use, cases I want to share with you about the three takeaways that or little version about.
Using Data flow we, data flow we. Can create a streaming data processing, we are able to solve the same problem with real-time data insight, and data, flow it was high performance, scalability with. Minimal operational maintenance as. You already. Get. That data, flow can scale from one to several, workers based, on how big your data and how complex your processing, is and our. Report, will be different less than a day so, our analyst can, do more. Getting. More insight, with, more recent data and that's. All for me I will back to you Anna thank, you very much. Thank. You sir, please do stay on stage. So. It's, always useful, and, quite fascinating, to hear about real, implementations. From customers and I guess, one thing I want to highlight which was particularly fascinating, was the fact that they built one complex, use case and then they found out Oh guess what, I can, take what I've implemented, and, with, small tweaks, converted. Into a new use case so that was quite fascinating, and, and the tools on GCP do, enable. That easily, so, if you want to learn more please point. Your browser's to this URL so, essentially, cloud.google.com. Solutions. Under, the big data category, streaming, analytics and, and from that website it has links to other deeper, information that you might want to. That. You that you want to get access to so, please do direct your browsers to that website and, with. That thank you for listening, to our talk we. Hope it was useful for you.
2018-08-02 17:30