Serverless & AI/ML - Pittsburgh ML Summit ‘19
My. Name is Charles bear I'm a solution. Architect here, based in this office so. I have a talk. To give on, some. Surplus and AI ml, solutions, we have one, that I've developed another. One that we, have developed. Jointly. Here, as as. A part of Google cloud for a solution. Alright. Good, to go okay, so that's what I said, yeah. An ml. So. The first one is a project, that I worked on recently and this. Is you, know there's a trade-off when when you're looking. At to build or to make. Use of some a I and ml as a part of your application and the point of this project was to, make it easy for one of our customers to be able to ingest. The whole series of different audio files and these, were long ladee files they had a platform that, they would accept audio files from and. They weren't able to categorize, these in an, intelligent, way or they, could categories. Of subsets of them were, they relied on their users to categorize, them for them so wasn't optimal for them and so we came up with solution, here that was it was pretty interesting so the main problem, here again, they've. Got a website they, allow, users, to upload, any of their data any, of their audio. Files and for. Example these could be things like, podcasts. And, then. Those. Content, would be prioritized. And and. Categorized. In different ways tagged make, it easier to surface and for other people to find so, this was kind of the the, main point of it here that the problem with this is that they, essentially allowed users to do that so they. The, the content could have been not. Properly tagged, when somebody uploaded it it may be improperly. You know inadvertently, or. Maliciously. Tagged in different ways and. Then you're relying essentially, and your users to be able to find that for you so this, wasn't an ideal situation for them but. It's also not an easy problem to solve right because in order to do this you would have to then, go through and process all the audio you could easily listen to it we could process it somehow you, could write. Your own, I'll wear them to go and process, that data and in, translate, that audio file into text and then and do some things with it. Essentially. There's, a couple problems with this you, can imagine if you have operate any kind of site at scale, you'd, have a lot of different West's you're not just going to have one or two uploads. You're gonna have thousands, potentially hundreds. Of thousands depending on the populate of your site this particular customer had, a pretty, big problem with scale they had a lot of lot of audio submissions so you need to be able to do this at scale, the second one is performance, you need to build to do this relatively quickly or, you need to be able to decouple this so you have a whole queue of different files that, you're going through and you're able to do, the transcription and categorize, them appropriately, so how do you do this at scale and while, still having good performance, with your with your system that was the other challenge and the last one is, you.
Know The audio format itself, is not necessarily conducive you, don't you don't have a simple, way, of understanding. You. Have to do a significant, amount of translation, from the actual hurts that are recorded and the. Way that's represented, an audio file to actually, think about, categorizing. That and that's because there's are essentially two steps you have to convert that to text understand. What kind of text that would be that the context of the text and, then make a decision based on that point on the categorization and, how what what specific action you would take so. Those are the main main, problems, to, solve that I had working with this customer, came. With this reference architecture, fancy. Server and for a reference architecture but. The the main point here is if we sort of walk through this we we, made use of a. Couple of machine learning api's. That, Google provides and in, one of those is our speech of text so this is pretty should be pretty straightforward with speech of text it takes, in, either an audio file or it, could be actually you can actually stream, real. In real time the audio back and forth and get responses but. I'll take that audio file and convert it to text so it seems like a relatively straightforward thing, to do this, is something that you, know as as a, developer, you could decide to do yourself and there. Are some benefits for, from, doing yourself for example if you, are working, in a problem, or a domain space where you have very specific jargon, or a very specific terminology. That means a certain thing so, for example financial services might one of those where if you of you're thinking. If your recording. Has very specific, terms, that, means something really only to that your your domain you, may actually wanted to build your own model because your, audio model may or may not or the the speech, of text API, may or may not pick, that up in this, case we're, essentially, mostly. Focused on podcasts. And things like that which are essentially, relatively. Straightforward, we can make some assumptions about not being in a specific business domain so, for that case we, decided to save a lot of time and energy you've actually developed, developing your own model by. Able to just using the API and it's really straightforward API, you, send in an audio file and essentially, over, time it. Will go, through that that audio file and then return to you a set of text which is which, is in contained in the audio file, built. Into the platform and that API is the ability to, you. Know essentially handle, that long running kind of polling, for. Your free application, so for us that made a really good really. Good case for using that because we didn't have a specific reason to develop our own so. For, us performance, scalability and, our. Our ability, to meet the requirements, or well-suited. To to. Use the speech-to-text the. Other one is our natural language processing so. Essentially if you take you that text that you've converted now how do you categorize that pull, out entities, or sentiment, analysis, from that so, again you faced, with the same choice you can actually you build your own and this, is you could build something like this with tensorflow there's. Many there's a couple of open source ones you could make use of as well so, you use someone else use transfer learning to train the last bit and use. That as a part of your API for. For our purposes, what, we wanted to do is plot entities, as a way to help categorize as well as sentiment, when it was positive negative or sentiment. Relative to individual, entities so, our our. Cloud NLP, API provides, that for us made, it really straightforward as an option to you so again we decided to use this against. Our, requirements, for scalability performance, and be able to meet the business case and. So, we. We made use of that as well, a, couple, other things where the worth note on that I'll sort of walk through this. The, the. Glue. That stitches, this all together is essentially, an asynchronous.
Type Of cloud. Function the, cloud functions are asynchronous type, of you know is very good for building an asynchronous, event-driven, type, of platform because, it decouples the, the real-time the synchronous requirements, from an application and allows it to be driven, by specific of events in this case, because. We stitch several different api's together when, the results of those api's became, available or we're scheduled, otherwise we, went through and we were able to - checked and so that actually helps from a scalability, perfectas perspective, because we have can, scale up the number of cloud functions, we have we have a large load it will automatically scale to, do that it, also allows to scale down when we don't have a large, load as well so, it doesn't require constant, running so therefore we can reduce our costs. As well as usage. So. Spending, a minute walking through the the architecture. So. We have audio files obviously in this is sort, of a step one somebody. Uploads an audio file and this could be as a part of any of your process but in our case you know we essentially, assumed that was that was the entry point we would upload it to our cloud, storage and that would be the beginning of our business process here, so. Step, number two we, have a kind of a handy function within cloud storage that if you put something into a given, bucket and cloud storage it, will kick. Off a trigger and that trigger would, allows. You to. Then notice and, take action on that trigger so. Our our, cloud function, is also, triggered, by, by. That that file that's being placed in there so every time I files place in their users, uploaded, their files, places as a part of this process it'll kick off this cloud function this, cloud function will then go, from three to number four and actually, call the speech-to-text api, so we take, that file sends the speech-to-text api, it's, gonna return a job ID because again we're sending in a file so it's gonna get me a job ID which, i can check overtime. What. We did here in in. Number five six and seven is, essentially. Once, we got that that, speech-to-text, job, ID was out okay we will actually we need to keep checking on that so we would publish it to two pub sub in, pub subs another part of the event driven, architecture. Which allows you to essentially build it scale because. Pubs, you can send individual messages and, then other, computing. Components. Like cloud functions, or. Really, many of the other components, of the platform can listen to those. Messages as well so, when we publish, that that, message, whatever. Is listening, to whatever. Is, in that queue will. Then be be triggered so what we did is we published that job ID once we got from the speech-to-text api, publish, that job ID and. Then every, 10 minutes and when this is configurable of course we have something called cloud scheduler, it just kicks off very. Simple pub/sub message or invoke one of your actions the the. Main point of this is that all of those job IDs that sort of get that get, aggregated here from 10 minutes or so in number, 5 are then, are. Then essentially, picked. Up by because the cloud scheduler, will trigger, a cloud function another cloud function in number. 7 so essentially that cloud function will check with the speech of text API and say hey are you done with my results do you have my translation, for me yet if, it does then what, we do is we store those results, here in number 8, and. Then what we'll do is trigger. Two more cloud functions and because. Again we're storing a file in cloud storage you can you, can use those handy triggers so, we use a trigger here to kick off the NLP API so then using our NLP API super. Straightforward we we grab the entities within. The, speech, of text so all of the you, know person places and things that are within that that that, text we, have access to those we. Also have access to the sentiment, relative. To those individual, items or the, sentiment in general for the entire the. Audio component, the. Other thing that we did though we wanted to do was use something called the perspective API and this would essentially decide, whether, or not you know the components. Or, the text that we're sending in is. Toxic. To conversations, this is something that it's, been jointly developed, and, Google's, contribute to from it from the YouTube perspective, as well it's, a third, party API but again just shows you a way that you can integrate your process, to then look at the the content you have is there something that is super, negative in this in this in this discussion, or in this audio cat that you may want to either tag or otherwise, know about, so. When, we when we put that file in there, both of those cloud, functions, those cloud functions call the api's and then, essentially, store. The results in cloud storage and.
Then We put a very. Simple UI which, is App Engine app engines is are essentially, platform-as-a-service. Great, for running super. Scalable or even, simple web apps you, just write the code and deploy it so we essentially. Build a front-end around that so, this is essentially the reference architecture, here and. I can spend a minute, are there any questions on the architecture, before. We move forward I've. Been talking for a few minutes, see. If anyone's here all right ok well, I can go through a quick, demo of it here I. Will. See if I can go through a quick demo should. Go through a quick demo ok so, again this, is, see. If I can eat this a little bit bigger here. All right a little, bit better this, is a very super simple UI essentially. Lists a series of files that we've uploaded and then, based on the results of those files and again we. Ran through this through previously, because with, the speech-to-text API takes about half of the time of the. Actual audio, length to actually process so I didn't want to do this live but. You we have the results here from. From, the conversion. Let's. See if we can load, it here there, we go okay so, again. Super simple API we, have both, the full transcript so this is the entire text that we got back and then. If. These. Are essentially, rating the, the. Types of comments and, the. Individual, comments and so you see these are all green in the case that none of them really from, you, know from, a from a negativity perspective, provide, as any indication, that they would be a negative comment what. What is interesting here though is that when you click on the individual, kind. Of component that is highlighted. Here this, is provided, by the speech-to-text api it, gives you the you know what this is this is the the, information that we get from the, perspective. API, it. Also gives you the start and end time which is super helpful if you want to go back and be able to look at that individual, content and then. From the NLP, perspective. It. Gives you a really. Nice sense of which entities, are, actually. In this individual, subtext. So, you can see you know some of the things that it pulled out of here some, of them are useful some of them are not so useful but, it also gives you then a sort, of a sentiment so if you look at the color coding up here it, will give you a sentiment, and, you, can use these for categorization, right because these are essentially the types of labels you can use. To. Actually categorize, that content, that you have so. We again built the UI just to surface, those and individual components, you could built these into the tagging, if, you were actually building a website. Okay. Alright, so that's that's, it for for the demo any questions. On the the. This. Particular architecture, or, the components itself. Yeah. A platform. If. Not I have one more section to go through as well go. Through that alright.
Good. Okay so this is a different. Solution so the previous one is something that I wrote with with, one of our customers. And our, professional, services team this one is actually a solution that has, been broadly put together by the product team it, combines a couple of different technologies, for a, very specific applied, use case, context. Center API and. The. Point of this one is that I, think you've probably all had some experience with the contact center either a chat, bot or if you have you. Know calling, with, your phone and you. Know oftentimes those, experiences, don't don't go terribly well you have to repeat your information a lot so this is a way to help optimize that, in two different ways one to help understand, and, provide an intelligent, chat bot functionality. Which. Can integrate into your back office system and pull a lot of data from. Your own systems and make it much more intelligent the second way is to to. Provide a phone, chat, agent which essentially, a virtual agent, which will respond, to be able to do the same thing as you could with a chat bot but it's using our text-to-speech. API, so, it essentially provides that and. Then our dialogue flow if you work, with that at all this is essentially, kind of like the Google homestyle type of back-and-forth using. Speech and then and. Then honing, in a specific, intention. That a user may had via, voice and. Then there's, a whole framework that's provided by dialogue, flow so if, we think about you. Know this, as a business problem the problem is these. These conversations, that you have with with, the contact centers really, aren't that great and they're not great for the user but they're also not that great for the company as well so they don't off, they often don't provide a great outcome or, they require a human, to get on the. Phone and speak, with you what. Otherwise might be a relatively straightforward or otherwise. Kind. Of a mundane type of question so. In. Order to enable a better economic conversation. There's really three main components which you see here the perceive talk and interact the, perceive is really what we there are speech attacks you need to be able to to. Take. The audio that you hear and convert that into text reliably, in. Order, to be able to respond. To that you also need to be able to take that text that you get back from your back office, systems or your, intentions, in dialog flow and be able to respond to that user in a verbal way and so that is our, text-to-speech, API and that's how we provide that and. The last one is sort of a whole framework around, how, you have, an interaction, with. Via. Voice or be a chat bot and, that, that whole framework, of doing so and that's what our our, dialogue product. Does, so. The, the, main components. Of where. This this contact center API, fits, in or. Essentially, what, I kind of describe here is a customer, either. Phone or chat so this is again the voice or chat and, typically. Where, this fits in is where you, those individual, users are routed into a customer contact center that. And in. Customers. Have a lot of different or users. And companies have a lot of different types, of solutions that they use here across a wide, range there. Could be IVR systems they could be you know there's there's many many different systems that are offered by many many different companies but.
Essentially, They. All have hooks into third. Parties, and external, systems to allow them to integrate, because that's how the call, sensors typically. Work so. Where the contact, center ai4. From Google Cloud fits in is essentially, right here with the contact interface we provide an API that allows you to call and respond. And. Two main functionalities, a virtual agent so, this, is the, this. Is the voice component. Of this so I talked, about you know having the the text-to-speech, being, able to respond in an intelligent way and in go in using. Dialogue flow to go back and forth and be able to answer. The questions within, the context, of the. Conversation that, you're having versus. Just be able to to, to answer, the specific context, with with the, answer without, any real context, so that's the big difference here and, what something that's kind of interesting here is agent assist and this, is even if you do end up, end. Up with a live. User and. As a part of the the virtual agent we, have this capability, which will be in the background and essentially, be listening, for the car listening to the conversation, and, the customer or, they call her in maybe, you ask, those questions, it. Will automatically go out to a knowledge base or any, third you know whatever type of system. You may have to. Look up information and, what's really cool about that is as the as the user may be asking. Questions the, agent says we'll be pulling up information from there from that knowledge base from the from the help center etc, and, pulling. That up there for the agent so they can be better able, to answer the types of questions you're asking, it's. A really pretty cool feature and then. What, you see here is the backend for filling in this is the integration with a lot of you know the third-party system before they you're an organization. System, so this could be their account system you want to look up the account, look, up the hours for, the individual location, it's a whole series of different api's. You can imagine that'll, be available that if we expose and if you are. Essentially. Your knowledge base as well so if you're looking for information within. The context, of commonly, asked questions this is a great, way to integrate, this into the context center API.
Okay. So I think we we, talked through this I mentioned, we have a lot of technology partners and the reason that we have partners here is because, these, are typically things. That companies already use and. The, way that this is kind of understood, is a way. To and drive, intelligence, into. This or be able to bring your the, data you already have in your organization into, into. The conversation, here okay. So, the virtual, agent is the, the text-to-speech bit, and so this is this uses the same IP API, same. Style API as the speech-to-text in, the sense that you, send an audio file for for, the. Speech of text you, do the reverse right, and so it you can do things like send in your. Text it will it will it, will. Respond, in a series of different languages languages. So there's 30 different languages not and someone. On the order of 180 different voices there's a lot of different, components. That are available, there, which, is pretty cool the. The agent assist that's what I talked about being, able to reach out to third-party systems and pull, that back, and. Then we. Have a third part we're just working on now the insights, API which is essentially, looking. Looking, at all of your your, data that you they're essentially. Generating, by running the system and, be able to pull some metrics and we'll. Pull, the trends out of that. So. What does it actually look like if you're if, you're using, this I so. The. The first part here is the you, know a customer call comes in into, a. Telephony. Center and these are usually the. Some of the partners you just saw in the previous slide. Cisco, this is typically like a sip, type of. System. And then. That, gets routed into, our cloud speech-to-text and. Our speech to text is, can. In real-time take. The streaming audio that you have and, automatically. Return, the text here in this case it's going to. Malaga. Flow and the dialogue flow, as. I told you before it's kind of the context, of how do you have this conversation and. Back and forth and you can develop the, the. Dialogue flow, flow. With. This. Is the specific of your business case here so in this case you. Have one you've got the virtual agent here so again, if this is the the audio component here we, will use the the, cloud text-to-speech. To be able to respond, directly, to that to that user and stream the audio back, to the call center so, essentially you, know the users, interacting. With, the the IVR but really it's loading, this agent in dialogue flow, looking. Up some information and then responding, to that user with. With voice as well. The. Second part of that is the the natural, language, understanding here, so this is. Building. Outs what. What kind of context, what, kind of context, and you pull out of what the user is saying so again is using, the speech of text is taking that speech to text API and look and looking, up the the results of that in in. A. In. The database or other. Third. Party you, know it could be within, an organization, you, know their accounts etc looking up that data and. Then responding back and. Then you. Have the the full session, of transcripts.
Of Your speech. Text so you can have the sort, of the back and forth between that customer, you can use for for, training and additional so, I think the, the key components, of the platform and why, this could. Be difficult, in the future or difficult, if you didn't have this, is this, these components, right here, represent, a pretty. High barrier, to actually. Providing. This this intelligence, right so, if you already had the data that or the text it, wouldn't be that, difficult to then go and build in the intelligence, for. That but, if you but. The api's, themselves, provide a lot, of value makes it really simple to be able to integrate and so that's the whole the whole point of this here and. And, so if you've if you use our api's you, know between the authentication, the the developer. Experience they're, relatively straightforward, and. You. Know compared to what you would need to build for a. Comparable. Type of architecture, you start way ahead though. You have some limitations as I mentioned before about you know maybe domain-specific. These. Models are something models that we maintain as. Google Cloud and allows you to use them pretty straightforward, in a pretty straightforward manner just like any other API which is really the power of the platform here and, then, the the other component, is the dialogue flow in, order to build something that has sort, of a context, and a conversational, type of context it, takes a lot of effort to be able to do that it's, pretty straightforward to write if-then statements, or to, write, switch statements, in as, a part of your application to, handle, different components. But being able to have the context, of the conversation and then, use that later. As you're. Answering. Questions, or doing searches is something that really adds, a lot of value so, if you were to develop that it adds I would certainly add a lot, of time so I think those, are the three components that as we, built this solution seem, to add and offer a lot of flexibility. Offer, a lot of value to people as you use them as well, again. It brings it back to the developer experience so. Very. Quick example. Here, from an architecture, perspective if if any of you are interested. In architecture, but. Generally. The same type of flow customer, you have various, channels that come in and they'll. Come in either via, via chat, or via, the, via. Voice, and. This is how we fit, in here so nothing, I think I really want to highlight here other than you, know we integrate. With with different partners here. All. Right. Ok. And, this is the interesting, thing about the insight, so if you if you have access to these. Conversations. That you've been having in or that your users have been having with your call center or your, virtual, agents you can actually do something interesting over time right you can start to notice that all call centers do this they keep, track of metrics and, then take. Actionable, measurable. Actions. On that so one. Of the things that we did is, for. The insight API it. Essentially provides a framework for use, a hook into and this is a very similar type of architecture. That you saw from a service perspective we, use the whole set of service, functions like cloud functions and, our speech-to-text api we. Also have a data loss prevention API so as as our as, our chats flowed. Through here we ran all of the all of the chats. Through, this. Particular flow again, this is server list and we store that in, bigquery and, then, we we, essentially, expose, big queries our server. List you know data warehouse if you haven't worked with it before so we use that as sort of the component, that the data house component of that and then, we built some. A whole bunch of different dashboards. Using. An app on top of App Engine but, the point here is that we were able to gain, a lot of insights into like. The the the. Durations, of the calls what, types of calls they were the. Percentage, by which you know the the, automated, or the agent was able to respond. Versus. Had to call in a human the, extent to which the, customer, rated as satisfactory, afterwards. The number of engagement, with with with chat BOTS versus, you know in person or, or voice calls, so, we're able to pull all of this together, and, again we built this this is a pretty straightforward it's, it's an event event-driven. Architecture as, well and. It's also service let's makes it really easy to be able to spin up and spin down and not, worry about sort. Of the consumption, or the ongoing. Costs, so.
That Was also a pretty interesting component, of that so. I think that that's all I had to talk about today mainly, the two projects, that I worked on from from a service perspective, from. An FML and AI developer. Perspective.