Your Apps Can Talk! Introducing Cloud Text-to-Speech, Powered by WaveNet Technology (Cloud Next '18)

Your Apps Can Talk! Introducing Cloud Text-to-Speech, Powered by WaveNet Technology (Cloud Next '18)

Show Video

Good. Afternoon my name is luis carlos covo I lead, the wavenet. Team, I did. Mine for Google, in Mountain View and I oversaw. And worked on launching. Wavenet, for the Google assistant first in October last year, and. That was also launched to clout later. And. I'm. Going to talk a little bit about what. Is wavenet how it works why. It is better. Than, previous approaches, for text-to-speech. And. A little bit of you. Know how. It compares with a with a with the other methods. So. Before we had wavenet, there were the. Previous approaches, to text-to-speech, basically. Fall on two categories. They. Usually. The highest quality one was, concatenative. And in this type of text-to-speech. Approaches. We take a large corpus, of, speech. From one speaker and we divide it on small slices of a free my a few. Microseconds or sorry. Milliseconds. And then, the problem of text-to-speech is just a problem finding the right pieces of speech to reconstruct, what you want to say on. The, other hand with. Parametric approaches. We. Build. We have a linguist build a model, or, a mathematical, function that approximates. How. The human. Vocal tract works and then, we learn sometimes. Maybe using machine learning how. To drive that model to be able to say the things that we want to say and to assemble the speaker that we want too much. So. Both approaches, have come a long way and, they they are good for many. Applications but, they have some limits. In. For, concatenating methods the. Main issue is the amount of data that you need to for it to sound good you. Need several thousand hours of data, so. That you have enough pieces to reconstruct, anything, that you might want. To say if you don't have enough amount. Of speech you, will have to use about Timmel slices. In some situations, and that will sound. Bad. There. Will be noticeable. These are all glitches, in the audio. The. Other problem, is with. Scaling. Something. Like concatenative. Doesn't, allow you to. Generalize. Across speakers, or to do transfer learning because, speakers between, speakers if you have a corpus, of speeds for one speaker you want to add another, one you need as much data as you need it for the first the. Same applies, for expressive. Speech if you want your speaker to be able to produce. Speech in a range of emotions you almost need to duplicate, the, whole data set for every one of the emotions that you want to say on, the other hand it sounds pretty natural, because, you're, using real speech but it has these these limitations. With. Parametric, usually. We can get away with with, fewer, with, less data because we already have a model of how. The there. Is a good approximation of, how the human vocal tract sounds, the, problem, here is usually, that the speech, that is produced, the, quality of that speed is limited by the quality of the model that you have and these are never perfect, so, the speech ends up sounding. Somewhat. Noisy, robotic. And unnatural. Here's. Where wavenet, came in wavelet, is coming. In as a parametric, model, on asteroids, where everything, including. The human vocal tract, it's learn in an implicit neural, networks, in tation from. The data that you have therefore. Your, the quality of your speech is no longer limited by. The, model, that you have but but by the amount of data that you have the, quality of that data and maybe the computing. Property you have to process that data. Also. This. Allows us first, to produce speech that it's even of higher. Quality than in its election for, the same amount of data. And. It. Also allows, you to do transfer learning across, across. Expressive, styles and across speaker so if, you have a large corpus for a language for, one speaker you, don't need as much data to introduce, a second speaker because the model can learn a lot about, generalities. Of how, humans speak from, the first data, set and just you just need to set up a little bit of seven to fine-tune to, learn the particularity.

Of Another speaker so, this allows us to build, more, voices, and. Provide, more more, speakers, and more. Emotional. Or specific styles for cloud customers. So. Demo. Time we'll see if this works so we can hear, some, real examples from, Google production systems. So. We're gonna, hear first the unit selection sample. A. Single. Wave net can capture the characteristics. Of many different speakers, with equal fidelity. So. As you can hear, it's. Pretty natural and maybe, it's a bit hard to tell it without headphones, but there's a few releases, that was saying wavenet can capture the characteristics. Of many different speakers, with equal fidelity. Compared. With that wavenet a single. Wave net can capture the characteristics. Of many different speakers, with equal fidelity, now. It's fast so. We, don't have those bleats is there it, may be a bit difficult to tell the difference with, without headphones I spent we, spent a lot, of time dividing these models with the headphones, on but, there's, a real. Improvement. In DIF that you can really. Improvement in in quality. That you can measure in. Some ways that we'll see later with. Respect to parametric, the difference it's much. More clear I think it will be evident even through, the through the these, speakers so. This is one example of parametric a. Single. Wave net can capture the characteristics. Of many different speakers, with equal fidelity. And. Wavenet, a, single. Wave net can capture the characteristics. Of many different speakers, with equal fidelity, so, this is where it's fast, so. These were two models on the strain on the same speaker on the same amount of data and you can see that the wave, net1 sounds much more. Natural. In, another, language enjoy a, bonito no media no cotton. Re vos announced, a hawk Aoki who I met a seka Connie Madero Qureshi mas. Tata. Nova bonito, no me de casa, no cotton Oliver she announced a hockey, yamete sake econo-mode arrow today Chema's, kong-chi. Wavenet Tonio won't say how guess sao gate technique osaka's master, all.

Right Now. You know how to say wavenet in japanese. Okay. So what is waving it waving it it's a neural. Network that produce, speech the. Main thing. That you have to know is that before that before wavenet came around if it, was thought that it was not possible to generate, speech sample, audio sample by audio sample, directly. With a neural network because, the, the time. Dependence, between. In. In the in the audio when. You have a high sample rate like 24,000, samples, per second is just, too high for, most, for. The model you would have to train so, how we did, we do it so what you need to do to know is that when it's in Auto regressive, neural, network with dilated convolutions. What, does this mean Auto, regressive, means that every sample that you produce goes. Back as an input to the system to produce the next sample. That. Way you create, continuous. Sound that sounds natural. Convolutional. Means. That. Instead. Of connecting, all neurons from one layer to all neurons from the neck to the next. You just connect its each neuron to a couple, or, a small number of neurons from the previous layer and you, reused the, the. Weights of that connection in. All neurons. That have the same relationship, that makes the model small enough to, be for. Us to be able to learn, it officially. And finally. The secret sauce is the dilated, part what. We do is that these, connections. That we reuse, as we go up through the layers of the neural network we, make, the, the neural star connected to the one in the in the following layer. There, are spaced apart, further and further exponentially. So. What what this gives us is that the, the, the. Time. Range. That that can influence a sample grows exponentially, with the number of layer stuff, linearly so, keeping, a small model with a moderate number of layers we. Can get a, very. High receptive. Field that allows us to learn a model, the, long-term brain let the long-term relationships. That we need for the speech to sound natural. So. This was a very this. Paper, described. His work him up came, out in October 2016, I make a splash because it sounded much more natural than previous system but. It was terribly slow mostly, due to the outer recive, nature of the model you. Need to do, you. Have to run a fairly complex neural, network just to get one sample of audio and you need to do that 24 thousand times to. Generate one second of audio and, there's. A lot of optimization. That you can do but it's it, was it is not feasible, with current hardware to read it fast, enough.

For. A real-time, text-to-speech. System that usually, needs, to run orders. Of magnitude faster in real time. So. What did we do so this. Motivated, a, new. Line of research that resulted, in a new paper that, is roughly a year later after the the, work was in production called, parallel wavenet so, what we do is that we train once we train the original wavelet model we, use that as a teacher for a second neural network that is much faster, this is neural network just States in a vector, of noise and transform, it to sound like the speech that we want to sound to to to generate that. Generated. Waveform, is then passed to the already trained wavelet model that's course, how. Likely this, is to be speech. From, a human. Being and at the. Or, from the particular speaker that we want to replicate actually, at the beginning this network produces, just random noise but little little, by little it learns how, to produce audio, that pleases the. Original. Wavenet, network and that sounds. Basically. It learns to imitate the, original the original network the, good thing why we go. This roundabout. Way is that this new need this new network its, feed-forward which, means that in one single path you can generate all the utterance you don't have to go sample. Sample anymore, and. It's. Not only faster but it can be parallelized, so you can chop an utterance in many pieces send. It to different processors, different computers, and assemble, it and that's. What allows us to get the latency that we know that we need to make this available in production systems. So. This allowed us actually, to increase. The. Speed of wavenet, by three orders of magnitude so. We went from. Generating. 20, milliseconds in one, second, so significantly, slower than real time to. Generate in 20 seconds of audio in just one second so 20x, real time and. That, was enough. For us to use this in production. And. Even, though we gave up on the auto regressive nature and, and that can have an effect. On quality even this parallel, wavenet still closes, 70%, of the perceived. Naturalness gap between, synthesized and real speech here. You can see these. Are most coarse which are basically. We have blind tests in which we send audio of audio. Samples to people and they score how natural they sound from, 1 to 5 5, in the maximum, usually. You don't get a 5 because people, always think that maybe you're trying to fool them actually. Real speech usually gets around 4.5, and we, are able to push from. The the. Gap between synthetic, and unreal. Set. By by 70% and we believe that you know further changes in processes are gonna allow, us soon to fully, close that gap. Thank. You so much with this I will pass to. Dan Aharon which rpm in, Google, Cloud TTS thank you. Thanks. Hi. I'm Dan I'm product, manager and. Cloud ETS likely we said and a couple other products in cloudy I. So. I wanna. Tell. You a little bit more about cloud text-to-speech. So, first. You. Know this, technology. What. Is it good for so three. Main. Use cases we see, used one is in, call centers for automated. Voice responses, so you. Know a lot, of, IVRS. Today, interactive. Voice, response they, need to pre-record. All, the prompts, so. That they can play them when. People call in with. With TTS, they. Can now generate. Them automatically, and they don't need to pre-record it, it's. It's. It's. They. Were forced to pre-record it because synthesized, speech was was not really, that good up until now but, now now, it suddenly becomes good enough and the other benefit is you. Can you have much more flexibility, in language so you can insert entities, that change, instead. Of having one. Script, that was recorded three months ago and then you can't deviate from it.

Second. Thing, similarly, in. IOT, if, you want to talk to devices and have them talk back, it's. It's a very it's, a very useful thing to have so that that you. You know you can have conversations with your users and last, but not least media a, lot of sort of written media can now find. A new form in audio. And you're gonna hear from Deanna, a little, bit more. About that in shortly. So. Then, Cloud. Texas speech was introduced, three. Or four months ago it's. Part of our conversation, group in the building, blocks and and part of our cloud AI portfolio, so, if. You haven't already definitely. Recommend you check out some. Of the sessions for other, products there's a lot of really. Cool products in in Cali I. So. Cloud TTS as I mentioned was introduced, late. In, late. March and, it's. It gives everyone, the power to use the same TTS that Google does, and. And, that, includes using wavenet, voices. You know we. Fortunately. Have the, ability to run stuff on GPUs. And other things than we can produce, machine, learning based speech. Synthesis, API at scale. It's. Really easy to use you're going to see that in a little bit and it's, pretty flexible you can use text or XML or, do all these other things so. A. Few, new, things we have for you today. First. Wavenet up until today has only been available in English us, we. Now have seven more languages, that are available so. That's. That's, pretty big it's been one of our biggest, requests. From users so we're very excited about that, so. It's now available in English German French Japanese Dutch, and Italian French, is not live, yet I think it will be live maybe next week or pretty. Soon. The, second thing is audio profiles and we're gonna talk a little bit more about that so I'll come back to it. So. This is now our full portfolio and voices. 14. Total, languages. And variants and, there's. 30 standard. Voices and 23, wavenet voices, so across, them you get a you, know reasonably, global. Coverage. With you know a few pockets that are me saying that we're working on on uncovering.

Okay, So the second thing we're at reducing today is audio. Profiles so. Up. Until now text-to-speech, produces, a single, WAV, file and. Then. As. A developer, you can use that WAV file to play anywhere you want to whether that's in a, kind, of a tiny speaker, whether that's on headphones or, whether that's on a phone line or anywhere, else. What. We found is that the, the quality, of the speaker, or the. Attributes. Of it can, have a pretty big impact on the quality of the. Sound that comes out and. So this. If. You want to aspire to get the best quality you should actually have. A different wave form. That's, sent to each type of speaker, so. Starting from today you can actually provide. This audio profile, you can tell us whether, it's going to be played on a handset. Or on, on. A. Home. Entertainment. Device, or on, on. A phone line and then, we'll do the proper, adjustments. So. Here, for example. This. Is an example WAV file and. How. It looks like on a phone line so you can see all. Of this area there on the left, and. All of this trouble. Area on the right you don't actually hear, them on a phone line so when you when you try and play this WAV file it's. Gonna sound, distorted, because you're missing a lot of the information that doesn't get carried across, and. Sorry. What we're doing with audio profiles is were compressing, it from the sides into, the middle, for. For, this example for phone line and, so, you can see now the waveform looks like this and you, get much more information there, in the middle, which. Which, sounds better and so if. I were to play it on my laptop it'll. Actually sound worse but, that same wav file when you play it on the phone it sounds better. Okay, so, with that let me. Go. To the, code lab and the demo let's let's start from the demo first. So. Text-to-speech. I'm, just gonna go to, the news. Let. Me make my screen bigger. Let's. What's, and beautiful. Let's. Use this article. And. Let's. Just copy this text and paste. It into our, new cloud. To speech text-to-speech. API and. I'm. Gonna first. Play. A regular voice so you get a sense, for how it sounds like. It's. Official, Carmelo, Anthony, is now a member of the Atlanta Hawks, for now, the three team trade sending, Anthony Justin, Anderson, and a 2022. Lottery protected. So. You can hear it's a little robotic right, now. Let's play the exact same thing in wavenet. It's. Official, Carmelo, Anthony, is now a member of the Atlanta Hawks, for. Now the, three team trade sending. Anthony Justin, Anderson, and a 2022. Lottery protected, first-round, pick via OKC, to, the Hawks Dennis, Schroder and Timothy, Lucca, Barrett to the Thunder and Mike muscala to the 76ers. Is official. So. You can see it. Doesn't sound a hundred percent human but, if, I weren't telling you that this is played by speech. Synthesis, and if you were just listening to it I would have imagined. This is an NPR, reporter, or something like it's official, corpus especially, as you get to the second paragraph here Melo Anthony, is now a member of the Atlanta Hawks, for. Now the, three team trade sending, Anthony Justin, Anderson, and a 2022. Lottery protected, first-round, pick via OKC, to the Hoss Dennis, Schroder and Timothy, Lucca, Barrett to the Thunder and Mike muscala to the 76ers. Is official. It's. A fish okay. So. Let's. Go to the car lock. Thank you. So. What. We're gonna see next is I'm. Gonna show you how, to take. An audio file that I have here, and we're. Gonna transcribe. It with speech to text then. We're gonna translate, it to a different language and then we're gonna turn. It into wavenet and play it so. There's. A lot of things that could go wrong and I'm not a very good Python, developer, so work. With me here and let's. Try and do it together, hopefully, we'll be able to get through it okay. So, this is the, this. Is the simple, text-to-speech. Example. That's on our website. And. So. Let's. Leave it for now let's, come back to it. Let's. Add some, code that.

Does Transcription. So. This. Is the. Speech. To text. Sample. On our website I'm, gonna copy over, these, important. Statements. So we already have our parse. So. We just need a OH and, then. Let's. Copy. All. Of this. Okay. And. Instead of client, it's called speech, client. Okay. And, now. Let's. Give, it a path. Slash. Users. Than. Documents. Audio. Slash. Gonna. Play this file. Hi, I'd. Like to help sorry not this one actually I, wanted. This one welcome. Everyone to the Google cloud next session for text-to-speech. Hope. You have a great day. Okay. I'll. Put speaker, you just eat this one. Okay. Stop. Wave. And. Then. We. Don't need Samoyed, house cuz it auto detects that the language is in u.s.. Let's. Add punctuation. Do. We have punctuation, here, no it's probably in the, beta. Snippets. Tong. Khoo. Yeah. Here it is, enable. Punctuation, equals, true so. I'm, gonna paste, it in here. And. Let's. Also use the video, model, thinking. You do that model. Are. We using the. Beta, speech, or the formal. One. Okay. We need this. So. It gets the beta so, let me make sure that we're using that. Google. Cloud. Import. Speech or speech okay. Great. So. Now. We have client, recognized, and. Then. It's. Printing. The response. So, it's. This. Let's. Just run this and see that. It's. Working correctly. And. Then. Here. Linear. Let's make this linear 16, and, call, it output. Dot. Wave. Wave. And. Then. Instead. Of this text. We. Can, do. Alternative. Transcript. Okay, look we can do that later let's, let's try and run this now. So. Python. Let's. Go to the speech at text-to-speech, directory. First. Python. Trying, our. Size. Let's. Just give it a text. Okay. So. As. No model field it's probably not using. The. Beta. We. Can probably do it without the model but. Let's, just see if there's, oh. Yeah. It's. Not. It. Should be speech, client. Let's. Write again if not we, can remove, the model. Okay. Let's, just remove the model. Maybe, I have a typo, there or something. Okay, let's try this again. Okay, let's. Skip that situation. Just use, the stuff that's not in beta. Okay. Line. Forty-eight, here. Something's. Wrong with the audio. Version. Which version. Yeah. It's because I'm mixing. By. Mixing the beta with, the non beta so. Let's. Just, let's. Just use the regular one. We. Don't really need the beta. Okay. Let's try this now. Okay. Welcome. Everyone to the Google cloud next session of the Texas peach hope you have a great day. So we didn't it's, not it, doesn't have punctuation so, that means the speech. We're going to produce will not be, as. Good but that's that's fine for now. Okay. Pick. A language guys. What. Are you feeling today from, one. Of the ones with wavenet, support like this. German. Okay, let's. Do German so. Let's. Go to. Translate. And. Let's. Look. At, the code sample, here. By, thumb you. Don't github. Will. Add these, imports. And. Then. Translate. Text. Was. A translate. Text. Here this. Translate. Client, and result. So. So. We've, done the recognition. Now. Let's. Translate. Translate. Client. We don't need this stuff. Let's. Translate, it to. German. And. The. Transcript. Equals. Basically. One. Response. Zero. Dot. Alternatives. 0. And. Then. Dock transcript. Hopefully. I got that right let's just print it to, be sure, transcript. And. Transcript. And then. Translate. The client. Translate. Texts, instead of text whoa I'd transcript. This. Is, translation. Resolved. And. Then. Let's. Input that text here. Okay. Oh sorry. I'm text-to-speech.

We. Should tell it that it's now doing German, instead of English, us. Okay. Let's, try that okay. Cannot. Import in here I think I need to go. And. Set. Up the cloud client. Install. Just. About the command. Okay, let's. Try that again, whoops. Let's. Try. Pipe. And come out again. Okay. Where. Is it. It's all curious okay. So transfer, people's response, zero. Oh. No. Its, response. In. Result, yeah exactly. Thank, you sponsor. Results, zero let's. Try this again now, all. The things that could go wrong in a live coding session. Okay. So. We, got we. Got it now there's just this translated, text thing. So. I. Think, it's square. Brackets. Right. Yeah. It's result translated. Text so result. Translated. Text. Does, anyone here speak German by the way how can we test if it's actually working. Yes. No. Yes. Okay, okay. So, this is the moment of truth guys. So. We. Are in. Speech. Cloud. Client, and. OH. Text-to-speech, cloud client. You. Should have this output wav file. That's. Played Vic. Listen Alan in there Google cloud next is it song for text in Shiprock Ahava, - hast uninsured and tak. Is. That right. Okay. Thanks. Everyone. We. Did it together with. Your help so yeah. If I can do it with my poor Python, programming, skills that really, is a sign that anyone can so. Please. Recommend. That you play around with it see what you can do so. With that let. Me welcome Deanna, thank you very much. Thanks. Everybody for still being here I appreciate that, I'm really excited to share, with you some of the information about. How we can use this amazing technology so, my, name is Deanna steel I'm the CIO at a company called Ingram content group does, anybody has anybody heard of Ingram pants, well, yes I know some of you in the front have thanked you for the rest of you Ingram, content group it be connectbooks with readers but what we are is we, are the global content. Distributor, for book related, content. And that includes physical books that, includes ebooks and, audio books it, also includes, providing metadata, to our customers, who tend to be retailers, through all channels and it includes ingesting, publisher, content so, publisher metadata and so forth and because, we have this ecosystem that relies on publishers, and retailers, what, we do is we provide analytics, back, to publishers, we, deal, with big publishing houses the big guys and we deal with small independent publishers, as well as independent, author our, customers, are retailers they're direct consumers, their libraries, and their educators, I'll. Go through this quickly three. Key themes we're seeing that make this technology really, reasonable. And relatable right now in the marketplace and where we think there's a huge opportunity so. First of all the business trends have lent toward this, this technology, really coming, to fruition. And really making a big difference the opportunities, we see span. Accessibility. And other areas. And then we'll talk a little bit about how we'll deploy the technology sorry. I'll, move here our. Innovation so today actually in 2017. We distributed. 217. Million audio, and ebooks. Around. The, world we, print new, books or a new book every six seconds and we print on our high-speed HP. Printers about, 500 books an hour. We span the globe if we were to look at all the physical and digital content that, we've produced we'd span the globe 1.2. Times, the. Business trends, so what's happening is barriers to entry have fallen away so several. Years ago if you wanted to print your own book, your own memoir or your own educational, book you would have had to write, the outline you, would have had to shop it to either. Publishers. Or other agencies, and you would have had to do that numerous times and suffer rejection hopefully, not a lot but it happened and the average lifespan to get a book accepted. And published would have taken six to nine months today, it can take up to weeks, we.

Have Worldwide distribution capability, so we support. 28. Facilities. Either office or distribution center and then have access to about 220. Countries. The, business trends so direct, reach and discovery, we work with publishers on strategy, so ideation, we've, dealt with publishers, and retails retailers, through channels for for a long time and that's our sweet spot so we help publishers especially, small to independent, publishers and, then independent, authors find, their way to fill direct distribution to content we. Provide analytics so we have advanced analytics platforms, that allow data visualization. And some, degree of predictive, analytics, different. Topic different time but we're getting into data science in that area and we have hundreds of publishers that rely on those platforms to, understand, how they manage and get their business, we. Provide discoverability, so because we deal with publisher, metadata and you're all familiar with medet metadata we, actually, allow publishers. To ingest their content and we help make recommendations, to them as to how to make their books more marketable and discoverable, which is a big deal if you think about publishers, or even independent authors very, often they're not really aware of how to be successful in that business they just know they want to produce that bestseller we, can help them do that and then finally the, metadata that we ingest we actually sell so, we've talked about monetizing, data in previous sessions we do monetize, data and that's a very important, part of what we do. Okay. So let's talk a little bit about business trends. 181. Million adults in the u.s. read a book per year who here is read a physical, book in the last year. Okay. Not surprising, for this audience and that kind of reflects what we've seen so the. United States population. Includes, about, 326. Million people and of those about 55% or, adults and of, those, 181. Million read so it's, it's resting, what we're seeing is that books in any format about 74 percent of the population reads books but what, we heard and what we thought over time would happen is that ebook distribution, would, eclipse, physical, books and we've seen that that hasn't happened so. What, we have seen though is that the audio book the. Listening, to the audio books has increased significantly and we're working with partners to ingest even more audio book content why, for. A few reasons. Business, business. Trends including accessibility, so for us accessibility. Is very important, when, we think about how we provide access whether, through ear prints those are two key, of, course, methods. But what we're seeing with, with, accessibility, is that. The US Census had. About. Nine. Million people identified, in, the. United. States alone that were either, hearing. Impaired or they, were deaf and it's. A challenge because only thirty, nine thousand, Braille, books had been printed if it's, a it's a very small percent of the population, and not everybody has access to Braille nor do they all have access to to voice readers. What that means is the percentage of the population that, potentially, we can provide access to is significant. Not only when we think about hearing impaired but also learning, appeared so, people potentially with dyslexia who have a hard time reading kids who, need that information translated.

To Them possibly, and also. People who for whom English as a Second Language and they want to be able to quickly, put the audio and visual, together so, we, find that accessibility, is key for us. I'm. Going to give you a, snippet. Of what, we believe to be really important, and a potential, for. Us to be able to make this text-to-speech, resonate, there, are a few things to think about here first of all obviously the success around text-to-speech, as you've heard has, to do with understand. Ability and the. Ability to sound natural in the, past when, we've heard text-to-speech. We've heard very robotic. Attempts. And you know Bell Labs and MIT. Have been working on this technology for, decades but, what Google is doing is really uncovering, that in that natural language sound and so we're very excited about that, well. We believe that book, discoverability, is, critical. And that. In. Fact partner says that about 30% of, all. Search, is going, in 2020, is going to be done via voice so it's going to be screen 'less so that's really important as we think about enhancing, book discoverability, and so forth, the. Demo we're gonna give you here is a snippet of a book by. Leif Enger it's a book called, Virgil, wander Grove. Atlantic we'll be publishing it in October of this year Leif, is a New. York Times bestselling, author and so we think this book is gonna do really well so. Imagine you're, driving you, hear an NPR segment, about this book and you think I'd, love to hear a segment, before. I go ahead and demo it for you I'd like to give you a little background on in context, on the book so, the book is written about a gentleman living, in the Midwest, he owns a. Cinema. And very, old-fashioned, cinema and he still plays reel-to-reel, projection. So, you'll hear something about being unspooled, his, life is a little bit unspooled right now and what. Happens to him will demonstrate, a little bit about how he picks it back up we will play you a snippet of the first. Part, of. The book. Ingram. Content. Ingram. Content. Alright. Getting the test version of Ingram content. Hello. What would you like read to you Virgil. Wonder. Now. I think the picture was unspooling, all along and I just failed, to notice the obvious, really, isn't so at least it wasn't to me a Midwestern, male cruising, at medium altitude, aspiring, vaguely to decency contributing. To PBS, moderate, in all things including romantic. Forays and doing unto others more or less reciprocally, if I, were to pinpoint when the world began reorganizing. Itself that is when my seeing of it began to shift it would be the day a stranger, named room blew into our bad luck town of greenstone Minnesota, like a spark, from the boreal, gloom it. Was also the day of my release from st. Luke's Hospital down, in Duluth so I was concussed and more than a little adrift. The. Previous, week I'd driven up Shore to a popular, lookout to photograph, a distant storm approaching, over Lake Superior, it. Was a beautiful, storm self-contained, as storms often are hunched far out over the vast water like a blob of blue ink but it stalled in the middle distance and time just slipped away there's.

A Picnic table up there where I've napped more than once what. Woke me this time was the mischievous Gale, delivering, autumns first snow i leaked behind the wheel as it came down in armloads, highway. 61, quickly, grew rutted and slick maybe, I was driving, too fast you. Too was on the radio mysterious. Ways I seem to recall. Apparently. My heartbroken, Pontiac, breached a safety barrier and made along lovely, some might say cinematic, arc into the churning lake. Thank. You. So. We can go on to purchase the book. Okay, we'll, go ahead and move forward let. Me spend a little bit of time telling you about the technology, um the technology is a combination, of Google's, wavenet. And Ingram. Contents, core source application, our core source system is our ebook, content distribution system, and we has over 18 million titles in it. So. There are two components as y'all know around. Audio. Text-to-speech. In the first part is the ingestion the second part is the distribution, so. The way this works is the, publishers. We ingest publishers, content we do that today again we do ebook distribution, we, bring that book content, into Ingram's, core source and then, push it out to the Google Cloud storage, the, cloud functions, so step, four includes three components, and includes. Translating. It to a WAV file and then storing. It in Google Cloud this, is for all new and changed, content, if you think about it book content, doesn't change that much unless, it's new or unless new editions come through the. Cloud functions, then pull that new content or the changed content, through cloud sequel, into, two of our technologies, one, area. Which is a tool that allows for written, previews, of books publishers. House it on their sites and it allows them complete to complete purchases. And the second is our IP page application, which is a business-to-business. Solution. The. Second part of it is also where the secret sauce happens right so the first part is translating, that text to speak the text-to-speech and, developing. That that wav file the, second part of it is when, the sample, is collected much like the sample you just heard we, bring it in and. Into, the dialogue flow Enterprise, Edition, and the system, parses. That data the, title, being the key passes. It through the cloud data flow into. Cloud sequel, which, then dialogue flow pulls that, sample, out of the. Cloud sequel, passes, it through to cloud functions, and then if transactions. Are to exist. To be exist or to exist we pull it back into Google. Express for, the conclusion, of the transaction, so we can bring it all the way through the sample and listening to that sample into, the conclusion of a transaction. And, we think that that's going to be really powerful so. In, conclusion I, don't want to get between you and dessert, or what have you we. Will talked a little bit about some of the business trends, and what's happening right now in the market we, talked a little bit about the changes from physical, to e to audio we. Talked about opportunities around, accessibility and closing the loop between people who have, needs. And people who and the systems and the technology, today and what we can do what Google's done with natural language and we. Talked a little bit about the technology and how we support it I think. We have about 23, seconds so I'll ask if there are any questions either for, any of us. All. Right. You.

2018-08-02 13:57

Show Video


This is great and all, but I couldn't wrap my head around finding a way to implement Google Cloud Text to Speech on an Android app.

take note of the "cloud" part

Other news