# Stanford CS230: Deep Learning | Autumn 2018 | Lecture 6 - Deep Learning Project Strategy

Show Video

Is. A sensitive. Device for recording, these very very high frequency changes, in the air pressure and this plots, that you see an audio is just what is the air pressure at different moments in time right. But, so given, a. 10. Second clip, like, this if. This. Is the 3 second. Section. Where. They said Robert, turned on then. What you would like to do is to build a nest. Lam say they, can sit here and the lamp is turned off turn it off turn. It off turn off, turn, off turn, off and at. The moment they finished saying Robert turn on you. Know you turn it on so this is a open, table Y really right. And then. And then it's not detecting, afraid right so so what you want to do for, the trick word system, is at. You, know pretty much the moment, they finish saying Robert, turn on. You. Want your learning algorithm to output, a 1 that's. Your target label, why I saying yep I just heard this trigger word and. For, all other times you want it to output 0 right, because cuz and, then the 1 is when you decide to turn, on the lamp at that moment in time right, so, to. Collect the data set here's. Something you, can do which, is, collect. 100. Audio clips. Of. 10, seconds, each and. You. Know when, I'm prioritizing my, work home my teams work I would really you, know look at these numbers and think okay let's, say let's say you actually if you are doing it let's say you are running around Stan's and you. Want to collect a hundred audio clips. Maybe. 10, people 10 clips per person or maybe a hundred different people, I would. Actually estimate. You, know if you go to Stanford cafeteria. How. Long does it take to get one person, right and you can pray get one person every minute or two if you go to a busy place them on like, a Stanford cafeteria, so, you can pray to get this done in like a 100. To 200, minutes late or three hours right it's not that bad so you get this done quite quickly, and. So. And lets you collect 100, audio, clips and actually, for them for the purposes of today. Let's say, you. Collect hundred, or do clips to use for training. 25. For. Your dev set. And. Zero. For the test set right it's actually not that uncommon, if you're building a new product they just not have a test set because you go is to build, something that you're convinced is you know just early, prototyping. Phases of the project sometimes I don't bother with my test set if you if it goes the function, paper then of course you need a rigorously, corrected test that but if you're just building a product and you don't need a rigorous, evaluation sometimes, you can just get started without doing with a test set right those pretty little get started, um. And. Then. All. Right so. Taking, that audio, clip from above. One. Thing you can do to, turn this into supervised, learning problem, is. To. Take so, you know the phrase, robber turn-on can be said in less, than three seconds so let's say you take three seconds as the duration. Of audio right so what you can do is a clip. Out so. Let's say here was when Robbie turn on my set so, what you can do is, the. Type of. What. You can do is then clip, out different, audio clips of three seconds so here's, one audio clip you. Can take that audio clip, this. Is X and the, target label is. Zero because, Robert, turn-on was not set, and. You. Can take I know. This audio clip a different, random nature clipped, with, three second clip and that clip. Also as. The toggle a buzz rope. And. You. Know for this one right. Which is a three. Second clip there. Come that that that ends, at the reel on the, last part of the on sound you would have a target, label of, one right.

So, And when. You learn about sequence, models there are and ends you learn a better method than this, explicit, clipping but for now let's say you take. These on audio, clips and turn it into please. So take a ten second clip and by, clipping around different windows. You can take. Your, let's. Say 100. Clips. And. Because. For each 10, second clip you can take different windows, you, could turn this into let's, say, 3000. Training. Examples, right so here I took a 10-second clip and -. And, show you, know totally, three different. Three-second. Windows but if you take thirty three second windows then. Each 10-second, audio clip becomes thirty examples, and now you've. Turned the problem into a binary classification, problem where you need to train a neural network that inputs. A three second clip and they, build it as, either 0. 1 right those, mean this and so this, is an example of. The. The more, complex, pipelines. You might have if you're, building a learning algorithm to take. A continuous. You, know audio, detection problem to turn into the binary classification problem which, you've learned how, to build barriers near networks for right, and again we learn about our own ends you learn about other ways to process sequence data or temporal data okay. So. Um, go. Ahead. Oh. Is. This manual even yes. I I would yeah actually if you have a hundred examples. It's. Not that hard to just listen to it you know on your laptop with some audio playing, software. To figure out when, when. They finish saying or Robert, turn on and then at that moment to. Put a 1 in the target label because this is really when you want the lamb to turn on right. So. Any. Other questions actually few fideos clarifying, questions yeah go ahead oh I, wonder if this is gonna cost a problem that ones. Are tools bars, oh sure. Let me get back to that. And. Things. If. There are specific reason, we only train them with three seconds. I. See. Yeah why, do we do three seconds there for five seconds there's a yeah, because there's another hyper prompt you can test so. I can go oh no, oh. Yeah. You, have to stand really slowly to. Take. Three. Seconds is this right. Reabsorb, it turn. On. Right. So okay this is a design choice yeah. Um. Alright so, so, um, let's. Say you do this feed. It to supervised. Learning algorithm. Train a neural network um, and. Let's, say that when, you classify, this when. You run this algorithm you end up with. 99.5%. Accuracy. Right. But. You find that the algorithm, has zero. Detection. Right. And. You know and what I mean is that whether. Audio, you give it it.

I Would, probably sample. Another. Thing you could do. You, know India in the interests of. Speed. Even. If it's not the Matt Matthew most good most, sound thing to do is to change. The target labels to be a bunch of ones after. That. And. This is a hack this is not formally, rigorous, but, if you've implemented, the rest of this code already this. Might be a reasonable you know a little bit hacky thing to do but this is just this is this might work well enough right I would, I might. Not I don't, know if I would want to try to you. Know write an academic research, paper with this method maybe and get away with it but this little thing that I think if you try to publish your paper with this academic. Review is my raising eyebrows and say maybe, now, maybe it is okay but I think if you're one, something quick and dirty that just works I think, leaving. The ones changing. Back to labels to be one, so that say, a clip, here. Right. Oh that. And it's just a little bit off their proper turn on the still label one that'd be pretty reasonable it doesn't be saying that. For. Anywhere, within. Maybe, a 0.5, second, period after, Robert turn on finish it's okay to turn on the light any, time within that period then you kind of want to be, turning on the light turning, on the lamp you know say within half a second right, after Robert. Turn on this has been said, anyway. And this. Would be a not, this. Would be a way to just get more labels, of ones, in there, right. How. Does that translate to like when you deploy this you're not gonna see Robert. Turn on as much right like, one out of 1,000, might be reflective, of what you expect to see, yeah. Yeah. Right so I think that uh. By the button. So. If you actually yes, so well I, this. Is of a deaf set and evaluation, actually kind of question right so one. Of the couple of the metrics that people often use when. Actually working on this is when. Someone says Robert turned on what. Does the chance that she wakes up or the lamp, turns on and then the second is if no, one is saying anything to the lamp you know how often does it randomly, turn on by yourself without, you having said anything so, those are the two metrics people. Actually use and and sometimes. You could also try the combined in a single number evaluation metric or something, but. I think that you. Could tend to find the data set to measure both of these things and then and then hopefully find a way to combine them into single real number which. I think yeah and I think one of the ways you talked about in, the in the videos as well great, so make sense yeah. But I think I think the. Question, is really Oh. What. Is it this satisfies, a user need right and, Owen. Just want one thing about dumb the. Straightforward way of rebalancing is that if you don't do this then. Your whole dataset just as very few positive, examples, right. And. So, if you throw away all, the, negative examples. So that you cut down the number of negative examples until you have exactly, equal numbers of positive and negatives you've, actually thrown away a lot of negative examples this, make sense and so one, one one problem, with the straightforward way of rebalancing is, that you, know in. Your audio clip and your test ten-second. Clip they were collected by running around Stanford um, you, have one example of, Robert, turn on and, so. If, you want exactly. Perfectly. Balanced positive.