# Minigo: Building a Go AI with Kubernetes and TensorFlow (Cloud Next '18)

Show Video

After. Alphago, their, second, paper was called alphago, zero and alphago. Zero describes, a. An. Algorithm. That started, from nothing from, random, noise and. So. And. Then went, on to teach itself how to play go so, mixing, in alphago, zeroes, paper, plus, my friend Brian's. Implementation. Of mugo we, get min, ago and if, you're curious about why mini goes logo, is a happy, looking robot falling. Off of a ladder all. Will become clear I will explain. Why he looks completely. At ease with this situation where he's falling. Off of a ladder alright so let's. Talk about what the game ago is how many people have played go anybody, here, wow that's a lot of people how many people like to play go, yes. That's exactly what, I like to hear I I love, go I've been playing go for a long time I really like it so we're gonna do a quick quick demo of what go is this. Is what, it looks like at the beginning people. Take turns putting stones down on the board trying to surround territory. When. We talk about capturing, capturing. Looks like this when stones are completely surrounded they get taken off the board that, also works on larger, chains of stones so you can see that groups. That are connected orthogonal, II share, their fate. Where, they will hang. Or hang, or stand together as it were. The. Winner is decided by as you try to divide, up the board. Whoever, has more, territory. Wins. That's, it so it's like you're drawing up lines on a map you're carving up I get this you get that you. Can see on that on the right, diagram. There that white has surrounded all the triangle points and black has surrounded all the squared, points. This, means that go is not really a sort of absolutist, game where you have to like capture the enemy king or. Completely. Destroy the enemy our enemy it's more like you're, negotiating. An agreement well you just want to get a little bit more than, the other person. And this is an example of what a whole game might look like so this is this is mini go in action you can see it. Sketching. Out territory, starting in the corners and this, pattern that you see happening right there that is called a ladder, ladders. Are an. Interesting example of why go is hard so. This is a fairly straightforward pattern, you can see it develop again it's. A really obvious pattern, it's a pattern that you, know toddlers, can probably follow.

And Predict. But it's an example where. Go. Has a very long horizon effect where the result, of that ladder could decide the outcome of the game. And it may require looking 80 to 90 moves ahead and, so why, why is that hard so. A. Ladder, is this great example of a game where you have. Where. Even with a branching, factor of only looking at two moves for each of those cases you're. Already looking at you know a t-square, or two, to the 80th excuse, me possible, positions so that very high branching factor makes things really difficult the, games are really long the, end condition, is really hard to describe so with go you. Want, with. Chess when, the King is captured a checkmated the game is over and everybody. Can see and agree this game has ended with, go the game is only over when both players agree, that, there's nothing left on the board worth contesting, this. Is pretty difficult if you're trying to teach a computer how the game is over in fact just scoring the board. Just. Knowing that it is time to score the board is a really hard problem and, then, lastly, and this is possibly the most important, part is that it's really hard and go to determine, who's winning in the, middle of the game. This, is, one. Of the so, this. Problem. Who's winning who's going to win one, of the hardest problems we have dealing, with the branching factor that there's so many possible, moves at every point one, of the really difficult problems that we need to solve and. So let's. Take a step back and let's talk about how we're going to approach this with machine learning so this, is five, slides about machine learning obviously. This is not an exhaustive. Explanation. I should, mention also this is probably good time none of us have PhDs, on the mini go team so. There's probably people in this audience who understand this better than I do but. Bear. With me I hope that this will be enlightening. For folks who have maybe not done any machine learning before at all so, real, quick inference, for neural networks so the, basic idea is we're going to put in an input, and get out an output and that thing in the middle is the. Model, that we talked about we, don't really want to worry too much about what it is except, for. We. Need to know a couple of things about it the first is that it's a bunch of math that is differentiable. Or close, enough to differentiable, and the second, is that it's really slow like. Slow on the order of milliseconds. To evaluate, and why does that matter that it takes milliseconds. That seems fast well because you maybe need to do thousands, of those before you can decide on a movie play so. We're gonna consider neural networks for inference, inference, meaning this forward path where, we start at our input and we get out our decision. That's, what inference is it's also called a forward pass, we. Need to know that it's slow and it's differentiable. Alright. So how. Do we create that model. That thing in the middle well, what we're going to do is we're going to try and quantify the error where we have, inputs, that we know and outputs.

We. Were about. 40, times slower than that which, is okay for a solution that was Python and for, our accelerators, which were not TP use. But. 40 times slower meant, 40. Times slower they were able to, Train. Play five million games in three days and, if, we are 40 times slower than now I'm looking at doing that. This taking three months so. That's a, little challenging I needed to find a way to do this more faster so. Before we do that let's take it a little step and say that as I'm setting. This up on GPUs and I'm realizing this is going to take months to run I really wanted to find ways to verify that everything. Was working so with containers it was really easy for me to make. Variations on the jobs I was running and really try to. Run. Other sorts of evaluation, matches, it was really easy to use the kubernetes engine API to. Spin up jobs that were variants so I could test different version of models make sure that I'm making progress. This, was a really important thing that I'm going to come back to later and, when I say measuring performance I am measuring performance on my tasks which is are my models actually getting, better at playing go, that's. A real question so. Early. On I. Knew. That this was going to take three months at, the point that cloud, tea pews were, being. Developed I said hey you know this might work really well if I could try running this on cloud TP use and the, cloud TPU team was pretty enthusiastic and, they said yeah sure go right ahead um, but. They were so, much faster that it meant really. Really. Written my pipeline, for. Reference the 2000, sheepies that I was using we're a few generations old so I'm, deliberately, avoiding, making any sort of direct. Numerical, comparison. But. Suffice to say that what. Was previously fine, in Python, was, now no longer fine, using. A cloud TPU there was no way that the code was going to be fast enough. Which meant that if I was going to be able to use these TP use effectively I would need to, seriously. Rethink how, this pipeline was going to run, when. You're planning on something taking three months and now you're looking at maybe a week or two you. Have some very, different constraints, about how long you can take to pre-process, your, input data how long you can take to you, know lazily, push out, the. Results, of the new, models all that sort of thing so what. We had to do is we had to rewrite 40p use this. Code is Monte. Carlo tree search this is the short pseudocode. For Monte Carlo tree search and I'd like to draw you attention to that line that says neural net evaluate. Leaf, dot game state because. That is the one that. Suddenly. This this whole thing needed to go in parallel, and. It, needed to go in parallel a lot, faster, than it could and so. This, was the part that the engine, that would do this rapidly, was the part that we needed to rewrite so. That's, what it looks like on a single, threaded version this is pretty, close to what the Python code actually looks like in. Mini. Go today, and. So. Rewriting it for TP use involves. Involved, breaking this out into a multi-threaded. Version. Friend. Of mine volunteered. To do the C++, rewrite and. He. He's, probably not gonna like me telling this story but, he was able to write this complete multi-threaded, implementation.

Make Sure that you are able to isolate the parts of your system and make sure that each of them is doing what you think does that kind of make sense great. Awesome and, lastly, I kind of want to quote. About. Why I'm really excited as a go player that all of these things are finally taking shape, I started. Min ago and after. The alphago paper was published which I think was, November. Of last year and, since. Then Facebook, has announced an open-source version they've, just released a model. \$0.10, and other. Chinese companies, have been, working on models which they've released with various degrees of openness. And. It's. Been very exciting to have these and an open source project called Leela zero has, also, been done where they've been trying to crowdsource all of the GPU compute needed and. It's been really excellent. To have, all of these different folks, try to reproduce the paper with varying, amounts of success and its, really wonderful as a go player to have access, to these essentially, Oracle's. Go. Players like to think of playing a game of go as having. A conversation talking. With someone we have a great proverb, that playing. A game of Go with someone is like living with them for a year and. In that case we have this. New thing that is. Saying. New, creative, ideas to us that we haven't really understood before so if. You are interested, in learning more about the game of Go definitely. Check it out online there's a lot more resources and, hopefully, it's going to be a lot easier to learn now that we have. Ways. To understand, it we. Do try to hang stories, on our moves that we play and. It's. Going to be a little bit easier for us to sort of do that and understand that as we have better tools to dig into so, thank. You all very much a big thank you to the. Folks who have helped Jenna donate, their time to work on min ago, that's. Tom Seth Brian and Josh they, all have, been instrumental in making me go possible. You.

2018-08-01 17:30

Show Video