Eric Han and Medya Ghazizadeh: “You're Doing Kubernetes Wrong”
All. Right well thank you everyone it's great to be here and thank, you Lisa for organizing, such a great event in Robert it's, my pleasure to be here with Madea he, and I have been talking about this with some anticipation in the sense that we. Both have a lot of experience either from customers, or a time at Google or at Madea at Target. And we were talking about what are all the mistakes you can make so, in some ways this is motivated, motivated by the fact that we have these great new schedulers, this. Immutable, style infrastructure, that we're all striving towards, and wanting to be cloud native at, the same time we can all make simple mistakes just like anything else in life that, really kind of ruin it for the rest of us so. In some ways I want to keep this a little bit more of a shared dialogue I know. I've been accused at times of drilling all the way into the weeds so I'll keep this hopefully, at a high level but. We'll stay afterwards just for some more conversation. As much as you guys would like so, please consider this a conversation although we, have the mics but. We'll ask you to raise your hands share, your thoughts and in some ways think. Of this as an interactive family feud so, that the the topic really is how, not to or, as, one of my friends would say you're doing it wrong so, how not to run, with kubernetes, and so, there'll be some examples from my experience, some from metias and, we'll talk about it both from the IT as, well as the developers, point of view and so, you'll see us trade off back and forth on that and, this. Actually represents probably my college experience hopefully. It's not here today in. The sense that please ask questions I was. Told to make sure I run on time which I'll do but. At times you know the, more that we can share the better because in some ways this is a whole new ecosystem. The more that we are interacting the more we're getting to the right answer and the, best part about where we are today is it's the. Best things are happening with customers, in mind so I learned as much for my customers, and I'll share some of the more anonymize. Results here so please. Keep. Keep it lively for us, with. That I'll just briefly introduce myself and I'll turn it over to media my, name is Eric I am. Responsible for product management at a company called port works, I will, demonstrate something, from port works at the end but most of this is just at some, fundamental level about. Cloud native kubernetes, and some, experiences. That we've seen working. With customers both my. Time here Paul works in my time prior with. That I'd be, very happy to introduce midea hi, guys my.
Name Is mithya and today, i'm gonna teach you how, to take down a kubernetes cluster and. I, am, a senior at Google and. I'll. Be happy to share with you all the, things I've seen that takes down a kubernetes, cluster. The. Next Oh actually, so I started, my career journey, not at Google but at target, and this, is a black. Site a target, is actually a real picture, and, not that. If, you know that the. Real reason that we have cloud is the retail like Amazon, came up because, of Black Friday black. Fred is extremely. Important, for all the retailers like you have no idea how much of the revenue of all these retail stores all of them they, make on Black Friday's extremely, important so we. Prepare, really well for. Black Friday's target. So. I. Assume. Everybody knows a little bit about kubernetes, here does anybody think. They. Don't know anything about Bernie's show of hands. Who. Here knows something about grantees, okay. Yeah I was just testing if your hands work. So. Cooler days they say ah it's a size like a Tetris game so. If you remember the tetris game if, you, ever played it it's like bunch of ships coming at you and you are supposed, to fit them in in the spaces, in the best possible way a lot of people think uber name is like magic, it's, just like all it does it schedules. Your stuff in the best place you give it a container it's as always they have bunch of computers, this one is the best one to run on it and when it doesn't. Work it will risk a lot and another thing just the basics, of community, people. Think uber is about continuous, that's not correct, you, could actually run the unsafe good maryani's you can make kubernetes. Scheduled. VMs so you can schedule anything it. Happens to be scheduling containers. So. Targets. We, only started, kubernetes, probably has thousands, of developers and everybody was excited to. To. Be on kubernetes a new shiny thing but, they were not ready to be on it so let's, play this video see what is this one oops. Okay. This is actually, Houston. Texas a Target store if. You look at this doing. The flower there there's, something going right in this store and it's, like surely because a lot of places were affected by that by. The disaster natural, disaster, but this one is actually a lot of things are working you can see the lights are working the, systems, were all working emergency systems are all working and nobody, got electrified, so this is the kind of plan, you want to have our kubernetes cluster in a black fire in a disaster, mode you want to be operating, without electrifying, people or, keep. The lights on so that was the mentality and. I'm. Gonna share with you some of those things that we did I. Don't. Think it goes next with this okay, Emily. Yeah. So. I think, Lisa and I were debating if anyone still knows family feud and. Who the host is I think it's Steve Harvey but at the same time if we were to survey, people and survey from IT and developers perspective this. Part I'd like some participation, if we were to say what's the first thing or most common thing that people cite as one, way to ruin their kubernetes cluster.
Somebody. Shoot a shout out an answer I. Heard. That CD what's that load. Balancer, I get. That the ingress controller and we can do demos with it later any others. So. I'll, give a different answer and this, is all non-scientific, in the sense that more and more of my examples, will be more just to illustrate the common points right, so as much as I want to say that it's, at CD it's things. Like making sure that you understand the upgrade policy I'll, describe a different one which is that in general I'll I'll. Give the most basic is putting. All your resources in one, basket or in, some ways creating a single point of failure at CD could be an illustration of that in terms of how, many times, that CD is being replicated at the same time and the other one that I see a lot of times, and it's probably the most illustrating, cloud is how, to handle the SLA so, as you know public clouds have SLA s that are written for to availability, zones so, the onus is really on the user to say I'm going to deploy and 1az and in the other but, the moment I have certain applications, that become sticky it becomes non-trivial right. So deploying. Your eggs so that not all the eggs are in the single basket that if there's a brownout or the, availability zone drops that, you can actually run your apps than the other AZ that. Becomes incredibly important and the, more interesting examples, that I've seen take this point and just we, examine, them in different environments so on Prem we've. Seen this with customers who want to run across racks that becomes incredibly important and even in the fact of hypervisors. So VMware. OpenStack. Every, KVM they want to make sure that they don't put all their applications all their data into a single hypervisor, because, that becomes a single point of failure so in some ways I wanted to start with a trivial but illustrate, the fact that it. Becomes. Important, to your design that as many. Set media says you could schedule anything, but how do you schedule it so that you have the availability in the reliability, that you're truly desiring, right because everything else comes after you, can ensure that your applications, are up and. This. Next one this is not the illustration, but it's more of an interactive quiz is if. I was to ask who is the best, backup, quarterback, to reach the Superbowl and this, is actually I'm going to cite a non-scientific, source called ranking comm so you could verify it later who. Would say it's Nick, Foles. Okay. I only have two who would say it's Kurt Warner and I'm going to reverse I only. Have four and what about Tom Brady I. Have. About seven okay so. We. Could choose different examples if you'd like but actually in this example it was actually Nick Foles so, maybe there's a recency bias I'll. Say, that in my opinion it's not important, to me because I came from Illinois and all, our quarterbacks have been bad so, I'm. Very tired of seeing other quarterback, backups do much better than. Any Illinois starting. Quarterback but, the other one else says that backup point is more to say what is your backup strategy, and so. If you look through some of the examples and all of us have this personal history and you guys can share it here is that, it could be configuration, it could be data I don't want to scope it to a certain problem statement but, not having a back-up, plan and sticking, to it and the example, that I thought was most illustrative. Especially. As a father of three is my, kids love Toy Story and Pixar, at some point almost, lost all of its Toy Story rendering, data well that movie was still being made because, someone did an RM star I don't. Know, RF. But. It was actually deleting. And the, you could say well you know did that backups they had backups, but, actually the backups had not run in a month and this video which. We can share later they. Go through the whole story the people who are interacting and what ended up happening is, the. Backups, hadn't.
Run In a month somebody happened to have at home but, the whole point here is you need a backup strategy because at any point I need a way to recover both, the data the configuration. And that becomes, something that we have to do on the beginning. Right, it's either, we learn painfully, or share. Those lessons collectively or we have to go through something like this and. The. Third one and then I want. To in some ways start, to expand the topic it becomes you, know I think. That the greatest, challenge that we're all facing right now is that there's a mismatch, in expectations. When. I look at applications, of what people want to get to we get very fancy, at the top level and we can start to describe canary, style deployments, how, do I take, an application sample. It and we did this when I was at App Engine, everyone. Does this is very common on AWS, as well is that, I will have a set of machines and I can start to expose either. Pods or some, set using this do some, set of applications, and canary. Test them I could. Do it with blue green it's just deploying, more of a bulk set so I start to get very sophisticated and, numerical. When I look at applications, but, oftentimes on the infrastructure, of the analogy, and this is me bastardizing, three analogies now the, analogy that people are using is it's almost like they're burning the ships in the sense that you, talk about where, we're gonna take the strategy, we're committing, to the strategy and some of this is born out of the more past, legacies. That we all come from because. In some ways we're trying to get to a better place, but, we really should be affecting more from infrastructure, and at the end I'll demo from a certain different perspective, but, we want to be able to do the same kind of numerical quantitative, methods against. Our infrastructure, so, I think that's the onus is on all of us to start to understand what that could be from an infrastructure, point of view whether, it's your configuration. Or whether it's your anything, below the application, how do I make it repeatable, testable, so, I, think tectonic, did this well in the terms of having two upgrades. Going on at once and rolling back but, how do I bring that to all my infrastructure, pieces and so in some ways I'm trying to share this collective, blame. With everyone. All. Right so. When, I was working at targets so a, lot of people don't see target as a tech. Company but it is Toyota. Company and have. Thousands, of application, developers, all excited. Beyond kubernetes and the. First thing we had this, was so we had running. Queries cluster about three, years ago this, was the first thing that took, out clustered, on somebody. So, if you put a star in your ingress somewhere. By. Mistake maybe. Your regex didn't work or something. The. Equivalence class is gonna forward. All the traffic, from all, the cluster to your container and so one container takes receives, all the traffic of all the cluster so this was the first one that, took the cluster down so it's I'm going to show you a lot of these things and then when. I was a target officer. Gate guard which detects, all of these bad things and who. This finds. Out who did it and they takes action, on them, not. Is it could you could actually. Prevent. A lot of them these things by admission, controllers but, these are the things that you need to be aware of so these are the bad things you could do to star, in an ingress large. Image size so. Slowly. Guys like kubernetes, is a scheduling. System, so, if you don't want to be scheduled. Do, not be on communities, like you can just run your become it's okay it, doesn't have to be shiny like if, you do not want to be moved around at any second. Why. Do you bother like so, okay so if you deploy say, hundred images and each often is 15, gigabytes, and do a qct I'll get pads you know responsibly gonna be maybe, five seconds but if it's like 46. Megabytes, your, response gonna be millisecond, it's so fast but when, you accumulate, thousands, of developers each one deploying. A 5 or 15 gigabytes image is gonna, slow it down and you have no idea, how easy I mean we play a have idea I'm sorry how. Easy it is to. To. Convert a big image to a small. Image use the Builder pattern, to. Use. It to all your images in docker file using a multi-stage so just both, whatever you want use all the JDK, all up fancy things and then copy, the last, jar or the last binary, to your container from scratch and this, is not only a performance.
Issue As stability. Issue is also a security issue because the larger, your image size is the, more to ship, with your application, the more is the attack surface the. More you. Know security, problems, you're not. Not, may, not be aware of so. Just copy what you really need and, remember, you, are aiming to be scheduled like moving, around from this class or to that cluster that's the best you're playing Tetris, you don't want to send. A big shape that doesn't fit in anywhere, you don't have small small shape, or the. Smaller the better. Another. Thing that took, our cluster down was that externally, hosted, image this happened, in prod and. I'm. Sure, you've seen it probably, a lot of people still do in. Your deployments, use the image hosted on docker hub like you say nginx, see, terrible. Terrible idea. Terrible, idea because, doctor how is open source I mean it's a free service unless, you pay them they. Might rate limit you and. Also the image might get deleted and. Also the image might get compromised. So, do yourself a favor get some darker, as you see yourself, gee see ours are super, cheap anything. Anything is super cheap to host it yourself. So, you're deploying the prize docker, hub is not there for you what you're gonna do, this. Happened, and took our class it down, privileged. Containers, this, one did. Not take our clustered on but it's compromised. Security. Anybody. Here. Knows about this secure how many people, know about the problems of privileged containers. Okay. One anybody. Else two three four, okay. That's, good a mantra, so. When. You're on a container, as the privileged, mode which. You can by, default is not privileged mode, the. Bad thing that can happen it means your, container can read. Other containers. Namespaces. So, you one container can go read everybody's, almost processes. And memories, space. Not. Only can be a security problem it also could by, a mistake. Stepped, on another. Processes, foot and take it down so, unless, you really need it you should, not, use. Privilege mode some people just say oh I just looked give it all a permission it's do, not this. One. This. One is very easy to prevent because you just have to not do it because by default is not privileged, mode so if you see somebody's doing privileged mode in your developers, and some, really needed I have seen plenty, of examples that people thought.
They Needed where they talked them, into not having it this one it. Can be easily prevented root. Containers, what's the difference of root containers, and privilege mode. So. If. You want a doctor on your computer, let's, say your user name is media on your. Laptop. And. You run the darker as media. And then, you have a script they're running. Where's. The nginx and then, you mount a file, from. Your laptop, that is owned, by route to. The, container. Do you think the container, has the permission, to, write, on that file. How. Many people say yes. One. Just. One. Wow, so it does have permission, yes you're right so. Even though you're running darker as Midea not as route you, are mounting. A root filesystem to, your darker container and guess, what you, can because. Darker checks for, a user ID not use their name and user ID for root is zero and. You. Can easily. Do. That and no I don't know how people, don't know right it's extremely. Dangerous, do not run your containers, as root how do you prevent it easy, go. To your doctor file at the one line just one line that's, all you do user. Is. User ID and then something just name, it whatever you say by default all the talk code is run as root and there's, a user. ID is zero this. Can be easily prevent, I don't know know, how why people are not freaking. Out about this I H time I see it is super dangerous drive on your laptop. Do, not do this. Another. One. Okay. So this one is. Very. Interesting because. When, we started kubernetes, target. Nobody was writing, anything, to containers. Because. At. The, beginning we do not really have volumes but some people start having volumes. In kubernetes, she's, okay. But there's one type of volume, which should be avoided, completely. The. The official box on kerbin, is website, says. 99%. Of the time you do not need. To do this so, if you see somebody mounting, a host pass volume, it, means they, are mounting, a folder, on the master. Of the, kubernetes, cluster to. Their container, you.
Should Be asking. Yourself why they are doing that there. Is only one one, one use case for that and this case is when you. Need to change the darker, socket. Because. You think doctor is doing something wrong, in. That case, it's an extremely, low amount of time that you really, actually need to do that, but. I have seen developers. That because they were lazy they didn't want create a volume, they. Just, mounted. The host pass volume that covalent. Is master, clusters. File system to their container and do their things, there whatever they were doing and. This. Bad because. They. Can step on each other's foot to two different containers, they can take down the cluster, this. Take our cluster down and this, is one of the items that the tool I brought his open source case card takes. Care of -. It's. Um yeah special yeah yeah. Yeah. Yeah. And there's our crowns that you could use for sockets, but yeah but this is would be the, danger path to to. This. Okay. I think, this this is the one that, I think it's the most common one that I've, seen my entire life that takes on a cluster I never, seen anything else in, entire, time. Of water kubernetes, that takes on a cluster than this so, let's say you have a cron job it's. Very common to have a current job is to type in kubernetes and the con job let's say fails right and they, say that con job is scheduled, to be run every minute, in. Three. Days one, failing, cloud cron, job by default, creates. 4320. Restarting. Pods since, it creates. 4,300. Powers they're constantly. Restarting, why. Because. The, default, concurrency. Policy, in. Kubernetes. For cron job is, always. And run. It in concurrent, you, should if you deploy. A con, job I wish that we had a better default because this happens. So much. When. You when. You are your. Cron job feels which sometimes, happens for example it tries to write something into the database, and theories is not there it's, it's restarts. And. The next job comes the, next job should replace the previous job you should not have two jobs and two jobs are restarting. Forever. And then, over. Time becomes four thousand, parts. So. This problem, not was only take the clustered on it costs a lot of money, because it was on auto scaling and guess how many CPUs. This. Cluster. That was looking at hat by, the time the four thousand anybody. Want to have a guess how many CPUs, it, had it. Was a small like elastic at the beginning it had them I think four. Cores or something like that. Well. Close, it, was about a few, hundred I don't, know it's about top of my have a few hundreds of virtual. CPUs attached to this cluster, and you. Can easily, prevent. That from happening please, do that when you when. You deploy, a cron job just, make sure the concurrence, policy, is replace or forbid the default is terrible because all the, failing, one accumulate, and all restarts, at the same time, another. Thing you could do is, when. You when you live in the kubernetes ecosystem. You. Should learn how to crash, like a pro, because. Community. Unless, you use a init container so. You have a part of three containers, and. One. Of them is waiting for the other one or waiting for a database whatever when. It's fails. Kubernetes, tarts this, that's, what Corey does. You. Should have. Used. It should have used the capabilities. In kubernetes such as a readiness check, and liveliness, check these. Two things like. Make. You crash like a pro so here's. How I crash, in my sample, application. So let's say I want to, connect. To the database thank you for loving and. And. My dealer visas doctor okay and. What. Do I do do, I want. To just fail and. What company restart me I. Could. Do that, if. I do that it'll keep restarting and Korean what should keep trying to talk, to the database, or I, could have a exponential. Back-off, say, try. Up to five, minutes or I don't know how many for, 30 seconds, and then, crash that, where, you prevent, like. Let's. Say the database, is not there for five seconds when you keep restarting and, keep hitting it you again I make it down even worse like I mean you just let it for like give, it five seconds okay this will come back so. Learn, going through how to crash in kubernetes is gonna.
Save. You, downtime, your cronies, classes and one of the this, is the most common one absent it's. A bonus one and the. Last one I. Admit. This is kind of a controversial. One because, I've argued a lot of people about this and this is my opinion and. Some. People deploy container. With. Only. One replica, in their deployment, sense so. When. You deploy I think. You should deploy, with two replicas sets at least you should never deploy anything, which one replica set because. If you are deploying with one applicants that it means your application cannot, run in parallel it's. If it cannot run in parallel you. Want to ask yourself why why. It cannot run in parallel it is, writing something to a disk which you should not be doing in kubernetes vault is that. What. Is the reason so. But. Some people tell me now I know I know, what I'm doing it, okay. But this, my opinion I think think. About it if, your, application, has to run in only one replica, and cannot iron Powell there's there's. A big chance you're not ready for kubernetes. Yeah. If you want to learn more about these. Principles. That I. Agree. With is called twelve factor app design, if I hear about it but the, colored ones are the ones that I like most that. The config should be environment variable. The. Stateless process it means the process. Should not write anything to the disk or. It. Should not if, you restart or should the, containing they had in any moment they should come back without any pain or any losing. Any data it should be stateless, and. Lock. Should never be written to the disk and then. Concurrency, which is should be able to run in parallel because you want to have highly, available. Traffic. And then, I have some time left for Eric, to show us a cool demo. All, right so in actuality there's a lot of things I can show but. One thing that I thought would be the most illustrative is, if, I just. Narrow. In on something, that's more visual and show, WordPress and so. It could be any application, and I think, MIDI as points would apply here is, really that what I want to show is how things can be a lot easier if we just start challenging the. Status quo and, so what this is is a two clusters in eks and we're. Uploading simple, images into the. Eks. Cluster but, what we can start to do with the eks cluster is start. To migrate the volumes because, in all ways we want a multi cloud portable, world because we were promised, that kind of portability so, what we end up doing in this demo is really the fact that we have two eks clusters that can be across a ZZZ there's. Low latency, and I'm trying to protect myself and start to create what starts to look like a stretch or a DR cluster but, the simple thing could be how do I use it in a kubernetes context, how do I pair those clusters how, do I run that application manage. The statefulness, and push it across and so, all I've done is take a wordpress container pod. Excuse. Me. Add. Some images to it and then, start to do things like. Applying. The applications, in migrating the volumes so, here what we've done is we've actually migrated. The data and, we can see it and then now we're going to recreate, on, the second cluster and this, is all trying to illustrate some of the points I was making at the beginning which is how, do I make it so that there's backups how do I make it so things are repeatable and how do I make things that. In some ways aren't reliant on a single point of failure, so. What ends up happening here, is we have to wait a few minutes and that's in some ways why I recorded it but you're waiting a few minutes and it's actually rerunning, the kubernetes, deployments. For the pause we're looking at WordPress and we're, now going to go get this service because I have to go publish, the or create, an external load balancer in Amazon I go, and grab the URL and now, I can go launch into that application and see. The same set of content so, in some ways what I'm trying to showcase here, is the idea if I had a kubernetes, cluster with, WordPress. I write, all the data I have, a backup strat, I keep. It across or make it so it's multi easy and now, I can rely on infrastructure. Or other plumbing, to, start to create migration, patterns or dr patterns if you will and moving. The data all i have to then do is substitute the, PVCs, rerun, it and then i my application, with the data intact so, this is in some ways what i think we are all trying to push for as a collective, community i realized. That that was super fast trying, to keep within the time happy. To stay afterwards and, midea, and i will be after here for. Quite some time so thank you all thank you.