GitOps In A Regulated Sector
Hi. My name is Brennan Kemp I'm a consultant, for container solutions and. Today welcome to my talk, on, guitars. In a regulated sector, so. Just, to kind of give a overview, of who, I am so I've, been a consultant, for around five to six years now. Majority. Of my work has, been around, helping moving systems to more more resilient, setups right more easier. Deployments. Of software etc and. Primarily. At CES is helping enterprise customers. Move from, legacy setups, to more cloud native operations. Which, in today's time means I've been, working a lot with kubernetes. However. Starting, out I was, a Python. Developer. Today I'm gonna take you through a bit of a. Use. Case at one of our, clients. Deutsche fiscal, so, I'm not gonna go deep into what Deutsche of the skull actually does. However. Just. Understand, they're in a they're, a type of Funko and they're, working with government. Regulations, so. In terms. Of. The. Actual teams that we worked with at. Deutsche of a skull it was extremely, small talented, team right so two to three people who had to handle the, infrastructure. And operations. Of around, twenty. To thirty application. Developers, work, right, and majora. Majority. Of this work was around a Greenfields micro-service project. So. When. @c. S the first things we do when chatting. With clients, is we go to the clients, and for, you know two, to three days we have workshops with them understanding what they need. What's, important, to them and after our workshop, these were some of the things that we, thought were. Extremely. Important, in their ecosystem, so. Having. A quick look at things you know audible, changes, who, who made who, changed. The system access. Control who helped who has access, to change a system scalable. Infrastructure, they didn't know how much. How. Much they, were dealing. With in terms of their, customers. Security. Any regulated. Industry, has, expectation. Of some kind of security disaster. Recoveries, so rollbacks etc. Collaboration. Was something, that we. Felt was necessary an. Assessee, t so, i will, chat about that in a bit and also, stability. Across all environments, not, just production environments. So. After, having a look at all of these we thought this was a, great. Opportunity to, dive, into a, slightly. Newer methodology. Called. Get ops right, so. A lot of people have been talking about github the use cases of gets upset cetera, and we thought this was a very good fit. Now, how does gets helps solve, all of those, problems so, for orderable changes, I mean. Developers, have been doing, commits for years rights your get commits, every. Commit, that gets pushed to get has, your name against, it has. A little reason, why this change. Was made and. For. Us if we, were to think of you. Know our own experience, where companies. Have come in and added extra change. Release processes. Change release, documentation. This, just slows everyone, down right so, or, gets. Ops audible. Changes there's no need to add any extra tools. Access. Control very, easy what, you can do and get is pretty, much your access right who can create, pull. Requests, or merge, requests, you can create tags who, can review. Pull. Requests, and you, know submit. Them etc and, only, the pipeline, actually makes changes to the infrastructure, or the applications.
And, Any. Of these branches, that cook off these changes are protected, so you can't push, directly to, a. Branch. That's going to make these changes. So, collaboration. For us was super important, specifically, because of the size of the team right so if you have two to three people these people are not, going to be able to handle, the amount of tickets. That, 22:30, application. Developers, are going to create right, so, for, instance if any. Application. Developer, wants, to scale up the size of a Cuban Asus cluster or add a new collection to a MongoDB instance. Or something like this this. Would have, a process, generally, of the ite operation. Guys getting a ticket of some kind they've got to go and delve into it and you, know add the, changes, etc etc, they've, got to deploy the changes they've got to then ask their application, developers if this is right or some. Feedback, needs to happen right now in, our, setup what, we envisioned, was this idea that, application. Developers, that had some idea of infrastructure. Which is you, know quite a large amount of us, nowadays, they. Would be able to. Create. A pull request with, their changes, and all. The operations, team would need to do is to, approve those changes, and merge them right, so this allows any body. Within the company, to become a potential. Operations. Engineer. For. Changes that they would need. Security. Security. For us wasn't. Difficult, in this case so generally. If you're working with a dipped. Repo. Provider like github or hosted, gitlab or something like this you'd. Need to worry about things. Like how, they handle, their. Cleanup. Of the VMS that run C icd how they handle their secrets etc, however in this case we were using an internal. Git repo so was hosted internally, on a closed. Network. We. Were using the push model now so. Using. A push model, specifically. Because, we were handling infrastructure. As well as application, so if. You. Know a bit about gets ups you know you've got the pool model in the push model pool, model is your, system, pulling changes, from the git repo and push, model is you'll get repo. Your CI CD pipeline, pushing, changes to the, to. The infrastructure, or the application, right, and. This. Doesn't really put. A pool. Model doesn't really work when you're. Handing infrastructure, because the infrastructure, has to exist before. You're able to put to. Pull the, changes which creates a bit of a chicken and egg problem so. Also. Another thing was, secrets, you know database passwords, things like this how are we going to store these and we decided to go for encrypting. Secrets address in a git repo and the pipeline has the ability to decrypt, them and apply them or, decrypt. Them and push, them etc etc. Roll. Backs roll backs became, super simple right, reverting. A change is as simple as reverting a pull request or, a commit, so you just, change. Two commits. Push. This push. This create a pull request or, you, know you use get gets. Revert. Merge request to revert pull, request. Functionality. And this creates a new commit that you then need to take through all of your environments, to test this right so, at, worst case this, means that you've got to deploy to some. Kind of staging environments, and then to production, the. Drawbacks, are. Handled. As if. You want to roll back changes on, software. The, dr strategy, if your system fails everything's. In get your source of truth it gets so in, terms, of this and we did kind. Of test this in, terms of spinning. Up a new environment, all you have to do is you have to switch out you, know a project, ID or, a subscription. ID depending, on what cloud providers, you using if, you're using terraform, and, you. Need to then run, terraform. Apply or something like this database. Restoration. Isn't, handled in this so database, database. Failover, is always a strategy. In its own because you need to know how to replicate data across regions. Etc so, in, order, for us to handle this. We. Didn't actually look at database restoration, right, off the bat but, rather handle.
Whatever Cloud providers, you're. A data. Recovery. Mechanisms. Are they. Right. So, scalable, reusable, infrastructure, cuba Nancy's cloud terraform, I'm sure, we've all heard about these so I'm just really, popping through all of these because I want to actually get to the actual, our. Actual approach and design of the system. But. One thing to take into account is, what. About stability right. So, stability. In terms of a lot of company only means that production, needs to be stable but, for us we, felt that all environments. Need to be stable now. The reason behind this is in, a. Normal, setup is if, you've. Got application. Developers, and you have operation, engineers, all working, on the same staging, environments, generally. What happens is operations, will want, to test out a change, on staging, they'll bring staging, down for a couple of hours application. Developers, will then not, be able to push their changes or test their changes on staging, and this, creates a bit of a you, know a bit. Of. Disagreement. Between operations. And their operations. Are too scared to push any changes, to staging, because they don't want to affect them devs, work and devs. Get upset with us because of the slow turnaround of changes that they might require. Which. Again seems, a bit of a chicken, and egg problem right, so our, approach to this was very simple. We, were going to split. Applications. And infrastructure, into two separate repos, so anything, infrastructure. Related this includes, a, couple, of things and anything. Application, develop, application. Specific, also, gets into its own repo, now. For. The. Infrastructure. Repo, what, we have is any components. Or services provided by the cloud now. There. Was a bit of discussion around handling, databases. In the. Application, level because, for, us we felt that applications. Should be able to create their own database, as needed, however, we didn't really have the. Designer, technology, at this point in order to handle this. Kind of setup but, we decided that databases. Would. Also be handled the, same way as communities. Clusters queues and networking right. And, now. For, infrastructure, we, decided, to go for a little approach. Now. This. Approach just, just as any framework. When, you're dealing. With an application, you. Use something like you know spring boots or Django or something you base your application, on this framework we decided, to base our infrastructure. Blow. On a framework. Called cube, stack right, and now, or, the, exact same reason as cube, stack has pretty. Much solved. A lot of problems that, you get, straight out the back with guitars like, setting up pipelines, setting, up your. Git workflows, etc, and one. Thing that cube stack brings is this idea, of an ops apps pair right now, the offsets pair is nothing new it's, your staging, and production setup. However, we're talking about this in an Operations, points, perspective. So you have an operational. Shadow, copy of every component where, the operations, team can test things out so on and then, you have an applications. Applications. Environment where the actual applications. Get deployed to and they use the components, right now, this. This. Is quite hard to keep, in sync. Making, sure that the operations, is as close to the applications. Environments, as possible so, we used another feature of cube stack, which. Is how cube stack sets out at, terraform. Modules to inherit the config, right so, in this case if we just briefly. Look at the. Configuration. That, cubes deck uses here we. Have this. Apps. In. This app's block and this ops block now, the. Cube. Stack sets up sets tariffs or modules to inherit, configuration. Between environments, so the. App's block. Is pretty, much your general settings or your setup of your environment, so in this case we're looking at clusters. Your. Clusters, your. Apps cluster, here will have a auto. Scaling enabled, with a min node count of 1 and Max node count of 10 with, a region Europe west and some node, locations, right so. How we, try, and keep ops as close, to the. Apps as, possible, is we.
Come, We. Come with this idea of explicit, differences, right so, in order for apps to differentiate. From apps you, actually have to explicitly set specific, settings that you want to differentiate, or else it inherits. Application. The. Applications. Properties. Right so. In this case we, have a max, node count of one so this. Operations, cluster won't have auto scaling, enabled, because it, will have max node counts of one in the mid node counts of one and it's. Got less, node locations, however, it still inherits, the mid node count from the apps, properties. And it, inherits, the, region, from the apps properties, so, any changes, you want to make you have to be explicit about this and this just allows for as little deviation between. The two environments as possible. Which. Eventually gives you something like this so. The. Top you, have your, application. Environments. So. This. Also brings in this concept of we, are splitting our application, environments, away from our operations environment, but in this, case our applications, have a staging pre prod and prod. Environment, and this is all handled, as the, application. On. The. Application, environmental. Operations, and operations. Has a direct, copy of this. What, this is grateful, is the operations, environment. Becomes like a smokescreen, test of any changes, so in terms, of, this. Very simple setup so obviously we had a lot more components, but in this very simple setup you can have this operations, clusters, you can deploy agents. To this operations, clusters that can test changes, to your databases like, is there data loss is. The downtime, is there, I don't, know a speed, speed up or slow down of changes, etc and, you, can have a very good idea. Of whether, your change is effective, and does what needs to do right. And, with. Our customer, the understanding, here is we. Have a pre prod and prod environments. That is actually customer facing so pre prod would be a. Place. For our, clients, customers to teach changes, before. They, get into product. Right so, what does this give us gives, us a manual testing environment, for any changes, operations. Needs to make without affecting. Application. Development right, which. In the long run creates, less downtime. Because. All changes. Problematic. Changes are learnt in the operations, environment. And not, in any application, environments. Right so, the operations, at the end they've got more confidence, to make changes they've got more confidence, to experiment. Downsides of this cost. Right, so I'm, just, put it as a note sure that. This. Is infrastructure. Cost that we're talking about so, when. You do trade-offs, of this your infrastructure. Costs will increase however. The, idea that you might not not have your application, developers, sitting on the sidelines, waiting to be able to deploy to an environment doing, nothing this, causes costs, as well as well, as the idea of reputational. Cost so specifically. As this is a greenfield, project, the, clients, that this company, is trying to. Try. To serve us they. Have a very specific. Reputation. They need to uphold and as soon as your reputation is damaged due to downtime. Or something, like this this can have a huge impact. Financial. Impact on your. Company. Right. When, we approach this we, decided on a very simplified get workflow so if we, take into account using. Cube, stack using, the terraform, that. Cute modules, that cube stack. Sets. Up etc. This. Is the, design of our. Get, workflow right so you create a very, very quickly you create a pull request and to. The master branch this spits, out a telephone plan where, you can get some idea of what, changes. Are going to be made to your setup. Right now. When. You merge this pull request it, kicks off the pipeline to actually apply those changes to ops now. Applying. These changes to ops might. Break right, so when you're working with cloud providers, you get these concepts, of things like service tiers where certain. Settings are only available in, certain, services, terraform. When, you spit out that plan doesn't, necessarily, take these into account and we have actually, experienced, this quite heavily in terms, of the difference between basic, tiers and standard tiers right so, when. We apply changes, to ops then. We see that okay. Applying. The setting, to a basic, tier doesn't work because it's, not allowed in the cloud provider you have to upgrade to T or something like this this, is a lesson learned that, can cause downtime, for, databases, for, setups. Right now. Once, we're happy with the changes that we've proposed, in our request work, in operations, this, master branch also spits, out a terraform, plan to.
The. Actual application, environment so to the, operations, production, environments, now. This, plan, will. Give us a good indication of any differences, or changes. That would be made to the apps environments, and when, we tag our commit for release so we've got this version controlled every. Time we've happy. With changes, we. Tagged the commits and it applies, the changes to the apps environments. Right, now. On. To the 11th Reaper so the 11th repo is this idea I believe. It was formed, by weave, works in their get ups I mean, they're very big in the guitars industry this, 11th repo is just a collection of all, the microservices right the, name comes from the number of the original micro-services, so we had 10 micro-services, to start off off. With but. This, quickly, falls apart when you add a new microservice, rights because then you're kind of like eleven micro-services, and 11th reaper doesn't make sense. So. What, does this 11th reaper do it collects. Because. We're using cuban ESI's we've got all these llamo configurations. That deploy the applications, right and this 11th repo collects, the current, deployment, configuration for. All the microservices, together, and. This. Allows us to have an exact recipe of microservices, that work together now. If we, take this one step further where, we started, tagging, all the micro services, with, the commits, of the gits. The. Get commit that was used to build that image this, gives us a very good understanding of, what is currently running in production. Or, staging or pre, product right so. We, had a bit of pushback of this because there's, this, idea that. Micro. Services are supposed to be independently, deployable and. Why. Do you why do you want to monitor the versions, together, because, micro services shouldn't have dependencies, so, for, the first bit just. Because your, micro services, aren't, are. The. The. Yam all is collected. In the same repo doesn't, mean these aren't independently. Deployable right, every, commit deploys, changes, so just. Because what micro services deploys a change doesn't mean that the next commit can't be from a different micro service that can deploy its changes. Micro. Services shouldn't have dependencies, well. Anybody. That's been working on micro. Services, for any amount of time understands, this micro. Services, have contracts, with each other and sometimes, these contracts, are broken rights because we don't keep track of service a token deserves B we, make a. Change on the contract, of service B that serves a can't handle, so this is one kind of dependency, that often, happens, to cause problems upgrade. Upgrading. Of services, internally, like Postgres versions, or something. Like this now, the next. Level. Of dependencies, is also your library dependencies. So, any. Microt service that uses libraries so if we're using Python, here you're put packages if you're using nodejs, your node modules. You. Might update a node, module that, could break. Break. The service or break another service, right and. Sometimes. You don't have full visibility into, those changes. Oftentimes. Developers. Are very stressed they're pushing, changes, they're upgrading packages, they're not reading change logs and etc. And these, things happen now. What's. Our. What's. Our thing allows, for. Alright, what, ow ow. Say. That allows for is to manage these that if you do if this ever does happen you just roll back to the previous.
Previous. Xi repo state and your. Contracts, are restored because the. Images. Are rolled. Back. Now. This, idea becomes super tricky when, you're using home, right, so we've, at first proposed customized because you know having customized, using, get ups gets. Off tends, to favor, more. Declarative. Setups. Right we're, home and don't. Get me wrong I like home I like home as a - a, tool, um, three, is amazing, however, home is not, declarative. You still have all this templating, we're injecting values, in and you're injecting these values at you at. Runtime, these values are stored separately outside of these templates, it's always, very difficult to, understand, what's, going on in, CI, CD with hub right, and there, are tools around it I know flux has weave. Works has a tool flux, CD I believe or something, like this. However. We, started. Hitting bumps, very, early on when. Using home right, another. Thing is you'll get. It. Workflow, so the, applications. In this case used get flow and now there's nothing wrong with get flow but. Even, the, guy that, bought. Us get flow commented. If your team is doing continuous, delivery of software I would suggest to, adopt a much simpler, workflow, right. So, in. Our feelings, you. Have three branches at max right. -. If you can get away with it this, get. Flow had quite a lot of. Branches. Quite a lot of merging, yeah merging there and, this. Made it difficult to map exactly to, what environment. Everyone. Needed to go into. So. The, end result looked something like this you have a. You. Have a git repo where, you. Have continuous. Your. Micro services are continuously, updating. Yeah more manifest in, this. Xi, repo and this xi repo is just, a collection of llamó where, you've got your. Image is, based. Off eight commits of. What. The micro service is, at that time now. Once you're happy with this recipe of microservices. Working together so, these will continuously, get deployed to the staging, environment, once, you're happy with it you. Know you tagged the git repo and this, will deploy, to pre prod I mean, you can use regex tags so you know pre prod - v1, something, this will deploy to pre prod and. Prod. -. Be. Won in something, would deploy to prod right, and this. Just gives you a very, nice continuously. Continuously. Deployable environment. So, my final thoughts and, lessons. Learnt on this right so. Specifically. When starting with github start, with something. Basic start. With something. That's been done before and build on top of it right so we chose a framework to, build on top of and. We. Started. Building the customer requirements, on top of this, every. System requires some compromises. With you. Know your ecosystem might, have some gaps, overall. We were able to achieve our goal now. Couple, of thoughts honest. Try try, try always. Use declarative, tools as much as possible if. Your work git workflow simple. You. Know start. Simple build, on top and also. More. Genuinely. This. Might not work for everyone, this is not a silver bullet don't. Just. Use it because it's a new trendy word only, use it if the shoe fits and. Then. Finally just a couple of resources I wrote, a blog post on this specific, use case this, cube stack I definitely recommend checking that out and we've works has a lot of great information. On, github. And I. Hope you enjoy the rest of docker con thanks, for listening.
2020-08-24 07:33