How I learned to stop worrying and embrace continuous deployment

Show video

Well, good morning, good afternoon, good evening, good night, whatever greeting is appropriate to people who are tuned in to watching us now. Thank you very much to Thoughtworks for hosting this little event. Thoughtworks, just to briefly explain who our hosts are, is a consultancy company that does strategy design and engineering.

But probably if you know about us, you know about us because we have an ethos of sharing our work in public. So, you might be familiar with our colleague, Martin Fowler, who has a very famous blog. You might know about the Tech Radar where we share what we're looking at with new tech.

You might know about books like Zhamak Dehany's, data mesh book a few years ago, which, made a big impact in the data space. So what we're doing today is we're introducing and discussing another really, awesome book, by a Zalweg's colleague. I'll introduce my first myself first before embarrassing, Valentina with getting her to introduce myself. But my name's Chris Ford, and I worked with Thoughtworks for more than thirteen years in various capacities. Right now, my focus is on, retail and travel.

But, yeah, I've done many different things, and I'm interested in the the full breadth of the stuff we do. Valentina, do you want me to introduce yourself, and then we can get going with the content? Yeah. Sure. I'd try to be very brief. My name is Valentina, and, of course, I'm also working for Thoughtworks. I'm right now, lead software developer. And you might be able to tell from my accent that I'm Italian, but I'm based in the Spain office. I had also some short detours in, Southeast Asia offices.

So, yeah, been working for Sattos for a while. And before that, I was in another consultancy in Italy, looking very much still at topics like agile, extreme programming. So, yeah, I've had my fair share of experience with this. Yeah. Absolutely. And we'll we'll get into it in a moment. But, yeah, agile and extreme programming, those kind of principles and ideas are very much infused in your book, Valentina, which I think will will reward reading from people who are into those, those kind of topics. Yeah.

But could I just ask start by asking you to say a bit more about your, like, your professional background? Like, you're you're a consultant. Right? What what does that mean? Right. So as you said, I'm a consultant, and TopX is a consultancy. Before that, the company used to work for was also a consultancy, which means I've been extremely lucky that I've been able to basically hop around from company to company without really having to do the whole, changing jobs, thing. So, this has kind of allowed me to see and compare different companies and, especially all the different practices for, you know, getting code from other developers' laptop to production. And, of course, I've gotten to know many more through my colleagues.

So, yeah, that's also a reason why I wrote the books. Amongst all the companies that I've seen, basically, I've managed to work with processes that were, maybe faster than others, and, continuous deployment happened to be my favorite one, of course. That's why I wrote a book about it. So, yeah, I thought that amongst all the companies I've seen, those that were practicing it, were very, very fast in shipping updates, in debugging, recovering from failures. And, I thought that, it was a topic that deserved a bit more attention and maybe a bit of a refresh in the literature since we, yeah, we all stopped the maybe, with continuous delivery that I think was written about ten years ago. And, there are still some practices, that need to be added on top of it, in my opinion, to perform continuous deployment, but I hadn't seen anybody talking about them.

So I had to basically, explain them maybe several times changing, from client to client and, so they all got condensed into this work. So now you've written a book rather than having to explain yourself each time, you can Exactly. I can just point to the book and, yeah, see what I wrote. But in reality, yeah, I I expect I still have to explain myself many times.

That's fair enough. Well, I think you're right because I'm about to ask you to do it now for people who haven't read the book yet. Can you can you actually, like, let us know what what does continuous deployment mean to you? And I guess what it means to you is now the official, definition now that you've taken the time to write that book.

Yeah. That that is, I guess, a privilege that comes with writing the book called Continuous Deployment. But, in all seriousness to me, continuous deployment is kind of an increment on top of, let's say, the traditional way of doing continuous delivery, where perhaps, in continuous delivery, you deploy to production on demand, and everything needs to be deployable, but it's not always deployed. Right? There's often, manual checks that need to happen, usually in some sort of staging environment, usually by some stakeholders, that could be within or inside of the team. And, there are perhaps scheduled deployments, perhaps on demand, but there is always some sort of manual, interaction with the path to production that lets, some, hours, days, or sometimes even months of work accumulate before you're able to perform that final deployment to production. So continuous deployment kind of does away with that step, and it basically represents a continuous pipeline where if you commit a change to main or master, that change will be, manually, sorry, automatically verified and deployed to production.

So there will not be a pause or a, go to prod button. The pipeline will do the entire job of promoting the change. So the effect is that, you commit and whatever you commit will be in front of users and run by real users for however long your pipeline takes to run. It's usually perhaps thirty minutes, maybe one hour.

But, yeah, there is no manual interaction in the path to production. Yeah. So this means that, basically, your ratio of code commits to deployments to production is one to one, and you have very little, if no inventory, stuck in the path to production just waiting for a human being to manually click something. So I I know maybe this wasn't a short definition, but, yeah, in essence, that's what continuous deployment is is full automation from commit to deployment to prod with no manual stakeholders in between. It's interesting because I think, like, often, we almost feel like letting commits rest or cool or settle before we deploy them somehow kind of helps or lowers risk. But I think you point out that that accumulated change that hasn't yet gone live actually makes it harder to know what's going on because, you know, you mentioned a one to one ratio.

That means that the mental model you have to have of your system is like, what is live? You don't have to think, okay. There's this staging. There's this other environment, and that is all this kind of complicated reasoning over time that you have to do. You're trying to present it quite a simple picture. Yeah. In fact, not only I would say that, letting commit settle, doesn't lower risk. I think it actually increases it. Right?

Because of I think it's only, let's say, a very human perception that, letting a commit sit in staging for a week or two, will make it safer once we go in production. I mean, sure, there will be manual testing, but in reality, what I think is happening sometimes is that, we feel safe making the commit because we know it won't be deployed immediately, but we're actually just shifting the risk of that deployment to later where it will be But maybe the data we're not at work. May maybe we're at six weeks. We only have a day it actually goes live. Yeah.

Yeah. And then it's going to production with many other perhaps unrelated commits. And when perhaps we've already moved on to a separate task and we have less context to figure out what's going on if the deployment goes wrong. So it's, I think, a very human thing to feel safer.

But I think by deploying immediately, we actually take much more ownership of the fact that our code will go in production immediately, and that forces us to consider all of the quality gates, all of the impact on production metrics like performance, the impact on security. So, yeah, it really, I think, empowers developers to make changes and really have production, in their mind as a first class citizen. Every single line of code they write because they know that, it's going. So you say empowerment, but I guess also accountability.

But there's no there's no kind of, like, pretending, oh, okay. I didn't quite do this right. Someone else will do it with it later. Someone else will test it once it's in staging. You know, it doesn't become my responsibility, but, maybe we assume that, some other stakeholders is surely gonna call out if something doesn't work.

In my experience, that doesn't really happen. But, yeah, it's an assumption that is reasonable to make. So we've got a couple of questions from folks watching.

Thank you very much for people willing to do that, which I might, ask. So one is by Giacomo, that was asking, like, what kind of number of devs are we talking about in organizations where you've seen continuous deployment? Like, is it three people shop? Is it three hundred? Is it three thousand? What have you seen? So I'm gonna give the consultant answer. So it depends. Yeah.

I actually talk about this in the book. So, usually, I expect that a team that practices continuous deployment doesn't go over, let's say, the famous, two pizza rule. So I wouldn't like to see a team of over, eight, ten, maybe twelve people perform continuous deployment just because, at that point, you have so many commits going in production that, it's kind of easy to lose track of, you know, what's in the pipeline at any given time.

So you need a reasonably sized team. Now what do you do if, your company is really big? Well, that's why architectures like microservice architectures are really helpful because then every unit of software is independently deployable, and it can be overseen by one reasonably sized team. In fact, I wouldn't maybe recommend continuous deployment to, say, a startup that has a a very, very big monolith that is just starting to break it into pieces because then you have a team that is simultaneously big and a unit of software that is not independently deployable. You might have some issues there. But in companies that are already established or perhaps very small companies with just one team that is really small, then that's usually, a very fair scenario for continuous deployment.

Yeah. I mean, you yeah. You talk about, basically, yeah, scaling up one system and one team at a time. I I must observe, though, more than ten years ago, I think Facebook talked about how they want people on their first day to commit to production, and they famously have a very large monolith. I guess they're a bit of a special case, but, yeah, certainly, it seems more approachable to do it like you suggest of, yeah, two pizza team, understandable system, and, though, you can kind of you just redeploy there and take accountability for the effect you're having on that system that you own.

Yep. I have another question, actually. Thank you as well from other audience member for writing in. So Armira says, I think it's a clarifying question. So instead of review before deployment, so instead of the Azure Shore being on staging or whatever, you have review as part of the commit process.

Is that is that right? That as you're actually writing the code, you would need to do that collaboration activity. Yeah. Exactly. And this is actually part of the benefits for me of doing continuous deployment is this, shifting left that is now a popular concept Yeah.

Of quality assurance practices and code reviews being one of them. So we can't delay, testing and, manual reviews of the code until there's a staging environment with lots of new stuff in at that point that can get confused. We have to do it during the development process. Same thing for, thinking about the performance impact of the changes and the the security impact of the changes. We can't just wait until they are almost about to go to production. We have to do them earlier.

For review in particular, I really like to advocate for pair programming when doing continuous deployment. And some companies even mandate it, especially in, regulated industries, you want at least four eyes on a certain change. So I think perhaps a sorry. Pair programming wasn't maybe that popular, in the last, ten years or, you know, in some companies more popular than others. But I would advocate that if you plan to adopt continuous deployment, it's a practice that perhaps you should reconsider if you haven't considered it before because it gives you that, assurance that at every change will be seen by at least a couple of people.

Interesting. And, yeah, just for for context for people who haven't read the book yet, you know, you have case studies from, you know, online retailers for whom downtime is very expensive. You have a bank there. Like, the the people who are adopting this practice are not not non serious companies. They're people who have a real mission and need things to work, and they found that they can do that with continuous deployment. Yeah. Exactly.

I have about seven case studies, and some of them are also part of regulated industries. So it's definitely possible. And, yes, it can comply, with the, auditing and all the regulations you can think of. So I think, actually, Dave Farley has a great blog post about this called, continuous compliance if you're interested in those specifically, and I really highly recommend it. It shows how, your pipeline itself and your process itself, even though it's on the linear side, can satisfy all of these regulatory requirements. Yeah.

I really like you brought up pair programming actually, because I think one of the things that I quite enjoy about your book is that you're considering not just the code that flips bits into production, but you're thinking about what is necessary in a team to do this well. So I wonder if you could just explain, you know, what kind of topics and sections does your book cover, because I think there's, you know, there's there's bits in it that are maybe more broad than someone reading the title would realize. Yeah. That's right. So it is, yes, a technical book, but I think it has a bit of something for, nontechnical roles as well, especially the first part. So, in the first section of the book, I talk about the theory of continuous deployment, and that's not technical at all.

It talks about the history of automation, how we got to this point, which is with a huge number of practices that were really instrumental in the industry. And, about the benefits, so how does this impact the the, for example, the four key metrics, of you delivering software in your team, the prerequisites, So practices like pair programming, practicing, good observability, alerting people, you know, at the right time, at the right moment, avoiding noise. So and immediate of others. So I would definitely recommend the theory part, and then I go into the practice, which is not just, splitting your work into, individual deployments, but also how do you want to structure your backlog to make the most of it. Because I really enjoyed that bit actually because I I felt like that was when you when I when I read it, I was like, okay. This totally slicing of scope into story so we can play it in the right order and the right granularity.

That's totally something that is a codependent practice on continuous deployment. But I think maybe in the literature on this, it hasn't been talked about as much. Yeah. Exactly. And, for example, it wouldn't make sense to do continuous deployment if then you don't leverage it. Right? With it, you can push the tiniest, tiniest increment of code into production. So you should slice your stories accordingly.

Perhaps put a feature flag so that you can see, what your new feature looks like before you wanna release it. Maybe release it in much smaller increments than you normally would that allows you for a lot of granularity with user experimentation. That's something that you can think of when you're slicing your backlog, and you have to think about it differently than if you're, you know, stopping every change. Then for the rest of the book, I obviously talk about, the technical side. So you get a new user story on your plate as a developer, and you have to start writing code. Right? So every commit you do will become an independent deployment into production.

How do you not break, collaborating systems? How Yeah. Well, Ali, actually sorry. So to interrupt just to say, Ali at home is ahead of you because, they asked the question of, like, how do you resolve dependencies between components when you're doing this? Basically, how how do you do sorry. You're about to tell us, but, yeah, how do you make sure that if you're continuously deploying, you get all the pieces working well together? Right. So that's, perhaps maybe the core part of the book. As I was explaining, it's kind of, in the middle.

So besides encouraging you to read it, maybe a short summary is, what I recommend when introducing new changes. You can always preserve backwards compatibility by, using feature flags. That to me is the core of releasing anything, but feature flags are not just a release tool. They're really also development and accurate tool because they allow you to hide any broken contract that you might have or any in progress change in production and still be able to test it in production while it's underway.

So I definitely recommend those. And then there's a refactoring functionality that is already live. So that's where I borrowed a pattern that already existed, but, I've, I haven't seen it as really so popular. That has helped me a lot, which is the expand and contract pattern. Yes. So it's not something exclusive to continuous deployment, but it becomes mandatory.

Let's say you're changing the contract between two services, say, the type of a field. Now you arrive at the situation where if, let's say, the field used to be a string and it's now a number, you have to deploy those two systems simultaneously because, if you deploy one first, then it won't be able to talk to the other one and vice versa. So before continuous deployment, of course, you could do a simultaneous deployment. Still, it wasn't the best because no deployment is truly simultaneous. You're still opening up maybe a few seconds of incompatibility, but it used to be harder to notice. Now every commit goes to production in the order and with the timing, that you wrote it and that you committed it.

The pipeline stake perhaps a different amount of time, so you must preserve backwards compatibility between systems all the time. This is where the expanded contract pattern comes to help because it allows you to expand the interface of one system to support both types of calls. Then you can freely deploy the other, and finally, you can contract the interface to the contract that you wanted originally. And then I have a whole chapter about databases where I explain how this also works with data persistence where, you have to also make sure that historical data, gets synchronized at the right moment during continuous deployments. Yeah. There's a lot there are a lot of work examples and very specific kind of nerdy detail that I think, developers will enjoy covering.

When you said it, it's quite personal, actually, honest, when you talked about those contract changes. I I I was just thinking back to ten years ago. I was working with a travel agency, and we had this issue where everything worked perfectly on staging because we've updated both the front end and the order management system to use a new contract.

And so we're all high fiving that we'd finished the feature. Then we put one we put the front end live, and and then everything broke. We're like, oh, we have to roll back.

Oh my god. What's going on? And we're like, oh, okay. We haven't we haven't updated the order management system. Cool.

So then we put the order management system live, and that also broke everything because we basically created this impossible timeline that we couldn't deploy to. And if only we'd had the wisdom that's in your book at the time, we could have maybe done something more incremental. But, basically, we, you know, we thought that we were just pushing the button and then going to the pub, with a change, and then we discovered that we basically created a situation that was very difficult, to get ourselves out of.

And the same happened to me actually, not on staging but in local, where basically we had just switched to continuous deployment, and, we had this refactoring to do again between back end and front end. One of our developers made the two work perfectly, but, we just realized we were pairing that there was absolutely no way we could commit and push either code base because then one would inevitably go first. So, again, going back to shifting left, this made us realize that, with or without, manual software was no real order in which to deploy this change.

We had to take a step back. We eventually had to rework the whole code base to introduce a step of, backwards compatibility. So that, really, really helped bring the problem to the forefront. So you could say that we didn't, have to wait until it was all in staging and we tried to deploy. We realized it much earlier, which is yet another benefit.

A a much less stressful time. So maybe, actually, it's a good segue to a a question I wanted to ask, which is, like, what does it look like when a team's practicing continuous deployment? Like, is everyone kind of, like, tense because they're like, the slightest change will break everything? Like, what what what's the difference between a continuous deployment team and other kinds of team? So that's a really good question. I would say that, yes, the extension, maybe in the first week, that you do it. And then deployments to production become so routine, and you practice them so often that, you basically don't really, think about them so much anymore as Yeah. These disruptive events that, could affect all the users. So, like, you still have to pay attention to releases, but because everything you commit and push is either under a feature flag that is hidden, or it is as part of an expanded contract pattern, or, anyway, it is covered by a whole lot of automated tests.

It becomes pretty straightforward to just deploy your code every day. And, actually, it's a very simple experience, of your code base and production because you know that they are always in line. They're always one to one. If your code works locally, it's gonna be in production minutes later, and you will have the proof that it will work in front of users as well.

Like, you remove quite a lot of stress from the deployment to production event. The fact you say proof because I I I always felt that kind of continuous deployment is almost like reality driven development. Like, you are just Exactly.

Reducing that buffer between you and production and reality that'll tell you what's going on. And because of that, you would think maybe some people would think that it may create a more tense environment. But in reality, it creates a less anxious environment because that uncertainty about and lack of traceability between actions and consequences is kind of wiped away. Yeah. Exactly. And if something anything goes wrong, you really have the feedback minutes later.

Same thing for, if something goes right. For example, you do a performance optimization. There is almost no way to test those things outside of a production environment, sadly. I mean, even with the most sophisticated stress test, slow test, you will never be able to replicate the exact load of production with the exact date of production. And you certainly don't want to move production data into preproduction environments most of the time.

So yeah. Also, another thing I want to add about development experience is that it becomes really, really straightforward to the back production issues. Not only because they, are most probably caused by the very few lines, of difference that, you've just, deployed to production. So it's only one commits worth, of changes, basically.

So and that's very, very easy to trace what went wrong, if something goes wrong. But also because, if you're in an emergency situation, the lead time for your changes is incredibly short. So say, one time we had, a bug in production that we needed to debug, and we needed some extra logs in order to understand which path the code was taking in a particular situation. In half an hour, we just added the some logs, and we found them in our production observability with users going through the path minutes later. I think that, for me, is the best example of how nice and straightforward the developer experiences, with a practice like continuous deployment.

I'm So when whenever the whatever hits the fan, you kind of everyone reverts to continuous deployment, basically. You need to, like, make changes, see what's going on, and fix something. So, ideally, you're in an environment where that's the way you work and you have disciplines and safety nets every day because, otherwise, you're you're moving to this new way of working under only under times of great stress, which is probably not the best time to try out this new way of working. Yeah. Exactly. I mean, I think you said it yourself. Often when there's a production emergency, you have, like, these two track process. So Yeah.

If if you're a company that doesn't do continuous deployment, you suddenly find yourself skipping all manual approvals just to Yeah. Debug what's going on. And that's a really stressful time for everyone, so it's not exactly the best moment to try a completely new way of working, especially if you don't have the tests to back it up, the observability to back it up. Yeah. It is.

It's gonna be So could you would you would would you do continuous deployment with a team containing junior people then, or is it kind of a a seniors only thing? I absolutely would, and I have done continuous deployment with teams where, maybe about half the team was junior. I think part of the practice, is also building building sturdy quality gates and sturdy team processes where, honestly, failure and, you know, making bad changes is expected. The purpose of the practice is not that only senior people who are always perfect, are only allowed to make changes to the code base. The purpose to me is to build the automation that is so efficient that it's incredibly easy to recover from failures.

So a bad change could be caught by the static code analysis. It could be caught by unit tests, acceptance tests, end to end tests. Then if it makes it go all the way to production, maybe, it gets heard back if you have a canary deployment system, or it is at least caught by the observability and sends an alarm really quickly. In which case, then you can exercise again the very fast path to production to do a quick revert or a quick fix forward. So So a great a great place kind of to get started in your career and learn and maybe have the freedom to make mistakes without horrible consequences. Yeah.

And not only, because it's safe to fail, but also because you get to see what are the quality gates, supposed to look like in a place where we can afford to deploy every commit to production without fear that, we're deploying something bad that won't be caught. So it's it's a really good, place to start, for a junior person because that, to me, is kind of the target state of any team that I work with, and you get to see it firsthand. Yeah.

Awesome. So I have a couple of questions that are maybe quite specific ones, folks are wanting to know about how they should, get started with it. So one question is from Dimitri who asked about feature branches versus feature flags. Like, they're saying, you know, is is feature flags the continuous delivery compatible one? So I guess he suspects what your answer might be. Right. Yeah. So it might not surprise any of the viewers that, I don't think feature branches are particularly compatible with continuous deployment because it's just yet another place to accumulate commits and inventory.

So Yeah. Even if your pipeline deploys automatically, the point of continuous deployment is to keep deployments really, really tiny. And if you have a huge feature branch, then we're gonna expect to find a lot of code in there. The moment you merge it, that's gonna be a huge change going through the pipeline and getting deployed all at once.

So it kind of goes against the the principle of the risk in deployments by keeping them small. And that's why I recommend using feature flags and small commits where you can always test in production what your in progress work looks like in a production environment. Makes sense. I have a related question actually from, Shakira who says, okay.

So that means if you got a one to one between commit and deployment, does that mean you end up with the a bunch of changes that relate to the same feature kind of spread out? Like, is that a problem? And, like, how how might you deal with that? Yeah. So I don't think it's, feature of continuous deployment maybe as much as a a trunk based development, if you're talking about, like, the git history. Yeah. I can be interesting to figure out all the commits related to a specific feature, but how we solve it, to be honest, is just with tagging commits appropriately. So every commit will have a ticket number formatted in a certain way, and then we have, you know, the Jira aggregator, that shows us all the related commits. Of course, you have to be a bit diligent in doing that, but that's the same as being diligent in not putting our related stuff in a feature branch.

So Makes sense. I guess the feature flag kind of serves to unify them as well. Right? Because what anything the feature flag turns on is by definition that feature. And if you wanna remove dead code after you've deprecated the feature flag, then kind of you've got traceability that isn't related to textual change.

It's related to, like, the the connections in the code. Yeah. Exactly. So on the Git history side, you can use tagging and then you group them by their tag. But if you're looking at the code, of course, as you said, you have your feature flag. So if you've done a good job with it, it's at the outermost point where the execution branches of code, diverge. So you can pretty much be sure that anything under that if statement represented by the feature flag is probably related to the new feature.

Of course, that is if you've applied it very diligently. Like, there could be scenarios where you have the feature flag in several places, but that that perhaps is an anti pattern in and of itself. Makes sense. Yeah. I think in the book, you recommend, like, one point of control is is the best way to do feature flags if you can. But yeah. Yeah. Cool.

So one final question because we're just approaching time. If people have been motivated by reading your book or listening to your talk today, what would you recommend as the route to getting started with continuous deployment? How do you how do you approach it? Right. So let's say, you're in a company where this is not really practiced.

Actually, on chapter three of my book where I explain, like, how to build up to it, I have a list of practices like observability, test testing, all the different types of testing you should have put in place, things like zero downtime deployments, what types of them there are, which one could be the best for you. I recommend going through that list and figuring out, which practices are we applying today, and most importantly, what do the manual approvals that we have now before deployments are really catching? Like, are they completely redundant and just a formality and everything gets caught by the automation anyway most of the time? Or is there actually some defects that we only see in demos, and why is that? So once you found the, that type of defect that gets caught, then think back of what kind of automation would you need to cache them earlier. So this way you can get to the point where, approvals are just a formality. And when they are just a formality, that's when everybody would be glad that you've removed them. Right. And if you never get there, well, you've just strengthened your tests and improved safety of your deployment process anyway.

So the attitude is your company never allows you to, then you've still made your pipeline so much better and automated a bunch of work that used to be manual anyway and greatly improved the the team's experience, the, stakeholders experience with the your path to production, and most importantly, the user's experience because they will be subjected to much fewer bugs. So, yeah, definitely a worthwhile investment in the practices supporting it, and that's what I would recommend to start. Then once you've reached that point of maturity that you're really confident in, I think you just have to take the leap and, and try it out. You can always revert if it doesn't work and try again.

Makes sense. Very, very good advice on that point and everything else you've talked about today. So thank you very much, Valentina, both for writing the book so that, everyone else can benefit from all the experience you've gathered in this, and also for coming on to this LinkedIn live today to to talk to everyone about it.

I really do recommend the book. I think, like, it's a really good way of summing up a whole bunch of different modern agile practices. You know, like Valentina says, she talks about how to slice stories. She talks about, all sorts of things with teams.

So, it's really a great way of getting a holistic view, I think, of a high performing software team today. So, thanks everyone for watching as well. Please, yeah, check out the book, if you get a chance, and, have a great day, everyone. See you around. Goodbye.

2024-10-08

Show video