Access Logging Made Easy With Envoy and Fluent Bit

Show video

Hello, everyone welcome, to docker found my name is Carmen Puccio I am a principal Solutions, Architect here, at Amazon, Web Services, I'm actually doctors. Partner, Solutions Architect, and I've, been there PSA, for the, better part of two years in. My four years here at Amazon I worked. In a variety, of roles I started. Out as part of our mass migrations, team working with our consulting, partners and technology, partners figuring out how to move customers, over at scale and in, the last two years I've served, as a principal Solutions Architect working, with our container partners, helping customers adopt, container technologies, it's, my privilege to talk to you today around access, logging, and how, you can make it easy with envoy and fluent fit and, if you want to reach out or have any conversations, you can always hit me up on Twitter and, you can see my handle down there below. So. With, that said I want to kind of set the stage and talk about the benefits of micro services before we can get into the technology portion, of today's presentation. You know micro, services is not necessarily a new concept so I'm going to do a flyby the first thing that we talked about is agility, when, it comes to the agility factor, micro services, foster, that organization. Of small independent teams, and. It essentially allows them to take ownership of, their services, teams. Act within a small and well-understood context. And are empowered to work more independently and therefore, more quickly. It's. Done to shorten the development, lifecycle and, essentially. To accelerate. The throughput of your applications, as you move them to production, the second thing is flexible, scaling, and micro, services allow each service, to be independently, scaled to. Meet the demand for the application, feature it supports, this. Enables teams to right-size their infrastructure, accurately. Measure the cost of each feature and. And maintain availability. If the service experiences, a spike in demand easy. Deployment is. Also another benefit, and think about this is essentially the the ability to enable continuous, integration and, continuous delivery, and making, it easy to try out new ideas and, to roll back if something doesn't work the. Low cost of failure enables, experimentation. Making, it easier to update code and accelerate, the time the market for new features we call this the the innovation. Flywheel, and you've probably seen that many different Amazon slides. Technological. Freedom, is another, one and just, because microservices, architectures, don't follow essentially a one-size-fits-all. Approach. Teams do have the ability and the freedom to choose the best tool to solve their specific, problems and as. A consequence, teams building, micro services that can essentially choose the best tool for each job, reusable. Code is another one and dividing, software, into small well-defined, modules, enables, teams to use functions, for multiple purposes and a, service written for a certain, function can be used as a building block for another feature and this allows applications. To bootstrap, off themselves as developers. Essentially. Create new capabilities. Without writing code from scratch and lastly, is resilience, right so, when you talk about the micro services, each, service. Is essentially. You know think of it as an independent resource, and it. You, need to think about how you can essentially build an application make. It resilient to failure and in, a monolithic, application. If a single component fails it causes the entire application, to fail with micro services, applications.

Essentially. Handle total service failure, by by the, grading functionality, and not just crashing, the entire application. So. With, with that said I think this picture says, it all if you imagine have an environment, with hundreds, to thousands, of micro services hopefully. This picture resonates, with you and and. If you take a step back and you think about traditional three-tiered, web applications. The communication. Model isn't complex, it was essentially, two hops and you usually have a front-end, web, application, and it's communicating, with a business layer and a data layer move, towards like the present-day where organizations. Are building cloud native applications. Via micro services approach where, each micro, service is a small, shippable, service running, in a group of containers, that scale is independently, via an Orchestrator, and the communication. Between these services, now becomes really really complex, and this, is why customers, are looking to service meshes and service, meshes are a software. Layer that handles. All of the communications, between these services, and it. Provides essentially, the the new features, to connect. And manage connections, between serve, and is, independent. Of each services, code allowing, it to work across network boundaries, and essentially. With you know multiple service, management, systems and if you think about the previous slide where one of the benefits, of Micra services is your teams, have the, freedom, to choose the best tool for each job hopefully. That one, of the big challenges you, know pops in your mind with that scenario and that's what we're here to talk about today, and. It's logging right so one, of the challenges of a polyglot, micro services architecture, is trying to correlate different access logs into, a consistent, format as they're sent to a centralized, logging, solution, imagine. Trying to find a particular error, or status. Code for. Different services that are interacting, with each other with no data consistency, in your locks and and. Moreover, trying, trying to find and, maintain, all. Of the different parsers, you need to ingest that data into, a logging, solution, you really don't want to waste cycles, here for your teams because essentially it's taking away from the innovation, and the quicker time the market that micro services architectures, are supposed to bring the organization, if you think about the benefits before, so, in, the, past what you would see is people like SS aging into various different servers trying to search and diagnose, issues and while, it's worked in the old world where infrastructure, was essentially, a pet it. Doesn't work in the modern world as you know with containers and micro services it's not feasible an application. Running in a container can have a very short lifespan and, can, run across any number of nodes within a cluster which means that you then need for centralized. Logging, is a mandatory, component. In any enterprise system, and the benefit, here is is now, you not, only have the ability to centralize, all of your logs in one place but you also can, you know essentially work to correlate, logs across your application portfolio. To. Get a great picture of what happened exactly in your environment, but the question is is like how do I do this how do I get it into a consistent structured, format and, what. Technologies, can help right so, the first one we'll talk about is envoy, you. Know for those who don't know envoy, is an open source edge and and, service proxy and it was designed for cloud native applications. It, was originally, built by lyft and graduated. It as I believe the third project into, the CNC, F following. Kubernetes and Prometheus and and, to move from. That maturity, level of incubation, to, graduation, projects, have to demonstrate things, like thriving, thriving adoption, and essentially. Like a neutral. Governance, process, multi, organization. Commitments, and a community, support. In terms of sustainability inclusivity. Envoy. Runs along, every application, and extracts. The network by providing common features in a, platform agnostic manner. So, it. Has features that you would expect from a mature aprox it e like, advanced, load balancing, and observability, with, a robust robust, set of API to, enable configuration. Management when, when all of the service traffic in an infrastructure, flows, the an envoy mesh it becomes very easy to visualize problems. In. In terms of consistent, observability. And it, gives you the ability to tune your overall performance and add or change features, in a singular place and, you. Might be asking yourselves ok so so like why a sidecar, proxy, how does that help me and isn't that just another piece of code and the, benefits, of running a sidecar proxy, are that you can install and upgrade it independently.

Of Your application. Code because. It's not tied to your applications, your. Developers. Aren't forced to use a particular library or an SDK, and the configuration, of, the proxy, is independent. Of the configuration. And business logic for your applications, the, proxies, have their own business, logic and their own configuration, so separating. At that layer minimizes. And consistencies. Even if your teams and services. Basically. You know if they're using various, different language or different, compute platforms, it doesn't matter right having. That same proxy, running everywhere down to the same specific, version, enables. You to have a consistent, configuration. Throughout your services, independent, of language or platforms. And regardless. Of what your different service teams are using. So, I'm gonna do a flyby on AWS, at mesh this isn't a talk on that mesh I just want to kind of highlight that we do use the envoy proxy, and, and, for, those who don't know AWS, at mesh is a service mesh which allows you to gain visibility between. Various, services, in your environment all while, making it easy to monitor and control and, debug, the communications, between those services so again like I said at mesh uses on boys the proxy in front of your containers, in a service mesh and. It allows you to generate access. Logs in a consistent format so, a simple example as you see here would be two services, service. A and service B and and service a is making say layer, seven HTTP requests. To service, B and if you want to log the request, to service B I could, easily obviously, just turn on logs but what happens again if you think about that picture before if I had hundreds, of services, and they're, in a polyglot, flashing. So how, could I force, the team's to achieve uniform, locks standardized. Logging, via SDK is one way but then again the challenge of getting people to bring it into their app keep it up to date it's. Troublesome right so, this, the same challenge, happens with metrics and tracing perhaps, that you're counting that something like the number of requests, that succeeded, or failed and you want to get uniform, metrics, across all of those services or perhaps, you want to trace a request, as it essentially. Traverses, through all of those services, even, in a simple example like, you see here where one service, is making a request to another this could be difficult because imagine service, B has multiple, endpoints and, it's running as a kubernetes, service, or it's running on Amazon ECS or or or, foggy and there, there really could be anywhere from tens, to, hundreds of containers, that are part of service B and if I'm load balancing, all those containers I don't want to have to build in load balancing, into my app code because then I have to write my own load balancing, or again download an SDK and make sure all of my teams are up the day begun make sure everybody's same burst ideally. What you want to do is you want load, balancing, done for you so that when service B team rolls out a net new version, of their service they, want all their clients to start using that new version in a controlled, manner we're, only a small percentage of the traffic, is routed to service B so, the team can observe essentially, that net new functionality, in production, and the point, is all. Your team's they essentially, yeah they're tightly coupled to each other because it.

Becomes Difficult to do things like new releases because you need to communicate and, have them sign off you want to be able to iterate, independently. And also. You, know not have, to worry about the heavy lifting, of things like load balancing and, traffic shifting and logging and let. Them focus on what they're good at which is application, code itself and and this is the focus of app and this, is the last thing I'll say about app mesh before we get into just you know fluent, bit and, on voya as a whole at. Mesh is essentially, application. Level, networking that works across various, different compute primitives, in AWS, and I want to emphasize and you, know the one thing that you could take away from the presentation today. Besides what I'm going to teach you next is app mesh works across our compute, services, and it's not tied to a specific container, Orchestrator or a computer primitive, so it works in Amazon, ACS which is our native container, Orchestrator it works with AWS Fargate, which is our serverless container platform. It works with Amazon eks which is our managed kubernetes, platform, and it even works with services, that are running on ec2 in containers, were even an individual process and lastly, if you're if you're running your own kubernetes, cluster it works there as well, all right so keep in mind that it can work equally, well across all, of these services and is not specifically. Tied to one container platform, so, to, go another. Level deeper now let's talk about the Envoy access, logs and this is like that this is the essentially. The the guts of the talk as. You can see here Envoy, in as, a proxy is simply a container that's intercepting, traffic, to your application, in the mesh and regardless. Of the backend envoy, has its own version of access logs which, gets you towards that path of standardization, that I was talking about so, when it comes to access logs in the format of those logs the Envoy proxy, uses what's called format, strings when, generating, access, logs which are essentially, just plain strings, that include the details of an HTTP, request and as you can see in the slide the Envoy access, log format is similar to what you've seen from other access logs and.

You Could also see here that the, JSON is escaped into a singular, log entry and in, order to make it meaningful. You need to unescape, it and you need to add struck sure around the data itself. So this, is where fluent, bit comes in right so you know if you haven't seen fluent bit let's just kind of talk about it at a high level and I also want to mention fluency. Fluent. D was created, by treasure data and is an open source data collector for unified, logging, and. Think of a it's, you, know essentially. A very common, to, use, fluent D when handling, in logs like something like kubernetes, we have a lot of customers that are doing you know essentially that model where they're using fluent data for words or logs out fluent. Bit is also. An open source project created, by treasure data and you can think of it almost as a sister, project to fluency um so, much so that fluent, bit provides an output plug-in to flush information. To fluent be think. Of it in in terms of an aggregator, if you will so they can work together as. You're in your architecture almost. As independent, services and here. At AWS what we've done is we've begun to integrate fluent, bit with our container services, because, we feel it's it's lightweight due to its not lightweight nature essentially, and while. The fluent bit documentation. As a comparison, stating, that fluent the scope is mainly for servers and fluent bit scope is mainly for things like embedded or IOT devices we, built integrations around, services like Amazon ECS and ECS for Fargate for fluent bit they handle things like ingesting, and parsing, and filtering, and, outputting your logs we're. Seeing customers adopt, fluent bid on their couger now these workloads as well and as you can see here it has more than 50 built-in plugins, available, and what. We've done is we've worked with our AWS, partners to add support in their solutions, to make it easier to integrate and send log data for our customers. So. I want to go into the, fluent, bit pipeline, and how it works and then we'll get into the the logging format right, so that this diagram, shows fluent. Bit and its, internal, log processing, pipeline, architecture and it's important to understand, the internals, of how this works is you plan your implementation. The, first thing you have is obviously your input and input plug-ins are used to gather information from. Different sources some. Of them just to collect data from log files while, others can gather things like metrics information, from the operating system when we talk about gathering. Logs, in this talk though it's going to be essentially, around the tail input, plug-in which which really just does that it's tailing, a file, or a series, of folders and files for new log entry and then. You have the parser and this is a very, crucial piece to understand, as log, data is streamed from your applications, you need to essentially. Parse it to a format, that's valuable, to you right, meaning you want this data structure you want the information that's, important to you inside of inside, of these log entries, and then, you need to think about how you're going to filter it right and think of filtering is how you would match or exclude, or, even enrich your logs with some specific, metadata and a very common use case here for filtering, is kubernetes, deployments, every pod log essentially, you know needs to get the proper metadata associated, with, it when we talk about this you know in the upcoming slides I'll go into it in more detail but you need to think about filtering, is how you're going to enrich right and how you're going to send things to different places at the same time because, then, you get to your buffering, and buffering.

Is Essentially as a way to handle, either too much data coming in or potentially, an unstable network, which might result in things like you know a delay in sending data, fluent, it uses a primary, buffering, buffering mechanism, in, memory, and also, as a secondary, one it which is optional, which allows you to use the files and then. You have routing and this is a core feature that allows you to route your data through filters and finally, to one or more destinations. And. The router relies, on concepts, like tags and matching, rules which we're not going to go into today but it's a really powerful feature and then, lastly, you have outputs and influent. It ships with a number. Of output plugins and again we've, worked with AWS, partners, to make it easier to ship your logs from. To, their solutions, so, this is the pipeline and if you think about your logs coming, in you might be asking yourself okay like what about kubernetes, right so. You, know when. It comes to kubernetes, and to dive in a little bit here you can see here there's this concept, called cluster level logging, and it's important, to realize that your, logs shouldn't, have essentially. A separate storage and life cycle independent. Of your, nodes pods. Or containers, and this essentially means because you need to think about how, you're gonna get your logs to a centralized, location so, you can store. Analyze. And query your logs because kubernetes, does not provide, native. Storage, for long term data. There. Are there there are a couple ways to approach this but, the most common, and encouraged approach is to do what's called node level logging and you can see here customers, achieve this by running a daemon set on their clusters, because, it creates an. Agent. Per node and it doesn't require any changes, to the applications, running on that note however, node, level logging, as you can see here only works for applications. That are emitting over standard out and standard error, and. Anybody who really is douve into container best practices, should know that you should always try and log standard, out or standard they are anyway because your containers, don't, necessarily, need to have to read and write to a log file which you know it's going to result in performance, gain not to mention your abstract, away the management. Of those logs within the container itself because you don't want to store logs in a container it's bad practice, right this, is because you, know you never know when that container is going to terminate when when using node level logging, you can see the logs are shipped to the following path and the, nodes in kubernetes, implement, essentially, a log rotation, mechanism, and when. A container, is evicted from the node all the, containers, with. Corresponding, log files are also affected. So, when. Flu pit is deployed in. Kubernetes, as a daemon set and configured, to read the, log files from the containers, whether again it's using tail or one of the other input. Plugins fluid. Bit has what's called a kubernetes, filter, which allows you to enrich your log files with, kubernetes, metadata, and the filter aims perform, the following operation. The. First thing it's going to do is it's going to analyze, the tag and enrich, the following, metadata it's going to get the pod name the namespace, the container name and the container ID and, then. What it's going to do is it's going to query the kubernetes, api server, to obtain the extra metadata for the pod in question and this is because that, you know pods aren't essentially, think like node level you have to get this information from the API server so to be able to get things like the pod ID the, labels, the annotation, this is really really important, right if you're especially if you're spread across a bunch of different nodes, for, your application, and the, cool thing here is the data is cached locally in, memory and it's appended to each record as these, logs traverse through your, system, so. You might be asking yourself like, with all this said how does this help me with envoy access, logs and it as you can see here the data in the log message is envoys default, access log but the JSON is escaped and in order to parse the log into something.

That's Meaningful we need to run it through a parser, and fluent it and fluent, that ships with some default parsers, that are commonly seen in applications. And at the same time you can always obviously right groan um I, actually, contributed. The Envoy parser, back to the project, back in February, 2020, and this, means that the official fluent good image and the AWS, for fluent, docker images, already have this for you out of the box now and you can easily wire, it up in your environments, which I'm going to show you how to do and as you can see here all it is is simply a regex that's going to match the, default the default envoy, format, and the goal here is not just to parse the message it's a stepping stone to put structure, into our locks. So, in, order to use this parser with your applications, you can leverage with, with what I think is actually one of the cooler features in fluent fit and in this example we've. Defined, our. Application, in kubernetes, via manifest, and you'll notice our applications, deployment, spec we've added an annotation, with a value of fluent, bit parser envoy and this. Annotation suggests, that the data that's, processed uses. What's called a predefined, parser, call, to envoy the. Fact that the parser exists, in our float that image means we don't to do anything else in order to use it and if we were writing our own custom, parser we can do the same thing very easily by writing it via things like config map for. Our fluid, daemon set deployment, and then obviously reference, it directly as you see here in the manifest, outside. Of the predefined, parser, suggestion, you could always wire up the parsing, as part of the fluent bit pipeline, that I described earlier but, by far this is the simplest way to leverage it in, your applications, in my opinion to use this this targeted, parser, if you will. So, now, that we walk through this step it's time to look at some of the useful information you get by leveraging fluent, bit to start here's. The kubernetes metadata for a log entry it was way too much to fit this into one screen so I had to break it up you'll notice there's log that. You've seen before now has all of our kubernetes, context, and the lesson that you need to take away from this is any log processor, you run into the dynamic, environment in, the cloud needs to be aware of how to enrich your logs metadata. Otherwise you're gonna lose this context, about what happened and where and now you have access to things like namespaces. Pods IDs, and and the node it ran on and all the useful information your, teams will need when doing any troubleshooting, or analysis, and, here's. The second part of the log and it was too much to fit on one screen and here's the parse log entry itself and again it's, the same log but you know it's, now sitting inside of a parse. Key called long process, and the. Reason for this is when essentially, you turn merge law go on in your fluent bit config, the filter. Tries to assume the log field for the incoming message is the JSON string and it, makes it into a structured, representation, of, it at the same level of the log in the map so you know the fact that I have this turned on it's going to dump everything into, long process, and it's inserted into that that key so all of these you, know key value pairs are essentially, now nice, and clean underneath this log process, key, inside of our log. And lastly. Here's. A view of the final, implementation and our. Applications. Are now, fronted, with envoy II and are, communicating. With each other as part of AWS at mesh in this example and. Those envoy proxies, are essentially. Running aside pars for our application, pods and their access logs are all being emitted out over, standard out and standard error, and all of the node level logging, is being handled by fluent, running as a daemon set which, in turn ingests, and parses, and filters and you're, now ready to take advantage of any of the output plugins that come with Flynn fit so as I said before you could you know send things down to something like maybe an elastic, search you could send it to a partner, tool you can send it something like Amazon CloudWatch there's all sorts of possibilities at, this point once you get to this to this model and while.

I Didn't demonstrate it today that the one thing that's cool is is the, AWS, for fluent, bit docker image, comes, with some built-in plugins, to interact with our native services, so. If you want to send things to Kinesis, fire hose or cloud watch logs or Kinesis streams that's, all you know right in the image that we maintain again, it is then it is the upstream, version. Of the fluent bit image we just put these parsers, or I'm sorry these output plugins in there for you to make it easy for you to wire up as part of your implementation. You. Also have the ability as well if you were doing this in something outside of kubernetes, we have a service, called AWS, fire lens or fire lens which is actually it's not even really a service it's just a layer inside, of things like Amazon ECS, and and and foggy that allows you to leverage the fluid functionality. Natively. Inside of Amazon, ECS and far gate using essentially. Wiring, it up for your task definition you're still using the the docker image the AWS, for fluent bit docker image or you can even roll your own but we essentially make it very easy for you to write whether, it be parsers, or filters and have that be part of your application, as you're wiring up your logs and sending them the various different destinations, whether it a be AWS, destinations. Or partner, destinations. So. So in conclusion you know I hope, you've learned how, easy it is to implement a consistent. And structured, log format, for the access logs for your micro services applications, again. This was geared around at mesh but, in reality you could achieve this with with envoy and fluent bit and as, you can see here it doesn't really matter what container Orchestrator, or what languages, your teams choose leveraging. The out-of-the-box, functionality. That's provided, again with envoy access, logs it gives you that foundation. To implement, a consistent. Logging structure, in your environment with fluent bit. If you want to learn a little bit more the. The premise of this talk was done you know it was based off a blog that I wrote about a month and a half ago the. Link is here at, access logging made easy with AWS, app mission flown pitch so super easy to find on the containers, page on AWS, if, you want to learn more about at mesh should have the link here as well. Firelands, again it's that that abstraction, of making, it easy to wire up fluent, bit as part, of your applications. Deployment, into something like Amazon ECS we have some logs here at the same time too we, have some examples so, if you want to go and you want to dive into whether it be you know Amazon, ECS, Firelands examples, or how you could actually put fluent in something like Amazon eks, and do it in kubernetes, we. Have some great examples up, on our github pages and at the same time - we've had our partners, contribute, back to showcase. How you can actually interact with our various different partner tools as part, of your your, you're fluent bit deployment, inside of AWS, how you can easily wire, all that up and that's all sitting inside of the github, pages that you see here um, so, with that said I just want to say thank you it's, it's it's been my pleasure to give, this talk it's, a shame I couldn't give it in person but hopefully, you found it useful again. If you have any questions, feel, free to ping, me on Twitter and enjoy, the rest of your conference thank you very much.

2020-08-22

Show video