Microservices in the world of Service Mesh CON003

Show video

[MUSIC] >> Good morning and good afternoon, guys. Thank you so much for joining this session on Service Mesh in the World of Microservices. Before we jump to the session, I would like to give a brief intro about the speakers. My name is Swami, Swaminathan Vetri, I work as an Architect at Maersk and I'm in Microsoft MVP under Developer Technologies. We also have three other MVPs joining us today.

First one is Ilyas, who is an MVP under Azure, and he works for ABB. We also have Karthikeyan, who is also another MVP under Azure Technologies. We also have Augustine, who's in Microsoft MVP under Developer Technologies. Today, we are going to talk about how in the world of microservices, the service mesh is going to be helpful, and what kind of problems it is going to solve. But before we jump into the service mesh aspects of it, let's understand the basics of Microservices, and then what kind of challenges we get in terms of Microservices Architecture, etc. Traditionally, we all been building in Module of the application, where a single application would become post of multiple modules, it could be 10, or 20, or dependent on the volume of the application.

What we used to do is, irrespective of whichever module is having a change, we go and deploy the entire application to the protection on different environments. That brings its own typical challenges on regression aspects of it. If there's a change in module A and if you want to deploy that, you need to do the regression of the entire application to ensure everything is intact. Also imagine, if this application is being built with multiple teams, then there's a huge amount of coordination and communication needed to come up with the proper release plan and then now follow the release guidance, etc. That definitely brings down the time to market, which is not really a good situation to be in. That's one of the reasons people started moving towards Microservices.

With Microservices what would happen is, let's say, if an application is having 10 different modules, you could typically make them into 10 different Microservices if they fall under different bounded contexts, and then you can go and build, test, and deploy them independently of one another. This brings a lot of flexibility. You are bringing down your testing cycle, you can also hit the market with your feature enhancement in a much faster way. So that your time to market is one of the key aspects in a business.

That's how everyone started moving from module to Microservices. Imagine, if you're migrating one application, maybe one application could bring you maybe 10 Microservices. If your sets migrates these 10 different applications, you can already imagine that you are getting under Microservices. If you extrapolate it at an organization level, you could very well start getting into hundreds or thousands of Microservices. If you are in such situation, it brings on its own complexities.

Some other complexities could be, how do we monitor these entire Microservices? How do we understand which Microservice is calling? Which is the Microservice, or how do we know if there is a problem in the production? How do you go and troubleshoot the issue? These are some of the core technical issues. I'm talking about the business side of it. Now that you have a flexibility to deploy individual components, the product owners would always want it to go as fast as possible to production. I think we all have heard this term as I wanted it yesterday, nobody waits for tomorrow or day after tomorrow. In such cases, you need to ensure you are able to push your Microservice into the production and then do a bit of testing with a limited set of users, and then you would want to enhance it for further global rollout also. Those are some of the business problems which you would also start having as you build large-scale Microservices.

What's the solution for this? This could be addressed in multiple ways, but one of the solutions could be, by using a service mesh where you could build an additional layer on top of an existing Microservices Architecture, and then you could offload some of these cross-cutting concerns of graphic management, or the observability, or the security aspects to it, and then that could take care of these aspects and leaving the Development Team focusing more on their own functionality building, etc. That's going to be the crux of this entire session. So we'll be hearing from the other experts on, basically, some of the high-level capabilities of service mesh, which are like traffic management, and then about the observability, and about the security, etc. We also try to see if there are any challenges in implementing service mesh on our existing workloads.

With that brief introduction, I would like to have Karthik start with the traffic management aspects of service mesh with a bit of insights on other aspects too. >> Thanks, Swami. Thanks for the nice introduction of service mesh. Before trying to get into what is traffic management, I would like to talk about what problem the traffic management of service mesh solves. It tries to solve two problems, where one is the technical problem and another is business problem, but Swami was telling everybody wants [inaudible] The business problem is, people wanted things to be in production as fast as possible.

Even if it's the bug-fix, or a feature, or any other small enhancement we do, they want that to be in production as soon as possible, but that is not the case in real-time. You've to test it, you've to make sure that production does not goes down because of some small mistake or some small bug that has creeped in. There is a challenge that exist. Then with respect to technical problem it's solves is, there is the current way things are moving forward as a polyglot. Polyglot meaning, people want to have more technology in their stack, meaning, some workload needs Python, some workload needs Modi S, some workload needs.NET Core,

some workload needs maybe Rust. These are the polyglot technology that it should be available in your production server, but we've solved the problem. How did we solve it? By using Docker. You build once and

deploy everywhere or it works in every machine, so those are the problems to solve. The problem with that problem is, it has created one more problem that is, if you want to do some [inaudible] concern such as say, I want to do a [inaudible] and I want to do some authentication. I want to do some say, app monitoring or perfect some logging mechanism. So these are the two problems that exist in the current system where once you move to a bigger place like Kubernetes or Docker bit of 10 polyglot, or you want things faster, the service mesh solves the problem. The first problem we have solved is, it gives you something called a capability to do AB testing, canary rollout, stage rollout with [inaudible] What is AB testing and canary rollout? AB testing is nothing but speed testing, what we can do is, you can actually send the same link or different pages to same user.

Meaning to figure out which is actually making him buy more. For example, if it is like e-commerce site, he can make sure that you can test it, whether the color red is making him buy more, or green is making him buy more. So these are the AB testing that can be done that can be backed to the same customer.

In case of canary rollout, if there is a new feature, a new bug or new things that need to relate to the customer. What we can do is, we can send 120 percent or 10 percent or 5 percent of traffic to the same customer. Then the customer can actually start slowly rollout. When there is an issue with that release, they can actually rollback or increase the percentage of traffic reached.

So that is a main thing that technical problem solve. How do we solve this? Now, we have a huge power to actually test things in production. So this means you don't have to be waiting for the testing team to test it for 10 days to make sure everything doesn't go down. We have the power to actually push it to production, do a little bit of testing, and then make sure that production actually works better. In the textbook they came up with something I think that is called fast fail. Racing faster so that you can actually move with the business, rather then IT trying to pull the business down.

So that is a main business problem that is solved. It solves in a very out-of-box layer. What it does is few configurations you do, the AB testing and canary rollout you can do it within minutes, so that's the first thing. The other service mesh, the capability to provide our circuit breaker, meaning it actually gives you more resilient to your application through circuit breaker, e.g setting up [inaudible] and timeouts. It also helps you in dynamic service discovery. Why is a dynamic service discovery needed is because, say I have 100 Microservices as Swami was saying.

All the Microservices I need to put somewhere, put it in a routing, and then I have to know where it is. I have to know how to talk to it. So these are the very great challenges that it faced right now. service mesh actually solves the problem, it gives you a load balancing, with usual [inaudible] it gives you proper [inaudible] it gives you [inaudible] It gives you arithmetic. These are all out of the box where you don't have to do much.

Actually not to change your Dockerfile or your application folder. Without changing that, you'll be able to see and do and get your business and technical problems solved so that you can move faster in your business, also in your engineering aspects of things. So these are the main things I can see with respect to traffic management. The main thing I always see is AB testing and cannery rollout which is a huge thing because people want things to be as fast as possible. The next main you can do is all the cross-cutting concerns can be addressed without changing the code, which is another way of implementing multiple polyglot with technology into your system.

That's the main aspect of service mesh if you as me with respect to traffic management. >> Great. >> Back to you Swami. >> I think these are great inputs Karthik. I think the major part of it, which we are all entrusted as a bit of traffic management, I think you've got a good insights about that. The next part which we want to touch upon is security.

Security is not a concern for developers for structure. We all think somebody else is responsibility to security. So with respect to security to enabling secure communication between Microservices, is there something which we could leverage from service mesh? Augustine, do you want share some insights on that? >> Yeah, there is a good point. If you see most of the service meshes that are based on Kubernetes. If it [inaudible] if you look most of these service meshes, also piggyback right on those.

Now, what are the good part of that? That means the developer can just focus on his application in business logic and that gets to them. All the security aspects is offloaded to the [inaudible] Now, one important aspect of any of security is it is exposed [inaudible] that mean its option or an avenue for malicious activities. Now this is where most of the theories, methods are trying to either implement or try to bring into the incremental mutual TLS. That means each part of the client server or whatever, given parts of that all talking to each other, but an encrypted manner. But that mean you'll have to have a rotation of certificates.

[inaudible] You can see a lot of things. This is where a lot of issues will come, because you miss the little thing, the bad actor will find it because that's what his job is. It's just to find one little small loop hole and he will just run through and get through that. Now this is where most of the Service Meshes, they take that onus on themselves.

They do all the certificate rotation, the expiry of that, look into it, send the certificate to the different nodes or whatever different artifacts are there within that cluster, and they take care of it. The second aspect is, do you have the access to a particular resource or to a service in terms of Service Meshes? That's where Role-Based Access Control comes into play. You can have this finely tuned access to a particular service based on the pathway or whatever you build.

That means we're now working on a basis of what zero trust and this access domain. That means, now I'm clamping down on any other further people trying to maliciously enter or inadvertently do something which they are not authorized to do. We've got the third of that aspect view. Now, the fourth one that is most important is the testing, observability and [inaudible].

That's a very good thing, therefore, Ilyas, to come in too and talk about this because that's a very crucial aspect of any of this thing. >> Yeah, absolutely. I think building zero trust networks in the communication is one of the primary focus on the big enterprises [inaudible] operate. I think it's good to note that the Service Mesh is able to bring the government out of the box and then we'll be able too.

Not all of the Developers were doing any extra activities and then it's all offloaded to the Service Mesh. It's a good laid question. Thank you. The next part is that what I think you touched upon is observability. Yeah, we've enable the traffic and we've made it secure, but how are we going to monitor all of these entire Microservices, and how do we troubleshoot issues if something goes wrong in the world of 1,000 Microservices or even 100 Microservices? How are we going to identify it? Ilyas, do you want to touch upon on those? >> Yeah. Before I start talking about

the observability and monitoring capability of Istio, but I'd like to thank, Augustine, for highlighting the SSL and mTLS part because the moment he talked about that, everybody started smiling simply because every Developer is now getting into full stack and getting some awareness about an SSL from Infra and everything, but it is still Greek and Latin for many Developers. Hence Service Mesh taking care of mTLS as well as the overall SSL or TLS is a great benefit for the Developers. Now coming to the observability part, right now everybody, and all of us certainly agree that the monolithic architecture is broken down into multiple 10s or 15s or 100s of Microservices. Now, just imagine when there is another.

Another could be due to latency or traffic or whatever it may be, or applications specific error. It is always challenging. It is like finding a needle in the haystack because you have so many services, so many APIs in front-ends and back-ends, and so on, it would be very hard for anyone, be it Developer or Ops guy to go figure out where exactly is the problem. I was recently working for a customer where they have lots of nodes, multiple containers running, and all that they know is the latency when the number of requests per second. The so-and-so number of requests goes to about, let's say 10,000, the performance degree it's about two seconds or something, but that's what we know, but other than that we don't know which particular API or which particular method is causing the delays.

Is it at the API layer or is it at the back-end layer? It was very challenging simply because there was no monitoring capability or APM was implemented in that particular project. But this also highlights the importance of setting up a clear observability infrastructure for the applications as well as the infrastructure so that when something goes wrong, we'll be able to understand where exactly this is occurring, is it at the application data or somewhere else? It is very important for the overall state of the application. Within the SRV arena, people talk about four golden rules about observability of any system, be it Microservices or Non-Microservices based solution, latency, traffic effectiveness, and then errors as well as saturation. For the same case, the database was overloading, but then they're not able to figure out that the database is running out of space.

Now, since we are talking about Microservices, in fact, all the players, top players in the Microservices arena have got a very good observability tools and technologies in the belt. If you talk about Istio or Linkerd, all of them are open source and they have very tool chains integrated right in the platform. The moment you turn on Istio or Linkerd for their solution, you're going to get dashboards that will give you lot of visibility into how your stack is performing right from ingress till the egress, how the cause was translated from one service to another service. If there's any error, it is going to pinpoint you. It is going to help you understand where exactly the error was introduced or which particular service is taking more time compared to the other service. There are a lot of benefits of utilizing the Service Mesh and more specifically, these open source technology-based solution that could bring in a lot of visibility and transparency into the operation of the solutions.

>> Great, Ilyas. I assume when you said all of these brings an out of the box capability, that means the developers need not worry about injecting any specific components or DLLs or any kind of jars into their application. All this happens as a magic for them. They just need to focus on their own existing regular function and development aspects alone, all right? >> Exactly. >> I think that's a really great benefit. Again, going back to, Karthik's point about Polyglot, it really brings in great absorbent experience for us because building a common library targeting different languages is not an easier thing, but if it's all happening at the layer on top of our application code, it's really good.

Ilyas, another question for you. Since you've touched upon a couple of Service Mesh options, can you also talk about what are the other options you experimented or explored which is out there or which are out there which people can go and exploit more or so? I know you tested on Istio on Linkerd, are there any other popular open source Service Meshes? >> There are plenty. Although this particular space is relatively new, it's been about five to six years but there are several players who are coming into the market. So apart from testing on Linkerd, there are players like, Consul, that's coming from HashiCorp, and AWS has their own App Mesh, and traffic popular. Proxy tool has their own service mesh and Kuma.

So a lot of them. In fact, Microsoft has now come up with open service mesh, which is very similar to the rest of the open-source mesh as well. Got plenty. It is also becoming a challenge to pick the right one. But then there are a lot of articles and researches about this area and comparing which particular service mesh has a lot of features so it is even easier for us to pick the right one.

>> Excellent. I think now we've got a good hold of what are the capabilities of service mesh. Now, if someone wants to implement a service mesh in their infrastructure, what do they need to do? Is it applicable only for a brand new microservice-based application, or can they also apply the service mesh on top of the existing workloads? How do you go about not doing it? Karthik, do you have any experience and insights on that? >> Yeah. So with microservice-based service mesh you can do it for a VM or it can be for a [inaudible]. It can be on both.

It is not something which need specificity. The main point is that I have explored certain parts of this queue so I'll be very comfortable in explaining to [inaudible] that is what I was thinking about. But I think the service mesh as only, but every service mesh it's almost about the same length. All you have to do is ask the VMR a couple of things. You need to make you install the service mesh into that system.

Let's take accumulated as an example to not [inaudible] more confusion to us. You can actually replace the word with VM also. Install your service mesh. Before that, let me explain the site car pattern. Why did we go for this queue? It's basically for some class fitting mechanical we need to portray.

It might be an authentication, it can be monitoring, or it can be a deal extermination, whichever you want to do. So that is what you want to do. What we can do is you have to figure out what is corresponding still the service mesh amount size, and then you need to install that and you'll have to enable it for your own part. Meaning, say, I have a part, say, audit part or a product part, I need to enable other part. Saying that it can be either new system or it can be a older application system.

Meaning, say, I have take 50 micro services, what I would suggest is that you start with one service, and then implement the authentication or login to that and see how well it work. Then you keep on adding to that. Adding it to your YAML file existing part is actually pretty easy than adding everything to every micro service. Add to one, see how it will work. Make sure that login is better. You can monitor it as you enable [inaudible]. You can do that.

Once you just fine, you do it by 10 and then later 20. That's the way we have to do it. First thing you have been install, you have to make sure that you enable your corresponding [inaudible] , for your part, test the part, and enable more and more. So that is the best way of doing it. Back to you. >> Okay, great. You mean take a piecemeal approach.

Just identify the capability. What do you want to put be it traffic management or observability, identify it, and then applying it on one of your micro services or a couple of them to understand the traffic modeling aspects, and then you can incrementally progress from there. That's a good call [inaudible]. I know we talked about the service meshes are going to offer a lot of things about from the developers. That means somebody else needs to really manage that entire service meshes. Now, what challenges it could bring in? Augustine, do you want to talk about some of the challenges we might encounter or what are the new answers people should be mindful off when they start working with the service meshes? >> Yeah. The first question is,

do we really require service mesh? I mean, that's the first question you need to process. Because the elephant in the room is that more than the service meshes actually installed on, like tapping and uncoordinated, which is tell me the big business there. You are using it quite complex, a lot of running parts are running on it, and then you now add another layer on top of it, which is not small, it's complex and interesting. Why they hide a lot of complexity from you as a developer? If something goes wrong, you really need to have the engineering to go and find the issue. You require a good grasp on networking concepts as well as monitoring tools that can pinpoint this is where something can go wrong. Because [inaudible].

There's a lot of resiliency built in it. So by the time you know there's an issue, it could be quite late. So your time frame to call equity lenses that political like them, financial laws and issues there you really want to pinpoint to [inaudible] , you've got to be very well-versed in the networking tab because one thing our service mesh is really working on the layer seven [inaudible]. Those things that will act in terms of okay, I had to look at the app level department and also sometimes below the track at a typical level. In terms of engineering, you have to be really good.

There has to be a certain skill for you to adopt service mesh, and once you have that skill, do you have the engineering bandwidth? Like the very final thought that this is non-conductive [inaudible]. Because one aspect of it goes to the network period, quite. Like if we will look at the Audi proxy type of [inaudible] if we look at [inaudible] that build on C++, [inaudible] linker D it I can react relatively old psychopathy from technology to rock. [inaudible] Because of the essence is, anything goes wrong, you have to be really, really good at networking to find out the [inaudible].

Yeah, absolutely. I think that's a great thing. First, identity whether you really need service mesh, concealing your scale of microservices, and then see if we have engineering bandwidth to handle the service mesh by itself. That's good input, Augustine. I think we are almost coming to the conclusion of this session, considering the time we have. I know service meshes are very wide topic and each of these capabilities can themselves be talked for multiple hours together, the traffic management, that observability or security. But yes, so what we tried to do in this session is just to wrap up.

So we started with the board interaction of microservices. I hope you understood why you want to go with microservices. Then as you start scaling or microservices, what problems you would get. Then we also touched upon what are the different capabilities of service mesh that could be applied to solve those business technical problems.

Also, we just step on some of the challenges is that operations advances for all, so it brings in for us. I hope that you got a good idea about what service meshes and then where does it fit in and where to use it, where not to use it, how to use it, etc. Thank you so much, experts, for sharing you acknowledge. Thank you so much everyone for joining the session. Stay home and stay safe.

>> Thank you. >> Thank you. Thanks everybody. >> Thank you.

2021-05-30

Show video