[Music] foreign staff developer Advocate confluent Danica fine [Music] good morning and welcome to day two of current how are we all feeling all right and we enjoyed that party last night right awesome I for one am thrilled to be here kicking off another day of technical breakout sessions of networking and maybe just maybe a little bit of swag from the expo hall floor it's tradition after all right as program committee chair of current this year I am so incredibly proud of the countless hours and hard work that our program committee has invested into making this conference possible for you the last two days and I'm so enjoying every minute of being here to experience it with all of you because without you I'm pointing to all of you here without all of you attending this conference without all of you at home tuning in for the live stream without our talented speakers coming to share their technical knowledge and without the community supporting us we wouldn't be here today now equally important to the people that helped to make this conference possible are the technologies that are represented here right current was founded with Apache Kafka at its core it's the next generation of Cafe Summit isn't it is just one technology and I hope I'm not upsetting even saying that it is a very very important technology right and while a lot of the agenda that we're seeing today is informed by trends that we're seeing in the Kafka community and use cases around there the rest of the agenda is informed elsewhere by Trends in the broader data streaming community and Industry what you and your companies and organizations are doing with these data streaming Technologies each and every day and so this year with those Trends in mind our technical agenda has expanded to cover over 30 Technologies in the data streaming space and broader ecosystem but don't worry there's still more than enough Kafka to go around in this morning's keynote we're definitely going to cover all things new and exciting in the world of Apache Kafka but I think it would be a missed opportunity if we didn't also check into some of those other trends that we're seeing and so with that in mind we're going to dive into another technology that's currently trending in the data streaming space Apache Flink and to shed light on Apache Flink and where it's situated alongside Kafka in this broader data streaming picture I'd like to welcome to the stage two leaders in this community Ishmael Juma and martinevisser [Applause] wow so Ishmael and I are super excited to be here today and talk to you about stream processing yesterday we already heard Sean talk about the elements of a stream processor and the elements that are involved in that stream connect process and govern or talking about processing we're talking about the manipulation or transforming of data into a more meaningful form examples are things like cleaning your data filtering it transforming it into a different format and of course applying calculations on top of that and which stream processing we're just making that real time and giving you those real-time insights and Kafka and fling are the foundation for these type of things for you're streaming and for the processing they go together like peanut butter and jelly now I'm actually curious as Mal why do you think that Flink and Kafka are such a great match together that's a great question Martine to help answer this I will use an analogy between data at rest and data in motion when we started building applications for storage data all we had were file systems file systems are a powerful abstraction but they're pretty low level so as we build more and more of these applications it became clear that we're writing the same code over and over again this led to longer development Cycles but also it was error program there was an opportunity for introducing a new abstraction that integrated storage and processing to make it dramatically easier to build data Centric applications this is where databases come in particularly relational databases but also other paradigms and this is the standard of how we build data address applications today so we've talked about the evolution of the data address stack what about data in motion well it's pretty similar at the lowest level we have Kafka the central Hub to write where we write and read streams of data it's a powerful abstraction as well but also a little bit low level you can write sophisticated applications using directly on top of Kafka using the producing consumer apis but it's a great deal easier if you are to use some higher level abstractions that provide some ready-made solutions for common patterns when it comes to simple Java applications Kafka streams is a great option it's a library it can easily be embedded into the application and it provides many of these core stream processing Primitives and makes them readily available it's very successful in this space but it is not quite the database of the streaming layer for that I would look towards Frameworks that try to do even more they take control of the life cycle of the application and take care of all the scaling failover full tolerance and more of that good stuff Flink is one such framework and one of many summertime what makes things stand out in this space oh that is a good question um I'm a bit biased in this in this but I think there's three elements involved in here I think it's about the apis the runtime and the community behind it so if we're looking at the apis Flink offers unified batch and streaming apis for processing this data it allows you for different levels of extractions so it supports both common as well as specialized use cases so when you have created your business logic it's being run on flings fling's unified runtime and that is a streaming runtime which basically gives you the ability to get really high throughput yet low latency results from a community perspective we see that Kafka and fling have this similar growth path they are the same set of companies that have adopted Kafka at first or also early adopters of flick and what we have seen in the fling community that it has been Super Active there's 370 flips 150 kilometers more than 34 000 commits in general there's a Chinese mailing list for for users as well and I really think that it's the diversity of the community and its broad audience in there where we have these great Engineers data analysts and everything that helped Harden the technology and has resulted in that incredible progress that Flink has made over the years and it has resulted in Flint becoming the de facto standard for Street processing later on I will tell a bit more on what the plans are of the community with Flink but perhaps Ishmael you could tell us the latest on Kafka sounds good thank you Martin thank you so we all know Kafka is the foundation of the streaming world and it has been maturing for many years does that mean that we're done not at all there are still some very exciting foundational changes happening in the Kafka space I'll be talking about what we've done recently what we're working on right now on hiding and rolling out what's on the horizon the next set of things the community has been discussing that we're quite excited about and some future future ideas some more forward-looking things that we think would make a big difference to the ecosystem so let's start with what we've done recently by now I hope all of you have heard about craft kafka's new metadata layer built on a replicated log this is has been a multi-year project that involved a number of contributors from a variety of companies and organizations and it brings significant benefits it's easier to manage you don't have to operate as zookeeper cluster anymore it scales much better the number of partitions supported is an order of magnitude higher and it's more resilient a number of edge cases have been addressed craft has been in production for some time in fact we announced it in this very conference last year but the big news is that you can now migrate your ZK mode clusters to craft mode completely online without any downtime we have also officially deprecated ZK mode and intend to remove it in Apache Casper 4.0 next year so please start your migrations as soon as you can so what's next this is another large re-architecture in the core of Kafka the separation of computing storage by supporting multiple tiers of data also known as skip 405 by moving the colder data to an object store like S3 Kafka becomes more elastic and easier to operate operations like expanding the cluster or replacing a failed disk are much much faster it also provides an opportunity for using faster cheaper and also smaller hard disks or ssds this project was started by Satish dagana and his team at Uber some time ago but he has been truly a community project since then with contributions from a number of companies like conference Amazon Ivan and red hat and many individual contributors to tell us more about TL storage we have to teach today with us please welcome Satish to the stage [Music] to get the start is the dish please tell us a little bit about your role and how Uber uses Kafka yeah I'm Tech lead a data and streaming infrastructure at Uber we have one of the largest Apache Kafka deployments in the industry serving trillions of messages per day and several petabytes of data per day we have a Kafka platform which is the critical Cornerstone for entire Uber Tech stack offering several critical use cases some of those use cases include real-time propagation of events across thousands of services in Pub sub manner it also enables streaming analytics platform which is powered by Apache link and Apache Pino it saw several critical use cases that includes Dynamic pricing ETA ah real-time data exploration it also enables observability platform which is about logging and debugging across the all the Ubers developments tank it also ingests lot of online transactional change data capture logs which will be consumed by several thousands of services we have ingestion pipeline which is powered by Apache spark the subsystem which pulls all the variety of data and pushes into Uber's data Lake that's really cool um so moving on to you know the the tier storage cap can you tell us a little bit about the high level architecture yeah give for 405 it introduces separation of processing and storage at Kafka broker level it is a fundamental core change that brings uh elasticity and scalability to the cluster we while building this solution we we ah we have few principles in place one is no changes required on the client side that means there is no need to change the client library or no need to set any configs you will still be able to access the store the data that is in remote storage right the second one is no new replication protocol so we the solution is built on top of the existing battle tested replication protocol the third one is giving a very simpler interface for plugging in the remote storages and at a very high level overview this can be divided into three parts so the first part is copying all the eligible segments from local storage to the remote storage right the second one is maintaining the log segment lineage and consistency semantics ah across the local data and the remote data the third one is about fetching the data from a remote this is done through a specific purgatory and thread pool sounds good um I think it's really cool that we managed to have such a large architecture shift without updating the clients or having any external protocol changes great achievement and can you tell us a bit about running you know k405 in production and any interesting lessons or benefits from this yeah so we have tier storage in our production clusters majority of our production clusters which are around 80 percent of the Brokers we have we have tier storage on them and some of the large clusters we've been running ah for last nine to 12 months so some of the major benefits here that includes ah it improves scalability and elasticity of the Clusters and the cost efficiency in our on-prem clusters we saw 25 to 30 percent they become more cost efficient in our preliminary analysis of cloud object stores we saw the cost efficiency can range from 30 to 70 percent based on different set of use cases the third point is uh the rebalancing the Clusters as you all may know it is very complex process and it takes longer durations so in one of the use cases we observed it takes around 18 hours to do a rebalance of the cluster so when we enable tier storage it came down to six hours the third the fourth one is rebuilding the failed Brokers so ah in one of the large clusters it took around eight to nine hours when you enable tier storage it came down to three hours we worked on a new feature called followerfetch from tiered offset instead of local log start offset which actually brought it down from three hours to one hour awesome I'm sure dear Patrick Kafka you know users are really looking forward to these changes as as well um when you see a storage is going to be production ready and what's next with your storage in general sure yeah so we are going to have Early Access release in 364 we're almost finalizing the release we had couple of release candidates and we are going to have one more release candidate uh in three seven War we wanted we want to have tier storage as ah production ready at Option ready so that all the known gaps will be closed some of the new features that we are going to introduce in 370 is better topic deletion life cycle management this this is already proposed in keep four or five so this will implement it in ah 370 and the second one is remote read and remote right quotas for better reliability uh across Kafka clusters and also on remote storage the third one is followerfetch from tiered uh tiered offset instead of local log start offset which I talked about earlier which brings uh much more availability to the cluster the final one is we are also experimenting having dedicated remote replicas where you can quickly bring up remotely replicas ah and serve all the remote traffic only through those replicas that's awesome I'm looking forward to 3.7 and using here's version production thank you Satish thank you a wonderful Plus for cth please so let's talk about what's on the horizon the next set of things we have been discussing with the community and in some cases working on that some of us are very excited about when it comes to an open source project it's a little bit difficult to talk about the roadmap after all Innovation comes from everywhere anyone can contribute a proposal an idea or a feature but we think that these items give a good sense of what's coming the Kafka protocol is the core contract that Kafka exposes to the rest of the world since Cafe has become the de facto standard for streaming the Kafka protocol is for Kafka what is for streaming sorry the same way that HTTP is for rest given this it's critical that we continue to evolve the protocol with compatibility in mind so that we stay up for success in the next 10 years you may be wondering if I'm using a client application why do I care about the protocol after all we have a rich set of clients for a number of languages and as a user you interacting directly with these apis not the protocol the challenge is that the protocol today pushes a bit too much work for the clients as a result it takes a significant amount of effort to make the clients really amazing we have some of them but we also have some clients that let's say are less amazing and we want to make them equally good how do we do that in my opinion the solution there is to simplify the protocol and this is what keep 848 does it moves a bunch of complex logic that today exists in the client it moves into the server things like metadata tracking subscription management and so on in addition keep 848 also removes a global synchronous synchronization barrier that has been a real pain point in previous iterations so it makes the the consumer group protocol more scalable and more reliable there is this core the general problem when people are trying to integrate the apps with event streaming you want to both maintain some state in a database and keep it in a consistent fashion in your events in event system let's say a user joins your organization and you throw that data in the database and you want to keep Kafka in sync what do you do can you simply write to both of them in parallel and expect consistent and atomic Behavior not today but this is what cap3939 attempts to solve Anna McDonald and Gordon Tai will talk about this exciting cap in more detail a little bit later in the Keynote development experience is critical for the success of many types of projects but even more so for an open source an open source project like Apache Kafka one of the tips in this area that I'm really excited about is the escape that allows the usage of girl vm's native image to reduce startup time memory usage and the size of the docker images initial results here are actually extremely exciting the the startup time has reduced by almost 10x and the memory usage is reduced by 4X I'm personally really looking forward to starting Kafka in 100 milliseconds and last but not least Fiesta Kafka is a pretty elegant proposal on how to enrich the Kafka protocol and Kafka apis to make it easier to to you know build some to to satisfy some use cases and I'm not gonna lie this is somewhat of a controversial topic what we have here today with us the keep author Andrew Schofield to tell us more about fuse for Kafka and to even give us a demo please a round of applause for Andrew Schofield [Applause] thank you [Music] okay thank you Ishmael so I'm not sure this is controversial I think it's really just evolution so uh let me tell you about the difference between the message queue and event stream to start with so they're really kind of at the end of the two different ends of the same Spectrum I suppose message queuing originally started as a kind of point-to-point integration between two communicating applications and event streaming has really turned into more of a kind of a foundation for a data streaming platform okay but there's plenty of middle ground and we're really going to be talking about that so questionnaires given that event streamers is kind of more General and a more powerful Paradigm Why didn't it just simply take over from message queuing and the answer is a message queue is a really useful abstraction for certain kinds of um use cases and I'll give you an example so imagine you were trying to do order fulfillment it's really natural kind of way of thinking about it that you put all of the orders onto a queue and then you have a pool of consumers who process the orders one at a time and then you know take them off the queue and then you've got a very good model for handling the orders as they build up and also for processing them and that's relatively difficult to do with an event streaming system with its standard API so message queuing systems are characterized by per message state so when a message gets sent onto a queue and received from a queue and acknowledged as being delivered then it goes through this sequence of states and the message queuing system uh remembers all that information as part of the way its API works consumer position is handled very differently in event streaming system so what typically happens is the consumer consumes a bunch of messages and then periodically updates its consumer offset and that's really not so convenient if you're writing an order processing application you know you want to share more freely than that so you know using a consumer group specifically in Kafka is not very good for that kind of model so what we'd really like to do is to see if we can kind of merge these two concepts to some extent and provide something more General and in the ordering older fulfillment example aren't the events really aren't they just a stream of events the orders coming in well I think they are an event stream really but it's a cue as well so you should be able to view it in both ways and that's what this Kip does and I'll give you an example of an improvement you get if you put your cues on top of a battery Kafka so message queuing systems use a model called destructive consumption so what that means is you take a message of the queue it's gone forever there's no memory of it at all it's just disappeared but Kafka has much richer ways of handling storage you can say you can do by time or by capacity and then with Innovations like kit 405 you can potentially have infinite storage so if you put your Q on Kafka you could wind it back in time to a different point for recovery for example so let's have a look at what the cape actually offers so it's a it's a much anticipated feature to be added to Kafka makes it much easier for application developers to write kind of queuing style of applications excuse me it introduces A New Concept called the Share Group and a shared group gives you easy and flexible sharing on top of cat for topics there's no partition assignment so you can exceed the number of consumers or the number of partitions with your consumers and it also counts the delivery attempts and the advantage of that is if you have a message which is repeatedly failing to be processed you can have it removed automatically from the queue so it doesn't block things up so this is how a consumer group looks today I've got three partitions in my topic here only three consumers can be active at a single time produce consuming from the data so that limits your parallel processing a Share Group looks very different it's as if the data from the partitions have been merged into a single virtual queue and then you can have as many consumers as you like on that queue now inside it doesn't work like that at all but that's the model the application style is very very similar to days with just using a regular cash for Consumer you replace the cafeter consume with a Kafka share consumer and there's a new verb called consumer acknowledge and that basically says whether the message was processed or not and so that's what what a you know a Share Group consumer looks like very very similar to people who use used to writing normal Kafka applications so in the spirit of them show and don't tell I'm now going to run a live demo so um in fact to all the Fulfillment the orders are going to be fulfilling are for hot dogs so what we've got here is we've got Three Chefs who are going to be making the hot dogs okay and they are joining a Share Group and they take an order at a time in order to make a hot dog thought I will do is I'll start up one of the chefs and then down behind me they have a load generator that generates a hot dog order every few seconds so you can see the progress of it so now we can see that the top one is um is making hot dogs and I'm going to start some more chefs and what you expect to see is the message is being distributed across the um across the shaft and there we are we've got one for the middle shift now the interesting one is the bad chef the bad Chef unfortunately has no onions is incapable of making a hot dog so what he does is he puts it back on the queue and then it's been reprocessed there you can see attempt number two so the attempts have been counted you've got reprocessing and all this is being handled naturally using a Kafka API and now I'll show you what it really looks like so it's a single partition topic okay each hot dog order is one message and when you do reprocessing it doesn't put it back on the topic it's just being redelivered by the Share Group and this is actually you know a modification I've made to Kafka running live in front of you come and switch back to the slides please so what next the kip 9932 well um there's some more decisions to be made it's obviously a big piece of work and the other prototype is incomplete at the moment so I'm going to enhance the Prototype that validate some design choices give it things like persistence and worry about performance and that kind of stuff and then we'll be in a position to finish the kip off prepare a vote on the community and then start delivery of the code and then we aim to get 932 into Apache Kafka in 2024. so if anyone would like to talk to me about it then come and find me on the conference floor I'll be happy to chat and I will hand back to Ishmael thank you very much [Applause] as we look towards the future and we think about what things would make a material difference to the platform and ecosystem two things come to mind the first is a mechanism to organize topics as we talk to customers that are using Kafka scale it's very common that they have some complex mechanism to name topics it may contain multiple groups things like maybe a team an application and so on when you think of data at rest there have been a number of ways to to to help with the structure of tables and such things directories tags labels are a few examples I think they would something like this would be helpful for Kafka an abstraction that helps us structure data helps us understand what's private versus public it allows us to understand you know if it's it belongs to a team or an application and equally important it allows us to evolve the structure with something like say sim links or the ability to rename topics we haven't yet a concrete proposal on how to do this but it is something we'd like to work with the community on and last but not least practitioning is an area of Kafka that is still a bit more complex than we would like users have to understand some internal details of Kafka to make the right choice here to choose to a few partitions and the system may not scale appropriately just too many and it is necessarily inefficient so how do we fix this there are a few options Wicked dynamically split emerge partitions we could also Auto scale them or we could remove partitions altogether from the system and focus on key based ordering instead one or more of these options would dramatically simplify the foundation of the system and when it comes to this score layer this is really what's important making the system as simple as possible while keeping its power so that we can build a successful ecosystem around it we've gone through a rich and exciting set of things as we look through these items the defining feature of the account the Patrick Kafka project continuous innovation is in full display as they say many many hands make light work so when it comes to an open source project like this there's a huge number of contributors from all sorts of companies and organizations that build these features that make sure that they are well documented that they're well tested that they work in every scenario in all possible edge cases that they scale and many more of these things something that we can truly rely on a huge thank you to all the people who make this possible and now put in theory to practice we have someone to be as not to have from Project owner from the streaming platform at BMW he's going to come and tell us how BMW uses Kafka please welcome to BS to the stage [Applause] thank you welcome to Venus can you tell us a little bit about how you use Cafe BMW sure so we're a team of 20 people providing platform and infrastructure services for our internal users um we're this platform comprises of around 25 Kafka clusters in Europe and in the US used by application teams to transform more than three billion events every day we're also planning some more locations such as Mexico and South Africa to get close to our factories and we're also partnering with our Chinese colleagues so that they can offer a similar kind of service in China as well to make our users lives a little bit easier we're not only providing Kafka essentially but also other services such as topic replication schema management or connectors so for example we are currently hosting more than a thousand connectors regular connectors and another 100 instance of our own custom-built sap connector a key component of our platform is our self-service portal which is used to bring the services to our users so that it is easier for them to use them taking away some of the time-consuming manual activities such as infrastructure provisioning but also other things like generating interface contracts automatically and transferring the information into our config management database um the platform is used by around 500 application and product teams with various use cases so we have a wide variety variety of those from very simple ones where they just go ahead have connector to provide the data assets as an event stream um or some of them even just using it more like a messaging solution but we also have large case streams topologies uh where that they use to continuously generate new information all the time awesome I'm always very interested to you know hear and learn about how businesses use Kafka so what would you say are the areas of the business impacted by your work how does all how does it all translate to the customer getting into their car so our ideas really to provide that platform just to anyone with the new BMW group many of the servers have initially been established in the car manufacturing as they decided early on for a strategic move towards Kafka so there are lots of critical use cases relying on us such as the transfer of just in time and just in sequence information to our production lines or there's also a autonomous a fleet of autonomous Transport Systems that is moving Parts within the factories on their own that would not be able to do their job without Kafka so why we're not having that one Mega use case with like I don't know trillions of messages every day that probably some members here in the audience do um our manufacturing is really really dependent on the reliability of our services since this was quite successful um the other business area saw the huge benefits of the platform and Then followed quickly so for example r d is now providing lots of valuable data streams about Parts Master data or the bill of material and that includes also a bigger project integrating our Mainframe to Kafka sales and after sales on the other hand they're using a lot of information for keeping up to date with customer orders and offering the right products at the right time so to answer your question basically without Kafka none of our customers would ever get a car wow there's a lot resting on Kafka shoulders at BMW I'm sure your team didn't hit the ground running from the beginning though you probably had to go through many stages to get to where you are today so where did you start and how what was your journey yeah that would be amazing having such an advanced service offering right from the beginning right but no you're absolutely right we started very small actually in December 2017 where we've just provided a building block for our openshift based private Cloud environment so I've bundled together a few Services as templates and just passed that on to application teams that were interested in just running the Kafka clusters themselves um and this was quite successful which resulted in more and more use cases coming up even in the critical environment and that was also the reason why some of the teams asked me whether we could not offer that as a central services for them where we operate the Kafka clusters for them since they were lacking the expertise to do so guaranteeing the availability so I sat together with a colleague and we enabled a provider to do the 24x7 operations agreed on all the procedures needed surrounding that and then started that early in 2019 offering that service this was also quite successful and stable from a Kafka perspective but we were facing major issues in the underlying private Cloud infrastructure which really became a major roadblock in 2020. so we had to take a decision how we move forward from there and yeah we just followed the Natural Evolution that we have gone through so far already and moved on to the public Cloud where we've had our first productive go live in July 2021 then uh since then service has been expanded with more and more services more users all the time but this massive growth which was a bit unexpected that it's so massive has come with a number of challenges as well um so as you know using Kafka can be quite complex and unfortunately there's not an infinite amount of expertise available and we also for as a platform team have a limited capacity which means we cannot just simply consult each and every of the teams in depth to help them with that issue so this is really leaning towards lots of additional effort on our side that we have to cope with helping them out of the situation on the other hand our community is really also won our our biggest strength since we've come up with some Cooperative work where we have released features in a very short amount of time that we would not have been able to do so on our own the public route migration is one very good example for that where yeah we've built a completely new solution together with the team from the production ID within just a few months nice it's nice to hear how Kafka grew along with the company over the years the Natural Evolution is a testament to the open source Community driving change within the technology as well yeah uh absolutely right and we can actually see this in some of the amazing work that the community has brought into Kafka so as a company that exists for a very very long time we're often faced with a requirement to modernize business applications that exist for decades um often based on highly transactional patterns so improvements such as skip 61r8 with the exactly ones support for Source connectors or also the ones that you've just mentioned with queuing for Kafka and two-phase commit can be extremely good enablers to get those use cases on board to Kafka as a first introduction but also we as a platform team are really benefiting a lot from from the different features that have been introduced to provided operations at scale I don't think that I really have to mention craft I think anyone involved in operating Kafka has that on the agenda anyway but there are also other things like Kip 875 with the possibility to reset offsets for connectors which is really really one of the features we're looking for and it's also one of the things why we will really go ahead and update our clusters even quicker than we usually do just to get us out into our environments fast upgrades music to my ears it sounds like your team is doing a lot to encourage adoption and allow Kafka to grow within the company in the context of the Natural Evolution of these Technologies at BMW what's on the horizon for streaming for you anything exciting yeah well there are always tons of ideas right and sometimes it feels like never never ending effort but there is one use case that currently stands out to me in terms of Technology stack and the possibilities for the future so this is another one from manufacturing where we have a group of machines that are working together to process part to produce parts and they can be configured to process any kind of different part the ones really needed depend on the car types moving through the production lines and it's quite amazing how many different variants of a car you can produce in a single production line however the problem is a bit that lots of the older manufacturing execution systems are built for linear production they're not capable of doing that parallel scheduling of of production so that's where the new solution comes into play coming up with an optimized Target instruction for each of the parts and sending that back to the to the manufacturing execution system um what makes this really interesting to me though is the usage of Apache Flink which has been chosen after looking into various Frameworks and Flink provides really the features the team needed for for this use case with high performance but also self-optimization of the jobs and the possibility to integrate systems that are not only Kafka but also others to also include for example machine parameterization so this use case is currently rolled out in our very first plan for electric vehicles but at the same time this is a blueprint for our Global factories so this is really a major boost for for the production of of such parallel things and it will be very important for the production of our new cluster architecture-based Vehicles which have recently been showcased with our BMW Vision noi classic so me as a platform owner I'm often faced with a request from users for help with larger streams topologies and having issues operating them and I really see a huge possibility here with Flink to provide another option that helps them really focus on what they want to do and less on how to do it while getting us closer to the business sounds great I'm particularly really looking forward to the cars that come out they are inspired by the Nori class and thanks for sharing your perspective on how you know BMW uses Kafka and with that let's welcome our time back to the stage to talk about link thanks for having me customers and companies are more digital than ever and more data is being produced every day customers expectations have changed and real-time Services have become the standard but yet preventing a fraudulent attempt of renting a car with a stolen credit card is still actually quite hard to achieve and expanding such a use case to also send a notification when the fraud attempt has been stopped or even sending a new card just takes more time and resources than are desired customer expectations are even bigger with all the buzz around AI large language models and machine learning and integrating data with AI providers or using that data for as your feature store or as training data is get is critical to get value out of all these Technologies and out of artificial intelligence now all of these real-time Services rely on stream processing and of course you can put them yourself most of us start with the average Kafka client to produce and stream your your streams but as use cases mature requiring elements also change and there's a need to abstract patterns via apis to improve things like time to Market and things like scalability life cycle management and fault tolerance are becoming more important to support these change customer expectations and that's why we have seen that Flink has emerged as a de facto standard for stream processing now Kafka and flank are used by the most technically sophisticated companies in the world like I said earlier these are the same set of early adopters for Kafka and then we are also seeing webflink and we know that 80 of the flamed users actually use Kafka Flink is a framework that makes it easy to process streaming data in real time it's built for streaming and it treats batch as a special case of streaming in its unified runtime there's a wide variety of unified batch and streaming apis that range from SQL to Java in Python and you have the ability to seamlessly switch back and forth between the different options that you have in here developers choose Flink because of its Rich because of its performance and the rich feature sets scalability and performance fault tolerance language flexibility and unified processing let me dive into each of our each of them first off Flink is renowned for its scalability what basically happens is when you are submitting a fling job it's being paralyzed into thousands or hundred thousands of tasks and there are no use cases where Flink has scaled up to process 6.9 billion records per second
next off Flink contains a fault tolerance mechanism of checkpoints periodically takes incremental and asynchronous snapshots of all the states that writes them into a more durable storage like S3 next to that Flink users can also trigger save points from manual backups for backups or for manual operations in case there's a failure like the crash of a task manager that's executing those parallelized jobs the job manager which controls the entire life cycle will detect that failure and we'll make sure that the job gets restarted from its snapshots that enables Flink to recover from failures in a timely and inefficient manner Flink has these layered apis at different levels of extractions on the on the most declarative level there's Flink SQL which gives you a pure anti-sql ability to write your statements and get easy started one level lower there's the table API which is the programmatic equivalent of SQL so you can use it to write your Java or your python applications and on the lowest level there's the data stream API which gives you a low level imperative API for direct access to building blocks like States and times now in a flint job data flows through a data processing pipeline which we call a job graph or topology what we see here on screen is an example of a SQL statement where we're writing into a results uh destination topic and we're reading from an event those could be Kafka or anything else what is happening is that when you're using the SQL solution or the table API um the job graph is being created for you it does so by validating optimizing that in and transforming that into a job graph if you're using flink's data stream API you actually have to define the entire job graph yourself and last but not least Flink has this unified stream and batch processing Flink is a generic purpose tool you can use it for both common as well as specialized use cases independent of the type of data that you're using if you have a need for a low latency requirements you would use streaming execution which means that results will be generated while your data is being processed with a batch execution you get one one final result after completion so Flink is absolutely a great tool but it does come with the challenges operating Flink on your own is actually quite difficult there's a layer of deployment complexity involved setting up Flink requires you to have knowledge of Link in things like resource management and resource allocation if we're looking at management and monitoring Flink has a wide variety of metrics available but picking the right metric to get started can be overwhelming especially if you're just starting with stream processing and because Flink is a framework it lacks pre-built Integrations with systems for observability metadata management data governance and security tools and of course operating Flink on your own does mean that you have to incur costs both from a infrastructure perspective as well as from an Engineers or resourcing perspective that is why we at confluent have been working on the first truly Cloud native link service where you only have to focus on your business logic and not an infrastructure is to first through serverless zero knobs usage-based SAS solution out there and yesterday you've seen a variety of fling features already in action in the day one Keynote and today I want to explain or highlight three fling features that I think you should actually know when you're starting with this Windows because windows are at the heart of stream processing where your grouping streams based on time or key based criterias temporal joints for efficiently joining data across multiple streams if you're thinking about doing a regular join on a streaming Source you should probably consider a temporal joint and pattern recognition where you're doing complex event processing for pattern matching in streams first let's look at window like I said they are at the heart of stream processing they are snapshots of data at either regular intervals or after a certain number of events take an example where you're streaming in factory data and you want to use that to power a dashboard to determine the number of errors that are being reported per minute if you would have a need like that you would probably use something like a tumbling window Flink has a wide variety of Windows Out of the Box available you can use a tumbling window but also a sliding or a Hopping window a session window even Global windows and of course you can combine all these operations with other features like deduplication top ends or joints in yesterday's demo we had an example of a regular join in there but Flink also has a concept of temporal joints what you're doing in here is that you're joining two streams based on a Time attribute in other words you're joining based on the data when the data was created or updated what we have on screen is an example where there's a mobile application user and a separate stream which contains customer lifetime value from let's say a stream that contains a feature for it says feature store and what you're doing with debts is that you're joining at the moment when the data you're joining them at the moment when you are having to execute a time-based join now one of the biggest advantages that you have for temporal joints is that with a regular join you have to take into account that that result might be changed into the future because there's new data coming in but with a temporal joint you're only interested in the value of the data at the moment of the join the ability to recognize patterns and react to events in a timely manner is something that you do with complex event processing you could have a series of small transactions followed by a large one which might be an indication of fraud or you could have a situation where you have a double bottom example where you have a low price followed by a small increase in buyers and then bumping it again what you could do is you can Define in either SQL or Java or python you can define a pattern matching pattern for that you use that with regular Expressions you can measure calculates you can Define measures that calculate aggregations for that pattern and you can also Define conditions for those patterns like an ordering or the actual value that's happening in that data now hopefully this has given you some insight on why Flink is such a powerful tool and how you could actually use it let's have a brief sneak peek at what we think that Flink will evolve to into the future technology is advancing and flinked us too the community is currently planning and working on Flink 2.0 which would be the first major version of fling since 2016. there are three elements in here that I want to highlight from that discussion first off the mixed execution mode like I said Flink has the extreme and batch unified runtime but one of the powerful things that the community is planning to do is actually to use those modes in one single job for example if you're starting a backfill from your um from your parquet files reading for parquet files on S3 and then seamlessly switch in the same job to a more efficient streaming when you're actually caught up on that backfill and switch it towards Kafka currently it would require a stop and switch in the future we would just automatically choose the right execution modes second off there are Evolutions planned on the SQL engine and to support things like streaming warehouses it requires support for updating statements stored procedures delete statements and so so much more it's all foundational for supporting and improving on lake house Technologies like Apache Iceberg payment hoodie and many more and we're the streaming Warehouse we are combining the benefits of a traditional warehouse but they're always up to date and they contain the Lost insights on that the last item I want to highlight is the removal of deprecation apis there's a variety of apis in Flint that have been originated from the batch World which will be fully deprecated so that there's only that unified API layer in there overhauling the configuration layer looking at the source and sync interfaces but more importantly we're looking at new abstractions for direct access to those building blocks like State and timers we want to make sure that there are no internals in there so all in all the community is super excited to work on this and I think it's going to be super interesting where fling further evolves to into the future one of the benefits of stream processing and how widely it's used out there is also that there's great to listen and hear how others are using that and be inspired how they see stream processing evolved and to hear more about that I would like to welcome animal McDonald and hitach said on stage foreign what up everybody how's it going uh I'm Anna McDonald technical voice of the customer at confluent and I'd like to welcome from jpmc let me tell yourself tell us a little bit about yourself thanks Anna I'm glad to be here um so my name is vitesh said I run the architecture and engineering for change data platforms these platforms run the core businesses in Chase including deposits credit cards auto loans home loans and Chase wealth management I'm also responsible for the engineering of core platforms such as data processing and stream processing I happen to also run the architecture and Technology strategy for aimlh Chase long time and what makes you excited about flank yeah so for a long time right we've been building applications that are message driven right so there was a area of message driven applications that started with message queuing and other capabilities uh we then sort of evolved to building Pub sub sort of applications of the in the in the past and these applications have really given us a lot of flexibility in terms of decoupling the nature of multiple systems with Kafka things became turned up another notch right we started seeing events that are not just business events technical events observability you know all sorts of events started flowing in the Enterprise and uh with uh with Kafka you know you got the producer and the consumer model right but what ended up happening is that we as developers ended up building a lot of infrastructure code because we had to understand which brokers do we connect what areas do we do what do we do in case of failures what do we do in terms of switching Brokers or things of that sort right so there was a lot of infrastructure code that we had to write and you know a lot of companies ended up writing Frameworks to deal with that did we process every single message and those certain things for it so Flink really kind of takes this to another level because you get the abstractions you know like the SQL API the data stream API and others which Focus the developer on writing business logic right so you can really solve the problem that you've need to solve versus focusing on the you know how the infrastructure is laid out and the abstraction became a lot more possible and this is very important in our kind of nature of world that we live in in the cloud world where containers can literally die or spawn anytime so that level of resiliency and that level of decoupling is really important and I think Flink has a has given us a way to manage that in a lot more approachable way yeah or it's like we like to say use cases when you look at the stream processing Cornucopia it's fall I'm saying that what are the top three use cases that you can see flank driving in the industry yeah there is uh you know there's a stream of or you know you know a structure of use cases that we should talk about one is we just talked about all these streams moving into the organization one is simple ingestion right where you take the data that is flowing into data streams and business streams and essentially capture that into your Lakes and their warehouses so that's right that's now available for analytics right you want to understand you know how things are moving in so for example how many accounts are you opening in you know did the promotion that you just launched or the ad that you just launched created the areas so that's simple ingestion that itself is a lot of value because you will now be able to take the data that flows in and now you are able to understand it uh next move to another scenario where we are talking about enrichment right so now you uh have a basic message that is emitted by a system and let's say that system is about an accounting uh platform for creating new accounts or managing account changes so in that scenario that system knows the language of accounts but you want the language of customer to be the core of your Enterprise you want to be customer Centric and areas so you want to connect that and enrich it with customer information so you want to uh you know tag that message with with customer related information so that it can be used within other areas so that's enrichment now you talk about uh trying to go for nextbax action or events processing right so you just created a new account what do you want to do next right do you want the customer to how how do you want the customer to engage with that particular account so you want to start early engagement and maybe you want to do differentiated Behavior depending on the level of the account or the nature of the accounts so that's basic stream processing things then become very interesting when you try to combine multiple streams so you've got business events like creation of new accounts or a change in an account and you have interaction events which is things like uh what did a customer do on a call center or what did the customer do on their mobile site or on a website and that's really uh interesting because now you can say oh such and such customers started an application in one channel but never completed in it so when he or she goes to the next Channel which is maybe a call center you may want to whisper into the call center and say this customer may want to look at that as an example so those are the areas one particular area that I find to be fascinating in the nature of stream processing and particularly with Flink is uh what I consider future proofing so what that means is we in our Enterprises still have a large range of batch applications right you know and these have traditionally grown from their ears and there are for different reasons and shapes and sizes you could take those applications and really process those batches and screens right so you can always do batch processing using screen and then lay the foundation for how those applications can be evolved or transformed in the future so that is a very interesting way of doing it and with the abstractions like the data stream API which sort of combines stream processing and batch processing that makes it a lot more simpler and just overall you talked about a lot of use cases what is the end outcome what outcome are you looking for I mean end of the day you want to really build experiences that Delight the customer right and you want to make the customer feel that whether you're going from one channel to another or one product to another they are you know you look at the customer as a whole and you want to make it very continuity right you want to have a Continuum of those experiences so those are the values that you drive but to do that you really want to focus on the changes in the business logic like how do you detect these business events how do you connect them with these technical events how do you interaction events on different channels and so and so forth so that's really where the the Confluence of things sort of happen right and to do that you need you need to focus on business logic you need to focus on things that are valuable to a customer and that's really where you can build experiences that matter yeah I think you're officially allowed to call anything a keynote unless you mentioned gen AI um so responsible for AI at Chase can you share your thoughts on like where do you see the impact with in financial services so being fortunate to work in language models for a while so we've in my previous lives I've sort of Applied uh language models for you know understanding what's in a document you know end of the day there's still areas where we use documents a lot or different types of content um we have you know applied it on document categorization information extraction as well as uh translation right you can do real-time translations of of documents and other things so those are the areas that we have already or at least I have already been part of in the uh with llms Things become more interesting because now you can apply these in a much more scalable fashion right you don't have to like when we had to build these applications in the past we had to spend a lot of time training and those areas now you can actually benefit from these large models and apply them with like fragrance like rag to really enhance the experiences uh so where I see sort of uh you know gen Ai and llms kind of facing into the future is really taking experiences to another level right you think about uh that all you are doing is you're you you have you're a customer of a of a set of products with a company and you want t
2023-10-03