AWS Summit SF 2022 - Modernize your applications with purpose-built AWS databases (DAT202)
("Surf The Orange Water") Hello and welcome to this session on modernizing your applications with AWS purpose-built databases. My name is Siva Raghupathy. I lead the worldwide database specialist (indistinct) team. I'll be co-presenting this session with Vlad Vlasceanu who'll come and join me during the course of the presentation.
So let's get started. Are you able to hear me? Okay. So let's get started. What does the AWS Solutions Architecture Team comprises of? We're about 200 plus solutions architects around the world. We're database experts. We work with customers on modernizing their applications on AWS.
So I'm gonna share some of the best practices that we learned working with customers around the world. Before I get started, a little bit of a background about myself. I've been with AWS for the past 13 years. When I started with AWS in 2009, for the first three years, I helped build a couple of services, Amazon DynamoDB, which is a NoSQL database service, and then later on Amazon RDS, which is a relational database service. I wrote the first version of the API specification for DynamoDB and also was a development manager for RDS. So after that, I moved on to the Solutions Architecture Team, and I started building the Solutions Architecture Team and also working with customers around the world and helping them migrate, including amazon.com.
So I'm gonna share some of my experience with both building AWS services as well as helping customers on their modernization journey. So in terms of the agenda, we're going to go through some of the key attributes of a modern database applications. And then we'll dive into the database architecture that supports modern database applications. And then I'll also dive into Amazon's modernization journey.
So for the past 13 years, I've lived the Amazon modernization journey as it happened. I'm gonna share some of the best practices that happened as Amazon, the retail company, modernized their infrastructure, leveraging AWS. Then I'm gonna discuss the never-ending mission of a pursuit for an ideal database and how it turned out to be purpose-built databases.
Customers offer and ask us, "Hey, is there a playbook for migration?" So I'm gonna walk through the three playbooks or blueprints that are kind of milestones along the way in your modernization journey. And then I'm gonna hand this off to Vlad who is gonna dive deep into a classic modernization journey, diving deeper into purpose-built databases, how (indistinct) can leverage them as you modernize your applications. So we have a packed agenda. So we're gonna use the complete 55 or 60 minutes. And after the presentation, I'll step out and wait for your questions. If you have any questions, please come talk to us.
We'll also share our emails, so if you have any questions you can email us as well. So let's get started. So what are the key attributes of a modern database application? A modern database application should be designed for innovation and agility, it should perform without any limits on performance and scalability, and it should be highly available, easily managed, and cost effective. Now let's dive deeper into each one of these areas. What does building an application for innovation and agility means? It means the architecture should allow you to innovate really fast. In a recent letter to shareholders, our CEO and president, Andy Jassy, compared iterative innovation to compound interest.
The iterative innovation really creates immense value for customers. And over time, it really snowballs and really exponentially increases your velocity of innovation, if you will. So in other words, your application stack should be able to support iterative innovation.
Performance and scalability are quite important. What does this really mean? If you're building a stock trading application or if you're building an ad tech system that's serving ads, these applications are very latency sensitive. In other words, any loss in latency will result in loss of revenue. And, therefore, your application needs to meet the performance characteristics of the system that you're building. And scalability, how scalable your application should be. As an architect, if you're building for, you know, 6x to 10 X is the typical limit of scalability, when you're building a application that needs to meet let's say a certain performance criteria of x, I would advise you to think about 5x to 10x scalability.
If your application has to scale to 50x or 100x, now you may completely have to redesign that application. But you can worry about that as you evolve that application later. During the pandemic, scaling up happened for a few customers. I worked with some customers that had to scale 6x in a matter of, let's say, three to six weeks. This was the scale up that they had planned for the next three years that had to happen in three weeks to support kind of workloads that they received. On the other end of the spectrum, the airline industry and the hotel industry saw a massive downturn.
They had to scale down as well. So in addition to scaling up, your application has to scale down as well to support the needs of your business. And high availability is critical, which means your application needs to meet the RTO and RPO, recovery time objective and recovery point objective.
In addition to that, a lot of financial applications have to have cross regional DR, data disaster recovery scenarios. And our CTO, Werner Vogels, always reminds us that in large-scale distributed system, things fail all the time. Everything fails all the time. So in other words, when you build your application, you should be building them thinking about blast radius. In other words, your application should be able to gracefully handle failures, if you will. In some cases, your choice of your database will allow you to actually automatically deal with it.
For example, if you're using DynamoDB, the partition size in DynamoDB is 10 gigabytes. We actually keep three partitions in three AZs. Even if two goes down, if you're doing writes, we may ask you to retry when the rest of the application just works. So thinking about availability and blast radius is pretty important.
And obviously easily managed. Automate everything, you should be able to automate everything and avoid manual processes. Last but not the least, building applications, you need to be cost conscious. Often, when you are building applications at AWS, halfway through my design review with customers, I pass and compute the cost and ask the customer, "Hey, this is the cost profile of my design.
Does it actually meet your needs?" We often come out with a couple of answers. "Yes, two thumbs up, let's move forward." Or, in some cases, we may find out that the cost is really high. Having a builder at AWS building services, when we build these services, we build them to do their intended function really well.
When the cost is off, chances are you're not using the right service. So this always has been a guiding principle for me as I do design reviews to be cost conscious and evaluate my design decisions on what exactly the monthly bill is going to be. What type of a modern data architecture will support these seven criterias that we laid about? On the left side, you have a traditional, I would call it sort of a monolithic architecture, if you will, where your web servers, application servers, and database servers, application tier is a big monolithic application that's actually talking to a database. In most cases, this is a relational database.
Even though these various layers allows you to scale themselves independently, really, scaling this application is difficult. More importantly, really, innovating this application fast is gonna be very difficult because there's tight coupling between the various application tiers. And in addition to application being tightly coupled, if you need to make changes in your application tier, you need to be able to talk to other teams.
And most of these applications use a single database, if you will. And if you are doing that, then if you're making some changes to the schema, or if your operating characteristic increases, you're gonna inadvertently effect your neighboring team or neighboring functionality, if you will. So what is the answer? The answer is actually breaking the big monolithic applications into microservices and really using a decoupled architecture. On the right side, even though we have three layers, if you look deeply there, the presentation layer comprises of multiple pieces, you may want to actually expose a mobile experience to mobile application so that presentation layer is separated from actually potentially serving a desktop application really well. These days, a lot of applications, the users of these applications are (indistinct) applications.
So what you're really presenting is really an application API, an API interface. And then building that API interface is very different than building a classic mobile applications, which is very interactive. Kind of thinking of this as personas that your application is serving and actually building them as separate microservices allows you to present the right interface to the downstream applications. As we move down into the business logic layer, instead of having one big monolithic application, you should think of that layer as multiple small microservices that do their intended function really well.
That allows to implement those using different construct. In this case, you could potentially implement a microservice using AWS Lambda, which is our serverless compute service, or you could containerize that application and use our Kubernetes service, either ECS container service or Elastic Kubernetes Service, EKS, or you could use Fargate, which is really a serverless service for hosting microservices applications. As you separate these applications into small microservices, these microservices need to talk to each other as well. In other words, by sending events. And what happens there is you need event queues and message queues. Typically, these are, for example, if you're using Redis Stream as a broker, potentially, you could, you know, for example, one application can actually send a stream message, and downstream applications can consume that.
Other classic scenarios are using kind of, for example, Apache Kafka or Amazon Kinesis as really a message broker. If you are using sort of one-to-one messaging, you could potentially use SQS and other queuing services as well. So the notion of queues kind of comes into play. There's also another construct which is pretty important when you're building this kind of applications, which is really having a mesh.
For example, AWS has a service mesh, App Mesh, that allows you to really have a mesh interface that you can actually send the messages through a proxy layer, if you will. And, therefore, if you wanna actually find out what messages are flowing through with others microservices, this is automatically instrumented for you. Even though this looks a little bit complex, you have various other services that you can leverage to build this.
And more importantly, each one of the microservice can use its own database. For example, if your microservice is about building a shopping cart, then rather than using a relational database and sharing that with another downstream order management system, you could use actually a non-relational database that can linearly scale with your workload. During Prime day, for example, we use DynamoDB for that at Amazon, we receive about 80 million plus requests per second. Going into DynamoDB, it is very difficult to use a relational database to achieve such a scale. So these decoupled microservice-based architectures enable you to handle any scale and really follow the cost, if you will.
So with that as a backdrop, I'm gonna walk you through a little bit of a modernization journey. So, as I mentioned before, having been here for 13 years, I was part of this journey when I transitioned from the service teams and became a principal solutions architect. Amazon was my customer. We had this program called Move to AWS (indistinct) which we used to call this one where, well, I started this journey in 2010 or so, where this modernization journey at Amazon really started in the year 2000. Apparently in the year 2000, the entire website was a one big monolithic application that was talking to a relational database backend. Various teams shared the different pieces of that application, the big monolithic application.
There was also a centralized DBA team that was maintaining these databases. Let's say if an order processing subsystem had to change the schema, they had to actually talk to the central database admin team to be able to actually submit that change and wait for approval. And then each of these various subteams were intricately connected with other teams.
Then it was very difficult to actually innovate really fast and get to market the new products and innovations, such as Prime, et cetera, that we invented. Imagine, this was 1999, the dot-com boom happened. And immediately in 2000, the dot-com burst happened.
And then apparently Amazon stock went like 80% down. Even though our fundamentals were very strong, we had to really tighten up our cost structure, really innovate really fast. So with that happening, we had an idea, like, "What happens if we break this monolithic application to smaller pieces and actually break this big application tier team into smaller subteams?" Perhaps the order processing team can be its own subteam as well as the other team cannot be... So, basically, we broke this thing into smaller subteams. The shopping cart team could be its own subteam. And then we evolved this concept called two-pizza teams.
Now, you may be wondering why these two-pizza teams. What it really means is that, what is the size of your team? The size of your team should be small enough that two pizzas should be able to feed that team was the fundamental idea. Well, is it 12 people, 24 people? Well, it really depends upon the size of the pizzas that you order, but you get the idea that should be a fairly small sized team. Now, a concept, isn't it like, you know, this is way be before all these microservices came into play. What we're trying to do at that point was find ways of really innovating fast.
That was really the business driver, if you will. Now, we found out this two-pizza teams were really fast and agile. And then they moved really fast as they had complete ownership of the stack.
So instead of having, talking to a central DBA team, if they use databases, they had their own DBA teams, if you will. In some cases, they didn't need a DBA because they used a fully managed non-relational database service. If they need a user experience, if they are presenting a user experience and the team actually comprised of a UX designer, the whole team could build it.
And then sort of the DevOps culture was born from there. So we had this notion that you build it and you run it. When I was in the early RDS team, we were a team of about 24 people. I was in charge of the backend services. How this really played out in reality was, like, every day at 10 o'clock, we had a standup. We all used to go there, we used a kanban board with the sticky notes all over, and then we just said, "Look, these are the things that we could do, and these are the blockers, et cetera."
So it became a very nimble culture, if you will, of innovation. And, obviously, it drove a lot out of innovation doing that. But doing that, still, you had to collaborate with other teams, which means we had to put some governance in place. The first governing structure is having like clean API interfaces. For example, the shopping cart system had a clean API that said, "Add stuff into the shopping cart or list the contents of the shopping cart." As long as that interfaces were well agreed upon, what the shopping cart service uses underneath the covers, what database it uses under the covers, other teams don't have to care for that, as long as they can maintain their API interfaces.
One of the things that we learned was if you build an API, you're gonna live with this for a long time. So please think deeply. This is where the one-way door, two-way door decision comes in. Some decisions you can change over time. Other decisions you're stuck with for the rest of your life. When we take one-way door decisions, we kind of halt, we call a bunch of engineers, really review the design, we call our downstream users and ask them a lot of questions, and we make sure that we actually do something that we can actually stand behind for a long time.
So clean interfaces, establishing clean interfaces was really important. So if you really think about it, this entire transformation was both a people transformation or a business transformation, as well as a technology transformation at the very top. What drove this transformation was actually our business need of really moving really fast.
Once we set that as a driving criteria, then we had to put some governance into place. Because we're dealing with a lot of customer confidential data. When you log onto amazon.com and give your credentials,
if we lose that credential or somebody hacks that and actually deciphers your password, that is something that customers, we immediately lose trust with customers. That is not something we can afford. So security teams are put in like frameworks. If you're dealing with personally identifiable data, there's clean data classification for types of data that your services would have to deal with. If you're dealing with personally identifiable data, the are certain criterias you have to go through.
In fact, when the retail team uses some of our AWS services, I often used to sit with their security teams, and the security team would ask me all the questions about, you know, "Does this database encrypt a thing at rest? How do you transfer within the database application? Is there any transfer happening where you don't have a non-encrypted channel?" Those are the governance rules that are put into place. One of the topmost governances, we call this leadership principles. You can look up Amazon leadership principles. When all these independent teams are evolving fast, really operating as an independent company in its own right or independent business in its own right, you need some strong governing rules. Once you put that into place, then you really liberate the team.
The teams can make decoupled technology decisions, they can pick whatever database that they want that really caters to their application need. And then they can also share those interfaces and data contracts and interfaces, if you will. And sort of one feeds into another, it really created a virtuous cycle. And, really, the snowballing effect happened. So I think, even though we're talking about modernizing applications, it's both a business transformation, if you will, and a people transformation or technology transformation. And your company values, starting from when company values in the governance pieces really play into that journey as well.
Now, this is a favorite topic of mine. So I've in the database industry for a long time. Before joining Amazon, I spent 16 years at Microsoft, helping build SQL Server and working with customers and partners over there in adapting SQL Server. I've been personally on the lookout for the ideal database. Like, what is the ideal database in the world? Let's say you've subscribed to the idea of building microservices, and then the next question customers ask when I do a design review is, "What database should I use for my microservice?" And they always look for the ideal database.
The ideal database now should do everything well in short. It should support all data storage types and access patterns and it should implement ACID transaction. ACID stands for atomicity, consistency, isolation, and durability. It should be strongly consistent and it should scale without any limits. It should be highly performant and it should be continuously available, yet it should be simple to use and cost effective. Has one of you found this ideal database? Please raise your hand.
One person has found the ideal database, and the person happens to be on the right end. I would like to speak with you at the end because my experience has been this doesn't really exist in real life. Well, that might be a sad message. If someone says that this exists, I would like to really question them and have a conversation with them. But instead, what happened over the course of last 30 years is that we have an explosion of database solutions.
Starting with Oracle, SQL Server, onto MongoDB, Cassandra, HedgeBase. The open source ecosystem has really evolved tremendously starting from MySQL, Postgres, MariaDB, and most recently NewSQL databases, such as TiDB, Cockroach. And Redis happens to be still the most loved database as per the various reviews. If you will, Stack Overflow, et cetera, right? Obviously, we have Memcached, Neo4j for graph. On the right side, you have all the Amazon innovations, starting with DynamoDB, RDS, Aurora, (indistinct) to DocumentDB with MongoDB compatibility, Keyspaces with Cassandra compatibility, and so on.
If you deeply look into these things, there's really, you know, if you're a customer trying to make what database to choose decision, it's really sounds like a daunting thing. And which one do I pick here, right? Actually, if you deeply look into this, the ideal database is evolved into a class of like seven or eight different classes of databases. Obviously starting with the relational databases. They've been around for 30 plus years. You model in terms of rows and columns, there's the various normal forms starting from first, second, third normal form. Everybody knows these things very well.
It supports asset transactions. They're really a nice, I think of this as a big Swiss Army knife. Has all kinds of tools.
But these things don't scale to the level of what we need these times for many of these platforms' needs. For example, I talked about DynamoDB supporting 80 million requests per second or more during Amazon Prime day or during our Black Friday or Cyber Monday. You can get that scale, it's in a relational database. Now, key-value databases are very good at that. DynamoDB is a great example. And document databases, developers love JSON data type.
This is a great database to actually, if you model something in JSON, to put this in a document data store. Wide column stores, such aa Cassandra or Amazon Keyspaces, even HedgeBase, are pretty good at dealing with datasets that have a large row size, if you will. And graph databases are phenomenal in doing Customer 360 kind of use cases, right? And most recently, new emerging time series databases, if you're dealing with time series data, what happens in time series data is really, what happened in the last five minutes or five days is a lot more important than what happened five days ago or 10 days ago because (indistinct) probably look at that. If you're sticking all of that into a relational database, the database is going to have a lot of stale data, and this is gonna slow down your entire database as you put more and more data into it. Using a time series database, automatically deals with that problem.
Vlad will get into the details of a time series database as he goes through his portion of the talk. And we recently have larger databases, Bitcoins, and all the other technology has been really popular. People want a cryptographically verified log of things that happen here. So larger databases are also starting to get more prevalent these days.
So what this boils down to is the pursuit of ideal database turns out to be kind of use the right tool for the job or use the right purpose-built database for your need. So now with all said and done, really, modernization using purpose-built databases, again, customers really ask me, "Hey, so all this is good. So where do I really get started?" Well, as the solutions architects, hundreds of solutions architects working with thousands of customers around the world, we kind of simplified this into three sort of milestones or three categories that you want to think about, depending upon wherever you are on this journey of your database modernization. And the first thing we would like you to think about is really moving to a managed database.
We call this playbooks. So if you have a self-managed database that you're managing, really moving to a managed database allows you to delegate all the muck of managing databases, backing up databases, et cetera, to us so you can use your precious time on innovation. And if you're using any commercial databases where you're actually, you know, a good chunk of a commercial database use, the TCO comes from licensing costs. If that's the case, moving to an open source database, such as Amazon Aurora or RDS, will save you a lot of money. And last but not the least, modernizing your applications, this is breaking your monolithic applications into microservices and using the right purpose-built database really is on the other end of the spectrum, if you will.
So why is this important? Both modernization and migration is important. Maybe this font might be too small, so I'll read the x-axis here. On the left, you have on-premise systems, moving to sort of lift and shift to the cloud. And then moving to a managed database. The next step being moving to managed database and breaking free from legacy to open source databases.
And finally, modernizing with purpose-built database. So if you simply lift and shift your workload to AWS, so you may not realize the whole power of the cloud, if you will. Obviously, you have the ability to quickly realize the elasticity of the cloud. But actually if you use a managed database, you can, as I mentioned before, delegate all the muck of managing the database to us. So what really happens is really, if you look at the innovation velocity, it exponentially increases towards the (indistinct) as you build your monolithic applications into microservices and use the right purpose-built that gives you the most innovation velocity. And your total cost of ownership also decreases as you move from the left to right.
With that, I'm gonna pass it on to Vlad and have him walk through the modernization journey and get deeper into the services. Thank you. Hi, everyone.
Thank you, Siva, so much. My name is Vlad Vlasceanu. I'm a principal database solutions architect here at AWS.
I've been with AWS for about eight years and helping customers migrate workloads to AWS as well as kinda try to help them deploy business critical database applications on AWS using our managed services. So let's dive deeper into this modernization journey. You'll see that it actually looks pretty familiar. What happened here? So a typical database modernization journey starts here.
And it starts normally with a relational database. Not always, but most of the time. The idea here is that you mostly have monolithic applications supporting business-critical use cases where all of the data access patterns we're all crammed into this relational model.
Multiple components, business functions, and so on. So that's one part of that story. The other part of the story is agile startups and teams that are bringing products to market fast. They might be rapidly prototyping on their laptops, on a common code base, they might be using frameworks like ORMs and other sort of tools like that that enables them to move quickly and bring a product to market quickly with as little code as possible. And when they often use these tools, typically also land in a situation when they're using relational database because those are the most prevalent scales.
These ORM frameworks support that very well. But there comes a time when they actually need to modernize. And the first step on that modernization journey is move to managed. I know Siva talked a little bit about this.
The goal here is to take your databases that you're operating on-premises or on your own self-managed infrastructure and move them to a managed database service, such as Amazon RDS or Amazon Aurora. The reason to do this is to reduce your operational burden. You no longer have to worry about the operational day-to-day activities, that backup patching, configuration, provisioning, all those things. So you gain agility, ease of management, and cost effectiveness most of the time. And in some cases, you actually gain scalability and performance, especially if you're coming from an environment where you're constrained by costs or budgets, or constrained by older hardware.
And to help customers with this first step, we offer Amazon RDS, the relational database service. RDS provides a consistent management experience and automation layer around relational databases. And we offer the widest variety of database engines part of this service. You've got two commercial database options, Oracle and SQL Server.
You got three open source database options. You can pick between Postgres, MySQL, and MariaDB. And we also offer our own managed cloud-native database engine, Amazon Aurora. I'll cover this in the next few slides.
But the key here is that no matter which engine you pick, you get a consistency of experience and features. Provisioning, configuring databases, setting up and doing day-to-day backup operations, monitoring the health of your databases. All of these are using the same APIs and the same construct, regardless of which engine you pick. Same thing with security features and high availability capabilities.
All of the actual work of setting that up is abstracted away into a couple of very simple choices. They're either parameters in API calls or they're consistent API calls. So that makes managing databases, relational databases, really, really easy. And your choice as far as which one to pick really depends more on your business needs.
What do you want to do? Do you wanna stay with commercial database engines? Do you wanna move to open source ones? Do you wanna move to Aurora? And all of those are business decisions. They're not necessarily done based on technical limitations anymore. Now, I mentioned Amazon Aurora before, and the reason we have Amazon Aurora is because customers asked us for a relational database service with enterprise-grade features, but delivered with the ease of use and cost effectiveness of open source databases. So that's why we built Aurora.
Aurora is a cloud-native database that is built upon an innovative purpose-built log-structured storage layer. And that storage layer provides a couple of key capabilities. One, it's automatically scalable. You don't have to worry about provisioning storage.
You just store data on it, and it scales automatically up and out and in based on that. Two, it's highly durable. So we're storing six copies of data across three availability zones. And we've designed it so it can sustain the loss of an entire availability zone plus an additional copy of data and still be operational. And, three, the storage is distributed.
You no longer have to worry about throughput to your storage system. Now, on top of this innovative storage layer, when you provision in Aurora, you provision a cluster. And that cluster contains a shared volume that is enabled by this innovative storage layer. And you can get to pick whether if you use MySQL or PostgreSQL drop-in compatible database engine on top of the storage layer. And if you choose PostgreSQL, you can have additional compatibility with SQL Server using Babelfish for Aurora PostgreSQL. So that allows our Aurora Postgres to effectively interpret T-SQL commands just like in SQL Server.
Now, with these capabilities, we also offer the ability to scale horizontally for reads. You can have up to 15 read replicas in a cluster. But there are very low latency because they're enabled by this shared storage layer. And you can also have globally distributed workloads.
You can create secondary readable clusters in up to five secondary regions. Data is replicated behind the scenes across all of these. Now, Aurora is designed for scalability, but over time, we have made many improvements to boost this scalability further. Now, if you look at the traditional database, it's very tightly coupled between storage and compute, even if you deployed on RDS.
So with Aurora, the first step was to decouple the storage. That's why we have this innovative storage layer where you don't have to worry about I/O throughput because storage layer is distributed across many different nodes. You just start using it, you don't provision storage. It automatically distributes and scales. And then the next step in that evolution was Aurora Serverless where we made the compute scale up and down based on demand. So if your workload sends more queries to your database, compute can scale up.
Less, it scales down. And you really only pay for the compute capacity that you use. So this is Aurora Serverless. And in preview today, we have Serverless v2 where the scaling of the compute is nearly instantaneous. It takes milliseconds. And with these capabilities, Aurora has become the relational database of choice for critical business workloads.
And one customer example is Reddit. So they had a significant operational burden managing their own databases and all sorts of other secondary issues coming from that. So they moved to Aurora PostgreSQL. And they did that to optimize their operations and make the productivity of their engineers improve that along the way so that those people can focus on bigger and better things. Now, once customers reduce the overhead and the operational burden, we get to step two of the modernization journey.
This allows them to improve the performance of applications and scalability of applications using in-memory workload. Typical use cases here, caching or offloading of frequently changing data, counters and rankings and so on, to data stores that are really designed for that. So in-memory. Part of this process can actually be very, very easy because a lot of customers use framework, ORMs and so on.
And those actually have native support for these capabilities. It typically requires minimal refactoring, and it's an easy way to gain more performance out of applications. Now, in this solution space, we offer Amazon ElastiCache, which is the RDS equivalent for in-memory database engines we provided for Memcached and Redis. And we also offer Amazon MemoryDB, which is another database in this space that actually offers full durability. The choice really that customers make here is based on how much durability do you need for these datasets.
With Memcached, it's an in-memory solution, it doesn't have any persistence, it's a relatively simple database, so it's highly suitable for caching key-value type lookups. If you need some level of persistence of your data, then you're sort of looking at Redis and/or MemoryDB as an option, depending on the level of persistence you need. And of course, Redis also provides a lot more query features in the database compared to Memcached. PayPay Corporation, which is one of Japan's leading mobile payment applications, they chose ElastiCache Redis to implement their QR code payment service in just three months. They picked ElastiCache Redis for the features that it offers, but also for the fact that it's compatible with what gets consistently ranked as one of the most popular databases in the world, which is Redis. But the choice they made was to enable workflows and applications and microservices that are in the critical path of all of their API calls to be serviced with submillisecond latencies.
And that is really the core of the reason why they picked ElastiCache. The idea there is that systems such as authentication where they get invoked at any given point in time with every API call from their front ends, those needed to have this level of performance and scalability. Now, I mentioned MemoryDB before.
MemoryDB is a newer option in our offering, which I'd like to spend a few minutes on. It's a Redis-compatible database with full durability. It's actually the database in AWS that gives you the lowest query latency of any of our offering. And MemoryDB does that by essentially separating out the compute execution engine that's Redis compatible from a transactional log that is multi-AZ durable. So when you write something to MemoryDB, you write it to the Redis execution engine node that then waits for the data to be persisted in that transaction log. And only then replies back to you saying, 'Yup, we're good."
And then from a scalability perspective, all of the replicas in that cluster will read from that transaction log and allow you to scale your reads horizontally for in-memory workloads. And you get the benefits of submillisecond reads, in-memory reads, scaled out for your workloads. Now, by this point in the journey, we've sort of done all the things that are easy. We've moved to managed, we have offloaded some of those in-memory workloads. So the next thing is to actually tackle that monolith, peeling away workloads and functionality out of it and implementing it as microservices (indistinct).
Now, the microservices then dictate what database solution you pick based on their access pattern, but also based on some very prevailing specific purpose that customers have. And those tend to be agility, scalability, and performance. Otherwise, you're not breaking up the monolith for the fun of it. You're doing it because you had to need more scalability, more performance, or you need to innovate faster.
And the choices that customers pick for databases in this space are typically non-relational databases. They're more geared towards more generic use cases. Key-value lookups, document source, wide column, and so on. So let's look a little bit more in detail at these solutions. The first one and probably the most popular one is DynamoDB. DynamoDB will give you consistent performance at any scale.
It's a serverless offering. You don't need to configure servers, you don't need to set up, configure database tuning parameters, you don't have to worry about versions, any of that. You simply go and tell AWS to create a table in DynamoDB, and it can be as simple as that. But no matter what scale you're operating on, no matter how big your dataset goes, it will scale and it will give you consistent performance. And this is really the key why customers pick DynamoDB.
You will have enterprise-grade features with DynamoDB, you will have encryption backups, role-based access controls, all of that, security controls. But a key use case for DynamoDB is essentially limitless scale. That's why Amazon relies on DynamoDB for Prime day, and we're able to achieve, like Siva mentioned, 80 plus million requests per second with this system. That's why other customers pick it as well. The other core feature is global tables. If you have this need to have a globally accessible audience or globally distributed workload where data has to be synchronized across regions but needs to be accessed really, really fast from every one of those regions, that's when you pick DynamoDB with global tables.
Think about workloads where they're core to your business, authentication, like access controls, where users from many different places in the world access, essentially, and try to interact with that system. And that system is core to your business and cannot fail. That's where DynamoDB comes in play. And it supports the most large-scale critical internet scale workloads out there. Workloads such as the teleconferencing services provided by Zoom. It's no secret Zoom saw significant increase in their usage during the COVID pandemic.
When we all went and started working from home and started interacting over teleconferencing services with our coworkers. Not even that, my family had countless family events spent on Zoom. Birthdays, anniversaries, and so on. All of that was enabled behind the scenes by DynamoDB. Zoom chose DynamoDB global tables with on-demand capacity to handle both of that growth during the last few years as well as any of the spikes in capacity that they encountered along the way. But DynamoDB is not the only option.
If you have a need for more flexibility, if you have workloads that you need to leverage, DocumentDB document model that's MongoDB compatible, more complex document models, that's when you use DocumentDB. And DocumentDB is a lot like Aurora in the sense that you have clusters. Those clusters can have multiple read replicas in them and are supported by that shared storage environment. But you get a similar operational experience there as with Aurora as well.
You got enterprise-grade features, you have security, you have global clusters, if you need to distribute your workload across multiple regions. And DocumentDB is really use cases where you need schema flexibility, content management systems, user profiles, product catalogs, or any sort of use cases where you accumulate events that are schema flexible or semi-structured in near real time and then need to do some analytics with aggregation pipelines on them. So those are really the core reasons why you might pick DocumentDB. And the other option in this space is Keyspaces. Keyspaces is an Apache Cassandra compatible database service. It's compatible with CQL 3.11 with API,
and it's serverless just like DynamoDB. You don't need to define any servers, you don't need to worry about versions, you don't need to worry about high availability, all of that is taken care of. And it's designed for wide-column use cases. So these are query patterns where you might need, for example, schema validation, even though they're non-relational, or you need to support multiple independent sorts on the same dataset.
After this step, as you're starting this journey of breaking up the monolith, really the next part and the final kinda step in this journey revolves around specialized datasets. What are specialized datasets? So these are typical datasets or access patterns that have special requirements, or they have some unique characteristics. Think about graphs or time series data. Now, sometimes you can actually reverse step three and step four. Sometimes it's easier to break out of the monolith, the specialized datasets, 'cause they're a lot more obvious, if the data is organized by time, for example, and so on.
But what we do see in practice is that customers tend to adopt a generalized non-relational databases first because it's a way for them to gain confidence, to understand how to operate in a non-relational environment. And then they pivot over these more specialized databases. So let's look at a couple of options here.
Probably the most interesting here is Amazon Neptune. Amazon Neptune is a graph database. So it allows you to query and store and process highly related data.
What do we mean by highly related data? Well, these are datasets where not just the data entities matter, but the relationships between them. The way to think about it is you're not just looking at what the data is, but you're also looking at the strength or quality of those relationships between them. And Neptune is a database that supports the open property graph and W3C RDF graph models. And you can query using the Gremlin and the SPARQL query languages. It is similar from an operational experience as Amazon Aurora. You create clusters.
Those clusters can have read scaling, read replicas. It has similar enterprise-grade features, backups, high availability, security controls as well. And it's designed for workloads where you need to implement social networking graphs, or you need to build recommendation engines where finding entities with similar relationships matter, or fraud detection use cases where you're evaluating the strengths of those relations.
But, really, it's designed for those use cases where it can query billions of relationships in millisecond latency. And one good example of that is Dream11 Fantasy sports. So they're one of the largest fantasy sports platforms in the world.
And they needed to scale from tracking thousands of friends to millions of friends for any given new users. So they built a persistent graph, social graph, of all of their friends and followers and users in their system. And they stored it and built it on top of Neptune. And they used ElastiCache too as a caching (indistinct) in front of it to store counters and store sorted lists of common followers. Now, this is a great idea where this caching pattern doesn't just apply to relational databases. You can use it with any of the other technologies that we've discussed.
The other two databases I'm gonna cover really briefly are Timestream and QLDB in the next slide. So Timestream is a system to process, store, and capture time-ordered data. This is data where the ordering in time matters. So these could be measurements from IoT devices and systems like that or real-time events. It provides schema flexibility and the ability to capture multiple measurements per record. And it's completely serverless.
You don't have to worry about any sort of capacity layer. You just start using it. Now, it does provide you with data tiering. So it has a fast memory tier and a slower magnetic tier.
The fast memory tier is really optimized for very quick ingest of high volumes of data as well as point queries. And the magnetic tier is optimized for storing data long term, as well as analytical queries. So the mix of these two, you can set up tiering policies, and Timestream will automatically tier and move the data between these two tiers for you. So this is great for use cases where you need to do trend analysis or forecasting.
Anything that's based on time. And the last database I'm gonna touch upon is QLDB, Quantum Ledger Database. So this is a ledger.
It provides two key capabilities. It's immutable, meaning any change that you make is append-only. And you can track all of the history of all the changes in time. This is what the ledger is. And the second key capability is it's verifiable, meaning all of the changes are cryptographically verified that at any given point in time, you can verify a specific change is valid, is accurate, it's correct. So if you have workloads where you need that, that's where you would use QLDB.
It's also serverless offering, it's fully managed. You can query it with Amazon PartiQL, which is similar to SQL, but it's also more adapted towards kind of semi-structured data models. Really, it's designed for systems of record, for example, for financial transactions, or audit databases, or systems that need to track state.
The way to think about this is you don't only need to query the current data or interact with the current data that's stored in QLDB, but you also need to prove that that current data is valid or accurate. So this is the use case that QLDB will solve. Now, if you've gotten to this point, you probably no longer have that central database in the middle.
Yeah, you've peel off all of the workloads and datasets that really don't need to be relational out of that central database. You probably still have relational database and they probably still support core business functions, but the relational database at this point just supports what you need, the core piece, what they're designed to enable. So what happens next? Well, the next step is to go beyond transactional needs, to adopt a business-wide unification strategy that can enable you to take more value out of your data, to use analytics and machine learning. And to do that, you typically would deploy one or more data lakes as a centralization point where all of these databases feed into or feed out of. But this is probably a topic for a different discussion.
There's a couple of sessions here at the summit that dive deeper into this modern data architecture and what happens next, past the transactional workloads. So I recommend you check those out. But to summarize, most modernizing with purpose-build databases is a journey. And one that you're probably already on.
Might not have realized it yet. But all of these steps are designed to enable some of those key seven pillars or attributes that Siva was mentioning at the beginning of the presentation. And you have different database solutions at every step of the way that would enable you to be more agile, to get more performance, more scalability, easier management, cost effectiveness, and high availability. I promised you a couple of follow-up suggestions for sessions. I'll leave this slide up for a few seconds so you can take a picture of it. And also, we offer many resources to help you boost your AWS cloud skills.
Get in touch with Training and Certification today and check out some of these resources that they offer. And with that, from both Siva and myself, thank you very much for participating in our session today. ("Surf The Orange Water")
2022-08-11 07:48