Deploy Highly Available SQL Solutions & Solve Real-world Scenarios Azure SQL Bootcamp Session 3

Show video

Hi and welcome back to the azure, sql bootcamp. This is day three, of four of our boot camp and we're so glad to have you back, or have you joining us for the first time. My name is anna hoffman, and i'm a date and applied scientist, on the azure, sql team. And i'm also joined by bob ward a principal architect, on the azure data team, um you know we're pretty excited to be here right bob. Yeah thanks uh it's kind of, amazing, it's day three already it seems like it's really flown fast for the first two days. You know. We really delivered some amazing content in my opinion if you're out there wanting to learn about azure sql. You know day one it was all about what is azure sql, kind of fun facts about how it's evolved. Trying to describe. People understand what do we mean about a managed instance in a database. You know we've done deployment, configuration. Uh yesterday, deep dive no fluff right yesterday. Uh, deep dive, focus. Hyper focusedness. Hyper focusedness, that became some term i found on the internet yesterday. So you know security, and performance. Very very important concepts, for anybody wanting to get started in azure sql especially people that, are used to sql servers so yeah it's been a lot of fun to be part of this so far. Yeah i definitely agree with that bob and you know in addition, to what we've been doing in the mornings or evenings or whatever, time it is where you are watching this, uh we've also had, links, to lots of other resources, whether it's on github. Or. Microsoft, learn with that free azure sandbox, that you can use and actually try, all the demos that we've been, showing you and even then some and we've got some more coming today. As we turn to look at availability. And i also have some fun things planned, for the end of this module, some that you might even not, be aware are happening, soon. Uh-oh. There's no telling when anna's gonna come up today that she hasn't told me. I wanted to keep it fun and interesting, so, we'll see how it goes um, today similar to other days we will have a short break, uh we're gonna try to do that at the top of the hour. Uh give you about five minutes to get something to eat get something to drink.

Um And then come back with us to see some more of what we're talking about. Now before we get into, the content. I wanted to just give an overview, of what we're doing with microsoft, learn and learn live, if you haven't been joining us for the previous days, you can access, all the recordings. Uh there's a link in the the chat for that i believe. And, um, and, once, you, join, what you'll notice is on the left side, or sorry on the right side of your screen i think that would be over here, or below, your video screen there are, chat windows where you can actually chat with, other viewers, as well as get your questions, answered from members of our team we've got, great moderators. On there for the third day in a row really grateful for them, um answering, all your questions, to the best of their ability, and also kind of surfacing, some of those questions. Up to us, uh i also wanted to give a shout out to the learn live team especially rob gibbons, who's, running all of this for us in the back end making it all possible. Making us look good and not cut off and. All that good stuff. So thank you rob, and. The other thing i wanted to mention is there are hands-on, labs associated, with what we're doing today. That you can access. Today, tomorrow, next month whenever you want, um so definitely, check those out they're free, uh it's also on github and all the code we put on github, is free for you to use including, these slides or on github, as well, so you can come back to this or use it with your colleagues, or customers, or that sort of thing later. Okay so all that out of the way we can get started, and talk about. Availability. Uh this is something bob and i talked about of being like one of the really great value ads for azure sql. And just moving to azure, in general. Um and then we're also going to talk about some real world scenarios, we're going to put you to the test and we're going to do a fun challenge, and, and see how that goes. So without further ado let's get into. Availability. We're going to cover a lot of topics, that you, find. Probably, very familiar, from your experience, from hdr. To where did my backups, go how do i do point in time restore. Some availability. And consistency. Requirements. And things that we do on our end as well as how to monitor, all this once you get it set up. You know anna as we looked at, security, and performance. We kind of if you see this slide right here it's kind of maybe familiar, to those that attended yesterday. We always want to look at what those capabilities. Are what are the capabilities. For availability. For you, in your deployment of azure sql and then what are the tasks you know what things do you go through. Using these capabilities. To set up your solution to make it as highly available, as possible. And you just think. Right out of the gate you need to know about backup restore, for disaster, recovery.

And About availability. To keep things, going, all the time in case there are problems, and so, today i think you're going to find the story we're going to show you about, those two topics, are quite amazing, and. To me. Worth the weight in gold perhaps, you know for moving to azure but we're also going to tell you how to go further, something called availability. Zones. Geo-replication. Failover, groups, and then we'll do some core sql stuff near the end we'll talk about just database availability. Uh how we run recovery, those kind of type scenarios. And database, consistency, but let me give a shout out first. To sql server replication, we won't go into the details of that today, but you should know, that sql server replication, is fully available in managed instance you can set up a transactional, replication, system. Across your managed instances. And even people don't realize, that you can set up an azure sql database, as a subscriber. To a publication, that could be on premises that could be from managed, instance. And and in fact that might be interesting scenario for an online migration, type problem where you want to use replication. And make your azure database, as a subscriber. And then again you want to have these tasks you want to go and figure out based on those caseability. Capabilities. How do i pick the right deployment, option for my rto, and rpo needs and my availability. Needs, how do i configure, hdr. And how do i monitor, availability, i always want to throw an anna about monitoring, because people want to know all the time. We showed you how to monitor security, we showed you how to monitor performance, and we'll also talk about how to monitor availability. Near the end of the session today. Awesome, thanks bob and the first thing we're going to talk about is backups, now this is something we commonly, get asked about, when, people move to azure they say okay what's going on you guys are managing my backups, but what does that mean, so just to lay it out uh we do full database backups, once a week with log backups, every 5 to 10 minutes, and differential, backups, every 12, hours. Now these backup, files, are stored in azure storage, with, ragrs. Storage so what that means is it's read access. Geo-redundant. Storage. So right off the bat what this means is that you have the option, to, do a geo restore, now you might be wondering, what does that mean, so what this means is if a whole, region, goes down. You already, by default, without, doing anything, have, geo-redundant. Storage of your backups. So you can actually pick to restore. In another region, in the case that something you know really catastrophic. Happens, uh you can be sure that you'll be able to restore, that data. Another, thing, that we have available, for you is point in time restore. Now if you recall, in sql server, and, in order to do a point in time restore, oftentimes, you need, full recovery, mode enabled, and, if you recall from previous, modules, of this bootcamp, we actually talked about how azure sql. Really only allows you to run in full recovery, mode and one of the reasons. Is for this point in time restore. Now retention. For point in time restore, can range between, 1 and 35. Days this depends, on, the service, tier you select, as well as the deployment, option, as well as your, actual, choice, uh by default, it's it's every seven days i believe. Um now if 35, days isn't enough for you, uh certain deployment, options, also have the ability, for you to enable, long-term, retention. Or ltr. And this is going to allow you to retain, up to five years. Of backups. And this is really interesting because if you have ltr. And you, delete, a database. Not a logical, server not a managed instance but if you delete a database, or several databases. Maybe five years ago and then something happens, and you need to be able to restore, that database. You can actually go in and do that with ltr, technologies. We talked about geo restore, and i mentioned, the, the backups, of deleted, databases. With managed, instance you can even do a manual, copy, only. Uh to blob storage, for backup restore, with managed, instance uh so that's an awesome, capability. Too we're really kind of trying to take the stress, away from you and give you, the, security, and the confidence, that we're going to take care of your backups. Now bob when you look at this and you've been working with sql server you know i always make fun for just a little while just a few years. Um. You know what stands out and what's the significance. Of some of the things that we do here. Thanks anna you know i spent, a lot of my current support. And no one ever calls microsoft, support and says hey my backups are working and i can restore my databases, successfully, so everything's great. You know it's always a call to say.

I Can't restore my backups i can't find them or i don't have them or they're damaged or there's a problem. Just consider the fact that when you deploy, an azure sql database or a managed instance. You get, automation, that kicks in and you get redundancy. Built in remember we talked about the fact that like even in general purpose. Your database, and log files are in local redundant, storage so you got multiple, copies within the azure storage infrastructure, already. The backups are separate. They're in geo-redundant. Storage, so the, the probabilities. Of having like a damaged, file from a backup go to almost nothing. And you know people try to configure that themselves in their own environment, that's a task and a lot of work, but, declaratively. Getting to say i'd like to you know 10 years of backups and i here's my schedule, i'd like you to build on or the fact that i accidentally, deleted a database and i could just go click and get a new one and get it back from a restore. Or, that point in time restore, is just available by putting in a date time and just doing it, i mean this is pretty significant, stuff. I should also call out for folks that like a little more control and i mentioned about copy only backups with managed instance. You can restore, those copies, to another, managed instance, and in addition. We do allow a native restore, and managed instance. From backups, from on-premises, sql servers which is part of actually a migration, strategy you can use so, i think we give you. Backup restore options but we also give you just amazing, built-in. Backup restore capabilities, that is pretty difficult, for folks to set them up themselves. And to rely on because you know microsoft's, got their backing behind all of this. Definitely, and one other thing i wanted to call out because i do recall seeing something on the chat, about it yesterday when we talk about point in time restore, something that's different, maybe, than what you would do on premises. Is that in azure sql, you can't restore, on top of your existing, database. And we really do this to protect, you what you have to do is restore, to a new database, name. And then you can go through, and determine. If whatever it is we're trying to fix or go back to is still there, and once you're comfortable with that you can go ahead and rename, your, old database. And then rename your new database, to your current database, so you only have a minimal amount of downtime, and you don't actually have to change. Any connection, strings. And this is an example, that you'll actually get to do hands-on. If you go through the microsoft, learn. Labs we have a point in time restore. Lab for that, the other thing i wanted to ask you about bob is, you know we do back up integrity, checks, as a team can you share a little bit about. What we do in those. It's pretty interesting, we're going to use checksum as part of a backup option which is a native part of sql server that's been around for a while, so you're able to see, very quickly whether there's a problem with the media, of the backup you know putting the backup file, uh itself. But we do restores, for you we automate, process to randomly, look at and do restores, and do, testing of restores because sometimes that's the only way you know. That could something be restored so we're automating, things in the background, to check and make sure. That what we have available for you in terms of backups, can be restored, for you so. It's quite amazing, behind the scenes the automation, even from an integrity check point of view we're doing for you. Awesome yeah i think it's a great story and i'm looking forward to hear what, uh you guys say when you try out these labs or even in the comments related, to this. Now now that we've kind of talked about backup and restore, we wanted to shift to talking about. Other things that are built in related, to hadr. And we thought what better way than to kind of dive deep into each of the service tiers for vcore, model. And look at the architectures. And then you'll kind of see, what availability. Models we have set up from the get-go. And, the first service tier we're going to talk about. Is, general, purpose, high availability.

So Let's kind of walk through what you see, on the right hand of your screen to get set up, so what we have is we have an application. And we have some gateways, and we have a primary, replica. Now if you recall from an earlier module we talked about. Setting up connectivity. From your application. And how you know they're really two options you can go straight to the gateway, every time which will then point you to the database. Or you can go to the gateway, then get the connection, information, for the database, and go, directly. To the database, those, are called proxy, versus, redirect. So that's what you're seeing there is we have our gateways and our control ring, your application, is going to go there figure out where the database, is and then get pointed, to that and that's really across, all service tiers. But when we start to look at general purpose. What you'll notice is on the primary, replica, on the locally. Attached, ssds. We have the tempdb. Now the. Data, and log files, are actually stored in azure, premium, storage, which is locally redundant, storage. And your backup, files as we mentioned, in, for backup restore, are stored in azure standard, storage, which does have that geo-redundant. Storage. Now when you think about this you can think it kind of behaves. Like a failover, cluster, instance. And, if you want to think about, what would happen, in the case of failover. Would need to occur you might want to first back up and say, why, would a failover. Occur, or what would trigger this, uh now recall, back way the beginning, bob talked about how, really, azure is kind of built on this azure, service fabric, backbone. Now, if the server. Service, fabric, uh detects, something, is wrong, uh what it's going to do is it's going to, try, to initiate, a failover, and just like a failover, cluster instance, it's going to, first. Look for a node with spare capacity. Move the database, file and run recovery, uh so this can take some time depending, on, uh the capacity. And and how close it is that that is moving, uh but then we'll repoint. Everything, for you so again, as long as you're going to the gateway, and then the database, you're going to be redirected.

Automatically. Meaning. You can fail over, it only takes you know less than 30 seconds. And then you're back up and running. And what i want to, really stress, here, is that. You didn't have to set anything, up for this, all you did was deploy, a general purpose, database, and you just get this, right out the back for manage instance, or, single database. Yeah anna so. You're exactly right general purpose just feels to me like a payload, cluster instance shared storage, right temp db on your local storage. But what about business critical because we talked about the fact that business critical has some interesting, properties. And we even mentioned to you about faster recovery. Times, from failover, type problems, so how do we achieve that, so, you see in this diagram, here it looks a little, pretty much like what anna just showed you, except notice one difference here, the primary, replica. Still we have the gateways right the primary replica, has the, data, and log files on the local drive tempdb, is also there by the way, separately, we don't show that but it's always there, but notice the data and log on the local drive now. Not in azure storage. So. In order to, achieve, now, high availability. We're going to just we're going to actually deploy an availability. Group for you behind the scenes, with multiple. Secondary. Replicas. Where data and log are on local storage. Just like you would do in an always-on availability, group scenario. What's interesting about this is that, we always try to keep four replicas. Available, at all times one primary. And three secondaries. And if you kind of look and think about this and think about availability. Groups, but how are things synchronized. So what we do here is that your transactions, on the primary. Are allowed to commit. As long as one of these secondaries. Can acknowledge, they've hardened your log just like a synchronous. Availability, group scenario. In your environment so as long as when one has acknowledged, i've hardened the log change. We can go through and let the commit, uh proceed on the primary. So it's it's got that field availability, group right you're still connecting into the gateway though so you're kind of abstracted. From where you're going. And as anna just described, a second ago what if a failover, occurs in this environment. So we're still using, the service, fabric, logic. Just like we did with general purpose the service fabric for example is monitoring, is the rack down or host have a problem or the overall virtual machine. But sql server. Integrates, with that what we've developed in azure sql both for general purpose and business critical, so we can detect with service fabric or we can help signal the service fabric. We should initiate, a failover. So what if a failover, occurs, well in general purpose ana talked about you'd spin up another node like spin up another sql server and just point to your. Database, and files on storage, but remember we have a replica, design so we're going to fail over, to a replica, that is synchronized. Right, where the data and flog files exist. Now the backup files are still in geo-redundant. Storage just like general purpose so, just like in your environment, you would store your backup files in a separate storage, location, we do the same we make it geo-redundant. But the data log files this is a non-shared, storage solution. Just like an availability, group. Now what's interesting about business critical is a couple things, number one. For free. One of these secondary, replicas, you can use for read purposes, and you can use the same type of techniques. To, specify, your intent, for read only. When you connect. But just like what anna talked about, what i love about this is that there's no listener for you to just deploy. You simply use the gateways the gateways, are in charge of deciding. What replica, you go to where is your primary, where is your secondary, to be connected at this point so, again it feels like an availability, group because we're using that technology. But as anna talked about general purpose. You just deploy business critical, and you just get this this is what we provide, for you and so, you can see though in a failover scenario, since we're always redoing. Logs on your secondaries. We simply just have to run recovery. Using accelerated, database, recovery which we'll talk about later, and so failover can be quite fast compared to general purpose.

Nana, I know we talked about this we were building this training together and i'm like and i have some thoughts for you how to build a demo and you're like bob i got this let me show you something so maybe you should show the audience. Pretty nifty demo how to compare. Failover, times between these two options. Yeah definitely, bob and and like the audience, has heard um, bob taught me how to use, oshtress, and now, you'll see it everywhere, and all the things that i do. Let me just run a few things to get this set up, and, then i will explain, to you. What is happening, because there is a lot going, on. Um, here. Uh so, let's start, by looking. At. The. Top. Part of your screen. So what i have up here is azure data studio. And i have two. Powershell. Notebooks, one on the left, and one on the right, and the one on the left, and really everything on the left side of this screen, is related to a general, purpose, database, so remember similar, to a failover, cluster instance. Now everything, on the right side of my screen. Is related to a business, critical database. And, what i have. Running, in the. Powershell. Windows, below. Is a very simple command. Um so if we take a look at it i'm using oshtress, connecting, to the server. Just selecting, the account from a table. For the business, critical. Database, so that's what you're seeing on the right just the results, of that. And, similar, on the left side of the screen i'm running the exact same query, on, the same date the same database, content, but a different database. Uh except this time on a general purpose database. So all you're seeing is the results from that query again, left general purpose. Write business critical. Now this is just supposed, to simulate, an application. That has retry, logic which is very important. And we're just continually. Trying to basically connect, to it, so what you can do, is while service fabric, will do an automatic, failover. You can actually use powershell. To force a failover, now you can only do this once every 30 minutes. But what this allows you to do is kind of see, how fast these failovers, can be, so what i'm going to do is i'm going to try to click these buttons and run these commands to initiate a failover, at the same time, and then i want you to observe. What happens, to our connectivity. Remembering, that right is business critical, and left is general purpose so i'm gonna kick that off, and it's like the dot race you see at the, you know. At the ball game. Yeah, exactly. So immediately. What you see, on the right side as well as the left side shortly, after. Is that these databases. Become. Unavailable. As the failure, begins to occur, but what you've noticed, is. With business critical, it was very, fast you know almost immediately, maybe three to six seconds is how much i usually see this take. We have, gotten reconnected. And the reason for this is because we are running on that availability, group so we just have to kind of switch pointers, to a secondary. And we're almost, up and ready ready to go with general purpose it was still less than 30 seconds which is still pretty powerful, for just deploying, it and not configuring.

Anything. But the reason it does take a little bit longer is because we don't have a secondary, you know on standby. Ready to go. So that's kind of a fun, uh a fun demo that we like, and, really powerful, really shows you what i've been calling, the power of pass, or the power, of, platform, as a service, uh so definitely something really interesting. And something you'll get to play with in the hands-on, exercises. As well. Now bob this, is actually pretty historic, for us because this is where the hashtag. Anna that was awesome. Came from. Yeah that's actually a true funny story we were we were filming one of our videos for the youtube series azure sql for beginners. And i remember at the very end of this. Because i remember giving you some pointers about how to kind of build this and you went off and built this right. And. I maybe i don't know if i'd seen the full demo until we had recorded the video and i was like oh my gosh, that is awesome, so they caught it at the end of the youtube video so uh yes i think that has become a new hashtag, just like. Hyperfocusness. Hyperfocusness. Yeah i couldn't say it. And you know, uh, bob while we have a second anthony, asks a great question, he says. If business, critical has availability. Groups in the background. When or why would i ever do my own, availability. Groups. A great question anthony, that's really what platform of service is all about anthony so. We are going to deploy an availability, group for you with say azure sql database or managed instance with this business critical option. And you don't have the option of doing your own availability, group, and you're even going to see later, in our session today we're going to support, something that feels like a distributed, availability, group so, it's our belief that from a past perspective, we're kind of removed the need for you to ever have to do this. Perhaps, there's some sort of nuance about an availability, group that you want maybe you need more replicas, that we can provide but. Just wait because we're going to show you how to do multiple, replicas. Uh so that when you'd go to something like azure virtual machine and do your own configuration. But we've found, what we've provided in the business critical and what we're going to show you in hyperscale. Or things like auto failover, groups that we have a lot of that functionality, that you probably need including read capabilities. Those kind of type things, with our built-in deployment so we kind of think you don't need to do that, if you choose platform as a service. Thanks bob but, also thanks anthony i think it's a great call out and again a great, reason to, consider, a pass service. Um now the last service tier which we haven't really talked about yet and is only available. In azure sql database today. Is the hyperscale. Tier. You know anna. As we've evolved. Azure sql over the years we showed you that history the beginning of like we did managed instance cloud cloud lifter, and we did that for the purposes, of ensuring. That we could provide a bigger surface area for instance level features. Well customers came to us and said we need really large databases, for the cloud, and you need to find a way to make that work, and so we're thinking to ourselves how do we create a scalable, solution, that's high performant. And allows, for this large amounts of storage of databases. And provides some interesting characteristics. And so. A project was developed, called socrates, that led to something, called hyperscale.

And This is only today available for an azure sql database. And its unique architecture. Today, at least, is one you need to think about carefully because if you go to this architecture, you deploy hyperscale, you can't move back to a different one without migration. So it all starts with primary, compute, or primary, nodes. And on this is a sql server. And your primary is where you're going to do your read write operations. And what we're going to do a little bit different here though is we're going to have a cache a local ssd, cache. A little bit like buffer pool extension but slightly different. So, you have a buffer pool in your sql server, but if you can't fit it in there we're going to try to put it in this cache, which is a local, ssd drive so if you try to go. Read from data, we'll go to the buffer pool if not we'll try to go to this cache, so that's like one layer, of this model i'll talk about secondaries. In a second. So another, possible, layer. Though, is something called page servers. So another, set of sql, servers. Also with caching, buffer pools and caching. And so if we can't fit pages there, we'll go to these page servers, right and we actually have redundancy. Built into these servers and they're paired together so we have that all figured out as well. So you see this model right now, you're asking yourself. Where are my database, files. So the third layer, is where we keep your database files and we usually use azure, standard, storage, here, this is where we're going to keep your database, files. Which is going to allow us to do something very interesting. We're going to be able to use snapshot. Backups, because, you're primarily getting your data. From the upper layers, of this distributed. Caching, system. So we can afford to be able to do a snapshot, backup. And a restore. Actually uses a snapshot, backup, and anna this is where restoring, of terabytes. Multi terabytes of data, can happen in a very very short time frame so it's pretty amazing technology. So you're looking at this model. And you're like okay i kind of get it i see how you're doing multiple layers of caching of pages and so forth. That kind of type thing, what about the transaction, log how do you do logging here. Well as it turns out what we do is in the primary, compute, wherever doing your rewrite. When you need to flush, log records we still have the standard, log manager, in sql, server. We're going to redirect, those flushes to something called a log service which is on a separate node running it's redundant, as well. And we're going to use its own caching mechanisms. And something called a landing, zone with azure premium storage. And then, as necessary. Long-term. Log storage, for point in time restore capabilities. Is an azure, standard, storage you see how we've separated things here in this model. By the way those snapshots, are in separate storage as well, we've provided a very redundant, solution here. Now let's talk about those secondaries, like how does that work, we don't necessarily use availability, group, specifically. Here but we use the concept, of log change technology, from the log service. To feed. Secondary. Replicas. And to feed the page servers. So, as we make changes those page servers are going to get up to date because we know in the transaction, log how to do that, and then we know how to keep these secondaries, up to date. Because we're using the log service. It's a pretty amazing system, the auto failover, capabilities, though anna, are the same. We're still using service fabric we're still using sql server to decide. Do we need to fail over though, from a, primary. Um, and anna you asked me at one point in this architecture, because i was building the sly, and it used to say one to four secondary, replicas and i said actually that's not true.

It's Zero, to four i think, you asked me that right, oh yeah, why, what, why would i how, would i have zero. Yeah how can you have availability, with zero replicas. You should think of hyperscale. As a combination. Of, what we've done with general purpose and business critical, so imagine this diagram, and you had no secondary, computes you just had the primary. Remember now the data files though are not, they're in they're shared stores they're on the actual, azure storage, so. If you needed to failover, with zero replicas. We would spin up another compute. We would be able to refresh, the data from our caches and our data in the actual, uh azure storage. And we'd be up and running with our primary at that point right now the failover, time is going to be longer, because it's kind of like a gp scenario where we got to bring up a sql server, and in this case we got to feed it you know we got a feed sql of these cache scenarios. So, you can do zero if you want but here's the cool story you can do up to four, secondary, replicas, and they all can be read replicas. So now, in the case of bc we only had one read replica. We can give you four of these and these all these secondaries. Are fed again from changes, in the log so in those cases anna, you know your recovery time or fail of time is going to be much faster, because now it feels like a business critical. Because those are being fed from changes from the log service, pretty pretty. Amazing, architecture. Yeah definitely, and you know it continues, to amaze me how we really re-architected. Sql for the cloud. Um you know another funny thing i wanted to call out before we get into the knowledge check, is as we were going through this, uh, i, asked you bob, about, your typo, of sql server dot xe. So anna was really showing her age here or lack of age. So we i'm not this is not this true story we were filming this video. And i think, i was like i don't know if it was live or was in a side question she goes hey bob uh you know they made a typo in this diagram, sequel server.exe. Does is missing the e before the r. I said oh anna. Oh and are you sweet young person. Sql server xc if you notice it's 8.3. Which is the, file naming convention, for dos and os2, back in the day, and so we had to use 8.3, for sql server we just cut off the e. And by the way anna i don't think that'll ever change i can't even imagine. The product, what would happen if we tried to go in and change this to be like you know, greater than 8.3, it probably would break the entire product of sql but there's, there's your story of the legacy. Of sql server in 1993. Running on os2. Yeah, i love it, sql server xe. Um now we have a knowledge check but before we get into the knowledge check i'm going to. Ask a question, from david o'keefe. Uh he asked, what is the downtime.

Associated, With moving from standard. To business, critical. And then the second part of that is what about moving, between. Generations. Of. Compute, or hardware, i.e, gen 4 to gen. 5.. Yeah thanks david for that question. Uh so moving from standard business critical is kind of like a scaling, operation, in a way correct. Uh and same thing between, hardware hardware generations. So if you think about, you know gp we've got things on azure standard storage. And if you go to business critical we need to go set up those replica, systems, right, so we have seating, involved so the larger the databases, are the longer that operation could potentially, take but but be aware of this, one thing we're pretty smart about this, so when we're setting things up it's not like you go, change to business critical and immediately you're just down. We wait and we let you run to the longest possible, moment we have to switch over and put you in involved and again, you're just using the gateways, which is beautiful about from application, point of view but you can expect there's going to be a longer down time than say scaling from two to eight cpus and general purpose as we've shown you, uh in that operation because we got to see you know those replica, designs, same thing between. Like gen four to gen five if you're doing doing that in gp. That, could take longer depending us trying to find a node that has a gen 5 type processor, right, but in the gp, case. We just have to go find a new node and point to the database, files that exist so that switch. Within let's say a general purpose a service chair can be much faster. Is that anna you kind of agree with that, yeah definitely i think that's, that's correct, um now a lot of people have already started to answer these questions but just to kind of read them out and then we can go ahead and start talking about them the first question, is, which service tier, puts the data, and log files, on azure, premium, storage. And the second question, is which service tier has an always-on. Availability, group, behind the scenes. So, if i take a look at some of the answers we get in. Yeah let's let's say let's take this is it you you were a little tricky on this one actually. Uh if you ask me, that first one. I think people need to really carefully, read that sentence. Yeah and i think um. You know that shows in our answers. For the first question, i'm seeing d, a, c. D a c you're seeing okay you're seeing a mixture okay. And b, we're seeing a good mix of all, okay, this is actually good, this is good because, we're gonna we're gonna stop history, yeah. It looks like a lot of bees, are starting to come through. What about the second question what are you seeing on that one. The second question is you know the majority. Of folks are saying b, a few people have said, c and d but most people are saying b i would say there's much more consensus, on the second question. Okay, okay, you want to let them keep coming through you want to you want to tackle them no why don't you go ahead and, tell us. Well if you remember yesterday, we used the sherlock holmes method right we. Did an elimination. Process to to answer our questions and i i just encourage anybody i love answering questions i think that's the right way to go after things oh i taught my boys that how to take tests that way right so anyways, in the first one.

Which Service, tier, okay, this is a good thing they're all these three choices are service tiers i'm glad everybody, has probably resonated with that. Puts the data log files on azure premium, storage, so first of all, it can't be b. Because business critical if you remember in our diagram, puts, data and log files on a local ssd, storage so let's x that out, which means we can x out d. Right so it now comes between, a and c. And if you remember, very carefully in that last diagram. We actually put your, data, files. The data, files. On azure, standard. Storage. And the long-term, log files, on azure standard storage, right because we're trying to make it as cost effective as possible to deploy your solution. So if i thought about that, the data, and log files together the only solution that lines up for this is general purpose so that should be a i think. Yep there you go. Okay. Uh let's talk about the second one, so which server, has an always-on availability, group deployed behind the scenes. So it can't be a anna, because if you remember correctly and you look at the. The comment to the to the left here. If you put your files on azure premium storage. All of a sudden now you're not using. Or you're, using a shared storage solution, not a replica, based solution, so it can't be a therefore it can't be d. So if i look at this carefully i've got b and c now now i just mentioned that hyperscale. Uses a log, based approach to ship changes. You know to these tiers, but, if you remember correctly i said it doesn't use an availability, group specifically, to make that work so i think the answers got to be b. There we go. What do you know. Great so i think oops i think most people got number two i think we tripped some people up on number one but it gave us a great opportunity. Uh, to. To take a look, and dive a little bit deeper, uh looks like we have, a question. Uh, on the screen, from, duke, nukem. Who asked, do you really work with buck woody.

Oh My gosh i'm supposed to answer that and not laugh out loud. Unfortunately. Yes. Uh, i do work with, actually to be complete serious buck is one of the super, coolest people on, on earth. Um. But you you i will just make one funny story for you and i let you comment. That we will be, when we used to travel together we were at dinner. And we would call out some, thing and then buck tried to describe the history behind something like where it came from, and my comment was here we go, you know because here's going buck again right so we, all laugh together and use that comment here we go, so i don't know in about about how you want to say but love working with buck i know he's monitoring, us today. We do work with buck. Yeah and we're very grateful, for uh his work, i know he's very active on the chat answering y'all's questions, so. Awesome. Um. Now, we are going to continue. Talking, about availability. And. Really we've talked, about the different service tiers and you might be wondering, okay, how can i go to the next level or are there other ways i can go to the next level and we're going to explore a few other ways you can kind of go further. With availability. And the first thing we're going to talk about is zone redundant. Configuration. Now you might have heard this phrase used, uh there's lots of different words that we use to kind of describe, this you'll hear like, availability. Zones. You'll hear azs. Or multi-az. Or zone redundant. Zone redundant, configuration. So they all mean the same thing. But we want to talk a little bit more about, what it is so this is free today. In the business critical, service tier, it's actually, only available, today in azure sql database, business critical tier, um and it's really easy to set up you just go in the portal and it's just like a switch you flip on or off it is off by default, and we'll talk about why in just a few moments. But let's talk about. This architecture, you're seeing on the right hand side of your screen. Of what this means. Um so what you're seeing, is the business critical database, and this, should look familiar. Um, that always on availability. Group the way fail versus secure, all the same. Your azure standard, storage. For your backups, this is just. Business critical deployment. Now what's changed. Is you have these, multiple. Az, so az1. Ac2. Ac3. Each one has its own gateway. And each one has at least, one, replica. In it, so what does this mean what this means, is that in one region. We are going to spread, your availability, group across. Three, availability. Zones. Meaning, an availability. Zone, could be a different, data center, but in the same. Region, so you're not going to have everything. Right on one machine. In one region, in one data center you're going to have it spread, across. Multiple data centers, and this really helps you if there is something that happens. In one data center in a region, you're still safe, because you have, a, secondary, replica, at least one in each of the other availability. Zones. So this is a really great benefit for availability. This is how we're actually going to be able to provide, you the highest, sla, which we'll talk about. After this but the highest sla, is going to happen with zone redundant, configuration. Or what you'll hear a multi-az. Deployment. Now let's talk about a few of the other things that you see, one thing that you see is now we have, three, control rooms with gateways. As opposed, to just one, um so how does this work are you gonna have to set up azure traffic manager. No so what happens, is, we actually do all this work for you so you, just point your application, to your database. We're going to go to traffic manager, to figure out which availability. Zone to send you to, um, give you back the database, that you need, connection, so you can connect directly, to the database, depending, on which method, of connectivity.

You're Using, um so really again, similar to business critical, architecture, and general purpose architecture. You don't have to really set this up at all you don't have to worry about it you just check the box and, and you're good to go. Um now one thing, to be aware of and one thing we get questions, a lot about, is, why isn't this on. By default. Um bob you know you want to answer that question for our, viewers. Yeah, one one issue here just to just to let you know is the fact that this is not available, today in every single region, um, and. Because, we need to be able to have the ability to have zone redundancy. In a given region around the world so, there's other particular, scenarios, here where. This is a little more difficult, to create, the zone redundancy, across like a hyperscale, type deployment right so, and then for general purpose remember we have azure, shared storage to figure out across these zones, so there's a few, complexities, we've got to figure out behind this but the good news anna, i'm pretty confident here that you're going to see zone redundancy. Be a big part of what we do going forward with just about all the deployment, options we offer. Definitely. Um. This is definitely true so thanks bob um, bob one one other question i have is you say, or we say it could slow down latency, sensitive, oltp. Applications. What does that mean yeah yeah if you look at this diagram right here you can kind of see why that might be the case right so remember we talked about in business critical that we have to have a commit. Uh cannot be allowed to go forward unless one of the secondaries, at least one of them has said, i've got you back you know i've got your log hardened. So imagine, that. In this particular, case the secondary, and az3, for example. Is the one that has to come back to az1. Even though these are close there's still network connectivity. Between, these that is slower than just say in a single az. So because of networking, between the azs. For a secondary to come back and signal that i've. You know hardened a log it could be slightly longer, and it's it's really hard to say how longer, that is it just be maybe longer than you're used to, but let's say you're using like an internal tp, application. Where transaction, logs, logging, can be the bottleneck. That's something you got to take account of in a situation like this. Yeah. And you know one thing we've been talking about so far with all these different architectures. Is the promise we're giving you about something, but where does the rubber meet the road like what's the real promise. So if you're not aware we have a service level agreement, or sla, specifically, for azure sql. And if you look at this diagram. On the left hand side it usually. Deals with something, called nines maybe you've heard that in industry before which talks about uptime, and downtime, expectations.

Of Your deployment. You may be, in your data center today. Providing, a solution, to your internal, customers, with nine, promises, right well we do that, and in our case for our promises, these nines. Deal with promises, we make uh that deal with your subscription, so if we don't meet the sla requirements for your application. You can apply for a credit towards the azure account you're paying microsoft, for a subscription, which is pretty amazing right. And if you look at this there's a single. Zone sla this is dealing with cases where availability, zones you're not using the zone redundancy. Anna just talked about, and notice here, that you're going to get the same for gp and business critical in the single zone scenario, from an sla perspective. But hyperscale. Varies, because remember i talked about, if you have more than one replica, or have replicas. We're going to make that availability. Better for you. In the case of these nines, expectations. Notice in the multi-zone, right since bc, or premium as well is the only offering today. This is the highest, sla you have and anna. You know when we were building this together i've heard this term in my career, uh for a long period of time but i have to admit to you i never researched, what that meant like what does that even mean well it turns out. 99.995. Is called four and a half nines. And, technically, in computer science terms that translates, to about 26. Minutes of downtime, a year. So if you go to that highest. Level of sla, that's, your expectations, of how much up time you're actually going to get, across a calendar year period. Now the right hand side, we talk about recovery, point objectives, and recovery, time objectives. Now this is really interesting. Uh because we are the only. Database, a cloud, provider that offers, slas. On these concepts, not just being available. But having, times for, how long does it take to get back and how, quick and how much data lost you might have. So if you look at this, for business critical tiers and that's the only tier that we're going to offer this sla. Uh it's very very, it's amazing right those are amazing stories in terms of the times we provide. But here's what's important, you have to use, a geo replication, you have to use, other features of azure to achieve, these type of slas. And then i think it's important maybe that we talk about that like how do you go further past what we've talked about, into more of a distributed, way of doing availability. Yeah definitely bob i think it's it's it's a great call out and great lead in um to talking about some of these ways you can kind of create a geo-distributed. Type of application. Um and really you have two options for this you have geo replication, which bob, uh just mentioned and you have auto failover, groups and, we created this chart and kind of, highlighted some stuff blue to signify, what might be better, in a certain scenario, but it's really going to depend. On your scenario, so to call out a few things, a big difference between geo replication, and auto failover, groups, as the name might imply. With geo replication. You cannot, do an automatic. Failover, you are going to have to fail over. Manually, yourself. And in addition to that with geo replication. You actually, need to. Update, those connection, strings. After the failover, occurs. Um, additionally, geo replication. Isn't available, in managed, instance today, so if you're running in manage instance, really your only option, is to use, auto failover, groups. Another, call out is that, with geo replication. Sometimes, customers, want to have. Multiple. Replicated, servers or databases. Across, the globe. You can do that with geo replication. But with auto failover groups you really just get one, auto failover, group. And of course both of these, options support read scale that's a big part of the purpose. And additionally, with geo-replication. You can have a replica. That's in the same region as the primary, which is not something you can do, with auto failover, group so, kind of you know a lot of differences, to consider, but really. We're just giving you this information, so you can make the best decision. As it applies, to your scenario, and we kind of thought you know, let's dive, a little bit deeper, into the architectures. Here. You know i'm a visual person, so i love your table that you said but i'm trying to put my, head around, why it's different. And actually geo-replication. Is where we kind of started, with this concept. If you look at this diagram, right here you need to think in your my in terms, in your mind. Of a distributed, availability, group in sql server, which. Uses, sync, replication.

Within Its own environment. Along with async, replication. To another, environment, right, which is why anna talked about the concept, of. Failover. So in this particular case right here, you can see our primary logical, server. And we have read write traffic and we'll talk about, traffic manager and load balancer in a second but pretty much think in terms of what we talked about deployments, where you're connecting to a logical server for read write applications. You also can connect to. Read-only. Traffic, right using this secondary, concept, but if remember like even in a bc scenario, you can also have a secondary. On your local. Logical server so you have the secondary logical server. We're asynchronously. Sending, changes, to this secondary, logical server. And in this particular, case, if you failed over to this secondary, logical server again it's a manual concept that anna talked about where you be changing your connection, strings. Then you have the same concept of having read write and read only and you can have multiple of these right that's the concept here you can have multiple, secondary, logical servers that you're going to point to in this environment. Now you may be asking yourself if it's async, between, these two servers. What about data loss, and that's fair, because it is asynchronous. It is possible that's the case. But remember, this is a manual decision that you're making. So, you have the ability by the way for a through a system stored procedure, to force this to be synchronized. Before you make that manual, a failover, decision, so you do have that in mind, so that's kind of geo-replication. Available today with azure sql database remember managed instance does not offer this option. Okay, what about an auto failover, group as the name apply, implies. This is automatic, failover and this is the only option you have today for managed instance, you can do this with azure database, though. Now in this particular, case. We're going to use the geo-replication. Architecture, behind the scenes to implement this, with some differences, you kind of see some of them here. Notice in the case of the primary logical server, there can be multiple, databases.

As Part of this, failover, group kind of feels like always on availability, group in that case right, we're still using a distributed. Asynchronous. Operation, to go to this secondary, logical, server, but also notice in the middle here, some things called listeners and in fact, if you've got to compare, these side by side. You can kind of see some of the differences. Multiple, databases, allowed on your primary for auto failover, group again, auto, failover, being part of the, promise here. And then you have these listener, concepts and if you use availability, groups you know what a listener is it's an abstraction, layer right, so that's why anna said, that for an auto failure, group you don't change your connection strings, because you just point to, your, failover, group server, from read write purposes. Or your failover, group server with read intent and you get directed. To the appropriate. Primary, or secondary, read scenarios. So, this is kind of the difference now it is auto failover. So a little bit, architectural, again hyper focused nana right i'm going to give you the details here, so, if we detect. An automatic, failover needs to happen, and there is no data loss we'll just failure, over. If we think there might be data loss we have what's called a grace period, which is defined. Uh, by default as an hour, you can up that to 24 hours if you want to and that's the time we'll wait. To make sure things are synchronized, for that auto failover, technology, to take place, in addition, in auto failover. You can implement that system store procedure i talked about within your application. So in a way. You could force, synchronization. To occur as you commit each transaction, if you wanted to make sure, that there was zero data loss occurring here, so pretty cool and let's just also mention at the top here agit, traffic manager, and load balancers. You could deploy that yourself. Not we don't do that automatically. And again i'm not talking about the zone redundancy, thing that we talked about for traffic management that is done automatically. But you could put a traffic manager on top of these scenarios. And they could be used to do load balancing, and point to, the appropriate. Servers whether you're using the listener, or not using lister so we have some documents. How you would design that from an application, point of view, to kind of give your web apps. The ultimate resiliency. Along with the scenario because maybe you want your web apps to failover, correct, along with your database deployment you want that to be synchronized. And that's where traffic manager and load balancing would come into play so and the really cool architectures. Again, if you think about what we've talked about, we're using the power of what we've built in sql server with things like availability. Groups. And with things like distributed, availability, groups behind the scenes. But integrating, that into the azure, ecosystem. Yeah, thanks bob i i like the hyper focusedness. Hyper, focusedness. Not even really sure how to pronounce that i don't either i don't either. Um we have a question from the audience. Um. Do you also have a switch. Which pins, the database. To a selected. Data center or country, i'm not exactly, sure, what, what they're asking here but bob maybe you, have some ideas. Well if you remember, you know when you deploy. Azure, sql. Uh let's just talk about not doing geo-replication. You decide, where that's going to be, so you pick in terms of a region. And then that region defines, country in many cases right or data center, so, it's up to you to decide, you know where does that database. Reside. And you could have multiple databases. In different regions you could use this scenario, here.

With Replication, and failover, groups, again in the case of geo. The secondaries, could be in your same region or in different regions and then you could have your app, like from a read-only point of view, pointing to a specific, region so. As far as pinning it go, that probably just sounds like more of an application. Uh design, thing you would be doing where yo you know where the regions are of where you've deployed things, so for example, you may have a primary. In the united states. Uh in a particular. Region and that's where you want your app pointed to, but you want specific. Customers, that are read only to only look at. Data that's been replicated, in europe. So your app would have to know to go to that read server based on the logical server name for that region or country that's kind of my thought about it anna. Awesome, thanks bob, um another thing that might be interesting to folks, is if we take a look. So. Let me. Get this back up in my virtual machine, give me, one moment, to just. Resituate. All these windows. And, we'll, take a look. Let's see. Again, all these labs that you see. Are going to be, available, for you. Let me just make sure we're connected. Get this running, and then i will explain. Okay, so, while this is running, uh let me explain what you're seeing on the screen. So on the left side of your screen again similar to the previous demo i did, um, i'm using, azure data studio specifically. The powershell. Notebook, um big fan of these you'll see me use these a lot and. In the microsoft, learn learning path you'll actually have a choice, to use a notebook, or use something like the azure command line interface so you can kind of pick whichever, one you want to use. And, what, we're doing, here, is i basically, have this notebook. On how to set up a failover, group, now i'm not going, to, go through the process, but, what you can tell, is, at step, 7. I'm ready to initiate, a failover, so what that means is with less than six commands. Powershell, commands i'm able to. Deploy, another server. Deploy, another, of deploy an auto failover group get everything set up get all the networking setup, in six commands which is pretty crazy, because if you think about, the logistics, that would go into. Actually doing this, uh. In, you know on your own without, azure, uh i think it would take a little bit more than, six powershell, commands. Um so that's pretty cool right off the bat, um so now let's take a look at what's on the right side of your screen. What i have is a very simple, java, application. And what it's doing is it's connecting. To the failover, group so it's not connecting, to a specific. Server, uh it's connecting. Actually, to the, the, the listeners, right, so i have the primary. Connecting. To, the uh prime, the read, write. Listener, and the secondary, connecting, to the read only listener. So that's one thing and what we're trying to do is we're going to try to insert, a row on the primary.

And Then we're going to try to read that same, row, from the secondary. And see kind of how it works and again. We'll talk about some application, best practices, but here another thing i've implemented, is retry, logic, so that if something doesn't work my app doesn't break it's just this is going to fail but it will continue. To run, okay so now, um what i can do over here is i can actually get the status, of a secondary, server, so what you're seeing is i'm saying, git azure sql database failover group, the name of the failover, group, uh, the name of the server so in this case i'm grabbing the dr, server so that's just what i called the secondary. Server. And the replication, rule and we can see it's in the secondary, role or the read only, role. Now what i can do is i can go ahead and initiate, a failover. You know you do have the ability to do auto failovers, here but i'm just initiating, a manual one for demo purposes. And now what we're gonna see, is, uh i'm just saying, switch. And then i'm saying which one i want to become the primary, so i want the dr server to go from being the secondary. To the primary. And what you can see on the right. Scroll so it's more in the middle of your screen pretty amazing, we had one. Failed, entry so you see insert, failed. Read failed, but now, it looks like, we're kind of, up and running, already. And, so, that's pretty cool how quickly you can do that failover. And then i can check the status. Of the. Dr server just like i did before the replication. Role and see that it is, the primary, once again. So it was the secondary. Now it's the primary, so that means that, i didn't have to change, anything. Uh the auto failover, group, switched the primary, and the secondary, in the back end for me, um and since i'm just connecting, to the listeners. Uh i don't have to change anything i don't go down. So that's pretty powerful, and additionally, you know i can fail back. And see sort of the same thing again. Uh now another, thing that i like to call out about these auto failover. Uh groups which i think is interesting. Is, how many read replicas. Do you think you would get. With a business, critical database. In an auto failover, group, um this would have been a good question to be a knowledge check actually. Um so if you think about it with uh your primary. In an auto failover, group. Uh you for business critical you have a read only and a read write so that's one read only. Now, if you flip over to your secondary. It's still an always-on, availability, group so you do have technically. A, read write and a read only. But, for. The auto failover, group you can use, the readable, there, as well as the primary, that's being used as a readable, for the auto failover group so i, think that works out to like, three, readable, secondaries, does that make sense bob, yeah absolutely. Awesome, cool so again you can see, we were able to fail back with just, one. Failure, and this is another example, of a lab that you can actually. Try yourself. Uh, in the microsoft learn learning path or through github. Now we're right at the top of the hour we still have some exciting, stuff to cover in the next hour but i think maybe we'll take, about a five minute break, and and and see you back to talk more about things like availability, and consistency. Really cool topics like. Accelerated, database recovery. And that exciting, challenge i'm going to throw at bob right at the end. So, stay. Tuned. Do. Welcome back, um, before, we get int

2020-08-27

Show video