Cloud Spanner and Cloud Bigtable: Modern, Distributed Databases that Scale (Cloud Next ‘19 UK)
Welcome. Everyone, thanks. For joining us 5:30. On a Thursday. Evening to, talk about in-depth. Topics, on distributed. Databases, so. If that doesn't, interest, you then you're in the wrong place but welcome, and, thanks for joining us my name is Adam Levine this, is Sharon de shet and we're. Going, to do, a little deep dive into cloud spanner and cloud BigTable. So, before we get started just, a friendly reminder to please fill out surveys we love feedback and. So those we'll open 20 minutes into the session. As. I mentioned my name is Adam I'm a product, marketer based out of San Francisco. Sharon. Is a big, data specialist, based out of Tel Aviv so, we figured London was the most convenient place to, get together have, a cup of coffee and, talk about databases. So. What. We're gonna do today is, talk, a little bit of about why manage, databases, I promise, to make that brief and. Then we'll get into the. Depths of spanner how, it works, BigTable, how it works we'll do a little demo and then, we, will. Talk. A little bit about some. Of the migration and modernization. Options particularly. To BigTable, and then if we have the time at the end we can do some Q&A so. Let's, get going and so. I wouldn't. A. Discussion, on spanner, and BigTable. Wouldn't, be complete without a little bit of a history. Discussion. To begin with so. Google, as a company has, been building, tools, and services that help people build their lot you know in their lives for as you, know long as Google has been around one. Of the things that, enables, that is, data. And. Data, in the infrastructure. To manage that data underpins, all of which Google does and so, Google has been tackling. Big data problems, for 15 years some. Of those innovations, have. Resulted. And you know famous open-source. Projects. Like, MapReduce. HBase. Things. Like that and. Then they've also. Resulted. In commercial. Products that you, have access to go build on within, Google, Cloud. But. It. All comes down to managing, data at scale and along, the way we've learned a few important, lessons, so. Before. We talk a little bit about those important, lessons a quick, story, so when BigTable, was first.
Created. And first produced, it, was sort. Of handed, over to developers, within Google to manage themselves and run themselves and so, a developer had to spend time maintaining. And managing BigTable. And then building, the application that ran on top of that, and. What the BigTable team soon, discovered, was that that wasn't efficient, all of these developers, were spending their time operating. This thing instead, of you know adding, business value and building, code on BigTable. And so they they. Decided, to build a managed, platform. That, developers. Could then access and build on top of but, the management, would be centralized, and this had two big effects, one, was, that. Developers. Could just build and focus on adding business value but, also for, the team that was managing, this thing centrally, they, were able to discover bugs and edge cases, and solve those, at. A much, sort. Of higher rate. Before. They became bigger issues so a challenge. Discovered, by one person over here was solved, before, someone, over here and so that was a really big advantage and. So. The same lesson holds true for all companies building, and running applications. In the cloud so, by right relying, on managed services, you're able to reduce your operational, overhead. And toil. And instead. Of worrying about availability. Upgrades. Security. Hardware patching, you. Can let a managed, service, help, you, achieve. That, for, you and, then, you can focus your energy on higher priority, work. Like, adding business value. Now. That's not the only. Stand. In managing. Data at scale there's. A few other things but. The overarching point is to let specialize, is for, you to let specialized, systems, that operate at scale, manage. That data so, you can focus on adding business, value. So. It looks like a quick step back there, are many databases out there DB. Engines comm if you're familiar with them tracks, 343. As of a couple of months ago each line on this, represents. One of those databases and actually got tired of clicking, to. Load them up so I don't even think this is all, 343. And, so. You have lots of choices to choose from in, addition, Gartner is saying that by 2022. 75%. Of all databases will. Be deployed, or, migrated. To a cloud platform, and so, managing, infrastructure, and the databases that they run on isn't. A. You. Know isn't a core differentiator. For most companies and. So whenever possible you want to. Take. Advantage of a managed platform, and, so then your database choice comes down to, what, your application, needs what, your industry is and. Really, what your cloud provider offers. And on, GCP we have a wide, range of, databases, both, systems. That are built, and managed by Google, we're. Gonna talk about spanner BigTable in two seconds but, then I also wanted to point out that we do have a range, of options, that, are provided. By partners, as well, as you can really run anything you want on GCE, which, you see over on the the far side and so today we're gonna focus on cloud spanner and cloud BigTable, so a few minutes in let's dive in a cloud spanner. And. We have to start with the discussion and what's tricky with databases, and so, cloud spanner addresses, these challenges with a combination. Of scalability, manageability. High availability. Transactions. And complex queries it's a lot of ands, I, say combination, because there are other systems out there that, address, this, in sort, of an or fashion.
So It's scale or, transaction. Or replication, or high-availability. But. With. Cloud spanner what, we're doing. Is we have taken, we've. Created an and statement you know we're taking the best aspects, of relational, databases and distributed. Databases. And combining, them together, um. It's not to say that like the 30 years of features that are in existing, relational, databases all existing. Cloud spanner, but, you, know we've combined relational, semantics, with. Horizontal, scale. And. 2 under you know as we go deeper and deeper you know we have to start understanding what those what the differences, are between cloud, spanner and. Traditional. Relational, databases and one, is that the application, is in, control of its data when you use cloud spanner and so you, know traditional, GBS have a lot of complexity, they have stored, procedures, and, other. Business, logic inside, the database engine cloud, spanner that's all pushed to the application, and that allows cloud spanner to scale really, well, and. So, Sharon. Is going to talk through you. Know how cloud spanner works and how. It's different and it's, really helpful to understand, how, it's built and how it works to, understand how how to approach building on it Thank You Adam. So. Sorry. So. To reiterate. What. Adam mentioned about, having, the both of. The. Best of both worlds so. Spanner, is post relational. And also. Highly, scalable, as no. Sequel, databases right so. I have a personal, story I would like to share with you I started. In the late 90s, as an Oracle and sequel server DBA, and, back. In the days. Even. As a developer. Who was not part of the production, team we. Had to do a lot of administration. Tasks and, maintenance, and. Many. Stuff. That were part of your source control like. Exchanging. Partition, like, vacuuming. Rebuilding, indexes, and some, of the stuff really. Is. Like house chores right you, you would like to concentrate your. Business logic so, we did like 60%. Of administration. And 40% of, business logic but. I think the worse was. The. Inability. To scale, whenever. We wanted to introduce a new workload, whenever. There, was a new customer. That joined the production, system, so, whenever we, needed to scale there. Were very. Expensive, hardware. Involved a long, term upgrade. Plans and this. Was a hassle and tempered, both innovation and, both, the business, and. Who. Who can ever afford it.
Would, Buy very, expensive appliance. Or use. A shared. Storage, system, like some of you know I will not mention names, so. Moving. Forward to, around, 2010. When the Big Data disruption. Happened, and Hadoop. Came and a lot of no sequel, and columnar, databases. Based on commodity server, were. Coming. To the market I remember, me and my data team sitting for. Coffee and we, were, thinking, what, if we can have both so, both, relational. And sis equal system, that. Can scale like no, sequel, in it it looked like a dream really but. When, we heard. About spanner, few, years. Later. In, a Google. Conference, like, this. We. Were, very pleased so this. Is like a full circle for, me and. What. Are the building blocks, of spanner, that make it both, scalable. Highly available, and, both relational. So. These are the Google Network. For. Those of you who don't know we have a global, backbone. Private Network very. And, fast, the. True time and, truth time is our globalized, global. Synchronized, clock and we are going to speak about it later, to, understand, how, can truth, I make make, make. Us, both, relational. And scalable, and we. Made some optimization. To the taxes, algorithm. The famous taxes algorithm, for two-phase. Commit and on. Top of that we, have some automatic. Rebalances. Of the shards of the table, so these all, are, can. Explain, why spanner, is high valuable, performance. Scalable. And relational. So. Every. Big, data talk has the cup theory minute so we are not different. Here so in. A distributed system we, cannot have all. Three, guarantees of, partition, tolerance, availability. And consistency, so. The traditional, relational. System, would, sacrifice. Availability. For the sake of consistency and, the, no sequel, system, will, sacrifice. Consistency. For availability. But, what about spanner, do, we break the cap theorem, the. Answer is no no no we don't break it but, we minimize, the chance of partition. Of networking. Partition. Because, we have this highly, reliable redundant. Network. So, we can have. Very, high SLA, but. If we have to sacrifice something, we. Will sacrifice availability. Because this is a relational, system, and a. Fun, fun. Fact. Erica, Brewer who, created. The cap theorem is, working. In Google and, he. Has written an article about, true, time spanner, in the cap theorem if you would like to check it out. So. Okay. So what we see here is the regional instance, in spanner we have two configuration. One. Regional, and one multi original the original is, under, four. Nines. SLA. And. Recently. We announced that, even, with one, node or two, nodes we, can have four. Nines SLA. And this is a big. Improvement. So, you can, start small with spanner and do a POC and, then, you can scale out and, what, you see here is something that is. Is. Very interesting, we have a separation, between the storage and compute, so, a node, in spanner, is a unit, of throughput. And compute, it, does not have any storage the storage is, elsewhere. It's distributed. And. Every. One of. The. Spanner instances. Can have one more than one database we can have up, to one database. Up, to 100, database sharing. The same. Configuration. And the instance, two to. Enjoy the same resources. But. In spanner, we don't as. Far as multi-tenancy, we, usually do not design. Database. Pertinent. We, use the primary key for that so databases. Are for sharing the same resources. In. A multi region. Configuration. We, have three. Types of replicas, in. The main region in this example you see a spanner, across, three, continents, so. The, u.s. is the main region. In, this example, we. Have what we call readwrite. Replicas. And these, are replicas, we can write into and we, have something. That is called a. Witness, replica. So this will ensure us a quorum. Even, if the right replica, is gone, in, addition to that in the other regions. We have read, replicas, so, this, is very performant. The readers can can. Read close to their zone and we, of this global high, availability.
And. Often. When. We speak about spanner we speak about external. Consistency. And, external. Consistency mean. That, the system, behaves as if all. Transaction. Were executed. Sequentially. Even. Though cloud spanner actually, run them across multiple, servers, so. It acts like a 100 database, but it is, a just distributed. System, so, this, is twofold this, is due to the very. Fast. Network and our sync synchronous, replication and, the customization. We made to, the paksas protocol, and the. Other factor. Is true time so. Often, true time is mentioned, as. One of the building block and what makes spanner. Ticks and pun. Intended here. Because. True time is a globally. Synchronized. Clock and in, true time in each zone of, in. Each one of the zones of spanner. We have combination. Of both. GPS, and atomic clock each, one of these clock, types are. Compensating. For the others failures. And in. Addition, to that we, bring into the. Timestamp. We. Attach. To every write and read, uncertainties. Of the network, so, even, if we synchronize. Every, 30 seconds, the, local, time and the. Reference, time we, can have drifts, in the local clock and we, have, uncertainty. That can, be as much as 2 millisecond. And this, is used as an epsilon, as part. Of the formula, that spanner is. Comprising. When it has to attach the timestamp to the transaction. So why. Am I telling you all this it's it sounds complicated, so. This, is how a spanner, make sure that. One time step does not overlap, with the other and this, is what makes this sequential. So, there, are no collision, and the, readers can read multi, versions so it is also a multi version database. And the, wizard the. Readers can read the very consistent. Version, of the, data without. Holding, logs and this is rather revolutionary. In. Distributed. System, and. On. Top of the, true time and the replication, we. Have a ton automatic. Table split, so, the, keys in spanner are ordered. And, split. Amongst, the nodes of the cluster in. What we call key ranges, or splits, so, each node can have one, or more a split. Of the table, in our example, of three nodes so, each node will have three. Twists splits, and we will get back into splits, and. Every. Split has one leader that, is allowed to write into the split and two replicas, so this, explained why we are so scalable. Performant, and highly. Available because we have three replicas, of each data. Unit. Okay. So, we. Talked about the wonders of spanner, but at, the end of the day this is a distributed. System with. Networking between so, we do have some best practices. Around primary. Key around. Child tables. And around the indexes, you need to keep in mind when you start your adventure on, spanner. First. How, do you keep. Parent-child. Relationship. So, you heard from Adam, we, don't have many knobs to turn it is not like the. Classic, relational, system, with the triggers, and the integrity. Constraint, but, if you do want to co-locate. In, the same physical node, the. Child. And the parent, key you. Should use the inter live keyword, like, in this example, so. In, this example if, we interleave, the foreign, key table the singers, with, the parent, the albums there, will be co-located. Together on disk. So, this is when we don't use the interleave, keyword, we, have two separate, tables, the, singers and the albums, but. If we do use the interleave it, will look like that so. We. Have collocation, of the foreign. Key the. Albums with, the sinker. And. We. Have some type of indexes. In spanner. Automatically. Every primary key will, have a unique, index, we, also have, the, ability to create independent. Interleaved. Index, or non interleaved, index, we, have other. Types of indexes. Like. Null, filtered, indexes, these are indexes, without nulls because. By default nulls. Are getting indexed. And we, also have some cover index, so whoever. Worked with ms sequel know the term so covering, index, is, helping. Us to prevent. Lookups, to the base table, when, this is applicable. And. Remember. The split we talked in, the scaleability chapter, so. We, used an example of monolithically. Increasing. Primary key and in fact this is an anti-pattern. Because. We don't encourage this. By. Default, the, last in, a monotonically. Increasing. Primary. Key the, last records. All will. Be appended to, to, only one note and. This, is a split. Number eight so, we don't want one.
Leader To accept. All hotness, from the new records. So, this is why it is recommended, to distribute. The keys by using a unique ID or by. Using some field. Promotion. Or salting, or some. Some type of bear mechanism. That we will have even distribution of, the split. And. Finally. We, have some new features, in, spanner. We. Talked about the 1 & 2 node guarantees. The SLA, we, have some. New graphing. In in our monitoring, system in the console, we, introduced JDBC. Driver will, reduce, support, in hibernate, and some, more security, control, and with, that let's. Get back to Adam to speak about BigTable. Right. So, I told, you we had a lot to cover and it's sort of a whirlwind, deep dive so moving right on from spanner to cloud BigTable. And. So cloud BigTable, is. Are, you. Know scalable, high throughput low latency. Datastore. And. So it's a you, know if you're familiar with the the types of no sequel databases it's, a wide column, or key value. Datastore. I'm just really good for low latency, random data access, it's. Often. Sort of partnered. With bigquery, for, a lot, of workloads. Particularly. Around real-time analytics. Doing. Machine learning and, AI on top of lots. Of you, know it's just a log data. And. The really nice thing about it, is that performance. Scales, linearly, as you add nodes and that's a completely online operation. So the same thing is true with spanner where any sort. Of scaling. Procedure. Or, any sort, of you, know with spanner schema change is all online so oh there's, no such thing as planned. Downtime, for. These databases. And. One other thing to note about BigTable. Is that, it's fully compatible with the, HBase client, so if you have HBase. That. You're running yourself, it's, you, know relatively, straightforward. To move, to a big table and use that one, use case that we'll talk about a little bit later is also moving. From, Cassandra, to, a cloud BigTable, because those are sort of can, be similar, data models in similar use cases. One, thing, to note about cloud BigTable, is that it's fairly, well integrated, with the rest of the GCP ecosystem. So you can actually query directly. Big query into. BigTable, it's, integrated with data proc and data flow and then, most. Importantly, it's integrated, with tensor flow and. So there. Are a lot of people building. Ml. Models, on top of BigTable. And. Using. That to sort of, process. And then serve. Personalization. Data like. This ooh the, most common, use case for, BigTable. Is. Things. That fall under a very broad umbrella of, personalization. And so this is really high throughput reads low latency, writes and that, integration. Where. You, are doing, predictions, on clickstream, data or. You're wanting to create a unique user experience, based on actions, and. You can sort of. Use. All that BigTable, offers to do this and we see lots of customers. Doing. This today. The. Linear scalability, of, BigTable. Makes. It. Sort. Of very sensible, for ad tech FinTech, gaming, IOT, and other. Use. Cases where you just you know have tons of data coming in and. You need a place to put it and then access, it later and. So. As I mentioned like big tables often used. With. Bigquery and this is sort of the you know very high level. Market. Texture diagram that we see for, this. Sort, of wide umbrella of personalization. Workloads, and. So that's, just a quick run-through of big table now Sharon's gonna talk a little bit about what's. Going on under, the hood with BigTable, and how, its able to achieve. Such. Throughput, and performance Thank, You Adam so. No personal, story here but we have a very nice demo coming, and, a. Little bit about the German terminology. Around the, big big table. We. Have an instance, and an instance, is a container, of cluster, we, can have up to four cluster. And we. Have, them in various. Zones and, regions, so each, one of this cluster is attached.
To A zone and. We. Have nodes, that are also. Called, tablet. Servers, we are going to speak about what. A tablet, is and, you. Can attach a storage, that it either, SSD. Or HDD and, of course for production, use cases we, prefer, SSD. It, has a few millisecond. Latency but. There are some use cases when, you, don't care about latency. As, much as. The. Cost so, you can use HDD. As well and, in. In. BigTable, as in. Spanner the, nodes are a unit of compute, and throughput, they also have, does. Not have storage of their own because, the storage is separated, from the compute, and we will speak about it later and. What. Is very nice about, BigTable. Is that, with we. Can scale easily by, adding more nodes so. Each, node we add to the cluster, is roughly. About. 10,000. QPS. So. You. Can scale. Up and down according. To your. Requirement. Your throughput your, performance, or your. Cost, planning. And you. Can see here, it. Scales, very easily. And. What. We see here so, we talked about the. Nodes are separated. From the storage but. We have another. Wonderful. Thing. Happening. This is automatic, rebalances. So. Every, one of the nodes is a throughput, unit, that, is responsible. For writing, and, reading to, the storage of the BigTable. System. But. Once we see one of this node is more. Loaded, than the other we. Can the, router layer of BigTable. Can, automatically. Place. The shard, in another node that is less busy each, one of this chart is called, tablet, and and. This is why the nodes also called tablet server and, you, can think of it as like a logical. Unit, handled. By only one node so, so. Those, of you handle. Cassandra. Or HBase, it's, very similar to regions. Or, partitions. And. A. Little. Bit about data modeling, so in, spanner, we spoke about we have some best practices, everyone, wants to know and needs to know in. BigTable this, is mostly, around modeling. The key so. Modeling. The key because you, heard from Adam this is the key value system, so. The timidity, of the transaction. Is, is. Pair in the context, of one, row it does not cross rows so. This is why it's, very important. To, model the key properly. So, the only index in BigTable, is. The. Key index, so, if you need additional indexes. You'd, you, would probably, create additional tables. Or use, some of the server, based filtering, that will happen after, you retrieve, the blocks from the storage and what. You see. Here is the. Column, families, column, families, our a way to group together physically. Columns. That have common. Characteristic. So. You, can Co, locate them together physically. And with. Each column family, we have one or more columns and the, system, is very sparse, what, do I mean that, you, can have in. One row you can have 100. Cells and in the other you can have only 50, cells so, you don't pay for what you don't right each. One of these cell has. Multiple. Dimension. So, the cell is perfectly. Its per column, family, it's per column, but, it's also versioned. So, you can do absurd, and right versions. In the same cell and, there. Will be a garbage collection, that. Will collect the older version, and you can play with, the configuration and. Decide that you want to keep all version, or you, can keep only the last version and in. Addition to that you can also configure, time. To live, TTL, in the column level family to, control, historical. Data aging. Out so, we see many system, like. That in the attack, in the monitoring. So, they use both the garbage, collection configuration. And the TTL, and. This. Is one of the most important, design, tasks. You will have to do per, table, in BigTable this. Is to decide, about, the key so. We, will try to avoid, keys that, create, hotspots, that for. Example we. Have a iot, system. In this example, and we, would like to monitor metrics. Of a device like memory, cpu. Storage. In, this example, if, we model, the key to, be around the only memory usage, per across. All the devices we, will create a hotspot, so. Adding, the timestamp can alleviate, some, of this problem but. It will also, introduce another problem. Of sequential. Writes that can also create hotspot. So, what we propose. Here, is doing, a field. Promotion. By, adding the user the, user in this example, is the device to, the key so. I, I, learned, from many customer, that when they chose the wrong key, the. Performance, was not as expected, and when they did. The rethinking, and design the keys as. As. These, best practices, they, got a few, millisecond, the latency, as expected.
And. This. Just. Summarized. What we said, so far but, what we haven't spoken about is, some, best. Practices, around the size of a single cell, and the. Size of a single table so. We recommend the cell to be not more than 10. Megabytes. And the road not to be more than 100, megabytes. So if, you have a very, large role you, will start to see some warning. In a monitoring, system as we, will show you later and. These. Are the common, API, operation. In BigTable and. Most. Used, are put and get to. Write. And read a single, key. Also. Very popular is, the range scan. So, if we would like to. Create. A monitoring. System time-based, or. Anything. Else that is time serious, we, can use a range scan. To. Read. More than one key in the same API, call, in, a very performant. Manner. And. This. Is about rights. So reads. Are very fast, and big table, but. Rights are even faster, and this, is because every one of the mutation. The updates, of the rows in BigTable. Is being, written first, into, memory for, consumption, and only. Then it is flashed, to disk in, the. In. The form of what we call SS table so. As a stable, is the, is. The most optimized. Way to. Keep, the keys ordered, and the values, in in the same place and we, also have, commit. Log so every one row we, write in BigTable is, being, written first to a commit log to, assure, us that even, if the node crashes, we, still we, still can recover from the transaction, and. Finally. Just, before the demo bear with us because the demo will be interesting, this, is about monitoring. So monitoring, in BigTable can, be done by using the console, and in, the console, we have very useful, graph for CPU for. Storage, for throughput. We. Can also use stock. Driver and, stock driver is the uber monitoring. Tracy in auditing, suite, we use for all GCP products and one. Of our customer, you may know know, them Spotify. Created. An open-source. Project, for, auto-scaling, auto-scaling. BigTable. Programmatically. Based. On metrics. From star driver so, whenever they can, detect. A note that is too large as far as storage, or CPU, they, will Auto scale the cluster, in an automatic. Manner so. This is the project by Spotify. And we, can also had some client-side. Monitoring. By, using open census, for example, and finally. We. Have key visualizer, and this, is a very, interesting. Piece of engineering this, is a heat. Map. That. Will show us in a visual, manner a lot, of dimension, of performance. In the big table. Cluster, so. Horizontally. We can see the, timeline so we can see, summaries, overtime of. CPU. Of reads and, writes and. Rows. Distribution. And vertically. We can see the, entire T. Schema, by, prefix, so if the tool can. Make, sense, of the primary. Of the key, you. Can fit as a, hierarchy. Of prefixes. In there vertical, access and a, heat map is a heat map so all the cold, values, are dark and all the hot values, will, be very bright, like, lumen, in bright.
These. Are some, of the common, patterns, we can see in key visualizer, for, example, you can see a periodical. Usage. So, the entirety, schema, is affected, at once by something that is happening, normally. A batch Map Reduce or, Apache. Beam and, the. Diagonal, pattern you see here. And the bottom right is also interesting. It's an indicative. Of a sequential, scan all, right usually, again, by a MapReduce, or one of the processing. Framework. So. We. Can move, to the demo. But. First I have to tell. You something about the demo in. The demo scenario, we are going to walk you through how to monitor, troubleshoot. Pinpoint. Performance. Issue in the cluster and we are going to also demonstrate how. To scale, the cluster, and the effect of scale in the cluster so. We, have an. Event, table and the. Event table has time, based the. Trading. Event of, a trading platform it. Has four. Kilobyte, average, row size and we, begin, with six node cluster in this, scenario the, trading, company decided. To start. With an historical. Backfill, so, they would fill, the cluster, with some historical, data before. It becomes production. And when, it becomes production. Ready they, will add replication. For high availability and, load balancing, they will add application, profile, they, will have an angular, real-time. UI and, they will have streaming but. In our scenario, we are focused at the historical. Backfill. And. We. Have some readers, in the system, data scientists, and data engineer, that starting. To train the data to. Do some prediction. And time. And time based analysis. So they are complaining, about slowdown. And timeout and we wanted. To know what happened, so we took a recording. Of the system, after we ask the reader to stop reading so, we can see what is happening in the cluster, so. Without. Further ado, let's look at the recording, what, we see that the cluster is starting, to fill up with the historical, data and. We. Can see we have six node and. Relatively. High CPU, so. We go straight to the monitoring, pane to, see what is happening. We. Can see we have very. Very high. Peaks. Of CPU, above. The recommended threshold. And we, can see that, these. These. Spikes. Are coming from rights not from read so, we know something is related. To rights and we. Can also see, this, high throughput, that. Is in correlation, to the pics we saw and, we. Are using. Now key visualizer, we will use key visualizer, to, understand, more so usually, you will start with the monitoring, console, and and. When we would, like to have more insight, like what, is happening, in the key space weather. We have warning. About large rolls what. Is hot and what is not we, go to key visualizer, so, we can go to key visualizer, now. We. Can open key visualizer, directly. From stackdriver from. The resources, menu and. We. Have both, daily, scan, or hourly scan, of the system, we, go to the last hourly, scan what, we see here is the ops metrics, is the aggregation, of reads and writes per key and we, see a periodical.
Pattern, That, is coming. Strictly. From the, CPU, of writes from, the writes and we, can also see, very high latency. Of. A few second, which is not expected so. This is what we will try to tackle to, lower this latency. That can. Explain the pics we saw in the CPU and we. Can also look, at the time. Base summary, and there horizontal, axis, and on, this on the key space in the vertical, axis we have. Another. Matrix showing, the distribution of. Rows, amongst. The buckets, of the of the of, the keys so, key visualizer, will try to divide, in an even manner the. Rows of the table between, what, they call key, buckets, and these, are. Mainly. For visualization, in. This example, we, don't have an even, distribution yet. Because, this is a relatively, new cluster. So we don't have the entire key. Space, yet so this is rather normal we. Happen to know that we, have data. Flow our managed version, of Apache beam. Ingesting. The historical. Backfill. So this is this. Is the most certainly, the the, problem, because, we saw this is coming from right and we, also know that we have an airflow or most. Precisely, our managed version, of airflow. Composer. That, is orchestrating. The ingestion so let's go to composer, or to, their flow UI and we. Can pick the last ingestion. Tasks from the airflow, UI. To. See. What is the name of the. Dataflow. Job, and the. Purpose is to take the name and to go to the data flow UI to. Correlate. What we see there with what we have seen so far so. This is the name of the dataflow job this, is the data flow UI. We. Go to. The. Data flow job and we, see a. Throughput. That is matching. What we saw in the UI of. BigTable. And if, we go to the last transformation, stage. We, can see also the number of rows we, write we, write and. And. The storage we save in every one of these, ingestion. Job so this is pretty heavy, and it, can explain, why the clients. Are imposing, too much stress, on the server so. Our conclusion, at the end of all, of that is that we need to scale the cluster, we need to scale, BigTable. Enable, to be, able to soak, all this pressure from the clients, coming, from data flow and this. Is what we have done we scaled the cluster, from 6 to 12 and after. About 20, minutes we, started, to see an improvement and we started to to. See that the CPU went, lowered, throughput, to entire and, the, reader started. To. To, work with the system without. Any complaint, and this, was done very easily because, scaling. In BigTable can. Be done, programmatically. Can be done burnt the UI or by the command-line, so let's look at the recording, of the system, after we fix it and we can, see in the graph that the CPU went, down and. In. Correlation. To the CPU. Being. Back to normal below, the recommended threshold. We. Go to the. Throughput graph and we see throughput. Went high so. We probably can, have more, frequent right, now and more workloads, in the cluster and if, we look at in the perspective of four, days this, will become even more apparent, how. The. Scale made such, an effect and we have now a room. To breathe in the cluster and. What. Is happening, in key visualizer, now. So. Again. We go to the last hourly, scan, we. Don't expect, this periodical. Pattern, to go away because we are still ingestion. But. We do expect latency, to go down so let's look at the latency. So. The max latency, should be around few millisecond. And it is around few, millisecond. So, now we are not choking and this is what we expect, of the system, and this. Was done with only a few minutes of troubleshooting, and, a few clicks and. This. Concludes, the demo and, we. Have a few more, things, to say about. Integration. In BigTable. And. Migration, path, so. BigTable. Inspired, all, the system, you see here and. Most. Noticeably. HBase. Which is one of the major. Part, of the ado picot system, and also.
Cassandra. Which, was open sourced by Facebook, on 2010. And, Cassandra. Was inspired. By, more. Than one system but mainly, from, BigTable. So. You must, be wondering why would we. Want to move from system, like Cassandra for example, to BigTable, so, the main reason, has. To do is the operational. Burden because. I I myself, medication. And it has a lot to do with tuning, consistency. Level of, union. Member. Memorable. Of tuning bloom filter, so, we don't have all this burden, in in. Big table and we have a very. Low touch replication. And we, don't have to deal with a. Lot, of discovery. Topology. Of the network and. You. Can scale, up and down very easily and, save, cost according. To your throughput and performance and. Finally. We have interesting, platform integration. In GCP like Adam showed you for, example in most of the recommendation. And personalization. Products. We, will use a combination of, bigquery. And, BigTable. And all, this integration make, this even more powerful. So. I think we concluded. Before time, yeah. So you can go on. Thank. You thank, you. You.