Under the hood of Google Cloud data technologies BigQuery and Cloud Spanner

Show video

hi and welcome to our session where we're going under the hood for google cloud's data technologies i'm leah jarrett and i'm a developer advocate here at google cloud and i focus on data analytics but today i'm also joined by my colleague derek who focuses on operational databases today we're going to be talking about two technologies first spanner on the operational databases side of things and the left hand side of our screen and then we're also going to focus on bigquery our data warehouse for analytical processing and these are just a few of the pieces in the broader google cloud data ecosystem integrations between both of these products and other products that support data fabric artificial intelligence and business intelligence all go into making it simple for data practitioners and business users to power data driven innovation using google cloud so the story of both spanner and bigquery starts with the evolution of the separation of storage and compute historically database systems have been architected with tightly coupled storage and compute so there's a dependence on locally attached storage because attaching storage to a network would result in higher latency but more recently these constraints have been lifted largely due to faster networks which we're going to talk about momentarily so now with architecture like bigquery and spanner we have a separation of storage and compute and this allows us to seamlessly scale to meet your high throughput data needs new compute nodes can easily be added to handle large workloads and we don't need to rebalance the data and ensure that a local replica is stored on new nodes so let's take a closer look at how our distributed storage system colossus is also faster reliable and cost effective for you so colossus is google's next generation distributed file system it manages stores and provides access to all of your data for both bigquery and spanner its design enhances storage scalability and improves availability availability to handle the massive growth and data needs colossus introduced a distributed metadata model that delivers a more scalable and highly available system with all this data stored remotely if a single node fails there's no data lost or even made temporarily unavailable not to mention remote storage is dramatically cheaper than local storage and the proof is in the picture so not only does colossus handle your first party data stored in google cloud but it also stores the information needed to run vital google systems that are used by so many people all over the world like we mentioned earlier the separation of storage and compute would not be possible without the evolution of networks which increase the speed at which compute clusters can pull data from storage and reduce latency just like colossus google cloud services like bigquery and spanner leverage the same network that powers the underpinnings of google's systems and we call this network jupiter so jupiter is made up of an in-house custom network hardware and software and it connects all of the servers in our data centers together powering our distributed computing and storage systems so we discussed our file system and the network that connects the storage and compute together now leading all the operations in our data center is borg google's internal cluster management system and the predecessor to kubernetes so bohr goes ahead and provisions the needed compute resources to handle workloads for spanner and bigquery running hundreds of thousands of jobs and making computing much more efficient this allows our data centers to run a really high utilization so now let's see how these pieces come together throughout the life of your data so let's take a look at symbol gaming our fake gaming company who uses spanner to support transactional or oltp workloads so a new user might log into the spanner game and enter their information this data is then sent to colossus for storage but it's going to be in spanner's file format next this data is requested in a query sent through spanner's compute engine and finally the results of the query will be returned in the application to populate the new user's profile when they load the page so spanner storage and compute are both going to be optimized for those transactional workloads and derek is going to talk more about that in a minute so even though symbol is using spanner for transactional workloads they also want to perform analytics on this information to understand how users are actually playing their game but for analytical workloads they're going to instead use bigquery so once the user's data lands in spanner they have two options first they can replicate the data in bigquery storage by pushing new rows into the native bigquery table this is going to provide optimized storage for analytics on the other hand they can use a federated query sent from bigquery over to spanner so that they get real-time access to information from their application either way they'll benefit from bigquery's query engine to perform scalable analyses including built-in machine learning capabilities that power business intelligence reporting and real decision making from business users so now that you understand how bigquery and spanner both come together to support your data needs let's talk about how spanner is actually optimized for those transactional workloads so derek over to you great thanks leah cloud spanner is google's fully managed distributed relational database spanner provides customers with minimal operational overhead with no downtime for schema changes or even maintenance windows this allows spanner to provide an industry-leading five nines sla because spanner is distributed database it provides seamless replication and scaling across regions in google cloud spanner processes over 1 billion requests per second at peak and all of that as a relational database that provides strong external consistency made possible by the technologies mentioned earlier borg colossus and jupiter to set up an example imagine that symbol games has developed a game that will be played by people all around the world these players must be able to sign up authenticate and play the game players really don't like to have to redo actions so this activity has to have traditional consistency guarantees another thing players don't like is downtime if they can't play they get upset and it better be fast so simple games will need to have the ability to scale up to meet user demand with availability consistency and scalability in mind let's see how cloud spanner can help them achieve their goals when you create a spanner node you get access to spanner resources in three or more zones zones are borg cells these cells are a collection of spanner resources that are physically separate from other cells spanner resources consist of spanner servers and colossus allocations of data storage as discussed previously spanner servers provide the compute capacity for your cluster and they contain the span process paxos and other board tasks to enable google sres to manage the environment the zones can be within a single region as shown here or across multiple regions if you have a regional configuration you have zones close together that provide the lowest read and write latencies but you only get an sla of four nines availability in a multi-regional configuration your zones are located farther apart which can increase latencies but it provides global scalability and an sla of five nines availability now imagine that simple games has a table of users the user data is stored on colossus and encrypted at rest a copy of this data is in at least three zones and replication of the rights is handled synchronously by paxos consensus so their data will survive zone outages and allows for zero time zero downtime maintenances as well to understand consistency let's go over typical transaction life cycles when a new user signs up symbol games issues an insert statement to spanner since nodes exist in multiple zones where the insert statement hits depends on which zone is the current paxos leader in this example zone 2 is our leader a single endpoint is exposed to the client and spanner internals handles routing of the query when the leader receives the write it acquires locks and generates a true time commit timestamp then it sends the write to other nodes in the configuration these nodes write the mutation to something like a durable transaction log and confirm the write to the leader after a quorum of nodes approves the right then the leader tells the nodes to commit the data it releases the locks and returns the success to the client all of this usually happens within a few milliseconds we can't do a session about spanner without mentioning true time truetime is a highly available internal service backed by a network of atomic and gps masters this service can be called to generate a guaranteed monotonically increasing timestamp with microsecond granularity across all nodes in spanner these timestamps are generated for writes it is precisely because of true time commit timestamps that allows a distributed database like spanner to ensure strong external consistency so let's see how they are used for reads once a user account has been created they'd want to be able to log in and start playing so symbol games needs a consistent read to provide authentication for the user in this example the read is going to be executed against zone 1 since it's closest to the client making the request but since zone 2 is a leader zone 1 checks in to make sure it has the latest commit data if the data is the latest it can be returned to the client otherwise the read waits until the node receives updates how does the replica know that it has the latest data with true time commit timestamps the replica compares its own true time state with the latest commit timestamp to know if it's enough up to date to serve the read but not all reads of the user need to be strongly consistent since user data doesn't change that often symbol games can choose to issue a time-bound read otherwise known as a stale read to do this they provide the max age that is acceptable for their query making it time-bound this allows the spanner node closest to the client to respond to the query without incurring additional round-trip latency communicating with the paxos leader no matter how fast the jupyter network is if you can cut out latency across zones you should with availability and consistency covered simple games still need scalability out of spanner so let's look at that as a first step simple games gets closer to launch they can go ahead and increase the number of nodes from pre-prod levels to prod levels simply by updating the cloud console next as the user data grows spanner will start to create splits of the data it does this both by detecting cpu load increases and hotspots and will split the data into chunks to address these issues these splits can be hosted on different spanner servers backed by different colossus allocations which helps distribute the load data splits and increasing the number of spanner nodes gives simple games the ability to scale as much as they need to but to avoid headaches down the road they need to be sure to place related data in the same split using interleaving imagine user activity data that relates to the user data by interleaving the user activity table with the user table spanner will change the splitting logic to make sure that the data stays together on colossus in this way spanner scales easily while also keeping performance high by avoiding additional locking required to read or write across multiple splits managing this distributed database for transactional workloads is only possible because of technologies like truetime borg colossus and jupiter so now leia's going to take it over and talk about analytical workloads great thanks derek so now that you understand how spanner is great and optimized for transactional workloads let's talk about how bigquery's architecture supports analytical workloads so let's start by adding the new user data into a native bigquery user table as soon as you push data into a native bigquery table the data is going to be transformed into a file format called capacitor uses what we call columnar oriented storage meaning each column in the table is going to be stored in a different area of the file so the amount of the columns you query is going to be proportional to the amount of data that's actually scanned now if you think about it this is optimal for analytics because you're often aggregating just a few columns over lots and lots of different rows plus each column also has independently compressed information here we use different types of encodings to optimize storage which makes it easier for dremel our query engine to quickly find and use the data it needs for your query even better you can choose to cluster or partition your data stored in capacitor so that dremel can easily eliminate certain files or blocks from the query let's look at a quick example of this so let's say we have our user table and we've decided to partition based on the date that the user was created we're also clustering or sorting our table based on the game id so now when we filter our data based on created date and game id dremel is going to use the file headings in capacitor to reject the entire file or file block if it's not needed for the query this means that less data is going to be scanned and your query is going to be both cheaper and faster so in colossus your data is stored in capacitor and it's going to be automatically compressed encrypted replicated and distributed so 100 of your data is going to be encrypted at rest just like derek mentioned with spanner and colossus is also going to ensure durability using something called erasure encoding which stores redundant chunks of data on multiple physical disks so immediately upon writing data to colossus bigquery is going to start this geo-replication process mirroring all of the data into different data centers around the specified region or multi-region depending on how you set up your bigquery data set this means that you have backups of your data stored in case of disk failure or data center outages okay so we talked a little bit about bigquery storage now let's talk about compute so dremel bigquery's query engine is made up of something called slots and these are google's proprietary units of cpu ram and network and they're used to execute the sql queries so bigquery will automatically calculate how many slots are required by each query depending on the data size and the complexity of the query itself you can think of a slot as a worker and slots are going to be responsible for executing parts of the active query independently and in parallel and they're going to perform partial aggregations they'll also read and write data back to storage so the awesome thing about bigquery's architecture is that you can actually purchase slots and reserve them for certain workloads in your organization this means that you can scale your throughput and your runtime as needed so now let's go a little bit deeper into query execution first when you submit a query bigquery is going to break it down into different stages which will be processed by those slots intermediary results are going to be written back to what we call the remote memory shuffle since data here is usually stored in memory it's really fast for slots to read and write data from the shuffle now each shuffled row can be consumed by slots as soon as it's created this makes it possible to execute distributed operations in a parallel pipeline which means that we can analyze huge amounts of data really fast and if a worker were to have an issue during query processing another worker could actually simply pick up from where the previous one left off because they're able to read from the previous shuffle so this adds an extra layer of resilience to failures within the workers themselves okay so let's say we're running a query to count how many users join symbol games yesterday before feeding them into our ml model in the first stage of the query slots will read the columnar files from colossus in parallel each slot processes a columnar file block filters the data and then writes the partial aggregations to the shuffle in the next stage a slot reads the data from the shuffle and combines those partial aggregations together then it's going to write the results back to colossus so that it can be served as a cache table back to our user so how do these architectural distinctions of bigquery actually translate to business value for you well first of all we talked about how bigquery's optimized storage and compute allows for super fast queries over huge data volumes meaning your business will be empowered to make informed decisions without ever missing a beat next we showed how bigquery helps you save money on data storage and analysis so you spend less resources answering business critical questions we also talked about how bigquery storage ensures reliability and durability meaning there's a very low risk of losing information and interrupting business processes and finally the ability to purchase and dedicate slots to specific workloads means you can scale throughput and run time so that you continue to support your business analytics as your data and your use cases grow and don't just take our word for it google cloud actually has hundreds of bigquery customers with petabyte-scale data warehouses so thanks derek for walking us through spanner and thank you all so much for joining us today to dig into spanner and bigquery under the hood here we've provided some resources to help you get started using both spanner and bigquery and i've also listed some sessions that are going on at next related to data analytics and databases so thanks again for joining us and we hope you enjoyed this session and that you enjoy the rest of your time spent here at next you

2021-11-11

Show video