Cloud OnAir What s new in BigQuery

Show video

You. Welcome. To cloud on air live, webinars, from Google cloud we. Are hosting webinars every, Tuesday, my. Name is Tina toshko, and today, we will be talking about what's new in Google bigquery, you. Can ask questions anytime. On the platform, and we have Googlers on standby to, answer them let's, get started. As. I mentioned my name is Tino and I'm a product manager on Google bigquery. I'm. Actually being, joined. By Jordan, tegami one of our engineering, leaders. On Google bigquery to. Help me with the Q&A here there you. Can always reach me on twitter at the handle down there and saying with Jordan I'm gonna volunteer them for that, we. Love, talking to our customers we, love talking to people who are using our platform we. Want to know when things are going well when things are not going well we want to know what you would like us to build next so please feel free to reach out to us and. At the same time of course you can follow along with our progress on our, release notes you. Can reach us on Stack Overflow we. Have a public issue tracker, where you can submit feature requests. And. So on so like we're even active, on reddit and we have lots, of folks on places. Like hacker news, that. Are. Developer, advocates, that, you can have conversations with but. This is just another way for us to reach out to our community and update, you on the things that we've been doing so. Far in 2018. Well. Let's talk about that, so. Far in 2018. It's been five months we've released. 27. Major features, into, beta and, general. Availability, now. These, are major features, so. A lot of features are coming out that just don't get as much attention but. These are the things that really our customers care about and. For every major feature that we release, there's. Probably a major backend feature that you don't know about so things that have to do with improvements. Around performance. Durability. Availability. Reliability even, cost, right, so the things that you may not notice until you, go and you push, that big red button that says run query things, just might work better the. Next time and of course all these features, appear. As, they, roll out right, we we don't have maintenance windows, we we, don't restore. Your bigquery. Instances. And things like that you just get features. Continuously. And. If. You don't know we are having a major conference, at the end of July July 24. And 26, in San, Francisco, we're. All go clap our form is going to be joined by our customers, or users you'll. Be able to hear some product updates you'll, be able to hear from our leadership, and some, user stories and of course you can connect with other folks who are using Google cloud platform so, please, come join us I'll be there. Shameless. Bitch I'm actually going, to have two. Sessions there in. The first session I'm going to be joined by Rick and Kevin from the Home Depot where. They'll talk about their journey on using, bigquery.

As. Their, enterprise. Data warehouse for, all of their analytical, needs and. In the second session I'll be joined by Nicole from Spotify, GP. From o'the Yahoo. And. A couple of guys from Twitter Pavan. And Roman who. Will talk about their, you. Know internet, scale usage. Of the. Query as well and, Jordan is gonna have a session as well Jordan. Is going to be joined by a Lloyd tab from looker the CTO of looker and they're gonna talk about some really interesting concepts. Around the quarry so please come join us as well. Well. Believe, it or not the, quarry has been around for six, years, we, went general, availability on, May 1st 2012. It's interesting because New, York Times wrote about this, Google. Offers big data analytics. Which. Is interesting we've, grown. Since then to incorporate. Lots of other features and lots of other functionality, that kind of extend us beyond, the Big Data realm but. You, know we've been at it for six years and we're continuing, to release more and more features and. Again. We'd. Love to hear the features that you care about the most and. We. Have lots and lots of customers were really. You know proud, of of the. Customers that we have and the journeys that they've taken with us you, know folks like Spotify as I mentioned before in Home Depot and o'the, and. You know there's, this call these folks are coming from all kinds, of backgrounds with, all kinds of use cases gaming. Clickstream. Analytics. Marketing data, retail. And so, on and so forth so hopefully. Next year. At this time your logo will be on, this, slide, as well. One. Customer that was. Profiled by Business Insider a. Couple, of weeks ago is Credit, Karma, we've, been talking, to Credit. Karma, for, the last two or three years they've been using our platform for a while and. So they've committed. To using, Google for all of their analytic, needs and. Bakura of course is a huge part, of that Credit. Karma is a very, sophisticated engineering. Organization. And. You, know there's just really, nice people so, if. You see them please say hi to them. Speaking. Of sophisticated. Engineering, organizations. Twitter. Has, decided, to align with Google cloud platform for. Their data. And analytic, needs as well so. Twitter. Of. Course is a very active participant, in the open source community, they, saw, very, interesting. Internet. Scale platforms. And. The fact that they're partnering with Google probably says something about it so. We'll. Be looking forward to working with Twitter in. The future. So. What. Does, 2018. Look like so far for, Google, bigquery well. These are some of the features that we've released and. I'd like to share with you guys and. Ladies what. These features are all about I'm, just gonna run through all of them and hopefully, give, you a good, idea of how. To use each one. Well. The, first team, I want to talk about is. Richness, of functionality. The, query, when. I first started was a sequel, engine that which. Is there to analyze data, that you load into the query but over over, the years we've added more and more functionality to kind of. Simplify. How, you manage, your data and give you more functionality, around that so richness. Of functionality, is the theme I'm talking about here, the. As DDL got. A definition. Language. DDL. Went, betta a few months ago the, GL essentially allows you to, inline. In sequel, statements, allows you to create and, delete tables. And use. Virtual, use and. One, so just kind of makes it much easier for. You to work with your data you can always do that using other types of operations. The. One interesting. Way of working. With data definition, language is, using, create table, as, select. So. One use case here for example is if, you have a table, that, isn't. Partitioned, and maybe. Doesn't have a specific column that you'd like you. Can select, from the table and you. Know create new columns you can filter data, or. Actually select. From any, query, you want you, can have joins you can have analytic, functions in there, and the, results of that table can be written into a new table that may or may not exist and then you cannot a partition that on this. Is probably the the, number one use case for, data.

Definition, Language that, we have so far so, please try it out we'd love to hear your feedback on that. The. Next feature is. Merge. So. The. Query has had. DML. Data manipulation, language for. Quite some time now merge. Is a really powerful operation. That, will. Merges. The. Three statements, that. Are. Part, of DML, insert. Update. And, delete so. In one statement merge you. Can do, all the, above. So. This is an example of a merge, query, right here, and. The typical use case is that you might have a, table, that has all, your, inventory and this table of course needs to kept up up to date as. New. Inventory. Comes in you, would probably want to update your inventory table, with. You, know the new arrivals, and of course if there. Is a product, that's, in your inventory you might want to increase the, and if. There is not a product in your inventory you might just want to add it so this is what the statement accomplishes. And. A quick little segue, by. Cory and technologies. Like the query, that. Support, DML. Right support mutations, updates, deletes and inserts. So. The quarry isn't necessarily, a great. Technology. Where you would want to do a million individual. Updates. And deletes that's. Just not, how this technology works and any. Technology, that similar to bigquery has the same you. Know architectural. Limitations, but what bitcoin allows, it to do it, allows you to work on as many. Rows. In a table as possible with, an one individual, statement, so, while we don't offer you a whole lot of statements that you want to make each. Statement can process essentially trillion, rows if you'd like it to be we. Allow, you to. Run correlated. Mutations. What I mean is by that is you, can update delete and insert rows. In, your table based, on some, kind of statement like a sequel statement so here's an example there, that. Sequel statement can be any or any kind of statement that can be joined you. Can have aggregations. Filters, analytics, and all that typical.

Sequel. Stuff, so. Again. While you, could. Do lots and lots of statements, on correlate, on mutability. You probably, shouldn't. And. So, the best practice around the query is to, batch. All of your updates the leads into single, statements, this. Is of course especially relevant, for. Things like gdpr, where you might, want to update your. Data inside, of the query so we try to make it easy on you. The. Next feature. Is actually. A data type numeric. And. Marek is. Really. A decimal. Data type that has actual. Precision. So all the way up to 38, precise. Decimal, points this, is especially useful for financial. Calculations. Where. You need to. Where you keeping track of money essentially. It. Turns out that float, is not a really, good datatype. For these types of operations, people, don't like losing, parts. Of pennies and their calculations, and neither is integer because, cents. Exist, but, with this data type of course we extended, all of our sequel functionalities, you can run algebraic. And arithmetic statements. In sequel, on, numeric. So, hopefully that simplifies, your life and. Let, us know how it works for you so. We, talked about the features. That enrich. The functionality. Of the. Query. The. Next thing I want to talk about our, features, that, simplify. Your, life, that. Allow. You to, have a peace of mind when you use in bigquery, the. First one this sea mech. C-max, stands for customer, managed. Encryption. Keys so. To. Take a step back the. Query by default. Encrypts. All data. At. Rest and. What. The bigquery actually does is it, chunks. All your data into tiny little pieces throughout its data centers and. This. Little. Chunks of data they're all protected by. Their, own data, encryption, keys, and. On top of that we have key encryption keys that, are, encrypting. The keys that are protecting the data so, that in some, unfortunate, situation, where data. Is you, know maybe a disk ends up and in. Somebody's hands data, is entirely useless.

To. Those folks because not, only do we chunk the data all around our data centers we also use, two levels of encryption but. We. Are the ones that are managing your encryption keys, so. For some situations our. Customers might want to manage their own encryption, keys so. You bring your own encryption keys you, manage them you can rotate them you can specify these, encryption. Keys you can actually with one click deactivate the, encryption keys so, that all data becomes, essentially, inaccessible, to. Anyone including us so. - it's a really interesting and really powerful security, feature for. Folks and finally. We're keeping track of all everything. That's happening with encryption. Keys in audit, logs this, is our paper trail, you. Can you, should probably, go. And enable audit logs today audit. Logs keep. Track of every. Single job every single query that has occurred. Inside of the query so. When a job, started, when somebody access specific, data set these. Are of course immutable. And you. Can actually pop them back into bigquery for, analysis. For. The last few years the quarry has been, a regional service but, we've only been inside of the United States and, Europe. But. We have lots of lots of customers in Asia. The. Primary reason for folks, in Asia, to. Actually, had the query in in the APAC. Area. Is to. Minimize, transit. From. Those areas to the United States or to Europe because. That a transfer, can be you. Know can have an additional. Cost, latency. Wise it's, not a huge consideration because you, know B query takes a second to process a query what's another 100 milliseconds, to, move that data from the United States to Tokyo. What. Folks have been asking for it and so we've announced the rollout of bigquery general. Availability in, Tokyo. Right so. It. Covers. The, the region, so, you don't have to go across the the, ocean and. Allows, you to keep your data in my region of, course there are use cases where. Data. Sovereignty, is. Applicable, and, so we're going to continue to look into other, areas where that's applicable. We've. Rolled out several features, around, transparency, of bigquery. The. First one is if, you are using the bigquery. UX. Today, you. Can in real time see. Progress. Of your court, you may have noticed that already right it. Will say in there when you run a query you will say your query is on, stage 5 of 12 and. You will actually show you a chart of what's happening, that quarry, the. Second is. The. Quarry slots is, a unit of compute. But. That's how the query. Executes. It's it's. Workloads you, can actually see, how. Many slots have been consumed, over. The course of quarry in Korean, history that. Might be useful for you for some of the folks who are on our flat. Rate pricing model, it allows you to kind of calculate. You. Know how, many slots you really need to, fill your workload and, finally. On the bottom there. One. Feature that is really. Powerful for, some of the more advanced, users. Is the ability to create resource. Hierarchies. So you, can actually think, of the either as virtual. Clusters nested, on top of each other but, they all share resources. You, can have two. Separate clusters one, for the analytics, team one for your data science team and. They. All have perfect isolation but when your data science team goes on vacation those, resources, are automatically. Available for your analytics, team so that no resources, are wasted and, so here's an example of. A, setup like that well we've released, some upgrades. To our monitoring, in stackdriver, that allow you to track these hierarchies. And how, they're being consumed, so. This. Functionality is really interesting, and you know reach out to us if you want to learn more about it. Self. Service cost controls so. Folks talk, to us, and. Tell us that sometimes. There's. Anxiety, around. You, know with the with, the model of pricing, that the courtney has where. You pay, for every query there is anxiety that you may run a query that, may, be too expensive or maybe you'll run too many queries. Well. We were introduced the cost controls, a couple, of years ago actually to. Help you alleviate that name, now. What. You can do is you can set usage, limits you can do it per query or you, can do it as a daily, budget this, allows you to kind of prevent the worst case scenario, in. In case you really start spending a lot of money and you, have really. You know made a mistake in your code this, protects you. The, first one is you can set a per query cost limit so. For example you don't want any query that processes, more than 100 terabytes to, ever be ran in the query or you can set that limit the.

Second Is you. Can set a per user daily, budget, for. Example you don't want any. User to run. More than 50 terabytes worth, of queries in any, single day and this is of course is a rolling window. So you know you don't have to wait until the end of the day, to. Have that reset and finally, you can set per project budgets, as well right so you, can say in every, budget every, project only, gets, access, to you, know 100, terabytes of processing, per day now. What. Happens when you reach that limit is you. Know bokura starts giving you an error that says hey you've exceeded your limit you can't we. Can't run any more queries until, your, tokens are refreshed until you start using more. So. What, we launched, earlier. In in, May is the. Automation. Of the per project daily budget so you can go into the, bigquery console, today you can set the per project limit and this, will take effect, in. Minutes. Instead of having to go through support. And things of that. And. Of course this. Is a very interesting topic of discussion, for our customers for, our users so we'd. Love to hear more about what, else you'd like to see from these features. Finally. The, query is being. Used for folks. From all. Parts. With all kinds of technologies, right so people who are used to relational, databases people, who are used to Hadoop maybe. They're using Kafka would. Like to use all those technologies, in conjunction with the court so, we're really focusing on interoperability. The. First feature I want to talk about is, partitions, now. Historically, maybe, six years ago when we first started, we didn't have any types of partitions, we. Had. Something, called table. Decorators. Actually. Table wild cards table. Wild cards allow you to. Query. Tables. Them actually, table. Data itself, so. If, you have a thousand, tables they're, all named a similar way you can use this functionality, to pick out which tables you really want to court so. That's been really powerful for, a long time but throughout. The years folks have asked for partitioning, functionality. So. First we released a metadata. What we call natural partitions. Partitions. On a column, that doesn't exist. That. Had. A partition, time predicate, but since, then folks have asked for the, ability to set actual, user based. Timestamp. Columns. As partition, keys so, in this example you have a table, that has four. Columns TS, and the other, three columns that are listed and we, set TS as the, partition key so. It's a date/time partition, key so, when you are querying this table and you say I want, a TS, to be a specific, date time we, will actually go to just, that partition. Hopefully. As as a the, graphic demonstrates. So. This is really a you really useful feature it's, also useful when you are loading data into the query or when you are trying to materialized partition. Tables because you can materialize lots, and lots of partitions, with, a single. Command. Right, because we will look. At the timestamp of every, individual, row and we'll put it in the right partition, hopefully. That's useful for you. The. Next feature. Is. Data. Manipulation, language on, partition. Tables, so. We're with, release partitions, will release the a domination relation, language now what kind of merging these functions together so. You can actually edit. Partition. Tables with DML, so. In this particular example you. See that. You. Know in case I want to take, data from one partition and do a filter where it says field 1 equals 21 and move. That into a different partition so. Just very. Basic, DML. Statement but allows you to do a lot, with your partition tables, hopefully, you find that useful. The. Next feature is park'. File. Format, in just so. If. You read the drama paper that was released in 2010. There's. A couple of concepts that actually explained in there the, first one is the trommel, execution, engine which is, really. What powers bigquery and dromoland. The quarry are not the same thing, the. Drama team is the, query team and they, both kind of go hand-in-hand but. The other concept, that was explained in the paper is our file format called column, a o common, go is a common form file format that. Julian. And other folks on, on. The, park' team. Saw. As useful and. Externalized. The parkade, but. Parquet is now a very ubiquitous file, format, it's used essentially. By everyone, who's leveraging.

HDFS, Hadoop mapreduce even. Spark so. Part, K is very powerful, so. What. We've, enabled this year is the ability to ingest, our K directly. Into the query, it's. Very. Useful feature for a couple of reasons for. For one just like Avro file format, it's, binary, so. You're, not going to have you know weird. Collisions. Between utf-8. And ASCII. Whatever, data you're loading, into. The query from Bar K is strongly. Typed, the. Second is Bar K is highly, parallelizable so, if you load. Terabyte file of parquet into the quarry we're gonna be able to split it into as many chunks as we, feel as necessary, to ingest. That data. Into the query very quickly. Third is parquet has, a schema, inside, of the file so, you don't need to define a schema, or we, don't need to infer a schema, so. That you, can just take, this part a file and load it into that bigquery table, and we will figure out all the details so it's really really powerful we recommend, this, if. You have control over which files are loaded into bigquery our K is a great way to get out and. Of. Course one, quick note on ingest. Of data into the query. Ingest. Is. You. Know very compute, intensive operation, we have to read. Your, files we, have to kind, of process them encode. Them into our format, compress. Them load them replicate them all over the place, typically, with. A. Tip. You know with, technologies, like bigquery this. Compute, capacity is, the, same capacity that consumes. Your queries so, that if you're loading lots of data into, your technology, your. Core capacity, may suffer with. The query we have a unique architecture, where the query capacity, is entirely, separate, from the ingest, capacity, so, that no matter how much that are you loading into the query your. Queries don't suffer and of course we also don't, charge for these loads so, it's kind of like the best of both worlds we don't charge for this and it, doesn't affect your query capacity, we. Feel like our customers, benefit from this greatly. And. Of course there are many other formats that are being used and in the world out there and. We're going to be announcing more soon, and, please let us know if you have a file format that we don't support you. Know we'd. Love to see. Which ones are popular and which ones we should support in the future. The. Last feature, out, of the major ones that I want to talk about is our continuous. Updates. To the symbol drivers to. The ODBC, and JDBC drivers. That we work with simple, to put together. Well these. Updates happen we have rather regularly, these days so, over we always want folks to use these drivers. As a gateway. To the query, so. Some of the things that we've been working, on is, things. Like support for partitions, support, for DML, and DDL support. For. CMAC. Customer, manage encryption keys so, everything that the QWERTY has we, want in the ODBC, and JDBC drivers. So they're continuously being updated as. Bigquery features are rolling out, we. Are also, working. On, making. These drivers faster so over. The past six, months or so we've, increased performance by, roughly 60%.

On. These, drivers and we're gonna continue to focus on that and, finally. We want these drivers to be mature and stable so our. Customers kind of get the best experience this way. And. One. Final thing that I didn't really talk. About here, is there, are there's. Much more that's happening inside of bigquery, as, you, can see you, may look in our release notes you may notice that we've increased, a whole lot of quotas. So. For example the bro in cell size limit. Specifically. For CSV. And JSON. Files has increased from 10 megabytes to a hundred megabytes so it's just easier. For you to load data into the court, and. The, maximum number of petitions per table. Have, been increased, to 4,000. And. So on and so forth the. Other, thing to point out there is we've extended our sequel functionality, to have the, error function, and the safe prefix, so, you can handle errors, a little, bit better inside, of your sequel, statements, now, a quick segue onto quotas. The. Query is a, multi-tenant. System, so a lot of the subsystems, are. Shared, and. You. Know there, are just, like an alien technology there are anti, patterns in the, query there are things that you. As a customer are is, allowed to do that. May. Not work. For you if you continue to do that through. Scale and. That essentially, is the truth whenever Tecna so, in, order for us to. To. To. Kind of work. With you in the best way to use the query we have these quotas. So quarters, are there to protect you. Guys and they're, there to protect us as well that. Said if. You have specific. Use. Cases where you, know certain quotas need to be raised police, come talk to us we're. We're, always looking for, ways to improve this particular, user experience, right so. The. The quotas are not. Immutable, many. Of them can be edited so, it would be good if. You if you do end up in situations where quotas are. Are. You. Can do end up in situations where. You you're hitting quarters come talk to us it may be well we'll. Work on that. So. So. Far the, query has you, know to recap, release. The number of features of course, we we. Have lots of engineers and we have lots of folks working on more features so there will be many many more and we'll. Do a, Molly recap, just like this where. We'll walk through all the features that we're releasing moving forward but, so far these are the features we've released in 2018. Again. Please do come talk to us about these and and others, that you may need that we haven't released and. You, know we'd, love to to. See how we can help you. And. To. Wrap. Up. We. As. I mentioned before have our conference in July 12 24, and 26, in, San. Francisco, I will be there vast. Majority, of the equity, leadership and engineers, will be there and. We'd love to talk to you so please come find us we'll have a booth we'll, have sessions. We'd, love to hear from you and. Great. Well. Stay, tuned for live Q&A we'll, we'll be back in less, than a minute. You. Welcome. Back now. It's time for the. Q&A we took some questions from the audience and. If. You have more questions please again reach out to us the. First one is when, are some, of these beta features that I've talked about gonna go GA general. Availability, well. Let. Me tell you how our features. Kind. Of get released into. General availability. It's. Kind of like the maturity, of scale by, the time features hit general availability they. Have a support plan or we, have SL A's we have our site reliability engineers. Working on these features so they are fully supported, they lock down and we're fully behind it by, the time these features are beta. There. They're sexually locked down as well, we're. Kind of from beta to GA we're working on maturity, we're, working on you. Know performance. Scalability, just. Making sure that everything, that that the user experience is really, phenomenal there, but by the time features are beta they're fairly mature away and. Before they go beta of course we have alpha and we have early access, programs. And. You, know we're, always looking for folks. To try these features out before they go beta as well so. It, really depends, on the, feature from, beta to GA it's. Rare. That we have features, that, take more than a year to go from beta to GA but, a lot of these take one or two quarters, generally. But. A lot of folks are using these, features in data because they're incredibly useful already, I'm. Not gonna stop you, next. Question is how can I find out about upcoming features. Well. We. Have, our, blog posts, we. Have twitter we. Have our release notes so, we try to maintain a presence around. What you. Know - up we, were trying to update our customers and our users with. What's happening but.

You, Know we do have a road map and we do have other features that are coming up so to, learn more about that reach out to your account folks, or reach, out to us directly to. Get these updates. Third. Question. I need, to update thousands, of rows on the table but the limit for DML, statement, is only 200. That's. Correct so, today, a per. Table limit for how many DML, statements you can execute is, 200. But. As, I mentioned in one. Of the earlier slides. We. Don't limit how. Many. Rows. Can be affected, by any of these statements so you can any. Of those 200, statements can execute, you. Know millions, or billions, or trillions of, mutations, you can you, can work on the whole table all at once if you'd like to so. Again a best practice for the query is to batch these updates, and deletes and inserts into. Into. Singular. Requests. So you can update thousands. Of rows in a single table using, our correlated, updates using a sub-query in a single statement or a couple of statements but. Again if. You have specific use cases were or, specific. Places. Where you, need these statements, raised please come talk to us maybe, there's a property you can use case and we can help you. What. Partitioning, schema do you recommend so. We. Have really, three types of ways to partition your data the first one is you can just shard. Your data what, I mean by that is you can create a table for every quote-unquote. Partition, so, you can create a table. For, every single day, and, name it so right you can say games, 2018. Or 601 games 2806. Or two games, 2018. Oh six oh three. What. We have as I mentioned is the ability to, kind, of query. Specific. Tables so you can say I. Want, to query all tables where the. Month is, July. And so, we will kind of seek all those tables and we, will only. Select the tables, that fit your predicate, so. That's one way of doing that the. Next way is using. Our natural, partitions. Which is a metadata, column, a date/time, column, that. Isn't. A column that is inside of your table but it's a metadata call and. It uses a partition. Predicate, the repetition, date predicate that's. Also, one way of doing things but, I think the most useful, way at least from what our customers tell us is this. User-specified. Column. Partition. So, you. Can you. Can actually set up a table where one of your columns, where. One of your values. Is the partition. Key so, you, can say this, timestamp, column is, my partition and please. Use that this. Is really useful because again, you can load data into bigquery into. A table, into. Multiple partitions all at once or you can run a select into. Statement that. Writes. Data to multiple partitions all at, once. Last. Question is how. Can I submit feature requests, and get, my questions, answered so, I touched upon this a few times there's. A few avenues where. Continuously, working on improving, our communication with our users we want to hear from you so. We're very very active on Stack Overflow we're. Very active on Twitter and, of course we have something called an issue tracker where.

You Can actually submit your own feature requests, and. We. Have multiple plans for, support, you know from all. The way out to dedicated. Support engineers, who are working just for. You, and those folks are there to help you with, any, issues you might have helped. You with any, issue any, features that you might want to have and you want us to know about and. These, folks can help facilitate any. Roadmap discussions, with, our team. Great. Stay. Tuned for the new for the next session see, chat wrangling. Your data with, cloud data, prep.

2018-07-14

Show video