foreign [Music] welcome to this Cube conversation here in Palo Alto California it's Cube Studios I'm John Furrier your host today we've got a great guest talking about data analytics future of AI Google cloud my guess is Bruno Aziza head of data analysts at Google Cloud 10 11th Cube appearance Cube alumni Bruno you've been in the space for 25 years with gcp for three previously Microsoft Oracle you helped launch three startups Alpine Labs csense at scale many more great to have you back well thanks for having me here today John it's always great to talk to you you know we've been talking about Big Data going back to 2010 we've had many conversations here on thecube we were there gen 1 Big Data Hadoop then you got spark everything else is happening with the data warehouses in the cloud you see the birth of the data bricks and the snowflakes cloud is expanding Next Level you and then now you see kind of this next-gen action right and this is kind of what we're seeing with app developers building data apps you've got infrastructure data infrastructures emerging AI is a forcing function and all everyone's talking about is this next level generative Ai and the role of data being the value you guys have some new functionality for bigquery that's a product that you're working with there at Google serverless data warehouse product that's been very popular and successful what's the scoop what do you guys got going on what's the news well first of all you know every journey in Janai starts with your data right and that's what we've been really focused on here uh at gcp uh you know and we're moving beyond the data warehouse in fact the way we design bigquery is way more than just a data warehouse it's an analytics systems it's what people want and so what do I mean by that it's a system that handles any data at any speed that has embedded machine learning as a key principle embedded business intelligence is a key principle supports any type of data structured semi-structured unstructured within the same environment it is also open to other data platforms so in our in our case we have bigquery Omni which allows you from bigquery the query data that's in Amazon that's in Azure and we have this amazing data sharing platform any week over 6 000 organizations securely share about 275 petabytes of data so this is a type of scale that customers need in order to build their next Generation applications you know one of the themes you might mentioned secure is security obviously with data people are bringing their own data models to the table in a new way they've got to secure their old ones but the theme of democratization of workloads secure data and choice our three themes I'm hearing a lot of how does that relate to some of the things that you got working on right now with bigquery because I'm hearing a lot of good things around how bigquery is kind of like vaulted itself out into the center of the action with some of these features can you elaborate on some of those things absolutely well thanks for saying that and you know we've we've been working on this problem for quite a long time you know bigquery uh one General availability in 2011. so we've had this experience of working with customers across the globe at a very large scale and they've really blown us away with the amount of innovation you know I'll talk about a few here that might be companies you've not heard of that are gigantic organizations in there in their geography you know tokopedia is a e-commerce Giant and using our technology they're able to cut analytics Computing costs by 25 this is a company that has an online Marketplace connecting 10 million Merchants 100 million customers every month around products selection payment delivery I think about Mercado Libre this is a gigantic organization in Latin America 35 000 employees 107 million buyers they process 35 purchases every second they migrated from an environment with teradata into bigquery 35 000 queries migrated and their data stack talking about adoption is adopted by 80 percent of their employees and so these are the types of examples where I feel like I see the future working with these customers across the globe who really are pushing to the Next Generation you know there's countless examples of customers I'm sure we'll talk about today but I think that's where people need to go is look at these amazing organizations that have been innovating with our platform as a blueprint for how they can get there as well Auto scaling is a big feature you guys have distributed environments hybrid Cloud big part of it open source this is the this is the kind of the current situation and applications are going to be built where they're going to want to have access to data all kinds of data there's kind of a data moat developing it's the new value proposition where people are realizing that data in motion is valuable but you don't want to just make it in motion and put it away for free and get it away for free into these into these public data sets so people are trying to rethink specifically how to deal with their data and they bring in more data to the table it's now more proprietary foundational models are out there and you've got the surge of Open Source developing how do you guys look at that big from a bigquery perspective how do I work with you with that Trend going on in the mainstream I love that you're saying data mode actually is is a term that one of our customers lytics who has saved a lot of money uh using our platform and been able to innovate is calling what they're doing you know from our perspective what we're trying to do is make it as easy as possible for customers to onboard on that platform and innovate with it so a lot of the innovations that we've introduced are really configured around that right this idea of having bigquery additions is about making really easy for you as a customer to assign the right version of bigquery with its Associated capabilities to the workload that you're interested in running and we'll do the rest right Auto scaling is a foundational capability of bigquery additions and what it does very put simply is that it follows your usage at the second level and it charges you only for that so if you think about your differentiation as a data team building data applications we want you to think about these features as just forget about it features meaning we will take care of the infrastructure for you we will scale it to give you the best price performance so you can do your job which is building these highly differentiated data products for your constituents internal or external what's interesting is there's ways to consume things differently based upon the use cases I want to get to some of the customer examples but first I want to ask you about additions mention Auto scaling that helps you guys have created this thing called additions can you explain what bigquery additions are absolutely so there was in the past really two ways to think about how you'd consume bigquery very popular ways when we started with the technology was was just papers per query so you can imagine any data analyst any data engineer can onboard on the platform very quickly just start querying the system and just paying us for that and then over time we evolved that to reservation so now you could reserve a capacity and know that that would be available and as we learn from customers or innovating across just like you said John all types of data all types of use cases and more data and more people we realize well there's a better way that we can assist them and so the first step was let's look at their specific workloads where do they start and how do they mature so they typically start with easy workloads around reporting and so forth and then they'll mature to mature the machine learning and then they mature they're even you know multi-region uh more sophisticated needs and so we created these three additions to make it really easy for another position to say okay this workload I'm an Associated to a standard edition which gives me all the basics that I need to get started this other workload over here I need machine learning so I want to activate that one of the great functionalities of additions is you can mix and match so it gives you both the flexibility but also gives you the predictability of how much you're going to essentially spend on this platform Auto scaling is a way for us to give you the best capacity and the best price performance as we're following your usage and so this is kind of our way to bring if you will a key competitive Advantage for us which is artificial intelligence where we can get as really good sense of how using the platform and optimize all this functionality for you you know the um other vendors in the market they think about capacity as VMS right so they give you a box here and then when you need more capacity or you get double the size of the Box we think about it as how do we make it easy for you to onboard choose the right version and then after that we'll just do the rest the infrastructure will just follow your usage and you won't pay more than what you're actually using yeah that's really good that's kind of Next Level features you got the flexibility which gives you Choice how you want to consume for the use case for the app if you're a developer and the predictability is more for the CFO okay I don't want to have to get charged more so you can scale up that's I love I love the combination there what does this mean from a customer standpoint can you give some examples of customers that have gone this way and what what have they seen in terms of capacity and savings absolutely so what it means for a customer is that today you can safely on board on the platform and just start working with it you know best one of the great companies that I can think about is L'Oreal you know I know you're gonna blame me for using a French company but you know we got great French customers L'Oreal and car four and what I like about L'Oreal is that just a few years ago they almost had no data in bigquery but quickly bigquery became the heart of their systems uh they now have Peter bytes of data in this organization that has thousands of skus I think they sell about seven billion products but their environment is a very distributed environment across markets across departments so they have that need of saying well uh if in the US I've got a very mature business and maybe in another geography I've got a business that's getting started I shouldn't really use the same level functionality so this mix and match ability is really helped them a ton now Auto scaling is a feature when when you watch the video with Antoine there who's there Enterprise architect he says look this is the feature I've waited for the most because essentially if I have bursty or spiky workloads I don't have to worry about having someone watching that Google does it for me and that's kind of the the benefit if you want the magic the Google magic Behind These functionalities is that you know because of a vertical integration we can really return uh exceptional savings for you you know another great example is companies building applications on top of bigquery Linux is a CDP so a customer did a platform lots and lots of customers you got seven billion profiles they run 400 billion events in real time the company experience a 15 performance increase while experiencing a 20 cost reduction because they're building this data mode on top of bigquery so lots and lots of examples of organizations like this who are scaling rapidly with us and where the infrastructure and the Google Artificial Intelligence kind of magic allows us to optimize our systems for them really unique talk about the announcements you've had on the um storage side you've got compressed storage can you give a quick summary of what that's all about absolutely I mean coming with additions and auto scaling customers now also have access uh to something called compressed doors so simply put what is compressed storage it's basically a way for us to take care of a higher compression level on your data so you pay less on the storage side so preview customers I've seen compression rates of seven times up to 35 times you know in particular data forms if you think about log for instance or log analytics that's data that's highly compressible and so you'll see as you watch some of the videos like I think about a go-kart list for instance these companies are saving anywhere between 20 to 70 percent uh in their their their storage and compute bill because if you think about compressed storage it also because you're paying less and storing less you're also helping in other parts of the stack where you're able to query differently in the case of L'Oreal for instance which I really love about you know the testimonial we got from Antoine is that it also is helping him with this system ability goals because you're essentially storing less or your footprint if you will is more efficient and so these Innovations the addition the auto scaling expression to your point earlier are really the the platform that's going to enable companies to get to the next level with these data apps yeah the carpet footprint is upside you guys have great initiatives there I think that's one of the benefits of cloud I got to ask you about what you see this week okay we're at mongodb last week in New York City this is kind of data week kind of put in the mix because it just happened but this week we got snowflake having their event head to head with databricks um obviously the cube will be there they're forcing analysts to choose sides uh kind of thing it's a cage match kind of thing we see it's a data we be at both simultaneously and you've got an event in Seattle as well this is like data week so give us a scoop on what you have going on in Seattle this week and what's the focus well I'm calling you actually today from Seattle so I'm Already There customers are are coming in and so we have this uh event we call the data engineering and analytics day uh you'll be all of you will be able to join for free on Thursday morning the keynote starts at 9am but what you should know is that the first two days are actually in person here were invited our key uh you know customers and practitioners it kind of connects with what you and Dave talked about in last week's podcast where we really stayed to the Genesis of what these events need to be at least for us we feel like if we create a platform where it's built by the practitioners for the practitioners we all get value from in so what's going to happen throughout this week essentially three things one is of course we are sharing our vision with our customers and they give us very Frank uh very direct feedback so together we can advance this platform we're building with them uh second is we're going to hear from these customers and so they're going to tell us their best practices their worst practices and then the third thing which I'm really excited about is this creates a platform where they connect with each other you know I'm not interested in being the way in the way of information with these customers you know I think that's one of the challenges sometimes of these events is they turn into marketing and Commercial events that's really not our intent this is a platform for customers in fact we have no marketing and no sales people at this event it's really the platform for customers who can help each other and as we watch that we also learn and and build the next systems that they want us to build you know additions Auto scale and compressed storage really came directly from our customers feedback and that's how we get better and know these events too when you have these major shifts on the market are where tribes kind of get reshuffled you know people want to find their tribe they want to find their community and they don't want marketing messages jammed down their throat they want to be open authentic it's a very Bottoms Up market right now and I think you know good to call out there yeah we did bring up on the Pod because we're seeing a lot of people do that the events to make just to make money and they kind of structure it a little bit like lose in touch with their customer um and obviously bro the podcast which is great we talk about all the top trends and since I got you here might as well ask you what you think you see as the top trends because there is a tsunami coming of of this new wave it's bigger than before data apps are a big part of it we've been talking about that data products I'd love to get your thoughts on it because you know you've been scratching this itch for over a decade Bruno yeah now we're on the beach and the waves are the biggest and we're out there you know surfing them what do you see as the major Trends developing in the space yeah so anyway I learned a lot of that working with customers I mean what's been uh incredible to me is over the last three years how we've been able to scale these platforms right because when I talk to you about these events this is not 10 people getting together there's thousands of people you know understanding the the space and building the Next Generation products so what I'm learning from my customers is three megatrends are changing their world the first trend is you're not building a daily lake anymore right and that was actually directly from the CIO Vodafone who told me the great thing about a lake is that it's defined I see the end of the water but in a way it's nowhere near the reality of my data my data looks more like a data ocean I I never see the end of my data and I know that some of the data I'm going to need is going to be in somebody else's environment maybe because I've acquired them maybe because I need to partner with them and so forth so this first idea of the data ocean is a key Trend that we're seeing customers really gravitate towards so what does that mean multi-cloud platform by default transactional and analytical workload coming together right so it's no longer two different workloads they have to be able to come together to make it easy for you to build apps the ability to catalog your data rapidly as it hits the system is Paramount because customers need to be able to trust the data the metadata that they're bringing into their environment and data sharing as a key principle is tremendously important because as I said earlier you need to have an ability to go out and get data in other systems very as seamlessly so you can complete enrich your information that's Trend one the second Trend we see is around what I call governance with a big G right we used to think about governance as this ability to restrain you know access but in fact the way that people are working governance is they actually think about how do I create pockets of innovation across my organization with decentralized data but centralized policies really really hard to do uh luckily about two years ago we shipped a product called dataplex which really marries itself well with bigquery so it allows you to Auto discover metadata from the data all the way up to business intelligence so it reads metadata and looker and so this idea of understanding your estate is really important it gravitates around this concept of data mesh which you know we've heard a lot over the last two years and the third trend is what you were talking about John this idea of building these really intelligent data apps and that was a big part of of powering that for us as Lookers so if you think about the stack for us bigquery dataplex and looker what really customers are trying to do is they're trying to turn their relationship with their organization from a spending time spending money securing restricting access to opening up access and creating an artifact where you as a data team you're actually creating value bottom line value for your organization and the best artifact for that is creating a data product data products need a lot of components but the main one is consumer grade experience on an Enterprise grade platform you got to build on the platform that will never go down highly vertically integrated so you get best price performance that's what we're focused on that's awesome Vision I love that hot take because you want the horizontal scale of cloud availability of data protected governed but also vertically integrated into the app this idea of data process is interesting and I want to get your reaction because we've ripped on this in the cube before but it's kind of playing out in real real time here in plain sight in the industry and that is you have data engineering and now you got data products remember back in the cloud you had SRE site reliability engineers and then you had developers what their job was to set up the guard rails for the developers to code in line in the CI CD Pipeline and do infrastructure as code devops okay now you're seeing similar pattern the engineers data Engineers set up the guard rails so that the Developers can program with data and they need the products so you're starting to see this trend where you have data engineering data products and now data Developers what's your reaction to that because that almost completes the stack of the Persona you get this the engineers which then can be automated by the way with AI and managed and scaled data products could be TurnKey consumer grade like you mentioned real time for this maybe more historical for that Nixon match your Lego blocks whatever you want to call it but the coders are now coding in the applications at the point of code the data developer what's your reaction to that well you know what I've learned from customers watching them build these data products is first of all there's three dimensions to what a data product is I think the first one is you got to design for what we talked about here Limitless data right your data products need to be able to handle structured unstructured semi-structure you need to be able for instance to imagine scenario where you're going to do machine learning on unstructured data so how do you do that and so I think that's probably Step One is think about it as any type any volume multi-cloud open infrastructure for your data so step one step two is there's dimension of time you just talked about it here you can't afford to not build on real time now right so this idea of real-time High concurrency is really important to build these next Generation data products and the third one is what you talked about people right so you need to start thinking about different roles we learn from customers that the teams the data teams that they build are really starting to look more like software engineering teams you have a data product manager who's going to write the product requirement document right the PRD and what they are supposed to be is the CEO of the of this product from ingestion all the way to activation the lead the team towards what's the output of this data what are we creating you have a program manager who helps them Drive the development of this product you have a data engineering team that helps them Implement now what's great about infrastructure like ours you don't need a lot of administration right so data Engineers can now rely on a platform like in the case of spark for instance serverless spark they can just work with spark without having to worry about the infrastructure and we just charge them for their use of their unique kind of transformation of the data engineering role you have the ux leader who is really focused on earlier you know we talked about consumer grade experience you know the best UI is no UI so you need someone to think the audience for these data products are not data experts so they need to have a consumer feel to them and then find finally the chief data officer who needs to be driving the strategy around that the good news here is that 10 years ago when you and I started talking about this John 10 12 of organizations had shifted officers now about 70 80 organizations have choose the officer so we really started to see kind of this group if you will in this typical team around data products building uh and it's really encouraging because people like us you know we've been talking about this space for a long time nobody cared because they were back office and now finally everybody cares I think I think I mean I go back to 2007 I remember saying you know this is going to be a developer angle we saw Big Data come in we were all early we're all I think all the data folks have been early on this it's the timing of how cloud and everything kind of comes together the Confluence how the world spun is really kind of key and I think now the data nerds the data Geeks the data hardcore data Ops folks they're into this so we're in a prime time moment and it's built on top of clouds it's not a bolt-on it's great so it's going to be an abstraction I love your vision I think that you know devsec Ops is here are you seeing more of that I think security has went through this they had that team now they're part of the engineering they put guardrails up for developers people are shifting left now data is in there so is it going to be is it going to be devsec data Ops is that the full validation certainly we'll see yeah I mean it's it's really exciting to be in 2023 in the in the data world the industry really now is taking advantage of all the Innovation that's we really we've been working on for decades and is all coming together for the acceleration of this value and I and I think also Chief data officers CEOs cios CEOs are now paying attention to data as building their differentiation on it right I mean you said it earlier when we started this idea of a data mode you know 11 years ago when you and I started talking about this I don't think you could get CEOs to care about that now they're realizing that this is a key differentiation for the organization there's one more company to tell you about in a car four car four is this retailer in France they got 80 million loyalty uh customers and they realize that their business actually was in the activation of these users and these consumers are coming to their stores so they created a different company called car forward links which is about how do we create more more compelling experiences how do we partner with the rest of the ecosystem so we know more about these people coming into our stores and provide them with the products that they need you know retail is doing this financial services doing this I think every industry will get to building their data mode and then you know hopefully they'll choose us to do that because we've been thinking about this and building for this moment uh for over a decade data is the lifeblood of the company it's their competitive advantages their intellectual property you got to take care of it they're going to blend it in with these large proprietary data models they've got to be more agile they're going to be more predictive flexible Bruno thanks for coming on thecube I know we went a little bit over but it was great conversation last minute give you the final plug for the last 30 seconds to a minute what's the pitch what's the Google Cloud analytics pitch let's bring Geary's bigqueries pitch how would you share the value proposition statement for the world absolutely well first of all thanks for having me John it's always a great fun to talk to you I really say that look if you're looking to build a platform where you are going to generate value for your organization we you know we're we're trying to be the best partner for you across Limitless data Limitless workloads and Limitless users there's really only a few vendors that can do that for you and you know you'll talk to many of our customers you can watch the video series and here directly from them what they're doing so we together can help power the next generation of data apps Bruno thanks for this master class conversation you nailed it I love it and I think there's a lot more coming this is just the beginning of how the world is changing and data's at the center of the value proposition we've been saying it for 11 years and uh it's good to see you thanks for coming on and congratulations all the work you're doing as head of data analytics at Google great to see you great to see you thank you John okay this is a cube conversation I'm John Furrier here at Palo Alto going up to Seattle where Bruno Aziz the head of data Google clouds having his event with data Engineering in the community as this next gen cloud data tsunami of AI powered applications is coming fast and every company is trying to refactor and figure out how to develop it and then how to run AI applications all powered by data of course thecube we're open we're data driven I'm John Furrier your host thanks for watching thank you foreign
2023-07-07