Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps

Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps

Show Video

[Music] welcome to today's session of the aws startup showcase featuring dremeo i'm your host lisa martin and today we're joined by robert maven principal architect at dremeo robert's going to talk to us about democratizing your data by eliminating data copies robert welcome it's great to have you in today's session great thank you lisa it's great to be here so talk to me a little bit about why data copies as jamia says are the key of the uh key obstacle to data democratization sure sure well i think uh when you think about data democratization and really what that what that means what people mean when they talk about data democratization what they're really speaking to is kind of the desire for people in the organization to be able to you know work with the enterprise's data discover data really in a more self-service way and you know when you when you think about democratization you might say well what's wrong with copies what could be more democratic than giving everybody a copy their own copy of the data but i think when you really think about that um and how that ties into uh you know traditional architectures and environments um there are a lot of problems that come with copies and those are real impediments and so you know traditionally in the data warehousing world what often happens is that there are you know numerous sources of data that are coming in in all different formats all different structures these things typically for people to query them have got to be you know loaded into some sort of a data warehousing tool you know maybe they they land in cloud storage but before they can be queried you know somebody has to go in and basically reformat those data sets transform them in ways that make them more useful and make them more performant and so this is very very common like i think many many organizations do this and it makes a lot of sense to do it because you know traditionally the formats that data is sourced in is pretty hard to work with and it's very slow to query so copies is kind of a natural thing to do but it comes at a real cost right it's there's a tremendous complexity that can come about in having to do all these transformations um there's a real dollar cost and there's there's a lot of time involved too so you know if you could kind of take all of these middle steps out where you're copying and transforming and then transforming again and then potentially persisting very high performance structures for for fast bi queries uh you can reduce a lot of those impediments so talk to me about oh i'm sorry go ahead go ahead i'm just going to say you know one of the things that is is even in more demand now is the need for real-time data access i think real time is no longer a nice to have and i think what we've been through the last year has really shown that so given the legacy architectures and some of the challenges with copies being an obstacle to that true democratization how can data teams actually get in there and solve this challenge yeah so you know i think going back a little bit to the prior question and i can fill out a little bit more of the detail and that'll that'll lead us to your point that one of the things that that is also really born as a cost when you have to go through and and make multiple copies is that you know typically you need experts in the organization who are the ones who are going to you know write the etl scripts or you know kind of do the the data architecture and design the structures that that have to be uh performant for real-time bi queries right so typically these take the form of things like you know olap cubes or you know big flattened uh data structures with all of the attributes joined in or there's a lot of different ways that you can get query performance typically that's not available directly against the source data so you know one of the things that the data teams can do um and you know there's really two ways to go about this right one is you can you can really go all in on the the data copy uh approach and kind of home grow or build yourself a lot of the automation and tooling and you know parts that it would take to basically transform the data you could build uis for people to go in and kind of request data and you can automate this whole process and we found that a number of large organizations have actually gone this route and they've kind of been at these projects for in some cases years and they're still not completely there and so i wouldn't really recommend that approach i think that that the real approach um and and this is really available today with kind of the the rise of cloud technologies is that we can shift our thinking a bit right and so we can we can think about how do we take some of these you know features and capabilities that one would expect in a data warehousing environment and how can we bring that directly to the data so um you know with the shift in thinking it requires kind of new technology to do this right so if you could imagine a lot of these traditional data warehousing features like interactive speed and you know the ability to kind of build structures or you know views or things on top of your data but do that directly on the data itself without having to transform and copy transform and copy so that's really something that we kind of call the the next generation data lake architecture is bringing those capabilities directly to the data that's on the lake so leaving the data where it is next generation is a term like future ready that's used a lot let's unpack that and dig into why what you're talking about is the next generation data lake architecture sure sure and i think to talk about that the the first thing that we really have to discuss is um really a fundamental shift in in technologies that's come about really in the last few years so you know as really cloud services like aws have have risen to prominence there are some capabilities that are available to us now that just weren't you know three four five years ago um and so what we can do now is that we have the ability to truly separate compute and storage connected together with really fast networking and we can you know provision storage and we can provision compute and from the perspective of the user those two things can basically be scaled infinitely right and if you contrast that with what used to have to happen um or what we used to have to do in platforms like hadoop or in scale out mpp uh data warehouses is that we didn't have not only the the flexibility to scale compute and storage independently but we didn't have the the kind of networking that we have today and so it was a requirement to take you know basically the compute and push it as close to the data as we could which is what you would get in a large hadoop cluster you've got you know nodes which have compute right next to the storage and you try to push as much work as you can onto each node before you start to transfer the data to other nodes for further processing and now what we've got with some of the new cloud technology is the ability to basically do away with that requirement right so now we can have very very large provisioned pools of data that can grow and grow and grow really without the limitations of nodes of hardware and we can spin up and down compute to process that and the thing that we need though is a way of processing it a query processing engine that's built for those dynamics right that's built so that it performs really really well when compute and storage are decoupled um so i think that that's that's really the the trick is that once we really you know come into the fact that we've got this new paradigm with separate compute separate storage very fast networking um if we start to look for technologies that can they can scale out and and back and and do really performant query in that environment then that's really what we're talking about now i think the very last piece in what i would call kind of next-gen data lake architecture is is very common even today for organizations to have a data lake right that contains a tremendous amount of data but in order to do actual bi queries at that interactive speed that people expect they still have to take portions of the data from the lake and go load it into a warehouse right and then probably from there build um you know olap cubes or you know extracts into a bi tool so the the last piece really in the in the next-gen data lake architecture puzzle is once you've got that fast query engine foundation how do you then move those interactive workloads into that platform so they don't have to be in a data warehouse right how do you take some of those data warehousing expectations and put those into a platform that can query data directly so that that's really what the the next generation means uh to us so tell let's talk about dremeo now i see that just in january of 2021 series d funding of 135 million and then i saw that data nominee actually coined um as a unicorn as it's reached a one billion dollar valuation talk to us about what drumeo is and how you're part of this modern data architecture absolutely yeah so you know if you can think about dremeo as a you know in in the technology context really is solving that problem that i just laid out which is we're in the business of uh you know building technology that allows users to query very large data sets in a in a scale out very performant way you know directly on the data where it lives so there's no real need for data movement and in fact we can also not only query one source of data but we can query multiple sources of data and you know join those things together in the context of the same query so you know you may have most of your data in a data lake but then you may have you know some relational sources so there's a there's a potent story there in that you don't have to consolidate all of your data into one place you don't have to load all of your data into you know a data warehouse or a cloud data warehouse you can you can query it where it is that's the first piece i think the the next piece that dremio provides is kind of as we mentioned before we're giving almost a data warehouse like user experience in terms of very very fast response times for things like bi dashboards right so really interactive queries um and the ability to to do things like you would normally expect to do inside a warehouse so you can you know create schemas for instance you can create layers of views and and accelerations and effectively allow users to build out virtually in the form of views what they would have done before with all of their various etl pipelines to you know scrub and prepare and transform the data to get it in shape to query and at the very end what we can do is selectively kind of in an internally managed way we can accelerate certain query patterns by creating something that we call reflections which is a an internally managed um you know persistence of uh data that accelerates certain queries but it's entirely internally managed by dremeo the user doesn't have to worry with anything to do with setup or configuration or cleanup or maintenance or any of that stuff so so does reflections really provide a differentiator for dremeo i mean you look in the market and you see competitors like snowflake single store for example is this really kind of that competitive differentiator i think it's one of them i think the ability to create reflections is it's certainly a differentiator because what it allows is it allows you to basically accelerate different kinds of query patterns against the same uh the same underlying source data right so so rather than have to to go build a transformation for a user that you know potentially aggregates data a certain way and and persist that somewhere and have to build all the machinery to do that and maintain it in dremeo literally it's a button click you can you know go in and look at the data set identify those dimensions that you you need to say aggregate by the the measures that you want to compute and dremeo will just manage that for you and any query that comes in that may be going after this massive detail table with a trillion rows that has a group by in it for instance we'll just match that reflection and use it and that query can respond in less than a second where typically the work that would have to happen on the back end engine might take a minute to process that query so so really that's the the edge piece that gives us that bi acceleration without having to use additional tools or any additional complexity for the user and i assume you're talking about like millisecond response times right under a second but i'm sure milliseconds uh hundreds of milliseconds typically um so we're we're not really in the one to two millisecond range that's that's pretty pretty rare but uh but certainly sub-second uh response times is is very very common with with very very large back-end data sets when you use reflections got at nut speed and performance is absolutely table stakes today for organizations to succeed and thrive so is is what dreamio delivers a no copy data strategy is that what you consider it i consider that i it's that and it's it's actually much more than that right so i think you know when you talk to to really users of the platform uh there's there are a number of layers of drumeo and you know we we often get asked i get asked um you know who are our direct competitors right and i think that that when you when you think about that question um it's really interesting because we're not just the backend high performance query engine we aren't just the acceleration layer right we we also have a very rich fully featured ui environment that allows uh users to actually log in find data curate data you know reflect data build their own views etc so there's really a whole suite of services that are built into the dremeo platform that make it very very easy to install dremeo on you know you know install it on aws get started right away and be querying data kind of building these these virtual views adding accelerations all this can happen within minutes and so it's it's really interesting that there's kind of a wide spectrum of services that allow us to really power a data lake in its entirety really without too many other technologies that have to be involved what there some of the the key use cases that you've seen especially in the last year as we've seen this rapid acceleration of digital transformation this adoption of sas applications more and more and more data some of those key use cases that dremeo is helping customers solve sure yeah i think there's a number of verticals and there's some that i'm very familiar with because i've worked very closely with customers and uh in financial services is a large one um you know and that would include um [Music] you know banking insurance investment um you know a lot of the large uh fortune 500 companies that maybe in manufacturing or you know transportation shipping etc um you know i think lately i'm most familiar with some of the transformation that's going on in the financial services space and what's happening there um you know companies have typically started with very very large data warehouses and often for the last four or five years maybe a little longer they've been in this transition to building kind of an in-house data lake typically on a hadoop platform of some flavor with a lot of additional services that they've created to try to enable this data democratization but these are huge efforts and you know typically these are on-prem and you know lots of engineers working on these things really full-time to build out this full spectrum of capabilities the way that dremeo really impacts that is you know we can come in and actually take the place of a lot of parts of that of that puzzle and give a really rich experience to the user you know allow customers to kind of retire some of these acceleration layers that they've put in to try to make bi queries fast get rid of a lot of the transformations like the etl jobs or elt processes that have to run so there's a you know there's a really wide swath of that puzzle that we can solve and then when you look at the cloud because all of these organizations are either they've got a toe in the water or they're halfway down the path of really exploring how do we how do we take all of this on-prem data and processing and everything else and and get it into aws uh you know put it in the cloud what does that architecture look like and we're ideally positioned uh for that story um you know we've got a an offering that runs you know natively on aws um and takes full advantage of of kind of the decoupling of compute and storage so we give organizations a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate into cloud can you walk me through a customer example that you think really uh underscores what you just described as what dremeo delivers and helping customers with this migration and to be able to take advantage and find value in volumes and volumes of data yeah absolutely unfortunately i can't mention their name but but i have worked very very closely with a a large customer as i mentioned in financial services and one of the things that they're very keenly interested in is uh you know they've had um a pretty large deployment that traditionally has been both hadoop based and they've they've got a large several large on-prem relational data warehouses as well and dremeo has been able to come in and actually provide that that bi performance uh piece basically the you know the very very fast you know second two second three second performance that people would expect from the the data warehouse but we're able to do that directly on um you know the files and tables that are in their hadoop cluster so i i think that um you know then that project's been going on for quite some time um and we've had we've had success there i think that that where it really starts to get exciting though and this is just beginning is this customer also is you know investigating and actually prototyping and and building out a lot of these functions in in the aws cloud and so you know the the nice thing that we're able to offer is really a consistent technology stack consistent interfaces you know consistent look and feel of the ui both on-prem and in the cloud and so we can really once they start that move now they've got kind of the the familiar place to connect to uh for their data and to run their queries and that that's a nice seamless transition as they as they migrate what about other verticals like i can imagine health care and government services are you seeing traction in those uh segments as well yeah absolutely we are um there are a number of um of um companies uh in the in the healthcare space i think that the one of the larger ones in the in the government space which i have some exposure to um is uh is cms uh which is one that uh that we had done some work through through a partner to implement dremeo there and you know this was a project i think that was undertaken about a year ago they implemented our technology as part of a larger data lake architecture and had a good bit of success there so what's been interesting when you when you talk about the the funding and the valuation and the kind of the buzz that's going on around dremeo is that we really have customers in uh so many different verticals right so we've got uh certainly financials and healthcare and you know insurance and um you know big commercials like um uh manufacturing etc so so we're seeing a lot of interest across a number of different verticals um and and customers are are buying and implementing uh the product and all those verticals yeah right so take us out with where customers can go and prospects that are interested and even investors and finding out more about this next generation data engine that is dremeo absolutely so um i think the the first thing that people can do is they can go to [Music] our website which is dremeo.com and they can go to dremeo.com labs and from there they can launch a self-guided product tour i think that's probably a very quick way to get an overview of of the product and who we are what we do what we offer and then um there's also a free trial that's actually on the aws marketplace so if you want to to actually try dremeo out and you know spin up uh an instance uh you can you can get us on the on the marketplace do most of your customers do that like doing a trial with a proof of concept for example to see really how from an architecture perspective how these technologies are synergistic absolutely yeah i think that probably every large enterprise um you know there's there's a number of ways that customers find us and so you know often uh customers may just try the the trial on the marketplace um but you know customers may um also you know reach out to our sales team etc but it's very very common for us to do a proof of concept it's not just architecture but it would cover you know performance requirements and things like that so it's it's i i think pretty much all of our very largest enterprise customers would go through some sort of a proof of concept and that would be done with our with the support of our field teams excellent well robert thanks for joining me today and sharing all about drumeo with our audience we appreciate your time great thank you lisa it's a pleasure likewise for robert maben i'm lisa martin thanks for watching you

2021-03-31 10:28

Show Video

Other news