Data lake analytics with Autonomous Database

Show video

hi my name is George Lumpkin and I wanted to talk to you today about using data Lake analytics with autonomous database every Enterprise has access to huge amounts of data and the cloud makes it much simpler to get value from that data I'm going to show you how with autonomous database you can do that now I'm mostly going to be talking about the capabilities that we have today available within oracle's cloud I will probably verbally talk about a few areas that we're going to be developing in the future as well but for the most part everything that you see on the slides is available today and things that you can try out today on oracle's cloud so you know our goal here is that you should be able to use all your data regardless of where it's coming from to help innovate to help Drive business processes improve your profits to reach new customers to reduce costs and you know really the thing that gets opened up in the cloud is the ability to do this more efficiently you can reach out to more data uh you can process data at higher scale and you can do more types of Analytics and so we're going to be looking at an oracle cloud service autonomous data warehouse this is built upon the Oracle database that many of you are familiar with and probably use for many years and what I wanted to be talking about in this presentation is not so much talking about the database features I think a lot of you understand that Oracle database provides a high performance query engine provides sophisticated SQL processing has scalable data loading and I wanted to talk about something different what does the cloud add on top of the Oracle database capabilities that you already know about and what do we deliver with autonomous data warehouse that's above and beyond perhaps the on-premise Oracle data warehouse that you're using today and I'm going to talk about really three areas in this presentation the first one is that autonomous data warehouse delivers simple and Broad data ingestion all right you're going to be able to get all of your business data and be able to do analysis on that business data when you're using autonomous data warehouse and I'm going to talk about how Thomas Day Warehouse cast a really wide net for being able to reach out to all of your data and then I'm going to talk about how Thomas data warehouse is elastic how it scales and adapts to your workload and the real benefit for you is this gives you the ability to lower your infrastructure cost this is part of the promise of the cloud right you know when you have your on-premise data centers you have to build out large configurations of Hardware to handle any workload but in the cloud you should be paying for what your workload is actually consuming at that point in time and we'll talk about how autonomous data warehouse delivers that okay and then the third area that I wanted to talk about today is the simple and very sophisticated analytics how you can easily do any type of analytics on your data and really solve every business problem that you have using autonomous data warehouse and this is all really interesting this is of course is where you get the most value from your analytics Solutions doing the analysis and we will look at several examples here and also give you lots of information where you can go ahead and and deep dive with Hands-On as I go through this presentation I'm not going to be showing a lot of demos I'm really focused more on the concepts of autonomous data warehouse provides but we have dozens of Hands-On available so that you can go try out everything that I'm showing here and I'll have those Links at the end of this presentation okay so first let's talk about data ingestion there's lots of data available to you how are you going to bring it under the umbrella of autonomous data warehouse to be able to do analysis um you know and I think maybe the starting point here is you know to realize that when you're in the cloud um there's data available everywhere right there's data within your on-premise data centers that you can access there's data within your own cloud accounts or within your own cloud Tendencies there's public data sets throughout the cloud there's data sets in other clouds that you may be that you may have accounts in and within autonomous data warehouse you can access all of that data regardless of what format that data is regardless of where that data actually resides and you know some of these start to become familiar for those of you who've used an Oracle database right you know you've have an Oracle database today you want to create external tables or load data for files that sit outside your database and you use a feature called external tables right and this is something that many Oracle customers have used for many years um in the cloud you do the same type of concept for folks who are used to developing with SQL you can create external tables and we can see an example here I'm creating an external table it's not pointing at a file on my local file system it's pointing in the cloud so it's pointing at the object storage and you can see that it's accessing a table you know customer sales uh and the file for which that is creating an external table called customer sales and the underlying data comes from a file sitting in an object store bucket and it's not just any file it's a parquet format file right I talked about how we can support any type of data we support parquet files and Json files and Avro files in addition to like CSV or Excel files and all of these are easy to access when they're in the cloud and you know if you work with a lot of data sitting in the cloud you don't just have sort of individual files sitting in buckets in the cloud you'd actually created a structure right if you think of if you're managing what you might think of as being a data Lake you've started to organize your files so one common organization could be is that you have a group of files that represents one table in this case we're looking at a table called customer sales and you're and you have files corresponding to each month of new customer sales you're constantly adding new files into your object storage as you accumulate more data and so you've organized this you've created a customer sales bucket and it has directories with different files and within autonomous data warehouse we recognize this we understand these types of file formats and we can so you don't need to create sort of an individual external table for each file we recognize that you have a collection of files you've organized these files in the directory structure and we can automatically create and what we call an external partition table we're creating a partition table with an oracle based upon the file structure that you've created in the cloud and you know you kind of see how you start to blend you have data sitting in a data Lake that data is represented accurately within autonomous data warehouse you can query it and get high performance right you're creating these partition tables so you get performance optimizations like like partition elimination and for customers who like to write their own scripts and code that's what these features are about and when you sit and say well gee if I have a data Lake I'm going to have tons of files right I I need to manage the whole data Lake and we've extended autonomous data warehouse to be able to do that we can create external tables so that you have data sitting in your Object Store sitting in your data Lake an autonomous data warehouse can query those tables you can load the data from the object store into your actual database itself you can export data from your database out into the object store and then more interestingly is we provide the capability to be able to manage these files right maybe you want to move files into different directories maybe you want to create new files maybe you want to delete files you want to see all the files that are available all those capabilities are built directly into autonomous data warehouse and so the idea is that you don't have sort of this bright white line that says this is data inside my database and this is data outside by database but you start to be able to blend this an autonomous data warehouse can sort of seamlessly access data wherever it is in the cloud and manage the files that maybe even outside the database in a seamless way in ways they're similar to how you manage database tables today now what if you say well that's all fine and good but I'm not a SQL Developer I'm not a coder I don't want to work with scrubs and I'm just a data analyst and so we built an entire set of tools for the data analysts we built tools for you so that you don't have to dive into the details of external tables when you want to access data and the data Lake and we have a collection of tools right we have a set of built-in tools that we collectively are calling the Oracle autonomous data Studio which allows you to easily load data to do data transformations to cleanse that data to browse through what data is available to you both inside your database and outside and do simple analysis now I'm really going to focus on the first two here around the load and transform but you know we have sort of a full set of easy to use built-in tools and autonomous data warehouse when you're in the cloud you know this isn't simply a when we talk about Thomas data warehouse it's not just a database service it's a platform for working with all aspects of data and so we provide a database engine but we also provide user tool tools for different user personas like data analysts like data scientists to be able to work with the data and autonomous data warehouse and so the simplest one is just starting with loading right and and again I'm not showing demos here but you're able to go try this out with our with our hands on very easily and for loading we provide really the simplest possible experience that you're just able to drag and drop files we will show you all the files that are available within your Cloud within your Object Store and you can pick which files you're interested in you can drag and drop and choose to load those files into your autonomous data warehouse or if you just want to run queries on those files and leave the data where it is outside in the object's door you can create external tables and this works for all the different file formats that I just talked about you know parquet formats and navrow formats and cell formats and Json formats it also worked across all the clouds and I'll talk a couple more minutes about what we're providing for multi-cloud support for data ingestion so we have a simple data loading tool and then we have a simple data transformation tool um I think everyone realizes that a lot of the time spent is around data wrangling it's around cleaning up your data and within Oracle we have a tool called transforms built into autonomous data warehouse this is actually built on technology that oracle's been developing for a couple of decades with Oracle data integrator and all the previous data integration products so Oracle is integrated a mature and fully functional data transformation platform into the autonomous data warehouse and you can see this has the the sort of workflow Paradigm of Step by Step doing map creating transformations of your data being able to create cleanse data with a you know kind of drag and drop UI it's a UI developed so that you don't have to write code that the transforms is creating the underlying SQL code to be able to actually do these transformations so we have the ability to reach out into the cloud and access any file in the cloud we can do this with coding or we can do this with uis and then we can access any cloud right you know we think about you're using Oracle Thomas data warehouse you would be using Oracle Zone Cloud you'd be using oci Oracle Cloud infrastructure but your data may not all be an oci you know maybe you are also using Azure cloud or you're using AWS and you know it doesn't matter from the perspective of autonomous data warehouse you can query or load data that's stored in the object store of any major cloud and it's just as easy as you're able to access data within oracle's cloud you need to provide autonomous data warehouse for the credentials to access the other Cloud's Object Store but we've done integration with each of the clouds so that you're not just saying every time you go access a file sitting in Azure that you have to go basically log into Object Store advancer what we're setting up is that we've integrated with these other Cloud security Frameworks um you know for example the Amazon resource names are the resource principles kind of thing where you're saying within your Azure cloud or AWS or Google Cloud platform you're saying I grant this autonomous data warehouse access to these buckets in My Cloud and so you've essentially um authenticated the autonomous data warehouse to be able to access data in these other clouds you're able to do this securely and you only need to set it up one time and so it's really you know part and parcel of how we inspect people to use autonomous data warehouse to reach out in a multi-cloud environment and you could say well that sounds really cool but you know aren't there challenges with a multi-cloud environment you know what about the Network bandwidth between the clouds what about the egress costs as I access this cloud and Oracle is tackling that problem as well we've started with a partnership with Microsoft azure uh and what we have is that we've in partnership with adser we have What's called the Oracle oci ads or interconnect and this is high speed connection between co-located data centers and we have this in 12 regions to date you can see on this map several of the regions that have really the the Microsoft logo and you can see that in Phoenix and Ashburn and London and Frankfurt and Amsterdam as examples that the Oracle Cloud region is interconnected with the Azure Cloud region and what you get is a high bandwidth low latency Network between Oracle and Azure cloud and we have a business agreement such as there's no egress or Ingress charges and so this allows you to access data sitting in Azure cloud know that you can access it with very high bandwidth and very low latency and know that you can access it without any type of excess charges related to egress you know our long-term roadmap is to do this with other clouds that we view that the reality um you know where Enterprises are moving to the Future it's going to mean multi-cloud not everyone is going to have all their data sitting in one cloud and so we're working and really leading the market oracle's Cloud towards this sort of multi-cloud reality and so the way that you should think about the implications of multi-cloud for autonomous data warehouse is you know you can use your autonomous data warehouse as your data lake house as your source to access data and do analytics on data regardless where that data resides if your data is in Azure or AWS you can have your autonomous data warehouse running an oci managing and analyzing that data and this is a big you know this is perhaps a a different mindset right folks think that oh I'm building a data warehouse It's a monolithic database sitting in one cloud and I'm bringing all that data into that one cloud and that's where I'm doing my analysis and what I'm conveying here is that our vision is much broader our vision is autonomous date of Warehouse is a platform that you can access any data across any cloud and do this seamlessly really an enterprise-wide data like data lake house hey the final point that I'm going to talk about regarding this idea of using cloud data is you know a lot of customers have had the question of well I could store data in object storage and collect all my data there and that would be architecturally a lot customers think of as being a data Lake or I could store data in my database I could store it in autonomous data warehouse and the benefits of storing it out Thomas data warehouses the data is highly optimized for doing analytics right you you put data into a database to get that sort of optimized high performance storage um and you know autonomous day Warehouse has been in market for a number of years um you know you customer could look to the price comparisons they say well object storage is you know within oci today is 26 dollars per terabyte per month it's 118 dollars per terabyte per month for autonomous data warehouse and therefore customers have been faced with a dilemma should I save money and put data in object storage or do I want to put this data in autonomous data warehouse and get higher performance uh have a lot more functionality with the availability of the data and we've recently made a price change because we don't think this dilemma should be what customers are thinking about are tackling and so we've lowered the price of autonomous data warehouse storage it's a 25 per terabyte per month essentially a parody of object storage and what this means is the decision on where to store your analytic data should be based upon you know what makes the most sense you shouldn't think about what's the cheapest place to store the data you should think about what the business use of the data if your data is primarily going to be used by the autonomous data warehouse put all of that data into a Thomas data warehouse you don't need to do any type of cost of cost optimization based upon storage hey sometimes you want to keep data in object storage sometimes you have not only the data warehouse actually net data but you're running lots of spark jobs and other type of processing over that same data at the same time and so it does make sense to have object storage based data Lakes but you should be choosing your architecture for a reason you know based upon your business requirements not based upon the cost values so that was a lot right I mean we talked about a lot for the data ingestion right and and just to sort of wrap it up right for our data ingestion it's simple you can bring in any format of data into the autonomous data warehouse um you can do this using scripts and apis you can do this using simple built-in tools that are really designed for the data analyst so we fit for whatever your preference is for how you want to work with data we can do this securely across all of your data across any cloud and I think this is an important point that you should thank your you get the most value from Analytics when you have the broadest set of data and so you shouldn't be limited to one cloud or sort of one Silo of data um and then you can do this cost effectively you can store your data and autonomous data warehouse you don't have to worry about what the costs are of being able to bring all this data into a cloud data warehouse right so so that was the first thing I wanted to talk about was Data ingestion the second one I wanted to talk about is elastic scalability and I'm really going to sort of talk about one thing here and this this is really simple with an autonomous data warehouse within the cloud you are paying for the resources that you use right and and I want to contrast this with an on-prem Oracle database right so you you know if you create an on-prem Oracle database you're putting down you know a hardware configuration that's got 16 CPUs and it has uh 20 terabytes of storage and that's what you paid for day in and day out right um and within the cloud and within autonomous data warehouse it's completely elastic you can say it specified the exact number of CPUs that your workload needs you could say I need eight CPUs and 12 terabytes of data you can change it anytime you want you come up to the end of the quarter and say now I need 16 CPUs for the next five days scale it up to 16 CPUs you're only paying for 16 CPUs for those five days you scale it down all the scaling completely online you can change the capacity of your data warehouse anytime you want and then we take it a step further with auto scaling suppose instead of you saying well I want to double the size of my system as I come to the end of my my quarter M processing you could let Oracle Thomas data warehouse do that for you we provide all the scaling such that based upon your workload we give you more resources so you can handle the compute capacity that you need and we do this up to 3x of your current database size right so if you have an ocp an 80 CPU database we'll keep giving you resources up to 24 o CPUs based upon your workload and then we're only going to charge you for those extra resources that you use uh and these are Big benefits this is big cost benefits of using a cloud data warehouse using autonomous data warehouse as compared to your on-prem system or even as compared to how a lot of other Cloud databases work today and this is a lot of the promise of the cloud right that you're able to quickly scale up and if you think about the scenarios that I talked about with data Lake maybe you want to maybe you don't have a production data warehouse maybe you have a data science a data science team and they want to go in and run in some experiments for a couple of weeks over 100 terabytes of data well now they can spin up dozens or you know over a hundred database CPUs to do their processing and experiments and then shut this off and spin it down when they're done and so they're able to have an environment that's highly Dynamic and highly elastic to be able to support the analytics that they need and all of this for those of you who are familiar with the organ damage all this is still the Orca Database The Oracle SQL engine all of the functionality of the Oracle database but you're really paying for it as you use it okay so the last section I'm going to talk about is simple and sophisticated data analytics and I'm going to talk about this in the context of a simple application right or I shouldn't say simple application it's actually a very sophistic application but it's a simple business scenario to understand um this is this business scenario this com hypothetical company is called Oracle movie stream so this is a online movie streaming company you know much again to all of the type of streaming services that are available when we turn on our TV today and this company has a customer base that's been growing uh they generate huge amounts of data based upon what their customers are viewing what their customers have looked at and I think this is a really good scenario to start to understand all the different types of analytics that could happen within one single company and within one single analytic environment and I really wanted to call out everything that I'm going to be talking about here with this company you can look at detailed demos and a detailed tutorials and go through all the examples yourself this the link will have more Links at the end of the presentation as well so you know you can see the the user experience of Oracle movie stream this should be familiar to many of you and I think that as you you watch TV yourself you understand that there's a lot going on under the cover there's a lot of intelligence built into these types of systems so let's look at let's look at an example I know that this is a little small but I think everything can be read so we started the top of the screen and they have some notifications like right here it says special pizza offer right where did this come from you know movie stream is looking at all of its customers identifying witcherage best customers identifying which customers might need to leave uh and it's using that to say well do we want to get promotions to some customers maybe this customer is at risk of leaving let's give them a special offer right um you know the Top Line um most of these types of applications is hey what are the popular movies right what's everyone else watching right and this is straightforward analytics this is looking at what are the most popular across um you know most popular movies in the United States today right um but then you get more interesting things right well what's the most popular movies and TV shows within my city right to be able to start looking at taking in spatial analysis where are you now what are the other people doing that are near you okay um maybe you want to look at the movies that are Award winners and you're pulling data movie stream is pulling data from third-party providers that lists the award winners and that data is coming in whatever format that data comes in maybe it's being shipped in as Json so you're able to ingest and analyze Json data as well um and then you have recommendations what movies do we recommend for you now that can be based upon machine learning but a lot of times it's also based upon graph analytics you like this movie what other movies are related to that movie that we think you're going to like and so there's just on the home page you can see that you know there's multiple types of analytics that are being done to enhance and improve the user experience and Oracle Thomas data warehouse supports all of those analytics within the single platform we really think of this as being a convert Oracle as being a converged database that is within autonomous data warehouse within the Oracle database we have machine learning we have graph analytics we have text and search capabilities we can process Json data we can do spatial analysis all of these things are built in and all these things are built in seamlessly you don't have to sort of integrate a whole bunch of different apis you don't have to integrate a whole bunch of different database engines you have one database engine using SQL to do all these types of analysis to do the spatial to do the machine learning to do the graph Analytics and this is a lot of the real power of what Oracle provides for Analytics is that you're not having to assemble if you look at the movie stream application you're not having to assemble five or six different types of database engines and push data between five or six different types of database engines and write different code for five or six types of database engine you're doing this all within a single database engine and having the sophisticated type of analytics to do everything you want right you know we talked about there was a pizza offer an offer for a free pizza at the top of the screen of Oracle movie at the top up there on Oracle movie stream well that was based upon machine learning where we built models to predict churn and said if these customers are likely to turn then we should work to keep them and we should do promotions to keep them right and so Oracle Thomas data warehouse is a machine learning engine you can use SQL as I talked about sometimes data scientists will use other languages so we also support Python and r and we implemented dozens of scalable machine learning algorithms within two autonomous database we made it simpler with capabilities like automl to help guide the data scientist or the user of what algorithms would be best and which can solve really any type of machine learning business problem you know the example we saw was about customer loyalty and retention but you could do customer segmentation forecasting and so forth using machine learning as well similarly we provide graph analytics built into autonomous data warehouse and what we saw with movie stream was a graph analytics we're helping to generate the movie recommendations yeah what are the recommendations for me based upon what I've watched before and it's really looking at doing graph analytics of the type of movies you've watched the type of movies that other people have watched based upon your watch pattern and providing a specific list for you for what movies movie stream thinks you will like and again it's the same idea as machine learning the autonomous day warehouse for buying a full-fledged graph engine built into the system so you can analyze how different data entities are connected right that you can do graph query type of languages to be able to look at the interconnections between data within your data warehouse or even anywhere across the cloud and you can solve problems like identifying product recommendations based upon customers with similar behavioral patterns so just to sort of summarize and and I should mention I have talked about graph and machine learning at a pretty high level here what I really like everyone like you to understand is these capabilities are built into autonomous data warehouse I could easily spend a full session on graph and a full session on machine learning and a full session on Json database but all that is there and the Hands-On you can see all the details but I think it's more important to realize our strategy our strategy is to have a single platform that does all these things so just to summarize when we talk about data Lake analytics with autonomous data warehouse we're providing three real benefits of the cloud and these are benefits above and beyond what you might see with your on-prem data warehouse you can easily ingest data from anywhere you can adjust any type of data format from any cloud and do Transformations and bring this under the umbrella of your autonomous data warehouse you can scale to meet any of your analytic requirements while doing this cost effectively you can control costs using the elastic and dynamic scaling of autonomous Data Warehouse in the cloud and then what we just saw you can do any type of data analytics you can solve all the data analytics for scenario like our movie hypothetical movie stream company with a single platform do this scalably and cost effectively and it's a lot more productive and easier to do it so we have lots more information on getting started with autonomous data warehouse looking at the examples and live Labs that we have you also have the ability availability to get started on all of this for free we have free version of autonomous data warehouse you can get started today and try out all the functionality that I described so thank you very much I hope that you will try autonomous data warehouse and try out our tutorials and live labs and hope you enjoyed this presentation

2023-04-14

Show video