Redash: Open Source SQL Analytics on Data Lakes
Hi. Everyone and welcome to our product walkthrough for reed ash on data bricks my. Name is Jesse White House I've been part of the reed ash team since 2018. I do, success engineering, I make videos maintain, our documentation, and write a lot of sequel, joining, me is Francois from the data science team at beta bricks Francois. Would you introduce yourself. Hi. Everyone my. Name is Michaela, I joined, airbrakes, in March and have been working, as a data scientist in the data team since then as, a. Data scientist I really want to know the data I'm working, on deeply and Rajesh has been the perfect tool for that as you will see. Awesome. Thanks Francois so. Here's, our agenda for this session first. I'll give a brief summary of what Reed ash is and who it's for um. Slide. There. We go, here's. Our agenda for this session I'll. Give a brief summary of what read ash is and who it's for then, we'll quickly walk through the product and its main features next. Francois and I will demonstrate a couple of use cases and show you read ash on data bricks in action and finally. We'll encourage you to sign up for the private preview of read ash on data bricks coming this summer and take a few questions at, the end hopefully you'll understand why breed ash is awesome and a great fit for data analysts, on the data breaks platform. So. First what, is read ash read. Ash is an open source web application it's. Used for querying databases. And visualizing, the results read, ash on data bricks is now the easiest, way to query visualize. And share insights, from your data breaks data Lake and it's built entirely for sequel native users at. Heart that means we're a sequel box so, you write a query you hit execute you see results it's really simple but we also provide fast visualizations, right off the query screen and dashboards. For collecting charts together, Rina. Also makes sharing your work easy with direct links that work across your organization, and we, support scheduling, queries to run in the background similar, to a data bricks job and you, can also set up alerts based off those scheduled, queries if you want, to measure. Metrics, passing a certain threshold and of, course it's all driven by sequel, queries, so. Who's. It for now, let's talk for a moment about data. Bricks as a whole it's. A unified platform for data teams which includes data engineer. Data scientists, and data analysts, for. Data engineers, data, breaks offers the best Apache spark experience, for. Data scientists, data bricks implements powerful notebooks that can run sequel, Python, or Java, and Scala these, are familiar for anyone doing scientific research or m/l work like francois for example and finally. For data analysts, reed ash is your familiar, interface to the data and even. For data scientists we provide the easiest way to share plots and dashboards, without sending a notebook link and of, course once the sequel is written you can tuck it away in the background so that visitors can focus on the plots. So. Let's. Go through quickly what can read ash do. Well. First read, ash is tightly integrated with, data bricks you, can login with your data bricks cadential x' and tables, in your delta lake appear inside read ash automatically. With, read ash on data bricks there's no need to configure database connections, or firewall, rules which can be a pain point when using our open source distribution, the. Read a sequel editor makes life easy you write queries in spark sequel there's a database schema browser which, powers our autocomplete, for table names and fields and a, table of query results appears just beneath the query editor but. Our tables, actually have a few tricks up their sleeve too you can include links images, JSON, blobs and formatted, HTML, it's. Also easy to add visualizations, to queries right from the query screen so, you can iterate quickly to, get just the right data presentation, most. Common visualization, types are represented like, bar charts line charts area, and pie charts we, also have heat maps box plots pivot, tables just about anything you can imagine we can do it and once. You have something of interest you can bundle it into a dashboard we. Offer flexible execution, that goes along with this so there's query parameters, that are friendly for business users you, can schedule your queries so that dashboards, remain fresh and you can even configure alerts like emails page, or duty or slack notifications. So. You can backup your queries, trigger, query refreshes, or even use read ash as a gateway for query results, into tools like Google sheets and the. Best part is that links within read ash can be shared around your organization as, a single source of truth there's.
No Need to email screenshots, or PDFs, for. These reasons riaj is a natural, fit with data bricks we are a powerful, front end with a powerful, engine behind us. But if you're in a pinch REE - also works with nearly 50, sequel and no sequel databases and most, rest api is including, its own API all, of which comes together to make read ash a really natural fit for working with the data bricks platform. So. From, here let's do a quick walkthrough of the main features of read ash, so. Here we are inside, the read ash application. This, is the main screen that you see when you first log in and normally. You would see my favorite dashboards, or favorite queries but since this is a fresh installation there's not much to look at when. You start using read ash the first thing you're going to do is usually to write a query so we can do that from the create menu and hit create a new query this. Is the query screen which has three main components the, first part is the sequel editor where you write sparks equal queries so, if I want to write a really simple query here and get some results I can, do that like this and if, I hit shift enter command, enter or ctrl enter it will execute the query and show me results this, is the second part of the screen by. Default you'll see a table but you can also add visualizations, from this tab you. May have noticed when I was writing the query before that as I started, typing the name of a table it was auto completed, for me that, comes from this schema browser which appears over on the left any database. Or table that exists inside our Delta Lake is going to appear here automatically. Which, is really convenient if you're loading data from a notebook and you want to quickly investigate. It inside of read ash so. We created some sample data that we put in a database called SAS demo you, can see the names of the tables here and I was querying the accounts table which is why it was able to autocomplete for me so. Let's make this query a little bit more meaningful how, about we get the card country, for these accounts we'll just call that country, an account. Of star. That. Will call number of accounts. Just like that then. We'll need to we'll. Need to add a group by since, we're doing some aggregation, and we'll. Order by two descending. Obviously. You don't have to write the query all in one line you can hit this format. Query button and it'll make it a lot easier to look at now. When we execute the query we. Can see all of our. Accounts. That exist on this table ranked by number of accounts and country so looks like the most number of accounts come from the United States then Japan Germany, Israel and Korea I'll. Go ahead and name this query we'll call it number of accounts. By country. Just. Like that and, let's go ahead and tag this let's just say it's part of the essay I asked demo that, way, I'm. Done with the query for now so I can actually just click this button that says show, data only and now, let's add a visualization. We'll. Make a bar chart that just shows the country and the number of accounts and by. Default REE Dash is going to sort this axis alphabetically. But, in this case I want to preserve the sort order from the query which makes it a little easier to read we'll. Call that axis country, of origin, and the. Y-axis we can call number of accounts. Since. Number of counts is related to revenue let's make that green as. A final part since our axis labels are pretty good we can even hide the legend that way we get more space in the viewport for showing the chart as. You. Can see when we're on the view query screen you, don't see the sequel which prioritizes showing, as much of the visualization as possible, and there, are a couple, other cool tricks that are tied to this screen we, can for example set, a refresh schedule for, this query it, defaults. To never but in this case let's just add a refresh schedule so it runs every five minutes that. Way every time we visit this query we know that the result is no more than five minutes old. From. Down here we can also look. At the API key for this query as, I mentioned everything you can do from the front end you can also do via the API so if we copy this results in CSV, link we, can go into something like Google sheets and we. Can import the data just immediately, just. Like this. Next. I'd like to add this into, a alert. System, I'd like to receive an email if our number of accounts goes over a certain threshold to. Do this I'm going to make a couple edits to this query it's.
Useful Right now but I'd like to add one more column so. I'm going to take the original query and wrap it in a common table expression, which. We're going to call base. We. Can select star from base and, then. We'll also add a window function that gives us the sum of number. Of accounts. Over. Order. By. Select. One and. We'll call this total number of accounts. Just. Like that. There. We go now. I can save this query and show only the data and you, can see that the total number of accounts that exists on the table is shown all the way on the right there's 696. Of them and of, those 269, were, created, based out of the United States in this. Case I'm only going to use this column for an alert so I can edit the visualization, for the table and hide this column. Next. To create an alert I use the create menu and, I. Can search for the query I just made the. Most recent queries I've worked on will appear in this list but search also works. I'm. Going, to create an alert based on the total number of accounts in this, case I'll set it to 690. This is frozen data so I know that as soon as this query runs again I'm going to receive an alert notification, but, that's okay in. A real live production environment, we'd be using dynamic data and. We'll set a custom template we'll just say congratulations. And. I. Have ahead of time I've written a little. Template for myself so we'll look at that in a second, now. That the query has been generated, and the, alert to go along with it I can configure a destination, by, default my email address is available but, if the organization configures, it I could also click this Add button and, send, pager Duty notifications, slack, notifications. Or generic web hooks, so. The, next thing I want to do is take that chart I made and add it to a dashboard so, from the create menu I can hit new dashboard, and we'll. Just call this sa, is, dashboard. Just. Like this. So. Now I'll add a widget to this dashboard and. Just as with creating an alert on my most recent queries, appear here and we'll be, able to pick which visualization, we want let's. Start with the chart that we made earlier let's just make that a little bigger and, you don't have to just use one visualization. From, a query you can have the chart and the table on the same dashboard which we'll do here we'll. Make that a little narrower, drop. That over here and make, these two height matched and that's. How easy it is to take a couple visualizations, and, toss them onto a dashboard, now. In our X demonstration, a little earlier during the keynote he showed off a monthly recurring revenue, query I'd like to include some of that data on this dashboard as well so, I can search for it up here. Here's. His m RR breakdown chart and what. I'd like to do is be able to limit this by date but, I don't like to play around with his query so I'm going to use this vertical ellipsis menu and I'm gonna fork it this. Creates a new copy just for me so what, I'll do is I'll call this M RR breakdown, with. Date filter, just. Like that, one. Thing that we haven't looked at so far on this demo, is how you can insert query parameters, into your queries so I'll show you that now a query. Parameter INRI - is any, time. Where you have double curly braces that, are surrounding, a piece of text so. I'm using this little button below the query editor to, manufacture, these for me so, you see I have a time, range start, in a time range end, I could, also just create something called Purim and as soon as I close the double curly braces the parameter appears down below since. This is a date range I can use. Both markers, in order to make it work so well ad and V. Totals, month. Between. And. Then. Sparks equal I have to wrap dates in individual, quotation, marks just, single, quotes just. Like that and. Now I can select the time range so, let's say last year let's. Suppose I I run this query this way. And. What. I can do now is any time I need to rerun this query all I have to do is change the parameter, marker and changing. The parameter, marker at the top will, allow me to see well here's last year but I could rerun it and look at for this year too so, I'd, like to include this table, on my dashboard, and I, can either go to the dashboard screen and add it from there or I can add it from this screen I'm going to do that here by clicking this vertical ellipsis saying, add to dashboard and.
I, Just look for the SI is dashboard and hit OK and from, the notification, that appears below I can jump straight to the dashboard. Here. I'm gonna edit the dashboard, and just make this widget a little wider so we can see everything. Now. For the last part of this demo as we wrap up right. Now the m RR chart, or table is currently. Got a parameter, but, I also want for these two queries to use the same parameter, so, first I need to go edit this query manually, just. Like I did before I'm going to use this double curly brace button will, create a parameter called time, range. Just, like this and. I'm. Using the accounts table and it, looks like it has a created, at field so, I'm going to go ahead and insert that into the query. And. We'll say we're created, at is, between, and just, like we did before we, have to use the single quotation. Marks. Just, like that. We'll. Set this to last. Year just to test it. Perfect. Now. I've got a parameter, applies to this one as well so. If. I go back to my dashboard. Now. You can see that each of my widgets includes, a parameter, so. For the final part of this what I'd like to do is make sure that all of these parameters work together and I, can do this by using parameter, mapping we'll. Create a new dashboard parameter. That we're just going to call time range. Just. Like that and just like, that the parameter has jumped from the widget straight, to the top of the dashboard and now. I can select each of these others and use the existing time, range parameter. Widget and, this. Will link them all together. And. Now if I want to look at this year versus last year I just, hit this year and apply. Changes and all, of these will update now you, may remember that I created an alert and I had set my query to run every five minutes I just wanted to show you if I jump over to my gmail looks like I've got a congratulations. Alert hey, there we go we hit 690, accounts, so. We made it this. Has been a quick walk through of the features of Reid ash. So. That's been a walkthrough of the basic features of read Ashe now, Francois, will take you through a case study from our hackathon, earlier this month, so. Thank you very much Jesse, this, next demo is the result of a hackathon. Project that, we did recently in the rubrics we. Are using it today as an illustration, of how re - can help you build powerful, insights and visualizations. From your data and then. Share it across your company to be more data-driven, so. Let's start with a simple question can we compare the popularity of various big data and genes here. I named Hadoop, hive and obviously. Spark, let's. Try to use public data and a. Great, source here is a tool widely used by data engineers, in data scientists which is Stack, Overflow. Stack. Exchange the, platform behind Stack Overflow is, very nice because they publish a lot of their data on their website so. This, data it's a few, hundreds, of gigabytes of bronze level tables. That. Are stored in a couple of archives. 7z. Files, and. That's where the data breaks platform becomes very handy so, I spun, up a big cluster with, all of memory to, download those archives then extract. It from 72. XML, and then, use spark XML, function to convert those files into Delta Delta tables. Finally. We, have a step to extract the data that we care about so here we want data that spark Hadoop and hive to. Clean it and save, the results as a. Goal did Delta table. Then. Comes, the power of data, bricks plus radius, integration, I can. Immediately good, on Rajesh, and query, the Delta table it typically takes just a few seconds and then. Plot the visualizations, aggregate. Them into powerful dashboards so. Now let me walk you through the process so. First, let's, have a look at the data as you. Can su see in that first query we have in, the table two text fields the, title and the tags which. We used to filter the post ready to spark Hadoop and Hadoop and hide the. Creation. Date column, will be useful to study the trends, of post, creation, and, we. Can also use it to, distinguish. Between early. Questions that are typically more general, about the technology, versus, the more. Recent questions, are typically much more specific, okay. Then, we have the views answer, and favorite counts that, help us to evaluate the popularity of the past so that's a lot very nice a lot, of very nice data to process.
Let. Next. Let's start, exploring it. These. Query allows us to compare the counts of questions. Answers. Favorites, and views per technology, to, know which one is the most popular please. Note that the, view. The. Purple, column. The. View count has been scaled down by a factor of 1,000, to put it in the same chart as the other three metrics so. Actually when you see a hundred twenty thousand spa, reviews it's actually 120, million views and. Now if we compare those. Three, technologies it's. Nice to see that spark is at the top in every, metric. Next. What. If we want to understand how those technologies are trending, then. We could look at the trends of the number of questions over time that's. What you can see in that query where, we are aggregating, by date, category, and then. We're computing the number of posts per, technology, based on the tags. Notice. On the bottom how how. Noisy the data is here and that's. Because it's aggregated, per week and. Fortunately. We have this date AG parameter. That, allows us to change the granularity either. At week with. Li monthly, or quarterly or the early level so if we select the. Quarterly level after a few seconds then we will see that the data is, much. Less noisy and it's. Nice to see the the trend of spark growing up so much after, 2014. To, reach a very high level now. Okay. Now let's imagine you want to quickly access, all the posts that, talk, about a specific topic. Dear. To your heart. The. Progress we have hundreds, of thousands, of posts so how to retrieve, the one that you care about that's. What the, next query enables, and. Mainly. The intelligence. Here is in the where clause that contains, a couple of lines so the first two lines. Enable. Was to select, for a specific world so. Let's, imagine we want to know but Delta let's type Delta, in type. Of world then. The, next line technology. Enables. Us to select a parameter. That's. Ready to spark Hadoop or hive technology, so, we will only know the. Only, have the post here that talk about spark and. Finally. The, last parameter. Enables, us to select the minimum years so let's imagine we want to know all the recent words that talk about the. Most recent. Topics. About Delta so, now we can easily see all the top posts published on SportsCenter. Lake since 2019, and if. You look at the answer, column you. Can see that all of the, all, of the posts here have at least one answer so that's that's really nice if. You have questions, you can go on, Stack Overflow to get them answered. Very. Good so, finally. Really the the power of research is to allow us to integrate all that data in a dashboard which. You can refresh automatically, and then share across your company for. Example, if you look at those two graphs you, could easily study, the technological, trends so. Imagine your company, now is. Trying to choose your next technology stack for big data you, can. Use the those. Two graphs to, argue that spark is the obvious choice, which. Is a good thing for a spark conference. Now. Imagine. Your company is developing an upper on open-source. Project, and you want your, software engineers to know what are the most common use or pain points on your platform so. This, table is really a great place to explore the specifics, let's, randomly pick a word sorry -. And. And. Now we can, see all of the posts that talk, about every - so. That's really nice we see that we have around a hundred past sorry. - is getting, popular let's. Now click on a link so. How to set standard sickle bigquery, in red - and we. Directly. Reach, to a stack overflow page, so, you can see all the details and if. We scroll through you, can see that the first answer is coming. From our very own a read our founder our, small, world that's.
It For this demo thank, you very much for your attention we wanted to show you what. You can build in a two days hackathon, project using. Data bricks Andrew - platform, you, can easily reproduce that for yourself for, your own use cases as. It comes from public data rrah. - has been really great tool for me to explore, data quickly and in depth at the same time it's, really super helpful in my work and for. People who have the chance to work on the data breaks platform I'm really super excited to. See agree - getting integrated into it and I hope that you will make the most out of it thank you. Thank. You so much Francois and that's. It for our walkthrough of Ray - thank you so much for joining in this session as a final. Next step for you I will encourage you to sign up for the private preview weightless, at the, the, link that's shown here it, should be available this summer and we'll. Open it up to take a few questions thank. You so much for your time. You.
2020-07-21 20:38