The Builders’ Data Cloud: Tapping into the Power of Streamlit
Welcome to Day 2 of BUILD 2022. I'm Ryan Green, and I hope you're as excited as I am. It doesn't matter where you're tuning in from or how long you've been building. Your impact is felt every day. Let's keep the momentum going today, and we're going to take an even deeper dive. BUILD is a global affair, with local events taking place across the world.
So wherever you're tuning in from, welcome, including locations in New York City, Warsaw, Amsterdam, Helsinki, Bangalore, Melbourne, Berlin, Tokyo and London. 16 locations and all. It's simply remarkable. The community of builders turn the impossible into possible. We have a great lineup for Day 2, so if it's early in your time, time to take down that coffee or tea because we are jumping right into it. We need tools in platforms that keep up with the demands to move fast and iterate faster.
There's no time for rigid, complex architectures that slow down productivity. In today's first talk, we'll hear from Adrien Treuille, former professor of computer science at Carnegie Mellon and current Head of Streamlit and Director of Product at Snowflake, as he shares his vision of enabling every developer and data scientist to build, iterate, and share world-class experiences without a full stack engineering team. With that, I'll pass it off to Adrien. We'll see you on the other side. Thanks, Ryan. Bring it in.
Hi. So my name's Adrien. Besides being a professor at Carnegie Mellon, I'm a confessed late-night coding nerd, proud Pythonista, and closeted Rustacean. But most importantly for this talk, I'm the co-creator of Streamlit. Now, Streamlit is an API that lets you rapidly build rich applications in pure Python, just like you'd use in a Jupyter Notebook or when training a model.
This app-building superpower can dramatically increase the impact of our work on data professionals. Now, I've been working on big data since way before it was cool. First as a professor at Carnegie Mellon but then at Google X and then at Zoox where I worked on self-driving cars.
Until I started Streamlit in 2018, I'd worked on dozens of machine learning and data science teams and it was always the same problem. We had access to some of the most important data in the organization and we were uniquely able to train models and create analyses that could dramatically impact others. But it was really hard to close this impact gap. Let me give you an example.
When I was working on self-driving cars, we needed to build a tool that let others search and explore our object detection models. We had a huge number of images and we had trained models to detect pedestrians and cars in those images shown as the colored boxes that you see here. We needed to make our models accessible to product managers to help set model parameters. We also needed to share with the operations team to find anomalies in our dataset. In short, we needed a tool that we could give our colleagues that would give them direct access to our dataset and models.
Partially because this would dramatically speed the operations of what we were doing but also just so that they'd stop coming into our office every time they wanted to check something. Now, this is a specific example, but if you think about it, it's really emblematic of modern data work. We're dealing with complex and cool data types. In this case images, object detection models, but it it could just as easily be geographic data or language data. And traditional tools are not well suited to this workflow.
We could create dashboards, but that only scales to simple use cases. We could use notebooks and those are in Python but they're not shareable the way an app is. We could literally create these tools from scratch using Flask, React but that's crazily time consuming.
Finally, and what often happens is you hire an external team to build these apps, but this is expensive, and slow, and creates a ton of dependencies among teams which makes it impossible to iterate quickly. I watched as super impactful tools like those that you see here never get built or take months to build for exactly these reasons. Now imagine if we, Python programmers, if we, data engineers, data scientists, machine learning engineers could create these applications quickly and easily ourselves without needing any external teams. Ideally, we'd want something very general to cover these diverse cases. Easy to learn, so that knowing Python is enough.
And fast. And when I say fast, I mean we want to build these tools not in months, but in weeks or days. This is actually possible in Streamlit and I'm going to show you how.
So what we're going to do is create a little tiny Streamlit app with each other. It doesn't do much, but it shows you how a very simple set of general tools combined together lets you do a ton with very little. So here I am with my text editor on the left and I'm going to just install Streamlit and it says simple as pip install streamlit, boom.
Okay, it was already installed, so that was fast. And now we're going to create a little Streamlit app from scratch. So I'm going to open up a app I have never created. my_app.py. We're going to import Streamlit. So this is just pure Python.
Let's also import NumPy, because we know we're going to want it. And let's say "hello" to Build. So st.write Hello, build.
Okay, so what we have here is a simple Python script, nothing fancy. Now we're going to run it through Streamlit. All right.
Streamlit opens up and we see Hello, build. Okay, that's a little bit boring, so let's make it more fun. We'll make it a header and we'll save. And now Streamlit says, okay, yes, we want to always rerun. And let's add a little emoji.
Okay, so this illustrates the first really cool thing about Streamlit, which is developer workflow. I'm just typing Python code in and we're seeing instantly the results of the Python code that I'm writing on this screen on the right. Okay, but this isn't an app. So let's start to do something cool. Let's first get a little bit of data. So this is why we brought in NumPy.
We're just going to create a little bit of random data right now. So we'll say data = np.random.randn, 200 by 2, and let's take a look at it. Okay, so all right, that looks a little bit boring. Let's actually look at a little bit more data.
Let's say 10 rows. Cool. All right. So already we're looking at beautiful data frames interactively in this web app on the right and we've got seven lines of code.
Let's see what we can do next. First of all, let's graph this data. So st.line_chart of the data. Okay, that's a little bit too much. Let's do five rows.
Oops. Okay. That's actually still too much, let's do three. Three columns I mean. All right. Now let's play around with some other charts.
So a really cool thing about Streamlit is that you have an amazing set of widgets which are all available at as basically single line calls. So here we have write, magic, text elements. Let's see what we have under graphs.
Area charts, line charts. Okay, let's try area charts. And go back to our app. Alright, so now we have this beautiful area chart here and let's just try a bar chart for fun. Okay, so you get the idea that we have quickly starting to, in just a few lines of Python, really create something that would be difficult to create in basically any other tool in the world.
But it's not quite an app yet. It's feels a little bit more like a dashboard. So let's make this interactive. Now, this is just a simple example but think about how powerful this is. Instead of saying that we want three rows in our data or three columns in our data, let's make that a variable. So let's say columns equals and now we're going to put in a slider.
How many columns? And let's say you can have between 1 and 20 columns. And now we're going to change this to columns. Okay. All of a sudden this has become interactive. Literally no callbacks.
We are at maybe less than 15 lines of code and we have a non-trivial app which is allowing us to manipulate data and visualize it all in real time. Super cool. But it doesn't end there, there's tons more stuff you can do. First of all, it's a little weird that this slider is up there at the top. Let's start playing around with layout. So we can easily say, you know what, let's put this slider on the sidebar.
Boom. This is already starting to look like an awesome app. And that data, it's a little awkward to have all that data there. Let's say, hmm, let's put it, let's put the data at the bottom for one thing.
Okay, so now it's at the bottom of our app and let's also hide it behind a checkbox. So we'll say if st.checkbox show raw data. Oops.
Then we'll show the data. Oh, and actually let's put that checkbox on the sidebar. So as you can see here, I'm literally just typing Python, I'm typing a few lines of code and things are moving around the screen and actually creating something that we'd be proud to share with our coworkers that let them manipulate our data and our models in a familiar app interface. I can show the raw data, I can not show the raw data.
We can change the number of columns. Let's do a little bit more layout. Let's actually say that we want these graphs to be in two columns. So we could say, column one, column two. And just think about how this is just a very, very simple declarative form of Python programming.
And so we'll change this and say column one, we want that line chart. Let's say column two, we want this here. Column one. Cool.
Or actually, let's put this guy just at the bottom. Awesome. So we've literally built a non-trivial app that allows us to do sort of interactive exploration, share information with others, show in high data, and we've done it in 14 lines of Python. So this is something that if you were to build this in React, it could literally take you a weekend or a week to put all these things together properly. We've done it in 15 lines of pure Python using a familiar sort of data-oriented style of coding with no callbacks.
Okay, so that's awesome for 15 lines. Let's see what we can do if we go to 300 lines. So you'll recall that in the example I gave you, we had this idea that we had a whole bunch of images and we also had a model that we were running in real time on those images, which is those boxes that you see there. And we wanted to share that with our product and operations teams. So we're going to jump into our next demo.
Okay. This is that app that I was just telling you about. It's 300 lines of code.
You can actually download it on GitHub if you like. And we're going to run it. Let's see, streamlit run streamlit_app.py.
Oops, proves that it's a real demo. Awesome. Okay. All right, so here we go. We're going to jump into the app. So what do we have here? We have, let's actually look, let's remove all the filters.
So we'll filter for everything. So we have 13,000 images in this dataset. We can go through them. It's actually a really cool dataset. It's of a car driving through probably Palo Alto or something. It's called the Udacity self-driving Car dataset, demo dataset of course.
And we can do semantic search on this dataset. So let's say we want to look only for examples with lots of cars. So we wanted to have at least 17 cars. So you'll see there's 53 images. We can run through this.
So already we've created a semantic search engine for our dataset in Streamlit. Really cool. But that's not all. Down here, we're actually running a neural net in real time on the images. And to prove that to you, let's actually change some parameters of the model. So for example, I can change the confidence interval.
So if we're super, if we require super confidence to detect anything, we see nothing. And if we, conversely, if we turn down the confidence interval to zero, we see tons of traffic, lights, cars, et cetera. This is an actual neural net running in real time inside the app that I can now share with others. And recall the really crazy thing about this is that this entire app is less than 300 lines of code and that means everything.
So the entire UI that you saw there is right here in the app. Loading all of the neural net weights is in those 300 lines. Let's see what else. Actually drawing the boxes on the images that's happening in the 300 lines too. And finally running the neural net, YOLOv3, including going through every single layer.
That's all happening in the app too. So this isn't 300 lines of we hit a bunch of secrets in some import libraries. This is 300 lines of pure machine learning code from top to bottom with less than 23 API calls that's creating an app that would be almost impossible to create with other tools. Amazing, right? Now let's see some examples of how other companies are using Streamlit.
Streamlit is an extremely general tool that can do all sorts of things. This right here is an app created by a single data scientist at Delta Dental which allows them to analyze call center calls including second by second sentiment analysis all interactively across the company. This is an app by a commerce agency called The Stable which allows them to do geographic analysis of retail store locations, again, interactively across the company. So, so cool. In fact, Streamlit was only released three years ago and it has had a huge impact.
It's been downloaded over 10 million times. We have over a hundred thousand people in our community and over 5,000 organizations use Streamlit including over half the Fortune 50. In fact, one of the companies that discovered Streamlit was called Snowflake. We were popping up in their hackathons. And of course, snowflake is known as the premier data cloud in the world but that's just part of the story.
In addition to Data, Snowflake has begun releasing an amazing set of Python tools and it has been evolving into an app platform with upcoming technologies like Native Apps. Now, of course, at the intersection of data, Python and apps is also Streamlit. So Streamlit started to come up more and more at Snowflake and we started talking and getting to know each other and getting super excited about each other's technologies. And we realized what if we could work together and bring this technology to Snowflake's customers as a first-party integration? It was just such a cool idea.
And on top of that, I have to say, Benoit and Thierry, the founders of Snowflake are just awesome. They are total coding nerds. We got in the same room and we were like, yep, we get you, you get us.
This is going to be awesome. And so earlier this year we decided to get together. Snowflake is the premier tool for managing huge globe spanning datasets with incredibly high performance and security requirements.
And we've been building the best way to build Python apps in Python. So Streamlit inside of Snowflake would give Snowflake developers and data teams the ease of use of Streamlit, but with the speed and scalability and security of Snowflake. Am I going to show you a live demo of Streamlit in Snowflake right now? The answer is no. We actually demoed this last week in front of a live audience but we are so close to releasing Streamlit in Snowflake internally here at Snowflake, that the product managers asked me not to share it because they're doing so many changes to the build. And also we're releasing this in private preview in January.
So instead of showing you live, I'm going to show you a video of what this looks like. It's pretty darn sweet. So similar to what we were talking about before. Here's an example where we have let's say a hypothetical marketing company which has built machine learning models to analyze the effects of different marketing spends.
We can now click Edit and edit this app in the Snowsight UI. As you see on the left, we have a beautiful Python editing experience with syntax completion, with highlighting with all of the cool tools that Snowflake has been developing through Python Snowpark. As we make edits to the app, for example changing those text inputs to sliders. The app live updates just like an open source Streamlit, but it goes past that. You can also share the app with others in your organization using Snowflake's role-based access control system.
That means that you could have the huge impact but on internal customers in your company with Streamlit and Snowflake together. And it goes even further. We have big plans for both Streamlit and Snowflake together and as an open source project. So as an open source project, we are releasing new data frames, which are editable. We just releases some beautiful new charts and there's a ton of more cool stuff which we describe in our roadmap, including stateless Streamlit and the ability to share and edit machine learning models live in your Streamlit apps. Also, as we showed you in our not-so-live demo just a moment ago, we are releasing a first-party integration of Streamlit in Snowflake that lets you take all of that open source awesomeness and bring it to bear on your Snowflake data.
Now that's for internal customers, even bigger is we are going to allow you to create these apps and share them in the marketplace so that external customers can get access to Streamlit apps through our Native apps framework. So we have a huge set of awesome features which are coming out and we're really excited to share them with you. In the future, you'll be able to use Streamlit as the UI component for native applications.
With native apps, distribute and monetize your apps via the Snowflake marketplace to companies across the data cloud. This will open up a whole new world of possibilities unlocking new forms of collaboration and new potential revenue streams. So, hope you found that cool. Check out some of our other sessions today. We have "Building Data Applications with the Snowflake Marketplace, Snowpark, and Streamlit;" "Machine Learning with Snowpark in Python;" and my favorite, "How Streamlit uses Streamlit." Thank you.
Thanks Adrien. Whoa, we are off to a strong start and we're not slowing down. Similar to yesterday, we have geared these experiences to fit your needs. While yesterday, we had two choices per session. Now it's time to up our game.
We added a third track to today's event with the opportunity to jump into longer form hand-on sessions. Pick which works best for you. First up is your choice between two labs or a deep dive by one of our partners. One lab is on machine learning with Snowpark for Python and the other one is on building a data application with Snowflake Marketplace, Snowpark and Streamlit or you can choose the partner track with astronomer on cell service data orchestration with Airflow and Astro. Friendly reminder, the labs are longer and will take up to an hour and a half whereas the sessions will be up to 30 minutes.
Again, there's no wrong choices here. We'll see you in about an hour and a half. Enjoy. - [Narrator] Why do you build? - I like to build because I like surprising myself with what I'm capable of. - I love to problem solve. I love a challenge.
- And it's especially a lot more fun than not building. - I like solving problems with data. - Because I want to support our open source communities. - And achieve more with the data cloud.
- I build because I enjoy creating things that last. - And the world means better products. - Because I'm passionate about using cutting edge technology to drive positive outcomes for society as a whole. - Being a builder is what makes it happen. - [Narrator] While most of the world consumes builders create. Only they can see the connection between the random and the possible.
They are app developers, data scientists, founders, product managers, engineers, architects. They're on a mission to build.