FOSDEM 2024 SpiceDB Mature Open Source ReBAC

Show video

All right! So this is the talk on SpiceDB. Thanks everyone for showing up so early in the morning. I'm starting to lose my voice because there was a  long day yesterday of talking and meeting awesome people. This is my first FOSDEM. So who am I?  My name is Jimmy Zelinskie. I'm the co-founder   of a company called authzed and authzed builds  SpiceDB. Previously, I've worked at Red Hat and   CoreOS. So I've been around in the container  and Kubernetes ecosystem for a pretty long   time -- basically since the beginning. There I'm  actually a maintainer of OCI which is the standard  

specification for Linux containers and I've  also started a bunch of projects in that space:   notably the Kubernetes Operator Framework and  some others. This talk is entitled SpiceDB,   but since FOSDEM is more of a developer community  conference, I really wanted to focus less on this   talk being a vendor pitch for SpiceDB, but  actually kind of more of a level set about   kind of the problems in the authorization space  and kind of the history and status-quo of that   so that kind of everyone understands what might  be the best tool to solve their problems. I'm   not going to try to sell you SpiceDB for all  problems because the more informed you are,   the better you can pick the product that's  actually going to complement your software stack   and what you need and that means there's going  to be way more qualified people using SpiceDB way   more qualified people using other authorization  tooling. But obviously like I'm the most jazzed   about SpiceDB because I created it! So why are  we all here? We're all here because there is a   not-for-profit organization called OWASP which is  the Open Worldwide Application Security Project   that got started in the early 2000s. They're  famous for having this list called the Top 10  

and the Top 10 is basically an enumeration of  the the highest risk--the highest threats--for   web security. And as of 2017, Broken Access  Control was number five. As of 2021, Broken   Access Control is number one. That means this  is the biggest threat to the web and to all the   applications running internet facing to the web.  But really the question is: how did we actually   get to this point? how did this happen? and how  did it happen so quickly? I'm not going to point   any fingers but what I'm actually going to do is  dive into two different groups of stakeholders in   the history of authorization. There's Academia  -- people publishing papers in this space and   defining concepts -- and then there's the industry  practitioners that are actually building the   software and realizing these systems as they're  actually connected to the web. I'm going to start   with Academia first. On the LEFT-hand side, you're  going to see a timeline and then on the RIGHT-hand  

side, there's going to be some notes, and not  for this slide, but others you'll see QR codes   in this corner as well. Those QR codes are going  to link to the specific novel paper. So if you're   interested in any of these particular concepts,  you can feel free to scan the QR codes. But our   history of authorization is actually going to  start in the '80s. It gets really kicked off   with this publication called the Trusted Computer  System Evaluation Criteria which is a security   practices book published by the US Department of  Defense. In it, it's outlining a lot of different   security practices that are effectively a part  of the United States military. Importantly,  

they describe these two different Access Control  systems: Discretionary and Mandatory. Now   Discretionary is conceptually just "if you created  the idea or the information you can share it"   and "if you're then given access to that you can  share that". It's at your discretion. I used file   systems and Google Docs as an example here, but  it's not a perfect one-to-one match. If someone   shares a file with you on a unix-like file system,  you can copy that file if you have read access and   then you can change whatever permissions on that  and share that -- similarly with Google Docs. So   it's at your discretion how you're going to  share that information once you're given read   access. Then there's Mandatory Access Control  which is effectively a long list -- an exhaust   list -- of all the access for a particular thing.  Most notably people are most familiar with SELinux  

as the example of this.If you're unfamiliar  with SELinux, it's a way of locking down the   Linux kernel. Honestly, it kind of comes with  a negative connotation because mandatory access   control are very verbose and very difficult  to get right because you have to enumerate   absolutely everything. Some people say that the  three letter agency at the US government that   created this are the only people actually know  how to configure this correctly. I don't know if  

that that's actually true or how many people use  it. I do know Red Hat is is one of the folks that   actually does promote SELinux. But the one thing  about this slide that I really wanted to kind of   drive home is that these ideas -- they're as  old as the military and war itself. There's  

nothing novel about the '80s where these ideas  got "invented", but what actually happened was   someone only actually ever thought to write this  down in the '80s. So it took that long after using   these ideas for many, many, many years. So now we  jump roughly 10 years, actually 9 years to 1992.   This coincidentally happens to also be the year I  was born--that makes me feel relatively old. But,   in '92, we get this paper published on Role-based  Access Control and Role-based Access Control often   called RBAC is where actually most people believe  the state-of-the-art for authorization systems is.  

The core idea is, basically, there is a group  that is assigned access to a particular thing   and those groups are called Roles and then you  map users into these roles and by means of being   in this role you get access delegated to you.  The kind of number one problem with RBAC is   that everyone defines it differently.  If you build any enterprise software,   you're going to talk to clients and they're going  to ask you for RBAC, but the difference is if I   look at two different enterprise applications,  how they implement RBAC is entirely different.   The only commonality is this mapping of users  into groups that then have access. This is kind   of going to be a recurring theme across all these  papers published in academia -- anything with   *BAC -- because they're documenting concepts, but  not actually specifications that would give you   an ultimately cohesively designed and secure  system. Most famously the biggest issue with  

RBAC is that there really is no scope. If you  say someone is an admin, does that mean they're   an admin of the entire web app? Does that mean  they're an admin of a particular resource in the   app? You just don't know until you actually  build it yourself. So there's not really an   easy way to reason about these systems until you  actually touch them. So now we jump well into the   future into 2015 and this is when the paper on  ABAC, which is Attribute based Access Control,   is written. The idea behind ABAC is to kind of  generalize on RBAC and say the role that you're   assigned is just one attribute that your user can  have and other attributes might be that you logged   in with this IP address or many other dynamic  attributes can be assigned to you. The really  

important thing about ABAC is it's providing this  real-time context so now you can write rules like   "are they connecting from this country's the  subnet at this time?" You can delegate access   at particular windows of time and perform more  logic on these attributes that folks have. And,   now, we're going to take a huge digression back  to 1965. If you're unfamiliar Multics is actually   this operating system that was developed between  MIT, GE, and Bell Labs. You might not remember it,   but it actually inspired an operating system  you're probably familiar with: Unix. Unix is   actually an attempt at making Multics concepts  ported to less expensive hardware. Multics is  

often credited as the first operating system that  has access control for the file system. I actually   don't know if that's true, but it's often credited  as that. In Multics, you have a file system tree,   so you get hierarchical structure, and then at  every branch which would be a file or a directory,   you can have five different attributes assigned  to that. You get read, write, exec, and   append -- these are all file operations that you'd  be familiar with. But, there's this fifth one   that's super interesting called "trap" and that  actually gives you the ability to do callbacks   into C functions. It was initially designed so  you could do file locking in user space. But the  

thing with Multics and reason why I bring it up  is because there was inheritance, there was ABAC,   and there was user-defined functions in an  authorization system in 1965. When in academia   the ideas behind attributes were published in  2015. So there are systems using these concepts,   but they maybe haven't been formalized and  written down in the concrete form and this is   a huge issue with the whole space. Because people  are doing things but they're not really studying   how to make these systems robust with these ideas.  They're kind of more just documenting these ideas   ad-hoc. So getting back to the normal timeline, we  hit 2019. It's actually in 2007 that the the term  

is coined "Relationship-based Access Control"  (ReBAC) and the idea behind this is that by   establishing a chain of relationships like "Jimmy  is a speaker at FOSDEM" and "speakers at FOSDEM   have access to the FOSDEM speaker Matrix chat",  if you can follow these chains of relationships   you can actually go from "Jimmy has access to  the FOSDEM speaker room". This term is coined   around then and it's looking forward at what tech  in the web 2.0 era will look like. It's published   initially while considering how Facebook's social  graph works internally -- when you share photos   on Facebook you say "friends of friends" can  view this -- you're literally defining it in   terms of relationship to yourself. So, we hit  2019 and that's when Google publishes a paper   called Zanzibar which is documenting an internal  system at Google powered by these concepts. And  

the difference and the reason why I have 2019 for  ReBAC is because Google is documenting a concrete   implementation of this, unlike a lot of these  other papers talking purely about concepts. It's   talking about an application of these concepts  and really giving you a framework for how to use   this effectively and in a correct way across  multiple products at Google. So then in 2021,   SpiceDB is open-sourced which is also implementing  the similar concepts to Zanzibar. Obviously,   I'm going to get into that later,  but there are other *BAC models,   but these were the primary ones that I see mostly  relevant in industry. You can dive into Wikipedia   if you're interested in other ones, but now we've  got to cover the industry side of things. We're   leaving academia and evaluating how industry has  this problem which is that they go to building web   application and your first job is to just build  the MVP -- the minimum viable product -- of your   web application. So what you're going to do is do  what you do with everything in a web application,  

which is store data in a database -- probably the  relational database you're using for everything   else. And then you're going to try to check if  a user has particular access based on some data   you stored in the database. It's maybe going to  be a role if you're inspired by RBAC, but maybe   it's just an enumeration of the list of users that  can do a particular thing. So you may have written   code that looks like this, but the problem is  this falls over at some point in time whether   fundamentally you build a system that actually  is just really slow or you have to build a new   system that is way faster than you ever intended  it for it to be or you basically get users of your   software that demand new functionality that is not  actually possible for you to implement until you   refactor your authorization code. A great example  of that is if they want recursive teams -- so if  

you have groups of users what if you have groups  of groups or groups of groups of groups of groups.   That is something that most people cannot build  or they don't build in their initial MVP and, when   you get requested functionality like that, you're  forced to completely rewrite your authorization   system. The other thing that could happen to  you is your company buys another company and   they're based in a different continent and that  means all the requests for checking permissions   now have to travel across an ocean (if they want  to be correct). That's a huge problem and making   sure that the performance is actually going to  be viable and the answers you're going to get for   authorization questions are correct is a difficult  problem. So you hit one of these these kind of big  

issues and then you are forced to enter this cycle  that I'm going to get into -- these numbers are   kind of fudged -- but the whole point is that  it's going to take an engineer probably with   expertise in that web app that has worked on this  specific authorization system. It's going to take   them a while to implement this. It's going to be  super sensitive, because someone else is going to   have to review it and that person is ALSO going  to have to be deeply embedded in that code-base.  

They're going to be extraordinarily careful  because any mistake that happens in this code   base is going to be a vulnerability because it's  giving access to people that shouldn't otherwise   have access. So that's going to take a long time  then you're going to do QA. You might actually   have to perform a security audit before you can  deploy this software because you're deploying to   enterprise environments. Then you're also probably  going to want to take extra time rolling out these   changes into production. You probably don't want  to deploy it to everyone all at once. You probably   want to deploy to a minor subset just in case you  find something wrong with the code. All of this   just takes time and the problem is it's actually  putting security of your software at odds with   development velocity. Fundamentally, it's going  to take you too long to add this functionality   and you're going to want to take shortcuts, but  shortcuts are security flaws in your software.  

Then it's rinse and repeat. You basically don't  know how long until the pain is going to build up   where you're forced to rewrite these authorization  systems and that is like the mystery box entirely.   You could finish or not even be finished rewriting  your authorization system and then all of a sudden   a new user sets some requirement for you and  you're doomed. You have to completely rewrite   the thing you just thought you re-architected  to be future proof. How do we fix this never   ending cycle? Well, OWASP themselves actually have  recommendations for this. They say you should no   longer adopt RBAC, but instead concepts from ABAC  and ReBAC. Obviously, I'm biased towards ReBAC,  

because I think it's the more modern approach. The  OWASP folks also give you some high-level benefits   to why you would adopt these these new ones over  RBAC. I'm going to just take this from the ReBAC   perspective. When you're doing a graph-like thing  -- a Relationship-based system, you're forced   to talk about individual entities. "This user  Jimmy has access to this particular document".   Because you're doing that, that has this kind of  buzz-word associated with it: fine-grained. You're  

not resolving Jimmy to a role or a group; you're  actually following Jimmy directly through to the   document. You're talking about individual  entities in the system so, as a result, you   get more fine-grained access. I'm not trying to  generalize about any users or paint over anything;   I'm actually talking about the exact objects  I care about. That means you can actually like   develop systems where you delegate access to  a particular row in a database or a cell in a   spreadsheet. All of these systems are designed  for speed because they understand that they're   going to have to store a lot of data to be this  fine-grained. Then because your applications are  

only talking about the direct objects that they  care about, any of the relationships "in between"   don't get written into your code. You just ask  the question "can this user perform this action   on this thing?" How they got access to that, and  if you ever refactor or change how they get access   to that, does not live in your code base anymore.  That means you can make changes to your permission   system and not change a single line of code in any  of your web applications. Believe me when you do   that for the first time, it is a magical feeling  because you don't have to touch ANY code. Then   there's also multi-tenancy and management ease.  This is kind of just about simplicity around  

modeling and then, with ABAC and ReBAC systems,  you're kind of paying it forward. So RBAC might   be really easy conceptually for you to implement  at the beginning, but these systems -- the ABAC   and ReBAC ones -- they're more focused on forward  thinking like if you need to make changes like I   just described how you can change ReBAC designs  without changing code. It may be a little bit   more effort for you to get started in building  and integrating with one of these systems,   but by day two, if you ever need to make a  change, it's going to pay dividends. Now,   I wanted to get deeper into this Zanzibar paper  that I talked about earlier, which kind of like   kicked off the interest in ReBAC that you see  today. Basically, Zanzibar is a purpose-built   graph database that is very specifically optimized  for one thing: finding a path in a graph and by   virtue of finding that path that means that a  user has access to that particular resource.  

It's actually one of the few good things that came  out of Google+. There's only two things that came   out of Google+: there is Zanzibar internally  at Google and then the consumer-facing Google   Photos. The novelty of this paper is actually  that it is solving an authorization problem with   a focus on distributed systems. You'll notice  the title of the paper is called Zanzibar:  

Google's Consistent, Global Authorization  System, so it is fundamentally trying to tackle   authorization as a distributed systems problem  which is not really something else any has done   in the past. Because they kind of acknowledge that  if they're going to deploy one system at Google,   it needs to work across all geos in the world and  it has to be extremely, extremely reliable and it   can never be wrong. These are really difficult  requirements, but the anecdote that I like to   use is when you're using a cloud provider like  Amazon and you go to provision something like,   say, an S3 bucket, you're always choosing what  region. But, actually, if you go to set IAM  

rules in a cloud provider like Amazon, you don't  pick the region. That is because these systems   fundamentally have to be global and when you're  designing them yourself at a particular scale, you   need to think about how you're going to make your  system global. So this paper actually inspired   two companies, Carta and Airbnb, to go forward and  implement their own internal systems based on the   ideas in this paper. None of them are truly 100%  what I would call authentic to the original paper,   but rather the paper fused with the requirements  of their business at the time. I think the real   superpower with Zanzibar, though, is that, if  you go to send someone a Google Doc in Gmail,   and they don't already have access, Gmail will  pop up a box and tell you "hey! you didn't give   access to this person". That fundamentally means  that Gmail actually has a way to ask questions  

and check permissions that are built into Google  Drive. That means you can have one central source   of truth for authorization data that your whole  application suite can share -- microservices   can share. This is incredibly powerful because  not only does it allow integrations like this,   but it also lets you have that central source  of truth where if you need to audit something   you can just ask that one service. It's the only  service you have to trust. It's the only service   that you have to query if you're trying to like  really dig into any of this data if say you have   a problem like an outage or something an incident  and you you need to understand what the access   control looked like. So you might now be wondering  "how do I Zanzibar?" This is exactly what we set   out to do basically the year after the paper was  published. My co-founders and I left Red Hat to  

found authzed and build SpiceDB in the open  source. There were some folks experimenting   with the ideas around ReBAC at the time, but no  one was really moving the needle towards making   this a production thing that you could use in  a real enterprise environment or at a real tech   company. We originally prototyped the thing in  Python: it was type-annotated, lazily-evaluated,   functional Python. It was way faster than  you'd ever think Python should be, but it   was not fast enough, so we ended up rewriting  it in Go and open sourcing that. The name is  

actually inspired by Dune because internally at  Google the project was actually called project   "SPICE" because of a running joke that "the ACLs  must flow". The timing for that has actually been   really good with the resurgence of Dune in the  movies. Internally at authzed all of our software   is named Dune references as a kind of homage. So  if we fast forward to today, the SpiceDB community   has actually gotten contributions from a lot  of companies -- big names like Netflix, GitHub,   Google, Red Hat, Adobe, and Plaid. There are  production users in small companies like startups   where it's just the co-founders all  the way up to Fortune 50 companies.

But I still haven't actually told you what  SpiceDB. SpiceDB is, as I described with Zanzibar   earlier, this extremely parallel graph database.  Developers basically apply a schema just like you   would for a relational database and -- I've given  an example schema here modeling a Google doc --   then what they do is they store data inside that  database and query that data according to that   schema. It's really magic when you can actually  make schema changes in a forward compatible way   that enables you actually modify your permission  systems without changing any application code. So   we don't actually have a SQL API despite being a  database; instead, we give you gRPC and HTTP APIs.   The primary interface we recommend is gRPC for  latency reasons because authorization is in the   critical path of everything your web applications  are going to do and possibly everything at your   business. You really have to make sure the stuff  is fast thus everything needs to be kept in memory  

everything needs to be returned in single digit  milliseconds. gRPC is actually pretty critical   for that. Then, in addition to the actual kind of  main server, we also expose servers for powering   developer tools so you can get like autocomplete  in your editor. Also integration testing services.   It's Kubernetes-native -- designed from the  beginning because our background is all in   Kubernetes. SpiceDBs self cluster. If you  deploy just SpiceDB directly on Kubernetes,   it will discover other nodes and actually start to  divide and shard up the in-memory graph that it's   using to actually serve across them automatically.  We also offer a SpiceDB Kubernetes Operator in the  

open source which will then do automated updates  for SpiceDB. Notoriously having zero-downtime   updates for a database is very tricky, so  we just took that problem off the table for   most people and just implemented it automatic  for anyone using Kubernetes. We remain true to   Zanzibar's goals of consistency at scale: we have  pluggable data storage systems and, depending on   what your requirements are, say you need to  deploy everywhere in the globe you can store   all your raw relationship data in something like  Spanner or CockroachDB, and then you can deploy   regional deployments of SpiceDB that will exist as  independent caches for those geos. Fundamentally  

they're sharing all the same core data and  they're consistent across those environments.   If that sounds too complicated for you or like  you don't really need that because you're just   single region shop, that's fine. We also have deep  integrations with PostgreSQL or MySQL if you just   want to use something like Aurora or Amazon RDS.  Obviously then there's also memory for testing. We   also have a tool called zed. Zed is the official  command line tool. It manages cluster credentials,  

backups, and it gives you a command for every  single SpiceDB API. I just kind of give an example   of running with a debug flagged permissions check.  You can actually see it gives you a whole graph   traversal. It shows you a tree of how you actually  computed whether or not someone has access with   timing data associated with all that so you can  see where things slow down. We have a web IDE,   so actually the two things you just saw -- SpiceDB  and Zed -- we compile to WebAssembly and then run   that in the browser. Then we build that all on top  of Monaco -- the engine that powers VSCode -- and  

give you a full IDE where you don't have to  install any of the software I just showed you. You   can just go to play.authzed.com and start playing  with this stuff. You can run zed against live   data. You can load in test data. What we actually  do is we can generate exhaustively all of the   paths available in the graph for you so there's  somewhat of a model checking happening here so   you can actually prove exhaustively all of the  ways you can traverse the graph are the ways you   think they are. That basically lets you prove that  a system is correct without you deploying it into   production or having someone do a extremely long  security audit on your program. Then you can check  

this stuff into to CI/CD so if you make a change  to the schema you can actually guarantee that   certain assertions always pass and that everything  is exhaustively checked. So Zanzibar is not a   silver bullet. We actually have had to extend  Zanzibar in a bunch of different ways. SpiceDB   remains true to all kind of the core concepts  that you'll find in Zanzibar, but not everyone   is Google, so not everyone relies on users being  represented the same way. We are kind of more   flexible with how people can model their own users  and then we kind of add on developer experience   because at Google they can say you're forced to  use the software, but when you're building open   source software, you can't force people to use  your software. You have to compel them to use your   software by having a better experience than what  they're currently doing. We've also added kind of  

contextual relationships with ABAC so that means  relationships can actually exist dynamically based   on context that you provide at runtime. That was a  joint project with Netflix. So if you're wondering   "how you SpiceDB", you can go to our Discord:  discord.gg/spicedb or check out GitHub. Basically   anywhere on the internet where you expect to find  open source projects, SpiceDB is there. thankz

2024-02-16

Show video