How Quake Failed their way to Success

Show video

Growing up, Quake was one of those games that just  blew me away with what a computer was capable of.   It helped inspire me to become a developer in my  own right. One of the first books I ever got on   coding was the Graphics Programming Black Book  by Michael Abrash, although I confess that most   of it was over my head at the time. Released in  1997, this book isn’t as relevant as it once was,   but it holds stories and insider info about the  development of one of the most groundbreaking   games ever made. Today, I’d like to dive into one  of those stories, but with fresh eyes after more   than 20 years of additional experience. So visibility determination isn’t easy,  

even today, as I showed with my recent deep dive  into current methods. It remains very much an   area of active research and development.  Rewinding the clock back to the mid 90’s,   and we go back to a time when this field was very  much in its infancy, comparatively speaking.   The Quake dev team faced this very  problem. The game was unplayably slow,  

and since they were breaking new ground,  there was no clear solution to turn to.   When we talk about visibility determination, for  the purposes of the video, we’re talking about   what goes into drawing a pixel on the screen. So you’re down here, this is the camera,   and you’re looking off in this direction, so  what we need to know is what’s visible to you.   In modern games, this mostly boils down culling  out what’s off screen, which is called frustum   culling, then doing some occlusion culling  to figure out what objects are behind other   objects, so we can cull those as well. But back in the 90’s, this was being done in   software on a desktop computer that would choke  on just playing a song today, something we all   take for granted as background work. Quake originally required a pentium 75,   which had a processing speed of approximately  126 MIPS, or million instructions per second,   according to google. Take note that  this is instructions per second,  

not floating point operations per second, which  is what you might see on more modern hardware.   By comparison, the phone in my pocket is capable  of somewhere in the realm of 1142 GFLOPS, or 1.1   Million mflops, if I haven’t messed up a decimal  somewhere, so the gulf in computing power, even   though this is a silly comparison, simultaneously,  it's’ good for younger viewers to have a grasp   of just how much of a potato that thing was  compared to anything you have now. Not to take   anything away from what it was, in its day. The cost alone of simply rasterizing a triangle   was high, so you needed to be fine grained, and  only draw the polygons that were visible.   There was also no gpu in any modern sense, as in  a gpu that you could offload a significant amount   of the work to. No depth buffer, no shaders,  not even hardware T&L, like you had nothing.  

So you’re working with a few more constraints than  today when doing your visibility determination,   you don’t want to simply discard  the objects that aren’t visible,   that’s still way too much. You want to get even  more granular than that. In an ideal world, you’d   somehow know just the polygons that are visible,  that contribute actual pixels on the screen.   In an ideal world. And of course, that magical test,  

whatever it is, has to be fast. Quake stored its levels in a BSP,   or binary space partition, which is a data  structure you don’t run into much these days.   Conceptually, they’re not especially difficult to  grasp. Let’s take a simple 2d example, and we’ll   build a little tree to illustrate what they do. So right now, the space, the level, whatever you   wanna call it, it’s empty. We can add a wall,  let’s call that wall A, and the wall has an  

orientation, like this way is forward, and  obviously the other way is backwards.   The wall creates a splitting plane, so  that forward, or in front of the wall,   we’ll call it empty space, while behind the  wall, this is solid. This is an important   thing to take note of, because it’ll be  very relevant throughout using the BSP.   So we’ve defined the wall, let’s define a  node, in this case the root of the BSP tree,   we’ll call it A. This node has 2 children,  the left node represents what’s in front of A,   which is empty space, while the right  is behind A, which we’re calling solid,   this represents the region behind the wall. One wall isn’t terribly interesting, so let’s  

add a 2nd wall now, and for simplicity  I’m just going to make it down here,   like this. So this is wall B, it faces in this  direction. We have to insert this wall into   the BSP, so starting at A, we have to check,  is this wall B in front of, or behind A?   We know it’s in front of A, so we’re going  to go to A’s child, the empty space node,   and replace it with a B node. And this  new node of course has a left child of   empty space, and a right child of solid. The left child of B represents this region here,   notice how it's the region in front of A that's  now being split by B. The left child of B   represents the region in front of B, while the  right child represents the region behind B.   Let’s repeat the process now, but  with a horizontal wall C on top,   facing in this direction. So C is in  front of A, and it’s in front of B,  

so it get’s inserted here, replacing B’s left  child, and so you have this new node C, with a   left empty child, and a right solid child. And I’ll leave it to you to do this on paper,   but if I were to add this new wall  D, which faces in this direction,   then it’d end up here as a child of C. So we end up with this kinda boring BSP tree,   that’s basically just a line of nodes descending  to the left. But the really cool part of BSP’s   is they enable you to do fast operations. Let’s bring in a slightly more complex BSP,  

just to make the test’s a bit more  interesting. This one is basically the same,   but I’ve inserted a few more walls. If you’re following along on paper,   take special note that I just inserted them  in alphabetical order, and notice how E is   in front of D, but when I go to insert  F, it’s behind E, thus on the right.   Now, let’s test whether a point is in empty  space, or in solid space. This point here,   very obviously by looking at it, is in empty  space, but we can confirm that with like,   actual code trivially. Starting at the root node,  is this point in front or behind the splitting  

plane? In front, obviously, so we move to the next  node, and it’s in front of that, so again, we move   on. And we repeat that process, over and over  again, until inevitably we end up in a leaf node,   either a solid node, or an empty node. In this  case, an empty node, signifying that this point   is in empty, or free space. Or in other words, if  that’s a player, they’re allowed to be there.  

So BSP’s can accelerate these kind of  tests, they allow you to rapidly discard   large sets of the world. But another interesting  property is that each of these nodes here. Each   of these represent a portion of the space. They, in fact, split the space of their parent.   So starting at the root node, this is all space.  This node’s left child is one half of the space,  

while the right child is the other half. When we look at this node, B, it partitions   this space here, meaning that it’s left child  is this space in front of it’s splitting plane,   while it’s right space is this space behind. And we can continue in that way, looking at C,   this node is splitting this highlighted  region here along the plane that we provided,   meaning that the left child of C represents  this space over here, while the right child   of C represents this space over here. And likewise, we can continue on and on,  

and these nodes carve up the space  into these ever finer convex subspaces,   and we end up with these 3 convex subspaces that  i’ve coloured, which correspond to these empty   leaves in the tree. Which is kinda neat. That’s one of the big takeaways, bsp’s   carve up space, and you can of course calculate  bounding volumes for these subspaces if need be,   and having bounds on these nodes allows you to  rapidly discard entire portions of the world,   making bsp’s useful not only for rendering but  for other gameplay systems like physics.   Lastly, BSP trees give you a natural  ordering when you draw. They can be used   to easily give you surfaces in front  to back or back to front order.  

If we take a really simple bsp tree, we’re  just going to have a couple lines here,   so you could follow along on paper if you wanted  to. And we’re going to place our eye here. Now the   interesting thing with the bsp is, the direction  doesn’t matter, only the position of the eye.   So to draw, the algorithm is pretty  straightforward, you walk the tree   and at each node, decide to either draw the left  or right child first. So you start at the root,   if you’re in front of the splitting plane, draw  the back of the tree first, then this node,   then the front. If you’re behind the  splitting plane, do the opposite. Then   of course you recurse and do the same, so you  you go through nodes B, C, and D eventually.   So that gives you immediately the order B, because  that’s behind, then A, because that’s the node in   question, then you have to draw this subtree,  so we recurse, we’re in front of C, so we draw   behind C, which is nothing, then we draw C, then  we recurse, which just ends up drawing D last   So we get easy back-to-front order by  simply walking the tree, easy peasy.  

So they give you quite a lot of advantages,  but this isn’t quite enough to draw effectively   because of one major problem, overdraw. Overdraw is a problem that’s still being   tackled on modern hardware 30 years later,  and in Quake this was really serious. This   was being done in software, and although they  did implement a z-buffer, it had a performance   impact and was limited to dynamic stuff. If you draw front to back, in some terrible cases,  

you may end up drawing and redrawing  the same pixels over and over again.   You could easily imagine cases where  overdraw could reach 10x or more.   So this was the big problem to be tackled, how  do you reduce, or even eliminate overdraw?   This part in the story was one of the most  interesting to me, the sheer number of   approaches Carmack took, attempting to solve this.  He attempted to use a beam-tree to track visible  

fragments, a subdividing raycast approach, almost  like binary search raycast, representing the world   with planes rather than explicit vertices,  a 1 bit draw buffer tracking whether or not   a pixel had been draw yet, rasterizing the  polygons into spans and clipping using those,   and an approach that involves tracking the  portals between leaves, and clipping further   away geo and portals to these portals. I won’t dive into all of them, but let’s   poke at the first couple of them. The first thing mentioned, the beam   tree is really interesting, because finding  information online was difficult, references   are pretty sparse or result in dead links. There’s some scattered references to some  

papers back in the mid 80’s that you can read,  which gave me a bit of context, and there’s a   very brief explanation in the book as well. From what I understand, they’re in essence a   special type of bsp tree, but instead of being  built directly from the polygons of the world,   you do something very different. If you’re looking out at the world,   your camera would be down here, and the viewing  volume that you see is formed by 6 planes.   You’ve got your planes on the left and right,  you’ve got your planes on the top and bottom,   and you’ve got your planes on the near and far. Now let’s consider a single triangle out here in   the world, and we have the camera looking at  it. So, if you take an edge of that triangle   to begin with, and you form a new triangle  using those vertices on the edge, and the   camera’s position. From this you can form a plane  that passes through the camera and the edges of  

the triangle, similar to the viewing volume. You can then repeat the process for each edge of   the triangle, building another plane that passes  through those 2 vertices and the origin.   So this is called a beam, this kinda volume  that’s projected out from the origin to the   triangle, and you’d insert this into  a bsp, not the whole thing, only the   planes that you calculated, the side planes. So let’s visualize this, so this is our screen,   and you’d start the beam tree off by inserting  4 planes representing the bounds of the view   frustum, the visible area or the edges of the  screen. And you’d end up with a pretty basic tree,  

kinda similar to what we saw earlier. We’ll colour the “drawable” area green,   so anything in the drawable area is good to go. So the basic idea is to use this secondary BSP to   check what’s been drawn on the screen. So let’s  say I want to draw this triangle here, you can   tell from a glance that this whole triangle falls  into the “drawable” area in the beam tree.  

What would actually happen is that you’d insert  this triangle into the beam tree, by inserting   each of the planes of the triangle, which in  2d basically look like lines. So one by one,   those would get inserted into the tree, and  you’d end up with a tree more like the one on   the right now. These open leaves of the beam tree  correspond to regions that are still drawable,   which you can go test on paper if you like, but  I’ll leave that to you and we’ll move on.   Let’s say now, I want to draw this second  triangle, we know by just glancing at this that   if we were to walk the BSP tree, we’d end up in  this region here in front of F, which corresponds   to this leaf on the bsp tree itself. As you prepare to draw this triangle,   you walk the beam tree and if it ends up in a  visible area, you’d update the tree and draw   the triangle. But if it doesn’t, there’s no tree  changes and thus the polygon isn’t visible.   That’s my understanding of it anyway. In  theory, a great approach, but as Abrash  

notes, the worst case was still terrible, and  maintaining the beam tree wasn’t free either,   so in the end they had to scrap the approach. Another interesting approach was to use raycasts,   since the bsp is great at accelerating those. So by this point, you should be reasonably   familiar with the idea of walking the tree to  figure out whether a point is visible or not,   it just involves descending and  doing the plane tests until one   reaches either an empty or solid leaf. So a raycast isn’t really all that different,   except you’re of course dealing with a line. If  you were to say, overlay the line over the bsp,  

you can clearly see where it would  get cut by the splitting planes and   chopped up. So you’d end up with a bunch of  fragments of the original ray, each fragment   lying in either a solid or empty node. Of course, you don’t need to chop it by   every plane though, that’s just wasteful. You  can be smart and do the closest section first,   you can see this section, if you walk it  down the tree, it ends up in an empty node,   and thus it never hits anything. So we go ahead and process the other side,   and this part of the ray of course gets  chopped by this plane here, so of course   what you’ll do is process that part first, so  we sorta can clearly see that it’ll end up in   an empty node, and thus hits nothing. This next part, behind the wall, well,   we KNOW that it hits something, like us the  viewers, but we need to prove it in code. Well,  

the next node down the tree is this node here,  with this splitting plane, so the ray gets   lopped into 2 parts, and we’ll take this part  here and walk it down the tree. And of course,   it ends up in a solid node, so we’re done, we  know for a fact this intersects something, we   know where it intersects, we know everything. So if we take a look at an example screen,   the subdividing raycast idea, at least in my head  given the description, kinda works like this. You   divide the screen into some number of areas, they  mention use an 8x8 grid, so 64 in total, and for   each you cast a ray, using the exact same raycast  approach to clip the ray against the BSP.   Let’s take a look from the side, so here’s  a side view, here’s the camera here,   and we’ll have some walls here to fill out the  scene a little bit. So what you’d do is you’d  

fire out a bunch of rays at the world, which  of course would get clipped against the BSP as   I showed before. Let’s assume that every ray hits  something, so each of these rays is being extended   until they touch a surface. The question would  then be, what did they hit. We’ll colour them   with the colour of the surface they touched. Now, let’s focus on these 2 rays. If 2 rays are   adjacent, and hit the same polygon,  then you can simply interpolate   the surface in between, easy enough. That’s a best case scenario, so now if we   were to focus on another pair when they don’t hit  the same surface. So these neighbouring raycasts  

hit different surfaces, so then you in essence  have to do a binary search by firing a ray in   between and repeating the process. Then it kinda follows that eventually,   after repeating this process a buncha  times, that either all the adjacent rays   will be on the same surface, or they’ll be  at pixel level, at which point you bail.   I could easily imagine this working  great in one of those old Quake levels,   where polys took up a lot of real estate on  screen, and chances were pretty good that just   a few iterations of this subdivided raycast  approach could cover most of the screen.   From the sounds of it, this  ultimately failed due to dropouts,   basically problems with tiny polygons. There were other approaches, as Abrash mentions,   some were merely kinda tossed around,  while others saw actual implementations.   I’m guessing based on the descriptions that  the draw buffer and span buffer approaches,   since they seem like iterations on existing  techniques, were tried out, as well as the portal   clipping, but the vertex free surfaces idea, maybe  something they just punted around, who knows.  

Here’s the point in the story where we have the  miraculous turnaround. If this was an episode   of House, you’d have him look at a bus going  by, or a leaf falling, or something random,   and suddenly have an epiphany. According to Abrash, we’ll just   read his words here: When I came in on  Monday, John had the look of a man who   had broken through to the other side--and also  the look of a man who hadn’t had much sleep.   And then he goes on to describe how  Carmack figured it out. The solution   being precalculating the visibility between  leaves in the bsp. This would be called a PVS,  

or potentially visible set, which now, nearly  30 years later, how it works is widely known,   but it was a real game changer in its era. The general idea is that, let’s say that we’ve   got our level here, we’ll just do a 2d version  instead of a 3d version, and let’s just quickly   understand what we’re looking at here. So each of these regions here, let’s say   these regions are leaves in a BSP. We’re taking  some liberties here, but let’s run with it.   Then each of these spaces is separated by these  green lines, these would have been splitting   planes, and you can figure out that where there’s  no polygons, those form a sort of border between   adjacent spaces that are called portals. So given this shaded region here, of course  

any region that it’s immediately adjacent to  it, is, of course, visible right away. You   can figure that out without a lot of effort using  the BSP since they basically share a portal. The   trick is really whether or not these further away  regions are visible from that initial region.   I mean, you could do something like cast a  zillion rays from one portal to the next,   and see which other leaves they manage to  hit, but that’ll be really unreliable.   Quake did this by erring on the side of being  conservative. Secondary neighbours are usually  

visible, not always, but pretty much most  of the time. So we’ll consider a neighbour   of a neighbour to be visible right away. So the effort was mostly on leaves further   out than this. The question now is whether  section 4 here is visible from section 1.   The approach they employed was by using a  separating plane, so they’d create a plane   from each edge of the portal, to the opposite  side of the other portal, here we have our   separator plane, and the portal lies completely  in front of it, so it’s not eliminated yet.   We’ll do the separator plane for  the other side, and lo and behold,   the portal lies completely behind the plane,  getting culled out. In a small way, this is   just like the beam tree we just talked about. If we were to say, expand section 2 so that the  

portals end up a bit fatter, you can see  that now the separating plane just catches   the end of the portal from section 3 to 4,  making section 4 visible from section 1.   That's the general idea, and it extends  pretty easily into 3d from there. In 2d,   of course we only needed the 2 separating  planes, but in 3d you would do this for   each edge. As I said, it’s a conservative  approach, but works incredibly well.   BSP’s have since faded from their glory days,  you don’t really hear much about them anymore.  

Levels now are dynamic, requiring a degree of  flexibility that’s necessitated new approaches.   The original Quake source code was released  over 24 years ago at this point, and that,   along with subsequent sequels, have been a source  of some incredible gems. Perhaps, most infamously,   the fast inverse square root function, which  is this abomination you see before you.   This was a story that for some reason or other,  has stuck with me for over 20 years. I read this  

at a time when I was still struggling with basic  concepts, and it shaped, at least in a small way,   how I approach things today. I find that it’s really easy to   be enamoured with a clever solution, I mean we  all appreciate an elegant answer to a problem,   but it’s also in the failures, the missteps  and dead ends along the way, that we can find   some really valuable insights. The final solution, at least to me,   evolves naturally from the failed attempts  that preceded it. Understanding the beam tree,   the portal clipping ideas, a lot of  the basic ingredients were there.  

I remember, I had just barely started learning  to program and struggled hard with the sparse   information Abrash provided. Looking back  now, I actually appreciate how it pushed   me to work harder, to really strive to  understand the underlying concepts.   If you think you’d like to learn more about  gamedev from an experienced graphics engineer,   I have a variety of courses available, very  suitable for beginners, so if, for example,   you ever wondered if understanding quaternions  was possible, check them out. Otherwise,   if you’d still like to support me and help  choose what topic I cover next, you can vote   on my Patreon page. Cheers

2024-04-19

Show video