Next Gen Immersive Audio: Spatial Sound and Project Acoustics

Next Gen Immersive Audio: Spatial Sound and Project Acoustics

Show Video

Hey everyone, my name is Robert Ridihalgh. I am a technical audio specialist with the Advanced Technology Group at Xbox, and in this session we're going to be talking about next generation audio using Microsoft's 3D spatial sound platform, and Project Acoustics to bring deeply immersive audio to gaming. So here's the agenda for today.

We're going to start off talking about why spatial sound and acoustics are important. Then we'll talk a little bit about the next generation, and what it offers. We'll do a deep dive into the spatial sound platform. We'll talk about high quality acoustics, and Project Acoustics, and then we'll wrap things up with a little bit of a chat about audio accessibility.

So why spatial sound and acoustics? Well, quite frankly, now is the time. We have platform features like spatial sound. We have technologies such as Project Acoustics, and we have the hardware that allows us to drive these and offload the DSP for them so that there's not attacks on our games, to bring these technologies to the gaming environment. You know, traditional audio is a flat experience, whether that's 2D speakers off of a TV, or even a home theater with a full seven-one setup.

What spatial audio brings us is the ability to bring the illusion that sounds are truly coming from 3D positions in space. Acoustics, on the other hand, are what help to describe how sounds move through the world. And really give you a feeling of the space that you are in, and how sounds are propagating, being muffled, in the space of the world. When you hear things as they are in the real world, where they're positioned properly, they sound like they would acoustically, that's what spatial audio and acoustics brings. It helps us tell our interactive stories more effectively.

It brings a deeper sense of immersion. It helps us to give a wider field of awareness. Through audio we can give you the world all around you, and not just that frustum of visuals in front of you. It helps to anchor those objects in the world, even though you might not be able to see them. It also brings us the ability to bring better audio accessibility to gamers. First let's talk a little bit about what the next gen console audio brings to us.

With Xbox series X and S, we have what I call the next gen audio opportunity. We have a platform that brings us features like spatial sound. We have technologies, such as Project Acoustics, which bring us high quality acoustic information, and now we have hardware that allows us to drive these technologies and platform features. In the Xbox Series X and S, we have two new processors for audio. One which gives us the ability to bring modern high quality audio decompression and sample rate conversion, and maybe more importantly, especially for this session, we have a processor that brings us the ability to bring convolution processing, as well as expansion of our spatial sound platform. Speaking of which, let's get into the details on the spatial sound platform.

First off, I'll talk about the landscape of where spatial audio can be heard. Spatial home theater, through technologies like Dolby Atmos and DTS:X give us the ability to bring sound all around the user in their home theater. This is generally done by adding the ability for speakers to reflect sound off of the ceiling, to make it sound like there's sounds above you. More ubiquitously, we have the ability to bring headphone rendering for spatial. Through any set of headphones we can bring HRTF processing, and this HRTF stands for head relative transform function.

It helps us to describe what the real world positioning of sounds is like. With two ears we listen and hear things in the world by directionality - Left, right, front, back, elevation above us and below us, and cues about how distant a sound is. If I have a sound that's behind me and to the right, that sound will reach my right ear first, it will filter around on my head, go into my left ear.

How we hear the world is dependent on the size of our head, the shape of our ears, how far our ears are from our shoulders. They all go into how we perceive the world. So through HRTF, we're able to mimic this, and give the user a really good sense of how sounds would actually be playing in the world. Microsoft's 3D spatial sound platform comes on both Xbox and Windows platforms. It's a system level integrated system that abstracts the audio that you send from your game, from what the user has chosen as their output format. And what I mean by this, is your game sends audio objects to the system, and the system just does the right thing for what the user has chosen, for how they want to listen to the game. We have support for both speakers

and headphones over a single API, and by extension, the single content authoring path. We have support for two different kinds of objects. First kind of object is what we call static bed objects.

These are objects that have a fixed position in space. They represent the idealized location of speakers in a home theater. So over headphones, when these are rendered, these are done as a, basically a virtual surround sound rendering. We also have the ability to do what we call true dynamic objects. These are objects that have an exact location in space. And so over an AVR, the AVR is going to take that position and it's going to channel pan it to a speaker.

But over headphones, you'll get an exact sense of where that sound is coming from. We support both spatial and non spatial sound simultaneously from a game. So you could have, say, the traditional music playing directly to ears over headphones while all of your sound effects are spatially rendered.

And most importantly, about the spatial sound platform, is that all it's doing is positioning sounds. It's not doing any other processing other than placing the sounds in the right location. On console we support fully hardware offloaded spatial sound processing, so there's no tax on the game from a CPU standpoint to do the processing. From an audio engine perspective, we have broad middleware support. All the major middleware support spatial sound through plugins and syncs. If you have the bespoke audio engine, you will use the ISpatial audio client API directly.

And the spatial sound platform has been designed to give consumer choice for playback. So the consumer gets the choice of how they want to listen to your game. Spatial sound has several platform providers. We offer three headphone renderers, the first of which is what we call the Windows Sonic headphone renderer. This is the technology that came out of HoloLens that's been adopted to the spatial sound platform.

We also have partnerships with both Dolby and DTS, to bring home theater, as well as headphone rendering, to the spatial sound platform. There's been a ton of games that have shipped using this spatial platform. A lot of amazing content, and lots more to come in the future. Microsoft spatial sound platform for consumers is really easy. We manage things through the Settings app on console, or through the audio control Panel on Windows. The user can change or select different output formats to their own choices.

We also offer the ability for you to get headphone rendering over HDMI, so that you can listen through headphones on your AVR. As I talked about with the game being able to do both 2D and spatial playback simultaneously, we offer this at the platform level as well. So you might have a game that is spatially enabled, and it might be you do a Skype call or chat or something like that, that would be a 2D experience. These play nicely together, and all the things that you expect with respect to muting and ducking of sound on your platforms, all exist the same with spatial. Finally, I'd like to mention that the Xbox Shell UI is also spatially enabled. So you can literally hear exactly where the UI is on the screen, just by how you're hearing it over your speakers or headphones.

We've talked about the spatial sound platform. Kind of a high level look at that. Now let's dive into the details on advanced acoustics, and specifically Microsoft Project Acoustics. Microsoft's Project Acoustics is a product wrapper around a technology called Triton, that came out of Microsoft Research about ten years ago. What Triton is, is what we call a wave propagation analysis system. This is where you feed the geometry and the material types and textures from your game to the analysis system.

And what it does is it looks at locations for emitter sounds and listeners, and does a sound propagation analysis through the geometry that you've fed it. It takes into account material types to understand how sound would bounce around, or be absorbed by various geometry. So if you could see here in the animation we have an emitter in the middle of the geometry, and two listener locations A and B. And as the sound reaches each one of those, we are getting what we call - it's an interference pattern, and this is building up what we call the impulse response for that location based on where the sound has been emitted. You can think of this similar to how it might be to drop a pebble into a still pond of water. From the where the pebble goes in, waves emanate out, they bounce off the shoreline around objects that are in the water, and interfere with each other as they come together.

It's these interference patterns that create these impulse responses. And it's from these impulse responses that we can extract information about the acoustics of the space. So we call these perceptual parameters. This is information such as, how reverberant the space is, how occluded or filtered a sound would be. Even the direction from which the sound would be perceived at that given location. So we create this dataset of numbers, and this is what you use at runtime in your game to drive the processing for your acoustics.

As I mentioned, Project Acoustics is a product wrapper around the Triton technology. It takes advantage of the fact that we have lots of machines in the cloud to do processing, so you can take your game geometry, and the information, feed it to the cloud, processing will happen in the cloud, and it will send back this dataset for you. You don't have to worry about taking up local machine time at your workstation, or at your studio. We have plugins for middleware in Unreal and in Unity, that help with this process to gather the game data, send it to the cloud, and receive that information back. We also have runtime plugins for Audiokinetic Wwise wise, to automatically process this information at runtime to playback the acoustics.

Because it's a dataset, it's also very flexible for you to use. Maybe you just want to use information about obstruction occlusion, you can do that. You can filter out just the information that you want. You can also stream the data, so you don't have to have the acoustic data for an entire level in memory at once.

You can use just the information you need for where the player is in the level. And maybe most importantly, for those of us who are audio designers, the data is designable. Because these are, basically numbers, you can change them at runtime. You don't want that space to be as reverberant as it came back as it would be in real life. You can tone that down, or maybe you have a gameplay element where you want to make sure that certain sounds are audible, even though in the real world, they would be occluded by geometry. Let's take a look at some examples of Project Acoustics in action.

The first title to use this technology was Gears of War 4. It was done through a very simple usage for specific sounds. They then evolved the usage of Project Acoustics in Gears 5, to bring in more advanced usage with indoor and outdoor information, more sounds. They also streamed the data, so that they weren't taking up as much memory. And both those titles married the acoustic information with our spatial platform to give incredibly immersive experiences.

So we'll take a look at a couple of examples here of this in action. First, we're going to play a clip with the acoustics turned off. [Game characters] I'd say it's about time we get the hell out of here. Hey, are we not going to talk about what happened back there, with the nest? I mean, what the hell was that? It looked like they were transforming. Wait, you mean like evolving? Shouldn't that take a lot of time? [Robert] You can hear sounds. If you were playing like this you would hear them positioned correctly, but you're not getting a sense of the space that you're in as a player.

So now let's listen to a clip with the acoustics turned on, and we'll see how that affects the game. [Game characters] I'd say it's about time we get the hell outta here. Hey, are we not going to talk about what happened back there, with the nest? I mean, what the hell was that look like? Looked like they were transforming. Wait,

you mean like evolving? Shouldn't that take a lot of time? Some insect juveniles can become drones in days, hours even. So juves in drums. Juves in drums of what though? I have a feeling we're going to find out. [Robert] Now you can get a much better sense of the space that the player is in.

And not only that, you're getting these smooth transitions between spaces, which is what you get with the fact that we have a dataset that tells us how the acoustics are throughout the game world. No longer are we relegated to handpainting in trigger zones to make changes to our acoustics and reverberation. Now let's take a look at a different implementation in Sea of Thieves. The team at Rare had, basically, a hole in their acoustics system. They just needed a better way to do obstruction and occlusion of sounds, so they used Project Acoustics for this purpose. As you can imagine, in a world like Sea of Thieves where a lot of it is on the ocean, there is no real need to have calculations over the entire world. Sound just propagates across water,

So what they did is they processed each individual ship and island separately, creating a dataset based on those pieces. And not only that, they filtered the information from Project Acoustics to just give them the obstruction, occlusion values, so the datasets were small, and they were broken up into the pieces that they needed. Let's take a listen to how this sounds in a clip from sea of Thieves. As you can hear, as the player walked away from the hut, walked around and behind other huts, and other parts of the island that the sound naturally filtered and occluded, based on where the player was, relative to the sound that was playing. One other aspect that I'd like to talk about with using advanced acoustics, is early reflections.

One of the pieces of data that does not come out of Project Acoustics is information about those first reflections off of geometry in the world. You know, how close I am to a wall or a building, or other kinds of geometry. We do have plugins and technologies that are available to do this. We've seen this done in bespoke manners in games, but for example, Wwise, from Audiokinetic, offers their Reflect plug-in, and this is a plugin that will take in information about reflection points, and/or geometry from the game to automatically calculate and playback these early reflections that are important to giving that first sense of the acoustics and the geometry around you. Finally I'd like to talk a little bit about audio accessibility, and how these platform features and technologies really enhance our ability to bring better audio accessibility to gamers without sight.

With the spatial sound platform we are able to reproduce sound positioning in highly accurate ways, so that a gamer who cannot see the world can understand exactly where sounds, and sound objects are, in that world. Whether that's over a home theater system or, more likely, over headphone rendering. The spatial sound platform gives that gamer the ability to understand exactly where sounds are playing. With acoustics, these are super important for painting the world with audio. Understanding the space that the player's in.

Telling us how the audio is, moving through the world. For example, you might have a sound that's behind a wall. Now, there also may be a doorway there, so we can, with the acoustics information, tell the player that, through more of the sound coming through the doorway, rather than through the wall, gives them a better sense of that space that they're in. It can really paint that picture for them, even though they can't see it.

So when we add these together, spatial processing and high quality acoustics, we can really do a full painting of the world, just with audio, for gamers that cannot see the world. To wrap this up, if we take spatial sound processing, we add high quality acoustic processing, we get truly immersive audio experiences. Finally here are set of resources that you can check out to learn more about spatial sound platform on Windows and Xbox. We have samples that you can look at. We have documentation on the platform. Project Acoustics has documents and downloads for plugins that you can try out their systems.

And then I also highly recommend going to our spatial partners, and looking at the resources that they have with Dolby and DTS, as they are providing a lot of information about how spatial can be used in games. So that's it for this session. Thank you for attending. I hope you got a good

idea of how we can take advantage of this next generation, and the opportunity that it affords us for our games, to bring deeply immersive audio experiences, more accessible audio experiences, and just bring audio in our games to the levels that they should be. Have a great rest of your Game Stack Live, and thank you very much.

2021-04-22 20:14

Show Video

Other news