Hey everyone, my name is Robert Ridihalgh, I'm Technical Audio Specialist in the Advanced Technology Group at Xbox, and in this session I'm going to be speaking to you about bringing audio accessibility to games, creating an immersive and accessible gaming experience. Here's what we're going to talk about in this session. I always like to start off by talking about the why. Why are we talking about this subject? And in doing so, we'll talk about a couple of challenges that we all have as game developers around accessibility.
We’ll talk about accessibility for gamers without sight, as well as deaf and hard of hearing gamers. Then, we'll look at some techniques and a couple of examples of projects and games that have done accessibility for audio. And then finally, we'll take a deep dive into technologies and tools that exist today, where you can bring great audio accessibility to your games.
So, audio is 50% of the game experience normally, but for gamers without sight, it's really 100% of the experience. So, how do we express the game world with just audio? You know, we want to give immersive experiences, yet also make the game playable through audio clues. That being said, designers and directors are very protective of their mixes and their designs. So how do we give audible clues to the gamer? Make the game playable via audio, without significantly modifying the design of the game? For deaf and hard of hearing, the game design and audio clues become almost impossible for them to experience. So really, the way that we approach this is through visualizing audio, but that can be really difficult. Extracting audio cues from already mixed game audio is really hard and generally not very effective.
But, if you have access to the audio before the mix, before that final mix, and can grab that information, it gives us a lot more flexibility, but it also introduces other challenges. In a game, there's tons of audio that's going on, so how do you choose what to express visually? And, it also requires deeper integration into the audio engine, because you have to grab that information before the final mix. The other thing to keep in mind with this, is that design choices are key, and we have to be very careful about what we visualize and make it so that it's not being seen as cheating for some gamers.
So, with that, I have a few challenges for all of us as game developers. For gamers without sight, how do we create an audio experience that's immersive, supports gameplay, tells our stories that we want to tell, without special casing and interfering with the design intent, and supports our want to give gamers without sight the ability to play and even thrive in our games? For deaf and hard of hearing, how do we visualize effectively, without interfering with the game, as well as eliminate the thoughts of cheating by doing so? And for developers, how do we make these capabilities easy and efficient, so that we're not adding a development tax to the team, so that it doesn't become one of those things that gets dropped onto the cutting room floor. I want to talk a little bit about some techniques that we've seen with some projects and games, and just give you an idea of some of the things that have been done. I worked on a project called Minecraft Beyond Sight, which was done here at Microsoft as a hackathon, to try to bring accessibility, deep accessibility, to Minecraft, including audio accessibility. We did this through a number of methods. One was, first, the use of acoustic clues in the game.
We brought a technology called VERA which stands for Voxel Engine for Real-time Acoustics, which was born out of the Advanced Technology Group. It allowed us to give indications of indoor and outdoor spaces, as well as be able to tell if a sound is obstructed or occluded by the world. We also added sounds that you might naturally use anyway to help as markers on things like torches. We added spatial processing, which is a huge component of bringing audio accessibility to games. Spatial sound allowed us to have sounds that had true positions in space, pinpoint accurate. We learned in that project that for blind and low vision gamers, you almost cannot give too much audio information.
And finally, we also added the ability for the user to be able to control voice prompts that would tell you what you were looking at. For the game Killer Instinct, the audio director in Microsoft's Global Publishing Group at the time worked with the developer and a blind gamer, named Sightless Combat, to come up with ways to give more and better audio accessibility for the game. They worked on doing very strong sound design, very specific sounds were given priority for playback. So, you could tell the movement sounds and the player themselves.
They added better location through panning to understand the soundscape on screen, and added menu options to give the player much more control over what they wanted to hear out of the game, so, they could concentrate on the things that were important for them to be able to play the game. I spoke of this technology called VERA. I wanted to give you a little example of what this was like. We'll play a video now without the VERA engine running, just to give you a baseline of the sounds in the game without the technology. So, you can see, or rather hear, that you could hear the torch sounds. It all just sounded like Minecraft, right? So now we’ll play a similar clip with the VERA technology in place, and listen to how the sounds change as you go in and out of spaces, break down the walls, that kind of thing.
Now you could tell when I went into that room and I closed that door, I could no longer hear all the sounds that were outside that. I broke a wall to the side and I could start to hear the sounds. So that technology allowed us to give a much better sense of the world and the space that you are in, and understand that, just through the audio. The basic techniques for deaf and hard of hearing, as we talked about earlier, is visualizing the audio in the game.
A couple of examples of this, Minecraft has what they call their subtitles, and this is just basically a small list in the lower right-hand corner that gives text information about sounds, you know, zombie rattles, footsteps, that kind of thing, and it also gives you an idea of whether that's to your left or right. So, you have some sense of the sounds and what's going on around you by that list. Fortnite has a more sophisticated visualizer. We talked about visualizing just cues that are important for the game.
So what they've done is, they've done things, like, for gunfire or footsteps, giving you a little bit more of an accurate view of where and what those sounds are while you're playing the game. We've talked about some of the techniques that some games and projects have used. Let's take a deep dive into the technologies and tools that exist today that will allow you, as a game developer, to bring better audio accessibility to your game. First I'd like to talk about acoustics. Acoustics are important because they help tell how sound moves through the world.
Whether that sound's reflecting off of walls, the reverberation of the space you're in, how sounds become muffled or occluded when they're behind walls or other objects, and even how sounds travel through, say, doorways in the world. I might be in a room where behind a wall is some sound, but I have an open doorway. I'm probably going to hear more of that sound through that doorway, rather than through the wall, and so, acoustics give us a way to help really paint a visual of the space just through the audio. So, for reflections, there are existing plugins and tools out there, for example, Audiokinetic's Wwise spatial audio system, and their Reflect plugin is a great example of a system that allows for taking in the game geometry and automatically playing reflections of sounds based on that geometry that the game has fed to it. At Microsoft, we have a product called Project Acoustics. Project Acoustics is a set of tools and technologies that does highly accurate acoustic simulation for games.
What this does is, you take the geometry of your game, you feed it to the analysis engine, and it does what's called a wave simulation on the geometry. It then takes that information and feeds back a dataset that is what we call perceptual parameters. These are parameters that help to define how reverberant the space is, how obstructed a sound emitter is, the direction that the sound comes from, and it's giving you this dataset for the entire geometric area of the game level.
It's positionally accurate, it's directional, and it's spatial, which we'll talk about in little while. So, it's very highly accurate information about the acoustical space in a game. “I’d say it's about time we get the hell outta here.” “Hey, are we not going to talk about what happened back there, with the nest? I mean, what the hell was that?” “It looked like they were transforming?” “Wait, you mean like evolving?” “Shouldn't that take a lot of time?” So, you can hear the sounds and if you were playing the game, you'd hear like this, you would hear sounds positioned correctly, but you don't really have a sense of space at all.
Now let's listen to a similar clip with the acoustics using Project Acoustics technology turned on. “I'd say it's about time we get the hell outta here.” “Hey, are we not going to talk about what happened back there, with the nest? I mean, what the hell was that?” “It looked like they were transforming.”
“Wait, you mean like evolving? Shouldn't that take a lot of time?” “Some insect juveniles can become drones in days, hours even.” “So, juvies and drones?” “Juvies and drones of what though?” “I have a feeling we're going to find out.” So now, you see, you get a much better sense of the space the player is in with smooth transitions between those spaces.
And all of this is done with that calculated data. No longer are you relying on old techniques, like placing hand located trigger zones to change your reverb, or your acoustic spaces. So, we've talked about acoustics, which is a really important part of helping to paint the space, the image, through audio. But the other side of this is spatial sound platform. Spatial sound is what tells us
where sounds are located, unlike acoustics which tell us about how sound moves through the space, spatial sound will give us an accurate sense of exactly where that sound is coming from. I'm going to go do a high-level overview here of the spatial sound platform, we’ll talk about why it's important, and then we'll get into the whats and hows of it. So why spatial sound? You know, traditional audio, whether it's 2D speakers off of your TV or even a 7.1 home theater system, it's a flat experience. Spatial audio gives us the illusion that sounds are coming directly from objects in three dimensional space.
It helps us tell our stories more effectively through fully immersing the player in the audio, it gives a wider field of awareness over the visuals, which, when you're playing the game, you have this frustum in front of you that you're - graphics that you're seeing, but with audio we can represent that all around you. It helps to anchor and give persistence to objects that you may not even see on the screen. It helps us tell our stories, it gives us immersion, helps with gameplay, but most importantly, it helps with accessibility. What is the Microsoft 3D spatial sound platform? It's a system that is integrated into the Xbox and Windows platform, it abstracts the audio that you send from your game, from their audio output format. And what I mean by this is, that your game sends the audio to the system, and no matter what the user has chosen for their audio output, the system just does the right thing. Has support for both speakers and headphones with a single API, and by extension a single content authoring path.
We have support for two different kinds of audio objects, one is what we call static bed objects. These are objects that have a fixed position in space, and they represent the idealized location of speakers in a home theater system. For example, a Dolby Atmos home theater setup is seven speakers around you, a low frequency effects speaker and four speakers above you. But we also support what are called dynamic objects.
These are objects that have a true position in space, and while over an AVR these don't really matter so much, because an AVR is just going to simply channel pan that sound to speakers, over headphones, and rendering spatially, you will hear that sound in the exact location that it was presented to the engine. We have full support for Dolby Atmos and DTS:X over both speakers and headphones, we also support both spatial and non-spatial audio experiences simultaneously. So, you might be playing a game that is spatially enabled and do chat, or maybe a Skype call together.
Those will continue to work and mix exactly as you would expect. For games that are not writing to a spatial audio endpoint, we have the ability to bring virtual surround sound for existing games. So rather than hearing direct to ears over headphones, you can actually get a 7.1 experience over your headphones through the spatial platform.
Now, we have broad platform support, it's on Windows and the entire Xbox family, from Xbox One all the way through Xbox series X and S. Broad device support, TVs, sound bars, AVRs, with full support for Dolby Atmos and DTS:X, as I mentioned. But maybe most importantly is over headphones, you can use any pair of headphones that you want, and we offer three different kinds of renderers. One is Windows Sonic for headphones, which came from the HoloLens. We have Dolby Atmos for headphones and DTS:X for headphones.
Broad middleware support. If you're using a middleware engine for your audio engine, there are plugins that make it easy to implement the spatial sound platform, and it's been designed to be easy for both the consumer and the developer. So, we’ve talked about two major areas for bringing accessibility to gamers without sight in spatial sound and acoustics. A couple of other technologies to consider for audio accessibility, are both text to speech and speech to text.
Technologies that exist today. For for gamers without sight, text to speech is great for screen readers, subtitle reading, narrators. For deaf and hard of hearing you can use speech to text to do automatic subtitling, or even bring chat into text for multiplayers. We've talked about techniques that can be used, we've looked at the technologies that exist today to help bring accessibility to audio. Let's talk a little bit about what are the future possibilities. And so, a couple of things that we're looking
at, one is, talking about visualizing audio, how can we bring audio visualization and make it easy for developers? We're looking at plug-in-based tools that allow for extracting sound object information out of an audio engine automatically, to give designers control over the information that is used, and for those titles that don't want to do bespoke visualization for their engines, ways to automatically do the visual display. We're also looking at a brilliant technology for guidance in games for gamers without sight. You know, we're able to paint the world with acoustics and spatial sound, but we also have needs to be able to guide the players to goals within the game. And so, there is a technology that came out of Microsoft Research, it's a product called Microsoft Soundscape.
It's designed for use with blind and low vision people to help use their phones and GPS out in the world, to guide them to locations in the world. We're looking at bringing this technology, as a possibility, into games, but I'd like to play you a little video here that'll give you an idea of what Soundscape is all about. “It was six weeks ago, I think, when we initially thought, wouldn't that be a crazy idea? And here we are.” “In Soundscape,
in general, the idea is that you hear spatialized audio, that is, you'd hear it in your right ear if something's off to your right, your left ear if somethings off to your left. On the water, we’re taking that idea in order for you to go capture beacons.” “You get like a ticking sound, it's, sort of, a benign like clock ticking almost, it's kind of a loping rhythm.” “And as you turn, you start to hear a pinging that's more of like a positive sound from your right ear to your left ear.
And then when you get in the center, you just move directly toward it and try to keep it in the center of your auditory field.” “You got it? Alright, nice.” “Yes, we did awesome.” “Soundscape!”
This is a technology that we think is variable, very usable for gaming, and we're going to look at taking the techniques that the product uses and generalize them to bring to tools for, and techniques for, gaming, making it easy for you, as a game developer, to implement something like this for waypoint guidance in a game. So, we've looked at techniques, we've talked about technologies, we've looked at the future. Let's go back to our challenges. You know, for gamers without sight, how do we create that audio experience that's immersive, helps with our gameplay, and tells the stories we want to tell without interfering with the intent of the audio design, and supports our gamers without sight and gives them the ability to play and thrive within our games? Some of the things we've learned. Give the
player control over what they need to hear and want to hear in the game. Use techniques that can expand upon existing sound design, and helps with giving clues to the gamer in the game. Use the technologies that we have available today. High quality acoustics, spatial sound help to paint the picture of the world just through audio, and give gamers without sight a better sense of the world that they're in when they're playing your game. For deaf and hard of hearing, building generalized tools for giving access to the sounds out of a sound engine, is really the key thing here to bringing visualization and making it efficient and usable.
In addition to that, we do have technologies like speech to text, and as I mentioned, we are looking at creating new tools to bring these capabilities to games. Finally, here's a set of links for references and resources, that I highly recommend looking at. There's information here on spatial sound, on Project Acoustics, on Microsoft Soundscape, and I would also recommend checking out some of the audio sessions in the audio track that go into much deeper detail on all of these technologies and tools that we've spoken about today. So, thank you for joining me on this session today.
It's super important to me, and I think to everybody, that we think about bringing better audio accessibility to games, making games playable for everyone. Thank you very much.
2021-04-26