[EAA 2021] Pandemic Listening Challenges and the Future of Listening Technology
This presentation contains binaural audio demonstrations that are best experienced with headphones. Hello. I'm Ryan from the University of Illinois and it's a pleasure to be here at the Educational Audiology Association Virtual Summer Conference. I'm a postdoctoral researcher in electrical engineering and my research focuses on signal processing for hearing aids and other listening technologies. I'm also a person with hearing loss. I've worn hearing aids since I was a teenager and I've always been frustrated by how poorly they work in noisy environments like restaurants and parties, so, as a graduate student, I studied ways of using microphone arrays to improve their performance in noise.
I worked with a team of students to design proof-of-concept listening devices with dozens of microphones that could separately process individual sound sources from a noisy mixture. I finished my program in December of 2019, just a few weeks before the first case of COVID-19 was reported. I wasn't sure what I wanted to do next, so I signed up for a technology translation program run by the National Science Foundation. It's designed to get scientists out of the laboratory and talking to real people, both consumers and people in industry. That seemed like a great opportunity to connect with the hearing loss community and learn what's going on in the industry. Over the past year I've talked with more than 100 people, from
engineers and audiologists to community activists to kids who are newly diagnosed with hearing loss. When I started the program I expected I would hear a lot about the new trend toward over-the-counter and direct-to-consumer business models, new connectivity features, and artificial intelligence. I thought the biggest complaints from users would be about noisy places like convention centers and restaurants. Then, three months later, everything changed.
There were no more conferences, no more crowded restaurants or busy train stations. Users weren't worried about noisy crowds, they were worried about face masks and captioning, and the tech industry started pouring resources into conferencing equipment. It was fascinating watching all those changes happening. Now the world is starting to get back to normal, but I think the pandemic will have a lasting impact on listening technology. In this talk I'll weave together personal experiences as a person with hearing loss, technical concepts from engineering, and insights from the people I've met over the last year.
The pandemic changed the way we all communicate. If you think about it, the communication challenges brought on by masks and lockdowns are a lot like the ones people with hearing loss deal with every day. Masks muffle high frequency sounds just like the most common type of hearing loss, and now everyone has to rely on technology to communicate, just like we always have. And
because the pandemic affected everyone, society put a lot of resources into new audio technology. For example, classrooms and conference rooms have a lot more microphones than they did before. Going forward, we can leverage that new technology to improve accessibility. 2020 was always going to be a disruptive year for listening technology thanks to regulatory changes, new wireless standards, and competition from consumer electronics companies. The pandemic will change things even more, but I'm optimistic that they'll be changes for the better. I want to start with the most visible - and audible - change caused by the pandemic: face masks. By late spring last year, health officials were recommending that people wear
face coverings over their noses and mouths whenever they were around other people. Masks turned out to be very effective against the virus, but they also make it harder to hear. Most masks are opaque, so they block visual cues like lip movements that help us understand speech.
They also muffle sound, especially the high frequencies that are especially challenging for people with hearing loss. There had been a few studies on the acoustic effects of medical and industrial face masks, but at that stage of the pandemic the vast majority of people were wearing cloth masks, many of them homemade. A friend of mine, who works in a school that serves children with hearing loss, asked me what kind of masks their staff should use when they start teaching again. I didn't know what to tell her, so I gathered as many masks as I could find and took them down to the lab. I included some of the transparent face masks that were then just starting to catch on. I did two types of experiment. First, to get objective, repeatable measurements, I put the
masks on a head-shaped loudspeaker that was built by a student from our college of art and design. The speaker can produce the exact same sound every time, so it provides a fair comparison between different masks, but it isn't a perfect analogue for a human head or a human vocal tract. For less consistent but more realistic measurements, I also wore the masks myself. The relative performance between masks was fairly consistent between the two experiments.
To rank the masks, I set up a microphone six feet away. I also tested wearable microphones to see if those would work well with face masks. Here's a plot of frequency responses for a few of the masks. Each curve is the difference between the speech spectrum with a mask and the spectrum without a mask. With the exception of the face shield, which is just a nightmare acoustically, all the masks had similar effects. They didn't affect sound much at all
below 1000 Hertz and they muffled high frequencies above around 2000 Hertz. The amount of attenuation depends on the material the mask is made of. Here is a ranking of most of the masks we've tested to date.
The blue surgical masks are my go-to mask since they have very little effect on sound. Tight-fitting N95 masks muffle a lot more sound, which I've definitely noticed in doctors' offices. If you want a reusable mask, pay attention to the weave of the fabric. Loosely woven fabric,
like cotton jersey used in t-shirts, is pretty breathable and lets a lot of sound through. The tighter weaves used in denim and bed sheets attenuate sound more strongly. On the right side of the figure you'll see four types of flannel with different weaves and number of layers. The number of layers does affect attenuation, but not as much as the weave. Four layers of light flannel was better than two layers of heavy flannel, for example. The three transparent masks we tested performed worst. Plastic might let light through, but it's very bad for sound waves.
There have been a lot more transparent masks developed since I did these tests, so some newer ones might work a bit better. I haven't tested many new commercial designs, but there is one set I want to mention. These mask prototypes were designed by a team of high school students here in Champaign, and they combine visually transparent plastic with acoustically transparent cloth. This striped design works pretty well acoustically,
and you can still see lip motion through it. It's a really impressive design. One of the unique features of our study was an experiment on directivity. Masks are too thin to absorb much acoustic energy; they deflect it. We wanted to know how masks affected sound to the side of the talker and above and below the mouth.
To measure directivity, we put the head-shaped loudspeaker on a turntable in our lab. We measured the sound at different angles in 15 degree increments. This polar plot shows the results. The black curve is the speaker with no mask. It's not quite the same as the directivity of a human talker, but it's reasonably close. You can see from the purple and green curves that the masks have the strongest attenuation in front of the talker and a weaker effect behind and to the sides. The shield, meanwhile, actually reflects sound backward. These results suggest that we might be able to make up for the attenuation in front of a mask by capturing sound somewhere else, like the cheek or forehead or chest.
Now, the loudspeaker doesn't have a chest so for this next experiment we used the human talker. Microphones were placed on the lapel, which is commonly used for presentations and broadcasting, and on the cheek and forehead, which are sometimes used in theater. We also tried a mic just in front of the mouth, like a fitness instructor headset. Here is a plot for the plastic window mask. Each curve compares the spectrum at the same microphone with and without a mask. The headset mic has just as much attenuation as the distant mic, though of
course that mic gets more sound to begin with. The mic on the cheek was under the mask, so the reflected sound was amplified, but it was still distorted. The lapel and forehead mics weren't affected much at all. Since lapel mics are already so commonly used, they seem like a great choice. These plots compare the effects of a few masks on distant mics and on lapel mics. Each curve is the difference between the spectrum with a mask and without a mask, so the overall spectrum of a lapel mic is still different from that of a distant mic, but it isn't affected much by a mask. That means if you already have a chest-worn
microphone that works well without a mask, it should still work well with a mask. To summarize, surgical masks and loosely woven cloth masks work best for sound transmission, but they block visual cues. Masks with plastic windows are more visually transparent but they block a lot of sound. The choice of mask depends on whether the listener relies more on audible or visual cues. Personally, I understand people better with the surgical mask than the window masks, but I've heard from other people who prefer the window version. With all types of mask, amplification can help compensate for the effects of the mask, so if you have a microphone available you should use it.
Since this research came out last summer, I've had a lot of questions from people asking about options for microphones to wear with masks. In higher education we were fairly well prepared: A lot of smaller classes were moved to large auditoriums to promote social distancing, and most of those spaces have public address or voice lift systems built in. Corporate offices tend to have similar setups. In K-12 the options depend on what resources a school has available. I heard from some teachers that were ordering home karaoke systems for their classrooms because they didn't have anything better. I think the people who have the hardest time with masks are essential workers like grocery store employees who have to interact with a steady stream of customers all day. You have no control over what kind of mask customers wear or how loudly they talk, and you don't have any good options for amplification. If it were me, I might tape a microphone to the end of a
six-foot pole and stick it in people's faces, but I don't know if the manager would appreciate that. Anyway, that got me interested in assistive listening technologies like remote microphones. There's a huge variety of these devices out there, all with different strengths and weaknesses.
In the next part of the talk, I'll talk about the wireless assistive listening systems available today and how they might evolve over the next few years. When most people think of listening technology, they think of hearing aids and cochlear implants, which are designed to be worn all day and to help in a broad range of listening situations. Wireless assistive listening systems are more situational, and help users who want to hear a specific sound like a television, a phone call, a teacher, or a public announcement. Those systems have not gotten much attention from signal processing researchers, which is a shame because frankly there's a lot of room for innovation. Among the adults with hearing loss I talked to, the most popular listening tech accessory was the phone streamer. Here's mine. It connects my hearing aids to my smartphone or laptop over Bluetooth so I can listen to music or take calls with my hearing aids. I had an older version
of this in high school and I used to wear it under my school uniform so I could listen to music from my iPod in class. I felt so cool that I could do that and the normal-hearing kids couldn't! But to be honest I don't use it very often now because the sound quality is not very good and I have to charge and carry around an extra device. I usually reach for some nice consumer wireless earbuds instead.
The difference comes down to power consumption. It takes a lot of bandwidth and therefore a lot of power to transmit high-quality audio wirelessly over a long distance. That's why earbuds like these sound great but only last a few hours. Hearing devices are supposed to last a few days on a charge, so they can't use full-power Bluetooth. Instead, there's a dongle like this that talks to the phone over Bluetooth but uses a lower-quality, shorter-range proprietary protocol to talk to the hearing aids. These are low-volume accessories, so they tend to
be expensive and don't keep up with consumer products in terms of quality or features. Now, there are some newer hearing aids that can connect directly to certain models of smartphone, so users with that phone don't have to carry an extra dongle. The new Bluetooth Low Energy Audio standard, which is part of Bluetooth 5.2 and was developed in collaboration with the hearing aid
industry, should bring that direct connection feature to most new hearing aids, smartphones, and laptops over the next year or two. The standard includes a more efficient codec, so it can transmit sound with higher quality and lower bandwidth, which means it consumes less power and can be built directly into hearing devices. It will be the same standard used in the next generation of wireless earbuds, so I'm optimistic there will be good compatibility.
Another popular streaming accessory, especially since everyone's been stuck at home, is the TV streamer. It bypasses TV speakers to pipe sound directly to the listener's ears. These are useful in households where some people have hearing loss and some don't, since they mean no more fighting over the volume control. Every major hearing aid company makes a proprietary accessory that will stream to their hearing aids. Like the Bluetooth dongles, these accessories vary in quality. Personally, I use a dedicated set of analog wireless headphones. I love being able to walk around the house and do chores while I'm wearing them.
According to the industry, we can expect the new Bluetooth standard to be built into TV and stereo systems as well, so any number of hearing aids and consumer earbuds can connect to them directly. If it's implemented well, that should be really convenient. You could tune your earbuds or hearing aids to one of the TVs at the gym, for example. One downside of the TV streamer, especially the big headphones I use, is that I can't listen to a show and also have a conversation with someone else in the room. I'll say more on that later. This next gadget seems to be much less popular among adults but is widely used in K-12 classroom settings: A remote microphone. These are for one-on-one conversations with another
person in the same room. The microphone is worn by a teacher or conversation partner and it transmits sound to the hearing device. Remote microphones have a few advantages over hearing devices alone. Because the microphone is far away, there's less risk of feedback when the gain is turned up. In noisy environments like restaurants, they provide a much better signal-to-noise ratio because they're closer to the talker. They
also pick up much less reverberation, which can improve intelligibility in large, echoey rooms. Essentially, they bring the talker closer to the listener. They also work well with face masks, so they're a great tool to have on hand during the pandemic. Now, as I mentioned, remote microphones don't seem to be very popular among adults. One reason is that they're expensive proprietary accessories.
Because the existing Bluetooth audio standard has such high power consumption and high latency, every company uses their own custom wireless protocol, and some work better than others. I measured one commercial remote microphone accessory with more than 40 milliseconds of latency which, [echo effect] as you can tell from listening to me now, can be very annoying. [echo effect] That much delay also makes it hard to speak. The new Bluetooth standard is
also supposed to have lower latency, so I hope that it makes its way to remote microphones. If it does, we might see accessories that work with hearing devices from multiple companies. Third-party devices could have lower prices, higher quality, more up-to-date specs, and more innovative new features than first-party products. Remote microphones are also inconvenient to carry around and they require self-identifying as a person with a disability. It's one thing for a parent or audiologist to ask a K-12 teacher to use assistive technology in class, but for adults, not everyone is willing to ask a business partner or a date to put a gadget around their neck. That's even harder during the pandemic because you'd have to wipe it down before handing it to them.
The last category of assistive listening technology I want to talk about is broadcast systems, which are found in public venues like churches and theaters. A transmitter is connected to the venue's sound system, so the signal could be a single talker at a podium, a mix of microphones from a panel discussion or theater performance, or pre-recorded sound - whatever is being played over the loudspeaker system. The signal is received by either a dedicated headset or the listener's hearing device. In the United States, public venues are required to provide headsets for users to check out, and most use FM or IR systems. There's a lot of variability between venues in how well these systems are implemented and how popular they are.
I talked to one local venue that has a lot of older customers and uses them all the time, and another one that has a set but hasn't checked out a headset in over a decade. Personally, I love using the headsets when I see musicals in Chicago because the assistive listening system gets a special sound mix that emphasizes the vocals over the instruments, so I at least have a chance of understanding the lyrics. But for a stand-up comedy show, I usually won't bother. Headset systems have the same problem that users have to self-identify and take extra steps. You have to find out where to get it, fill out a form, leave an ID, figure out how to use it, and remember to bring it back later when everyone's rushing out.
In Europe, it's more common to find induction loops, which are special wires built into a room that transmit sound directly to the telecoils in compatible hearing devices. The advantage of hearing loops is that users don't need to take any extra steps beyond pushing a button on the device they're already wearing. The downside is that induction loops are difficult and expensive to install, especially in older buildings. I know some accessibility advocates have very strong feelings about telecoil systems, but from a technological perspective, they're kind of a pain. The newest trend in assistive listening systems is Wi-Fi. Users download an app or
visit a webpage and stream sound over the network using their own headphones or hearing devices. If you're a venue owner and you buy a new FM or IR headset today to meet regulatory requirements, chances are it will come with Wi-Fi built in too. Wi-Fi systems combine some of the advantages of headsets and induction loops: They're cheap and easy to install, and listeners can use devices they already have. But of course the user has to be comfortable using a smartphone, and if the network doesn't work well, there could be a noticeable delay. A closely related technology is online broadcasting, which became popular among churches and other live event venues during the pandemic. People would gather together in a parking lot while staying in their own cars and stream the audio over their smartphones.
According to the Bluetooth Special Interest Group, the new Bluetooth Low Energy Audio standard is expected to replace induction loops and other broadcast systems over the next decade or two. I think that will depend on how well the standard is implemented, but if it works well it could make assistive listening systems much more accessible, and we might start to see them in venues that aren't legally required to have them. Since consumer devices will use the same protocol, Bluetooth transmitters might come standard in every new sound system, so you can tune into the playlist at the coffee shop or the announcements at the airport just as easily as the broadcast at the theater.
The systems I've mentioned can be categorized as one-to-one communication, like remote microphones, and one-to-many broadcasts, like induction loops and FM headsets. But what if I'm at a dinner or a business meeting talking with multiple people at once, or if I want to listen to the TV while also talking with my partner at home? What about class discussions where students want to hear each other as well as the teacher? I see many-to-one and many-to-many systems as the next big growth area in assistive listening technology. We're starting to see a few products that can do this, either using multiple individual microphones or tabletop microphone arrays. From a technology standpoint, it's not too difficult. For example, I've tried using this pair of wireless microphones with two conversation partners. The receiver is designed to connect to a video camera for shooting two-person interviews, but you can just as easily hook it up to a pair of headphones. These mics sound great,
in part because they're big and heavy and chew through batteries, but multiple-talker wireless systems, both official ones from hearing device companies and these do-it-yourself solutions, mean carrying around a lot of extra equipment, and it's harder to provide a good listening experience when there's more than one talker. For example, these mics can route one talker to the left ear and one talker to the right ear, or they can mix them both together in both ears, neither of which really sounds like I'm talking to two people in the same room. Let's talk about what the next generation of assistive listening technology might look like and what we can learn from the new audio tech developed during the pandemic. New device technologies, wireless standards, and signal processing algorithms should lead to a lot of innovation in assistive listening systems over the next few years. It's useful to think about
what kind of systems we'd build if we no longer were constrained by size, power, or bandwidth. Listening systems have two main design criteria, which I'll call immersion and enhancement. An immersive system sounds real and makes me forget I'm using any technology at all. It should preserve the room acoustics and spatial cues, so if you're standing on my left, it sounds like you're on my left, and if you're facing away from me and mumbling into a wall, it sounds like you're facing away from me and mumbling into a wall. The gaming industry is pouring resources into algorithms that can simulate room acoustics and head-related transfer functions to provide better immersion in virtual reality audio.
Hearing devices get immersion for free because they have microphones at the ears already. The sound picked up by microphones at the ears already has the acoustic effects of the room and at least some of the acoustic effects of the head. Enhancement is about making sound easier to understand, for example by reducing noise and reverberation. Hearing aids have a hard time doing that because their microphones pick up noisy, reverberant sound. Remote microphones can do better because they're closer to the talker. Remote meetings, like the one we're having now, are another great example of enhancement.
You're hearing my voice through a high-quality microphone in a quiet room, carefully processed in software and then delivered to your headphones. When everyone has good hardware and a good connection, I find it much easier to hear in remote meetings than in-person meetings, but there's no sense of immersion: You're hearing me from inside your head. Sometimes there's a trade-off between immersion and enhancement. For example, if you're facing away from me and mumbling into the wall, I probably don't want realism; I want to enhance your voice so I can understand it. But often we can have both. For example, most remote microphones transmit the same sound to both ears, so listeners have no sense of where it's coming from. But if we have a high-quality, low-latency
wireless connection and decent processing power, then we can match the acoustic cues of the low-noise remote signal to the cues at the ears. Here's a demonstration using a hard-wired lapel microphone in our laboratory. I read a passage while walking around some speakers playing speech recordings. The sound in the video is matched to the ears of a dummy head, so if you're listening through headphones, you should be able to track my direction as I move back and forth. First you'll hear the noisy mixture through the ears, then the unprocessed remote microphone signal, and finally the enhanced signal. Listen for the spatial cues and the changes in spectral coloration. [Overlapping speech] ... with its path high above and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at one end. People look but no one
ever finds it. When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow. Throughout the centuries men have explained the rainbow in various ways. Some have accepted it as a miracle without physical explanation. The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.
That demo only used one remote microphone, but it would work just as well with several. If we had a microphone on each talker, we could do more sophisticated types of processing than we can with hearing devices today. We could apply different gain, equalization, and compression to each talker, just like mixing artists do in recording studios. Of course, when we start interacting in person again, there's a third criterion to think about: convenience. I personally wouldn't mind carrying around a case full of microphones and clipping
them on everyone I meet, but I think I'm an exception. People are already reluctant to do that with a single remote microphone. Ideally, we wouldn't need to carry any extra devices at all. We could just walk into a room, our devices would enhance the things we want or need to hear, and the whole experience would be so immersive we would hardly notice.
It sounds like a fantasy, but a lot of the technology is already there, thanks in no small part to the pandemic. You see, now that workplaces are starting to reopen, the tech industry has decided that the future of work is hybrid. There will be some employees in the office and some working remotely, and they'll want to have meetings with each other. Now I don't know whether they're right about that, but it means the industry is investing heavily in audio hardware for hybrid meetings. Remote participants need to be able to hear everyone clearly and captioning software needs to know who said what, but in-person participants won't all be wearing their own microphones. The solution is to install microphone arrays, which can capture high quality sound at a distance, even if multiple people are talking at once, and they can track people as they move around the room. Microphone arrays were already common in smart speakers and game systems,
and now they're showing up in cell phones, laptops, and conferencing equipment to improve the quality of video calls. They've even started installing them in our local school district to help with distance learning. These systems are designed to capture sound for remote participants, but there's no technological reason why they couldn't also connect to the hearing devices worn by in-person participants. That way, people with hearing loss could benefit from this expensive new infrastructure to hear better in classrooms and meeting rooms. Especially with the new Bluetooth standard, it
would be an easy feature to add. Tech companies, if you're listening, please make it happen! These new hybrid conferencing devices are powered by microphone arrays. As it happens, microphone arrays were the topic of my dissertation, so get ready to learn all about them.
As the name suggests, a microphone array is a set of microphones that are spread apart from each other in space. Unlike single microphones, arrays can process sound spatially. Propagating sound waves will each reach different microphones at different times depending on their direction. We can process and then combine the signals captured by the different microphones so that sounds from one direction interfere constructively and get louder and sounds from other directions interfere destructively and get quieter. That geometric version of array processing is called beamforming, because we can think of capturing a beam of sound from a certain direction. To do beamforming,
we need to know how far apart the microphones are and where they are relative to the sound sources. There's another way to think of array processing which is often called source separation. In this interpretation, there's a system of equations that describes how sound propagates from each source to each microphone. If we have more microphones than sources, then we have more equations than unknowns, and we can solve for the original sources. If we're in a reverberant room where the sound bounces around, those equations get very complicated, so we need to calibrate the array somehow, and that's been a major problem for signal processing engineers for decades. In general, the performance of a microphone array depends on its size, both the number of microphones and the area they cover. Just like lenses,
larger arrays can create narrower beams, and when we have plenty of microphones spread far apart we can make the array more robust against noise and reverberation. When designing microphone array processing for listening devices, we have to be especially careful if we want them to be immersive. A basic beamformer like the kind used in a smart speaker would distort the spatial cues of everything that isn't in the direction of the beam, so it sounds like being in a tunnel. We can design beamformers that don't have the tunnel effect, but they also don't reduce noise as much. Just like with assistive listening systems, sometimes there's
a trade-off between enhancement and immersion. If we want both, we need to add more microphones. Most high-end hearing aids have two or three microphones per ear, and they're right next to each other, so they can only do a little bit of directional processing without causing perceptible distortion. It can help somewhat if most of the noise is behind the listener, but it's really no use at all in a very crowded environment. In our lab, we've been designing prototypes of larger wearable microphone arrays that have dozens of mics spread across the body. Our
most iconic prototype is the "Sombrearo" which has microphones spread around the brim of a large hat. Our engineering students have built a few functional prototypes over the years. Microphones around the torso are especially helpful because the torso is acoustically dense, and the microphones can be hidden under most types of clothing. We also brought in some design students to imagine more aesthetically pleasing wearable arrays.
With these larger arrays, we can design listening systems that separate process and recombine sounds from multiple sources, doing real-time remixing while preserving spatial cues and room acoustics. But they still have the issue of calibration. We need to learn where the sources are in the room and the acoustic paths that sound takes from each source to each microphone. Even with large wearable arrays, that's a daunting problem. Remote microphones can help. Even if they have poor bandwidth and large delay,
remote microphones still have a good signal-to-noise ratio and low reverberation, so we can use them as pilot signals to calibrate a beamformer. Here is the remote microphone demo from before, but this time, instead of listening to the processed remote microphone signal you'll hear a binaural beamformer from a 14 microphone wearable array that tracks the moving talker. [Overlapping speech] ... with its path high above and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at one end. People look, but no one ever finds it. When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow. Throughout the centuries, men have explained the rainbow in various ways. Some have accepted it as a miracle without physical explanation.
The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain. Now, what if we went even bigger and filled an entire room with microphones? Well, we did that. This is the largest conference room in our building, with glass walls and lots of reverberation. We set up a simulated party with 10 loudspeaker "talkers" all talking at once,
four mannequin "listeners" wearing microphone arrays, and a dozen tabletop array devices designed to look like smart speakers. There were a total of 160 microphones in the room. Now, no portable listening device is going to be able to process 160 channels in real time, so we designed a hybrid system. The distributed array was used to locate the sources, learn the room acoustics, and calibrate the wearable arrays, and then the wearable devices were used for real-time processing. In this demo, you'll be listening through the ears of this mannequin in the corner as she tries to listen to the talker next to her. First
you'll hear the talker alone, then the noisy mixture, and finally the processed signal. This is another binaural beamformer, so listen for the spatial cues. ...seven years as a journalist. We must provide a long-term solution to tackle this attitude. Then, suddenly, they weren't. [Overlapping speech] ...seven years as a journalist. We must provide a long-term solution to tackle this attitude. Then, suddenly, they weren't. So at this point you might be thinking, "This guy is crazy! No one's going to cover their body in microphones, much less an entire room."
Well, that's true. This is mostly an academic exercise to show what could be done with a really extreme system. But if you think about it, we're already surrounding ourselves with microphones. On my body, there are microphones in my hearing aids, my watch, and my phone, and it's only a matter of time before augmented reality glasses finally catch on.
Looking around my living room, I count at least 30 microphones between smart speakers, game systems, computers, and other gadgets. And thanks to remote and hybrid meetings, large network-connected microphone arrays are being installed in classrooms and offices all around the world. What if listening technology could tap into all these microphones that are already all around us? What if when I walked into a meeting my hearing aids picked up sound from everyone else's hearing aids, phones, computers, and from arrays installed in the walls and ceiling? With the right signal processing, my hearing aids could pick and choose what I should hear and process it to have the right spatial cues and room acoustics. I wouldn't have to think about it. And it wouldn't just work in offices. Imagine sitting down to dinner at a restaurant and hearing
only the people at your own table while turning down everyone else. Imagine sitting in a classroom and hearing the shy quiet kid in the back just as clearly as the loud kid right next to you. Microphone array processing can make it possible. Now, this utopian future for listening technology is still a ways off.
The new Bluetooth standard can do a lot, but it can't do this. Gathering sound from every device in a room will require new standards and protocols, and buy-in from the whole tech industry. There are also obvious privacy concerns that would need to be addressed before it could be used in public spaces, and there are many signal processing challenges in making sense of data from so many different kinds of device. Our group is working to address some of those challenges, which I'll tell you about in the final part of this talk.
But first, let's take a short break from microphone arrays to hear about an unexpected connection between hearing aids and COVID-19. I've talked a lot about how COVID-19 has influenced hearing technology, but did you know that a hearing aid signal processing technique can help with COVID-19? Early in the pandemic, there were widespread fears of a ventilator shortage, so a team of engineers here at the University of Illinois came together to design a low-cost emergency ventilator that could be rapidly produced. Our research group helped design the alarm system that alerts clinicians if a patient stops breathing normally. We wanted to find an algorithm that could run on any processor, even the smallest microcontrollers. The level tracking algorithm used for dynamic range compression in hearing aids was perfect. It can follow peaks in a waveform with adjustable time constants, but it requires almost no memory and just a few computations per sample. We adapted the algorithm to track the depth and
duration of breath cycles. It sounds an alarm if breaths are too fast too slow or too shallow. The ventilator design has been licensed by more than 60 organizations and the alarm system is available as an open-source hardware and software design. Now, back to our regularly scheduled microphone array programming In most of the microphone arrays in use today, all the mics are arranged within a single device. They're connected to each other by wires so they can share all their data instantaneously, they all have a common sample clock, and they have known positions relative to each other. Remember, microphone array processing relies on precise timing differences between microphones, so it typically requires perfect synchronization and known geometry. But if we want to do really powerful spatial processing, if we want to handle noisy crowded spaces with dozens of sound sources, we need our array to span the whole room. Typically, that means we need to combine
microphones from multiple devices. These are called ad hoc arrays or distributed arrays, and they're more challenging than conventional single-device arrays. We don't always know where the microphones are relative to each other, so we can't easily translate timing differences to spatial directions. Worse, the devices might not be synchronized, so we can't tell exactly what those timing differences are, and the sample clocks of different devices might drift over time. If the devices are wireless, like most wearables,
they probably have limited bandwidth, relatively high latency, and intermittent dropouts. With ad hoc arrays, therefore, we might not be able to use all the microphones together for beamforming. Fortunately, most devices today have more than one microphone. That means our ad hoc array isn't
composed of individual microphones, but of smaller conventional microphone arrays. The sensors within each device are synchronized with each other and can be used for real-time spatial processing, and the devices can share other information with each other to improve performance, even if they can't perform large-scale beamforming in real time. For example, the devices can work together to decide which sound sources are where and what their frequency spectra are like, or to track who's talking when. Then each device can adjust its local processing using those parameters. We call this cooperative processing. The large conference room demo was a good example: The smart speakers are in fixed locations, so they can locate the talkers and relay that information to the wearable devices, which then do real-time listening enhancement.
The beamformer that tracked the moving talker is another example: The remote microphone signal was delayed by about 100 milliseconds, so it couldn't be used for real-time listening, but it could be used for tracking. Some of the most useful microphones in an ad hoc array are in wearable devices. Wearables provide low-noise reference signals for speech from their wearers and they follow them as they move, but it's hard to combine wearables into an ad hoc array because they move constantly. When we tested our wearable arrays on mannequins, they worked great, but when we tried them on live humans, the high-frequency performance plummeted.
That's because humans are always moving even when they're trying to stand perfectly still. At higher audible frequencies, a microphone on my chest will move by multiple wavelengths relative to a microphone on my ears every time I take a breath. However, if we explicitly account for that motion when we design a beamformer, we can at least partially compensate for it.
This next demo is a beamformer using a wearable microphone array on a moving subject. Now, for the algorithm to work the motion has to be fairly predictable and repetitive, so enjoy the "Mic-Array-Na". [Overlapping speech] We are not aware of any British casualties at this stage. There isn't the seriousness of other businesses. It is about control of our economy. There was a rush of water. The rainbow is a division of white light into many beautiful colors. These take the shape of a long, round arch with its path high above and its two ends apparently beyond the horizon.
Please call Stella. Ask her to bring these things with her from the store: six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a sack for her brother. One of the advantages of distributed listening systems and microphone arrays over conventional hearing devices is that they can apply different processing to different sound sources. Perhaps I want to amplify one person more than another, or apply different spectral shaping to speech versus music. That independent processing could be especially important for nonlinear
processing like dynamic range compression. Compression is used to keep sound at a comfortable level by amplifying quiet sounds and attenuating loud sounds, but it's known to cause unwanted distortion when there are multiple overlapping sounds. For example, if I'm talking quietly and there is a sudden loud noise, then the compressor will turn down the gain applied to all sounds, and my voice will also get quieter. If we compress different sounds independently before mixing them together, that distortion doesn't happen.
It remains an open question, however, whether that kind of processing would make it easier to hear. In fact, many of the open research questions in this area are better answered by hearing scientists rather than engineers. If we could perfectly separate and recombine every sound in the listener's environment, how should we do it? How many sounds can a person pay attention to at once? Does that depend on age, on hearing ability, on noise level? How do we know what sound the listener wants to hear at any given time? A lot of engineers tend to assume that we want to hear what's in front of us, but sounds from behind are also important since they alert us to things we can't see. When we have to decide between immersion and enhancement, how do we make that trade-off? As signal processing advances unlock new types of listening technology, we'll need to work with hearing scientists and with users themselves to understand how to use it.
I got into this field because I want to make hearing aids work better, but these new listening technologies could help everyone hear better whether or not they have hearing loss. Large microphone arrays and distributed sensor networks could be used for augmented reality, media production, surveillance, and much more. They could let people hear things we couldn't otherwise, giving us superhuman perception. I often like to explain listening technology by analogy to vision.
A conventional hearing aid is like a contact lens. It's intended to restore normal sensory ability, it's designed to be invisible and worn all day, and it's no larger than the sense organ that it sits on. That means it has access to the same information our senses do. Just as a contact lens will never let us see a microbe or a distant planet, a hearing aid will never let us have a quiet conversation in a crowded convention center or hear a mouse from across a busy street. Instead of a contact lens, I want to build the hearing equivalent of a telescope or a microscope: a large, situational device that we can't wear all day, but that lets us sense things we normally couldn't.
The last year has been challenging for everyone, but especially for the hearing loss community. We suddenly had to learn how to manage without seeing people's lips and how to hold video meetings with spotty support for captions. We had to deal with shortages of not just toilet paper but also karaoke systems. There were some bright spots, of course. The world got a lot quieter, at least for a little while, and we got to watch normal-hearing people struggle with the sudden loss of high frequency speech sounds and learn how to speak up.
Perhaps most importantly, the sudden shift to remote work led to a new focus on audio capture and processing technology. Suddenly everyone was talking to each other through microphones and speakers just like we always have. Thanks to the pandemic, there are a lot more microphones in the world than there were a year ago, and most of them are networked. That progress coincides with technological shifts like the new Bluetooth Low Energy standard and market trends like direct-to-consumer hearing devices that were already poised to shake up the landscape for listening technology. If the tech industry makes hearing accessibility a priority, then as we start to gather in person again, we can leverage all those new microphones and new wireless technologies to power the next generation of listening devices, both for people with hearing loss and for everyone else who wants to hear things they couldn't hear before. That way, the technology we've developed to keep us apart can help bring us back together.
I'd like to thank my team at the University of Illinois as well as the agencies and companies that have supported our research. To learn more about our work, please visit the Illinois Augmented Listening Laboratory website. If you'd like to get in touch, you can find me on Linkedin, YouTube, and Twitter @ryanmcorey. I'm always eager to talk with audiologists and listening technology users, so I would love to hear from you.