EAA 2021 Pandemic Listening Challenges and the Future of Listening Technology

Show video

This presentation contains binaural audio   demonstrations that are best  experienced with headphones. Hello. I'm Ryan from the University of  Illinois and it's a pleasure to   be here at the Educational Audiology  Association Virtual Summer Conference.  I'm a postdoctoral researcher in electrical  engineering and my research focuses on signal   processing for hearing aids and other listening  technologies. I'm also a person with hearing loss.   I've worn hearing aids since I was a teenager and  I've always been frustrated by how poorly they   work in noisy environments like restaurants  and parties, so, as a graduate student,   I studied ways of using microphone arrays  to improve their performance in noise.  

I worked with a team of students to design  proof-of-concept listening devices with dozens   of microphones that could separately process  individual sound sources from a noisy mixture.  I finished my program in December of 2019, just  a few weeks before the first case of COVID-19   was reported. I wasn't sure what I wanted  to do next, so I signed up for a technology   translation program run by the National Science  Foundation. It's designed to get scientists out   of the laboratory and talking to real people,  both consumers and people in industry. That   seemed like a great opportunity to connect  with the hearing loss community and learn   what's going on in the industry. Over the past  year I've talked with more than 100 people, from  

engineers and audiologists to community activists  to kids who are newly diagnosed with hearing loss.  When I started the program I expected I would hear  a lot about the new trend toward over-the-counter   and direct-to-consumer business models, new  connectivity features, and artificial intelligence.   I thought the biggest complaints from  users would be about noisy places like   convention centers and restaurants. Then,  three months later, everything changed.  

There were no more conferences, no more  crowded restaurants or busy train stations.   Users weren't worried about noisy crowds, they  were worried about face masks and captioning,   and the tech industry started pouring  resources into conferencing equipment.   It was fascinating watching all those changes  happening. Now the world is starting to get   back to normal, but I think the pandemic will  have a lasting impact on listening technology.  In this talk I'll weave together personal  experiences as a person with hearing loss,   technical concepts from engineering, and insights  from the people I've met over the last year.

The pandemic changed the way we all communicate.  If you think about it, the communication   challenges brought on by masks and lockdowns are  a lot like the ones people with hearing loss deal   with every day. Masks muffle high frequency sounds  just like the most common type of hearing loss,   and now everyone has to rely on technology  to communicate, just like we always have. And  

because the pandemic affected everyone, society  put a lot of resources into new audio technology.  For example, classrooms and conference rooms  have a lot more microphones than they did before.   Going forward, we can leverage that new  technology to improve accessibility.   2020 was always going to be a disruptive year for  listening technology thanks to regulatory changes,   new wireless standards, and competition  from consumer electronics companies.   The pandemic will change things even more, but I'm  optimistic that they'll be changes for the better. I want to start with the most visible - and  audible - change caused by the pandemic:   face masks. By late spring last year, health  officials were recommending that people wear  

face coverings over their noses and mouths  whenever they were around other people.   Masks turned out to be very effective against  the virus, but they also make it harder to hear.   Most masks are opaque, so they block visual cues  like lip movements that help us understand speech.  

They also muffle sound, especially the  high frequencies that are especially   challenging for people with hearing loss. There had been a few studies on the acoustic   effects of medical and industrial face  masks, but at that stage of the pandemic   the vast majority of people were wearing  cloth masks, many of them homemade.   A friend of mine, who works in a school  that serves children with hearing loss,   asked me what kind of masks their staff  should use when they start teaching again.   I didn't know what to tell her, so I gathered as  many masks as I could find and took them down to   the lab. I included some of the transparent face  masks that were then just starting to catch on. I did two types of experiment. First, to get  objective, repeatable measurements, I put the  

masks on a head-shaped loudspeaker that was built  by a student from our college of art and design.   The speaker can produce the exact same sound  every time, so it provides a fair comparison   between different masks, but it isn't a perfect  analogue for a human head or a human vocal tract.   For less consistent but more realistic  measurements, I also wore the masks myself.   The relative performance between masks was  fairly consistent between the two experiments. 

To rank the masks, I set up  a microphone six feet away.   I also tested wearable microphones to see  if those would work well with face masks.   Here's a plot of frequency responses  for a few of the masks. Each curve is   the difference between the speech spectrum  with a mask and the spectrum without a mask.  With the exception of the face shield,  which is just a nightmare acoustically,   all the masks had similar effects.  They didn't affect sound much at all  

below 1000 Hertz and they muffled high  frequencies above around 2000 Hertz.   The amount of attenuation depends  on the material the mask is made of. Here is a ranking of most of  the masks we've tested to date.  

The blue surgical masks are my go-to mask  since they have very little effect on sound.   Tight-fitting N95 masks muffle a lot more sound,  which I've definitely noticed in doctors' offices.  If you want a reusable mask, pay attention to  the weave of the fabric. Loosely woven fabric,  

like cotton jersey used in t-shirts, is pretty  breathable and lets a lot of sound through.   The tighter weaves used in denim and bed  sheets attenuate sound more strongly.  On the right side of the figure you'll see  four types of flannel with different weaves   and number of layers. The number of layers does  affect attenuation, but not as much as the weave.   Four layers of light flannel was better than  two layers of heavy flannel, for example.  The three transparent masks we tested performed  worst. Plastic might let light through, but it's   very bad for sound waves.

There have been a lot more transparent masks developed since I did these tests, so some newer ones might work a bit better.  I haven't tested many new commercial designs, but there is one set I want to mention. These mask prototypes were designed by a team of high school students here in Champaign,  and they combine visually transparent plastic   with acoustically transparent cloth. This  striped design works pretty well acoustically,  

and you can still see lip motion through  it. It's a really impressive design. One of the unique features of our  study was an experiment on directivity.   Masks are too thin to absorb much  acoustic energy; they deflect it.   We wanted to know how masks affected sound to the  side of the talker and above and below the mouth. 

To measure directivity, we put the head-shaped  loudspeaker on a turntable in our lab.   We measured the sound at different  angles in 15 degree increments. This polar plot shows the results. The black  curve is the speaker with no mask. It's not quite   the same as the directivity of a human talker, but  it's reasonably close. You can see from the purple   and green curves that the masks have the strongest  attenuation in front of the talker and a weaker   effect behind and to the sides. The shield,  meanwhile, actually reflects sound backward.  These results suggest that we might be able to  make up for the attenuation in front of a mask   by capturing sound somewhere else,  like the cheek or forehead or chest. 

Now, the loudspeaker doesn't have a chest so for  this next experiment we used the human talker.   Microphones were placed on the lapel, which is  commonly used for presentations and broadcasting,   and on the cheek and forehead,  which are sometimes used in theater.   We also tried a mic just in front of the  mouth, like a fitness instructor headset. Here is a plot for the plastic window mask. Each  curve compares the spectrum at the same microphone   with and without a mask. The headset mic has just  as much attenuation as the distant mic, though of  

course that mic gets more sound to begin with.  The mic on the cheek was under the mask, so the   reflected sound was amplified, but it was still  distorted. The lapel and forehead mics weren't   affected much at all. Since lapel mics are already  so commonly used, they seem like a great choice. These plots compare the effects of a few  masks on distant mics and on lapel mics.   Each curve is the difference between the  spectrum with a mask and without a mask,   so the overall spectrum of a lapel mic is  still different from that of a distant mic,   but it isn't affected much by a mask. That  means if you already have a chest-worn  

microphone that works well without a mask,  it should still work well with a mask. To summarize, surgical masks and loosely woven  cloth masks work best for sound transmission, but   they block visual cues. Masks with plastic windows  are more visually transparent but they block a lot   of sound. The choice of mask depends on whether  the listener relies more on audible or visual   cues. Personally, I understand people better with  the surgical mask than the window masks, but I've   heard from other people who prefer the window  version. With all types of mask, amplification can   help compensate for the effects of the mask, so if  you have a microphone available you should use it.

Since this research came out last summer, I've  had a lot of questions from people asking about   options for microphones to wear with masks. In  higher education we were fairly well prepared:   A lot of smaller classes were moved to large  auditoriums to promote social distancing, and   most of those spaces have public address or voice  lift systems built in. Corporate offices tend to   have similar setups. In K-12 the options depend  on what resources a school has available. I heard   from some teachers that were ordering home karaoke  systems for their classrooms because they didn't   have anything better. I think the people who have  the hardest time with masks are essential workers   like grocery store employees who have to interact  with a steady stream of customers all day. You   have no control over what kind of mask customers  wear or how loudly they talk, and you don't have   any good options for amplification. If it were  me, I might tape a microphone to the end of a  

six-foot pole and stick it in people's faces, but  I don't know if the manager would appreciate that. Anyway, that got me interested in assistive  listening technologies like remote microphones.   There's a huge variety of these devices out there,  all with different strengths and weaknesses.  

In the next part of the talk, I'll talk  about the wireless assistive listening   systems available today and how they  might evolve over the next few years. When most people think of listening technology,  they think of hearing aids and cochlear implants,   which are designed to be worn all day and to  help in a broad range of listening situations.   Wireless assistive listening  systems are more situational,   and help users who want to hear a specific sound  like a television, a phone call, a teacher, or a   public announcement. Those systems have not gotten  much attention from signal processing researchers,   which is a shame because frankly  there's a lot of room for innovation. Among the adults with hearing loss I talked to,  the most popular listening tech accessory was   the phone streamer. Here's mine. It connects  my hearing aids to my smartphone or laptop   over Bluetooth so I can listen to music or take  calls with my hearing aids. I had an older version  

of this in high school and I used to wear it under  my school uniform so I could listen to music from   my iPod in class. I felt so cool that I could  do that and the normal-hearing kids couldn't!  But to be honest I don't use it very  often now because the sound quality is   not very good and I have to charge  and carry around an extra device.   I usually reach for some nice  consumer wireless earbuds instead. 

The difference comes down to power consumption.  It takes a lot of bandwidth and therefore a   lot of power to transmit high-quality  audio wirelessly over a long distance.   That's why earbuds like these sound great but  only last a few hours. Hearing devices are   supposed to last a few days on a charge, so they  can't use full-power Bluetooth. Instead, there's   a dongle like this that talks to the phone over  Bluetooth but uses a lower-quality, shorter-range   proprietary protocol to talk to the hearing aids. These are low-volume accessories, so they tend to  

be expensive and don't keep up with consumer  products in terms of quality or features.   Now, there are some newer hearing aids that can  connect directly to certain models of smartphone,   so users with that phone don't have to carry an  extra dongle. The new Bluetooth Low Energy Audio   standard, which is part of Bluetooth 5.2 and was  developed in collaboration with the hearing aid  

industry, should bring that direct connection  feature to most new hearing aids, smartphones,   and laptops over the next year or two. The  standard includes a more efficient codec, so it   can transmit sound with higher quality and lower  bandwidth, which means it consumes less power   and can be built directly into hearing  devices. It will be the same standard used   in the next generation of wireless earbuds, so  I'm optimistic there will be good compatibility.

Another popular streaming accessory,  especially since everyone's been stuck at home,   is the TV streamer. It bypasses TV speakers  to pipe sound directly to the listener's ears.   These are useful in households where some  people have hearing loss and some don't,   since they mean no more fighting over the volume  control. Every major hearing aid company makes   a proprietary accessory that will stream to  their hearing aids. Like the Bluetooth dongles,   these accessories vary in quality. Personally, I  use a dedicated set of analog wireless headphones.   I love being able to walk around the house  and do chores while I'm wearing them.  

According to the industry, we can expect the new  Bluetooth standard to be built into TV and stereo   systems as well, so any number of hearing aids  and consumer earbuds can connect to them directly.   If it's implemented well, that  should be really convenient.   You could tune your earbuds or hearing aids  to one of the TVs at the gym, for example.   One downside of the TV streamer, especially the  big headphones I use, is that I can't listen to   a show and also have a conversation with someone  else in the room. I'll say more on that later. This next gadget seems to be much less popular  among adults but is widely used in K-12 classroom   settings: A remote microphone. These are  for one-on-one conversations with another  

person in the same room. The microphone is  worn by a teacher or conversation partner   and it transmits sound to the hearing device. Remote microphones have a few advantages   over hearing devices alone.  Because the microphone is far away,   there's less risk of feedback when the gain is  turned up. In noisy environments like restaurants,   they provide a much better signal-to-noise  ratio because they're closer to the talker. They  

also pick up much less reverberation, which can  improve intelligibility in large, echoey rooms.   Essentially, they bring the  talker closer to the listener.   They also work well with face masks, so they're  a great tool to have on hand during the pandemic. Now, as I mentioned, remote microphones  don't seem to be very popular among adults.   One reason is that they're  expensive proprietary accessories.  

Because the existing Bluetooth audio  standard has such high power consumption   and high latency, every company uses  their own custom wireless protocol,   and some work better than others. I measured  one commercial remote microphone accessory   with more than 40 milliseconds of latency which, [echo effect] as you can tell from listening to   me now, can be very annoying. [echo effect] That much delay   also makes it hard to speak. The new Bluetooth standard is  

also supposed to have lower latency, so I hope  that it makes its way to remote microphones.   If it does, we might see accessories that work  with hearing devices from multiple companies.   Third-party devices could have lower prices,  higher quality, more up-to-date specs, and more   innovative new features than first-party products. Remote microphones are also inconvenient to carry   around and they require self-identifying as a  person with a disability. It's one thing for a   parent or audiologist to ask a K-12 teacher to use  assistive technology in class, but for adults, not   everyone is willing to ask a business partner or  a date to put a gadget around their neck. That's   even harder during the pandemic because you'd  have to wipe it down before handing it to them.

The last category of assistive listening  technology I want to talk about   is broadcast systems, which are found in  public venues like churches and theaters.   A transmitter is connected to the venue's sound  system, so the signal could be a single talker   at a podium, a mix of microphones from a  panel discussion or theater performance,   or pre-recorded sound - whatever is  being played over the loudspeaker system.   The signal is received by either a dedicated  headset or the listener's hearing device. In the United States, public venues are required  to provide headsets for users to check out,   and most use FM or IR systems.  There's a lot of variability   between venues in how well these systems  are implemented and how popular they are.  

I talked to one local venue that has a lot of  older customers and uses them all the time,   and another one that has a set but hasn't checked  out a headset in over a decade. Personally,   I love using the headsets when I see musicals in  Chicago because the assistive listening system   gets a special sound mix that emphasizes the  vocals over the instruments, so I at least have   a chance of understanding the lyrics. But for  a stand-up comedy show, I usually won't bother.  Headset systems have the same problem that  users have to self-identify and take extra   steps. You have to find out where to  get it, fill out a form, leave an ID,   figure out how to use it, and remember to bring  it back later when everyone's rushing out.

In Europe, it's more common to find induction  loops, which are special wires built into a room   that transmit sound directly to the telecoils  in compatible hearing devices. The advantage of   hearing loops is that users don't need to take  any extra steps beyond pushing a button on the   device they're already wearing. The downside is  that induction loops are difficult and expensive   to install, especially in older buildings. I  know some accessibility advocates have very   strong feelings about telecoil systems, but from a  technological perspective, they're kind of a pain. The newest trend in assistive listening  systems is Wi-Fi. Users download an app or  

visit a webpage and stream sound over the network  using their own headphones or hearing devices.   If you're a venue owner and you buy a new FM or  IR headset today to meet regulatory requirements,   chances are it will come with Wi-Fi built in too.  Wi-Fi systems combine some of the advantages of   headsets and induction loops: They're cheap  and easy to install, and listeners can use   devices they already have. But of course the  user has to be comfortable using a smartphone,   and if the network doesn't work well,  there could be a noticeable delay.  A closely related technology is online  broadcasting, which became popular among   churches and other live event venues during  the pandemic. People would gather together in   a parking lot while staying in their own cars  and stream the audio over their smartphones.

According to the Bluetooth Special Interest  Group, the new Bluetooth Low Energy Audio standard   is expected to replace induction loops and other  broadcast systems over the next decade or two.   I think that will depend on how  well the standard is implemented,   but if it works well it could make assistive  listening systems much more accessible,   and we might start to see them in venues  that aren't legally required to have them.   Since consumer devices will use the same  protocol, Bluetooth transmitters might   come standard in every new sound system,  so you can tune into the playlist at the   coffee shop or the announcements at the airport  just as easily as the broadcast at the theater.

The systems I've mentioned can be  categorized as one-to-one communication,   like remote microphones, and one-to-many  broadcasts, like induction loops and FM headsets.   But what if I'm at a dinner or a business  meeting talking with multiple people at once,   or if I want to listen to the TV while  also talking with my partner at home?   What about class discussions where students want  to hear each other as well as the teacher? I see   many-to-one and many-to-many systems as the next  big growth area in assistive listening technology.   We're starting to see a few products that can do  this, either using multiple individual microphones   or tabletop microphone arrays. From a technology  standpoint, it's not too difficult. For example,   I've tried using this pair of wireless  microphones with two conversation partners.   The receiver is designed to connect to a video  camera for shooting two-person interviews,   but you can just as easily hook it up to a  pair of headphones. These mics sound great,  

in part because they're big and heavy and chew  through batteries, but multiple-talker wireless   systems, both official ones from hearing device  companies and these do-it-yourself solutions,   mean carrying around a lot of extra equipment,  and it's harder to provide a good listening   experience when there's more than one talker.  For example, these mics can route one talker   to the left ear and one talker to the right ear,  or they can mix them both together in both ears,   neither of which really sounds like I'm  talking to two people in the same room.  Let's talk about what the next generation of  assistive listening technology might look like   and what we can learn from the new audio  tech developed during the pandemic. New device technologies, wireless standards, and  signal processing algorithms should lead to a lot   of innovation in assistive listening systems over  the next few years. It's useful to think about  

what kind of systems we'd build if we no longer  were constrained by size, power, or bandwidth.   Listening systems have two main design criteria,  which I'll call immersion and enhancement. An immersive system sounds real and makes  me forget I'm using any technology at all.   It should preserve the room  acoustics and spatial cues,   so if you're standing on my left, it sounds like  you're on my left, and if you're facing away from   me and mumbling into a wall, it sounds like you're  facing away from me and mumbling into a wall.   The gaming industry is pouring resources into  algorithms that can simulate room acoustics and   head-related transfer functions to provide  better immersion in virtual reality audio.  

Hearing devices get immersion for free because  they have microphones at the ears already.   The sound picked up by microphones at the ears  already has the acoustic effects of the room   and at least some of the  acoustic effects of the head. Enhancement is about making sound easier to  understand, for example by reducing noise and   reverberation. Hearing aids have a hard time doing  that because their microphones pick up noisy,   reverberant sound. Remote microphones can do  better because they're closer to the talker.   Remote meetings, like the one we're having  now, are another great example of enhancement.  

You're hearing my voice through a high-quality  microphone in a quiet room, carefully processed   in software and then delivered to your  headphones. When everyone has good hardware   and a good connection, I find it much easier to  hear in remote meetings than in-person meetings,   but there's no sense of immersion:  You're hearing me from inside your head. Sometimes there's a trade-off  between immersion and enhancement.   For example, if you're facing away from me  and mumbling into the wall, I probably don't   want realism; I want to enhance your voice so I  can understand it. But often we can have both.  For example, most remote microphones  transmit the same sound to both ears,   so listeners have no sense of where it's coming  from. But if we have a high-quality, low-latency  

wireless connection and decent processing power,  then we can match the acoustic cues of the   low-noise remote signal to the cues at the ears. Here's a demonstration using a hard-wired lapel   microphone in our laboratory. I read a passage  while walking around some speakers playing speech   recordings. The sound in the video is matched to  the ears of a dummy head, so if you're listening   through headphones, you should be able to track my  direction as I move back and forth. First you'll   hear the noisy mixture through the ears, then the  unprocessed remote microphone signal, and finally   the enhanced signal. Listen for the spatial  cues and the changes in spectral coloration. [Overlapping speech] ... with its path high above and its  two ends apparently beyond the horizon.   There is, according to legend, a boiling pot  of gold at one end. People look but no one  

ever finds it. When a man looks for something  beyond his reach, his friends say he is looking   for the pot of gold at the end of the rainbow.  Throughout the centuries men have explained the   rainbow in various ways. Some have accepted  it as a miracle without physical explanation.   The Greeks used to imagine that it was a sign  from the gods to foretell war or heavy rain.

That demo only used one remote microphone, but  it would work just as well with several. If we   had a microphone on each talker, we could  do more sophisticated types of processing   than we can with hearing devices today. We  could apply different gain, equalization,   and compression to each talker, just like  mixing artists do in recording studios.  Of course, when we start interacting in person  again, there's a third criterion to think about:   convenience. I personally wouldn't mind carrying  around a case full of microphones and clipping  

them on everyone I meet, but I think I'm an  exception. People are already reluctant to do   that with a single remote microphone. Ideally, we  wouldn't need to carry any extra devices at all.   We could just walk into a room, our devices  would enhance the things we want or need to hear,   and the whole experience would be  so immersive we would hardly notice. 

It sounds like a fantasy, but a lot  of the technology is already there,   thanks in no small part to the pandemic. You  see, now that workplaces are starting to reopen,   the tech industry has decided  that the future of work is hybrid.   There will be some employees in the  office and some working remotely,   and they'll want to have meetings with each other.  Now I don't know whether they're right about that,   but it means the industry is investing  heavily in audio hardware for hybrid meetings.   Remote participants need to be able to hear  everyone clearly and captioning software needs   to know who said what, but in-person participants  won't all be wearing their own microphones.  The solution is to install microphone arrays,  which can capture high quality sound at a   distance, even if multiple people are talking  at once, and they can track people as they move   around the room. Microphone arrays were already  common in smart speakers and game systems,  

and now they're showing up in cell phones,  laptops, and conferencing equipment   to improve the quality of video calls.  They've even started installing them in   our local school district to  help with distance learning.  These systems are designed to capture sound for  remote participants, but there's no technological   reason why they couldn't also connect to the  hearing devices worn by in-person participants.   That way, people with hearing loss could  benefit from this expensive new infrastructure   to hear better in classrooms and meeting rooms.  Especially with the new Bluetooth standard, it  

would be an easy feature to add. Tech companies,  if you're listening, please make it happen! These new hybrid conferencing devices are powered  by microphone arrays. As it happens, microphone   arrays were the topic of my dissertation,  so get ready to learn all about them.

As the name suggests, a microphone array is a set  of microphones that are spread apart from each   other in space. Unlike single microphones, arrays  can process sound spatially. Propagating sound   waves will each reach different microphones at  different times depending on their direction. We   can process and then combine the signals captured  by the different microphones so that sounds from   one direction interfere constructively and get  louder and sounds from other directions interfere   destructively and get quieter. That geometric  version of array processing is called beamforming,   because we can think of capturing a beam of sound  from a certain direction. To do beamforming,  

we need to know how far apart the microphones are  and where they are relative to the sound sources. There's another way to think of array processing  which is often called source separation.   In this interpretation, there's a system of  equations that describes how sound propagates   from each source to each microphone. If we have  more microphones than sources, then we have more   equations than unknowns, and we can solve for the  original sources. If we're in a reverberant room   where the sound bounces around, those equations  get very complicated, so we need to calibrate the   array somehow, and that's been a major problem  for signal processing engineers for decades.   In general, the performance of a  microphone array depends on its size,   both the number of microphones and  the area they cover. Just like lenses,  

larger arrays can create narrower beams, and when  we have plenty of microphones spread far apart   we can make the array more robust  against noise and reverberation. When designing microphone array processing for  listening devices, we have to be especially   careful if we want them to be immersive. A basic  beamformer like the kind used in a smart speaker   would distort the spatial cues of everything that  isn't in the direction of the beam, so it sounds   like being in a tunnel. We can design beamformers  that don't have the tunnel effect, but they also   don't reduce noise as much. Just like with  assistive listening systems, sometimes there's  

a trade-off between enhancement and immersion.  If we want both, we need to add more microphones. Most high-end hearing aids have two or three  microphones per ear, and they're right next to   each other, so they can only do a little bit of  directional processing without causing perceptible   distortion. It can help somewhat if most of the  noise is behind the listener, but it's really   no use at all in a very crowded environment.  In our lab, we've been designing prototypes of   larger wearable microphone arrays that have  dozens of mics spread across the body. Our  

most iconic prototype is the "Sombrearo" which has  microphones spread around the brim of a large hat.  Our engineering students have built a  few functional prototypes over the years.   Microphones around the torso are especially  helpful because the torso is acoustically dense,   and the microphones can be hidden  under most types of clothing.  We also brought in some design students to imagine  more aesthetically pleasing wearable arrays.  

With these larger arrays, we can design listening  systems that separate process and recombine sounds   from multiple sources, doing real-time remixing  while preserving spatial cues and room acoustics.   But they still have the issue of calibration. We  need to learn where the sources are in the room   and the acoustic paths that sound takes  from each source to each microphone.   Even with large wearable arrays,  that's a daunting problem. Remote microphones can help. Even if  they have poor bandwidth and large delay,  

remote microphones still have a good  signal-to-noise ratio and low reverberation,   so we can use them as pilot signals to calibrate  a beamformer. Here is the remote microphone demo   from before, but this time, instead of listening  to the processed remote microphone signal you'll   hear a binaural beamformer from a 14 microphone  wearable array that tracks the moving talker. [Overlapping speech] ... with its path high above and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at   one end. People look, but no one ever finds it.  When a man looks for something beyond his reach,   his friends say he is looking for the  pot of gold at the end of the rainbow.   Throughout the centuries, men have  explained the rainbow in various ways.   Some have accepted it as a miracle  without physical explanation.  

The Greeks used to imagine that it was a sign  from the gods to foretell war or heavy rain. Now, what if we went even bigger and  filled an entire room with microphones?   Well, we did that. This is the largest conference  room in our building, with glass walls and lots of   reverberation. We set up a simulated party with  10 loudspeaker "talkers" all talking at once,  

four mannequin "listeners" wearing microphone  arrays, and a dozen tabletop array devices   designed to look like smart speakers. There  were a total of 160 microphones in the room.  Now, no portable listening device is going to  be able to process 160 channels in real time,   so we designed a hybrid system. The distributed  array was used to locate the sources,   learn the room acoustics, and calibrate  the wearable arrays, and then the wearable   devices were used for real-time processing. In this demo, you'll be listening through   the ears of this mannequin in the corner as she  tries to listen to the talker next to her. First  

you'll hear the talker alone, then the noisy  mixture, and finally the processed signal.   This is another binaural beamformer,  so listen for the spatial cues. ...seven years as a journalist. We must provide  a long-term solution to tackle this attitude.   Then, suddenly, they weren't. [Overlapping speech] ...seven years as a journalist. We must provide  a long-term solution to tackle this attitude.   Then, suddenly, they weren't. So at this point you might be  thinking, "This guy is crazy!   No one's going to cover their body in  microphones, much less an entire room." 

Well, that's true. This is mostly an academic  exercise to show what could be done with a really   extreme system. But if you think about it, we're  already surrounding ourselves with microphones.  On my body, there are microphones in my  hearing aids, my watch, and my phone,   and it's only a matter of time before  augmented reality glasses finally catch on.  

Looking around my living room, I count at least 30  microphones between smart speakers, game systems,   computers, and other gadgets. And thanks to remote  and hybrid meetings, large network-connected   microphone arrays are being installed in  classrooms and offices all around the world.   What if listening technology could tap into all  these microphones that are already all around us?   What if when I walked into a meeting my  hearing aids picked up sound from everyone   else's hearing aids, phones, computers, and  from arrays installed in the walls and ceiling?   With the right signal processing, my hearing  aids could pick and choose what I should hear   and process it to have the right spatial cues and  room acoustics. I wouldn't have to think about it.  And it wouldn't just work in offices. Imagine  sitting down to dinner at a restaurant and hearing  

only the people at your own table while turning  down everyone else. Imagine sitting in a classroom   and hearing the shy quiet kid in the back just  as clearly as the loud kid right next to you.   Microphone array processing can make it possible. Now, this utopian future for listening  technology is still a ways off.  

The new Bluetooth standard can  do a lot, but it can't do this.   Gathering sound from every device in a room  will require new standards and protocols,   and buy-in from the whole tech industry. There are  also obvious privacy concerns that would need to   be addressed before it could be used in public  spaces, and there are many signal processing   challenges in making sense of data from so many  different kinds of device. Our group is working   to address some of those challenges, which I'll  tell you about in the final part of this talk.

But first, let's take a short  break from microphone arrays to   hear about an unexpected connection  between hearing aids and COVID-19. I've talked a lot about how COVID-19  has influenced hearing technology,   but did you know that a hearing aid signal  processing technique can help with COVID-19?   Early in the pandemic, there were  widespread fears of a ventilator shortage,   so a team of engineers here at the University  of Illinois came together to design a low-cost   emergency ventilator that could be rapidly  produced. Our research group helped design   the alarm system that alerts clinicians  if a patient stops breathing normally.   We wanted to find an algorithm that could run on  any processor, even the smallest microcontrollers.  The level tracking algorithm used for dynamic  range compression in hearing aids was perfect.   It can follow peaks in a waveform with adjustable  time constants, but it requires almost no memory   and just a few computations per sample. We  adapted the algorithm to track the depth and  

duration of breath cycles. It sounds an alarm  if breaths are too fast too slow or too shallow.  The ventilator design has been  licensed by more than 60 organizations   and the alarm system is available as an  open-source hardware and software design. Now, back to our regularly scheduled  microphone array programming In most of the microphone arrays in use today,  all the mics are arranged within a single device.   They're connected to each other by wires so  they can share all their data instantaneously,   they all have a common sample clock, and they  have known positions relative to each other.   Remember, microphone array processing relies on  precise timing differences between microphones,   so it typically requires perfect synchronization  and known geometry. But if we want to do really   powerful spatial processing, if we want to handle  noisy crowded spaces with dozens of sound sources,   we need our array to span the whole room.  Typically, that means we need to combine  

microphones from multiple devices. These are  called ad hoc arrays or distributed arrays,   and they're more challenging than  conventional single-device arrays.   We don't always know where the  microphones are relative to each other,   so we can't easily translate timing differences to  spatial directions. Worse, the devices might not   be synchronized, so we can't tell exactly what  those timing differences are, and the sample   clocks of different devices might drift over time.  If the devices are wireless, like most wearables,  

they probably have limited bandwidth, relatively  high latency, and intermittent dropouts. With   ad hoc arrays, therefore, we might not be able to  use all the microphones together for beamforming. Fortunately, most devices today have more than  one microphone. That means our ad hoc array isn't  

composed of individual microphones, but of smaller  conventional microphone arrays. The sensors within   each device are synchronized with each other and  can be used for real-time spatial processing,   and the devices can share other information with  each other to improve performance, even if they   can't perform large-scale beamforming in real  time. For example, the devices can work together   to decide which sound sources are where and what  their frequency spectra are like, or to track   who's talking when. Then each device can adjust  its local processing using those parameters.   We call this cooperative processing. The large conference room demo was a good example:   The smart speakers are in fixed locations,  so they can locate the talkers and relay that   information to the wearable devices, which  then do real-time listening enhancement. 

The beamformer that tracked the  moving talker is another example:   The remote microphone signal was  delayed by about 100 milliseconds,   so it couldn't be used for real-time  listening, but it could be used for tracking. Some of the most useful microphones in  an ad hoc array are in wearable devices.   Wearables provide low-noise reference  signals for speech from their wearers   and they follow them as they move, but it's  hard to combine wearables into an ad hoc array   because they move constantly. When we tested our  wearable arrays on mannequins, they worked great,   but when we tried them on live humans,  the high-frequency performance plummeted.  

That's because humans are always moving even  when they're trying to stand perfectly still.   At higher audible frequencies, a microphone  on my chest will move by multiple wavelengths   relative to a microphone on my ears every time I  take a breath. However, if we explicitly account   for that motion when we design a beamformer,  we can at least partially compensate for it.  

This next demo is a beamformer using a  wearable microphone array on a moving subject.  Now, for the algorithm to work the  motion has to be fairly predictable   and repetitive, so enjoy the "Mic-Array-Na". [Overlapping speech] We are not aware of any British casualties  at this stage. There isn't the seriousness   of other businesses. It is about control  of our economy. There was a rush of water. The rainbow is a division of white  light into many beautiful colors.   These take the shape of a long, round arch with its path high above and its two ends apparently beyond the horizon.

Please call Stella. Ask her to bring  these things with her from the store:   six spoons of fresh snow peas, five thick slabs  of blue cheese, and maybe a sack for her brother. One of the advantages of distributed listening  systems and microphone arrays over conventional   hearing devices is that they can apply  different processing to different sound sources.   Perhaps I want to amplify one person more than  another, or apply different spectral shaping to   speech versus music. That independent processing  could be especially important for nonlinear  

processing like dynamic range compression. Compression is used to keep sound at a   comfortable level by amplifying quiet  sounds and attenuating loud sounds,   but it's known to cause unwanted distortion  when there are multiple overlapping sounds. For example, if I'm talking quietly and  there is a sudden loud noise, then the   compressor will turn down the gain applied to  all sounds, and my voice will also get quieter. If we compress different sounds  independently before mixing them   together, that distortion doesn't happen.  

It remains an open question, however, whether that  kind of processing would make it easier to hear. In fact, many of the open research  questions in this area are better answered   by hearing scientists rather than engineers. If we could perfectly separate and recombine   every sound in the listener's environment, how  should we do it? How many sounds can a person   pay attention to at once? Does that depend  on age, on hearing ability, on noise level?   How do we know what sound the listener wants to  hear at any given time? A lot of engineers tend to   assume that we want to hear what's in front of us,  but sounds from behind are also important since   they alert us to things we can't see. When we have  to decide between immersion and enhancement, how   do we make that trade-off? As signal processing  advances unlock new types of listening technology,   we'll need to work with hearing scientists and  with users themselves to understand how to use it.

I got into this field because I want to  make hearing aids work better, but these new   listening technologies could help everyone hear  better whether or not they have hearing loss.   Large microphone arrays and  distributed sensor networks   could be used for augmented reality, media  production, surveillance, and much more.   They could let people hear things we couldn't  otherwise, giving us superhuman perception. I often like to explain listening  technology by analogy to vision. 

A conventional hearing aid is like a contact lens.  It's intended to restore normal sensory ability,   it's designed to be invisible and worn all  day, and it's no larger than the sense organ   that it sits on. That means it has access  to the same information our senses do.  Just as a contact lens will never let  us see a microbe or a distant planet,   a hearing aid will never let us have a quiet  conversation in a crowded convention center   or hear a mouse from across a busy  street. Instead of a contact lens,   I want to build the hearing equivalent  of a telescope or a microscope: a large,   situational device that we can't wear all day, but  that lets us sense things we normally couldn't.

The last year has been challenging for  everyone, but especially for the hearing   loss community. We suddenly had to learn  how to manage without seeing people's lips   and how to hold video meetings  with spotty support for captions.   We had to deal with shortages of not just  toilet paper but also karaoke systems. There were some bright spots, of course. The world  got a lot quieter, at least for a little while,   and we got to watch normal-hearing people struggle   with the sudden loss of high frequency  speech sounds and learn how to speak up.

Perhaps most importantly, the sudden shift  to remote work led to a new focus on audio   capture and processing technology. Suddenly  everyone was talking to each other through   microphones and speakers just like we  always have. Thanks to the pandemic,   there are a lot more microphones in  the world than there were a year ago,   and most of them are networked. That progress  coincides with technological shifts like the   new Bluetooth Low Energy standard and market  trends like direct-to-consumer hearing devices   that were already poised to shake up  the landscape for listening technology. If the tech industry makes  hearing accessibility a priority,   then as we start to gather in person again,  we can leverage all those new microphones   and new wireless technologies to power the next  generation of listening devices, both for people   with hearing loss and for everyone else who  wants to hear things they couldn't hear before.  That way, the technology we've developed to  keep us apart can help bring us back together.

I'd like to thank my team at the  University of Illinois as well as   the agencies and companies that have supported  our research. To learn more about our work,   please visit the Illinois Augmented Listening  Laboratory website. If you'd like to get in touch,   you can find me on Linkedin, YouTube, and  Twitter @ryanmcorey. I'm always eager to talk   with audiologists and listening technology  users, so I would love to hear from you.

2021-08-17

Show video