Digital sound processing in hearables. Know the present & foresee the future.
[Music] Dear organizers! Dear guests! It's my honor and my pleasure to be here today and give you my vision of Sound Signal Processing in Hearables, where software is the key. And I will be helped by Simon Cheung, who is our general manager of BeHear and Hearing Enhancement technologies in China. Hearing is probably the most important human sense and we only realize the importance of hearing when we lose it.
Helen Keller, the famous woman who was the first deaf-blind person receiving an academic degree in 1905, she once said that blindness disconnects us from things, but deafness disconnects us from people. And we believe that Hearables can help us to communicate between people and to have a better life. So... But what are Hearables? How do we define these devices? First of all, these are wearable electronic devices, that provide the full Bluetooth audio connectivity. But that's not enough. They need to do something else, something more to be called Hearables.
So Hearables can definitely enhance our hearing, make it better and even make it superhuman. Potentially Hearables can allow us to hear frequencies that are beyond a human auditory range, hearing like dogs or dolphins. They can provide hearing protection that will save our hearing from too loud ambient noises.
They can do real-time language translation so we don't have a problem communicating with others while traveling. They can do a real-time personal assistant services using the same СhatGPT services or similar. They can provide us with location-based information, for example, based on this latest Bluetooth Auracast technologies, so we can be calm and in the airport we will know that our plane starts boarding.
They can provide us with health monitoring and alerts, timely alerts that something wrong is happening. And can even help us to correct speech disorders and we will talk about it. What is common about all these functionalities? They are all about sounds, about hearing. They communicates with us through sounds, through our auditory system and that's why they called Hearables.
So what is the signal processing task that happened in Hearables? First we can divide it on three categories. We have outgoing sounds which is basically acquisition of user's voice or also binaural recording. We have incoming signal processing which is for voice enhancement and streaming audio enhancement. And we have loop type of processing where ambient sounds are acquired, processed and played back reinforced.
For example, for hearing aids, for user's own voice processing or for active noise cancellation. So now let's talk about this categories. And for the outgoing voice acquisition, of course, beamforming today is the must-have technology. It's used in almost all wearable headsets or true wireless earbuds. And beamforming utilized the time of arrival difference on two or more microphones in Hearables or a headset. And by utilizing that we can produce different polar patterns.
It can be fixed beamforming with one specific polar pattern or it can be adaptive so it will adapt the beamforming according to the situation. Beamforming is a wonderful technology, but it has two main drawbacks. First, it cannot distinguish between sounds from user's mouth or sounds coming from front direction because the time of arrival on the microphones will be the same.
And it's not very efficient at low frequencies because of the small distance between the microphones. So what can we do? We can use microphones on two sides of the head and that will greatly improve the performance at low frequencies You are encouraged to come to our booth and see demonstration of such technology in real time. It still has the same drawbacks that it's difficult to distinguish between sounds from the User and noises or voice destructor coming from front So what can we do to improve this? We can add the vibration sensor or VPU to our headset or our Hearables or true wireless device and then we will combine the advantages of two signals. We have the vibration signal that picks up vibrations called by user's voice through the bone conduction or cartilage vibration, tissue vibration and the beamforming. And what we see is that we use the spectrum of bone conducted signal in low frequencies and beamforming in high frequencies.
Because bone conduction signal is not efficient at high frequencies. It is just no bone conduction in that frequencies. And beamforming is the opposite. It's not efficient at low frequencies and very efficient at high frequencies. So, of course, we need smart spectral mixer technology because the division between this spectra is not a continuous. It may change depending on the users.
It may change depending on the mechanical contact between the sensor and the tissue. So this smart spectral mixer need to analyze the signals, analyze the signal to noise ratio, this boundary between the best spectra and then mix the signals together. We still need noise reduction after all this processing and we can use a classical noise reduction, which is sometimes called the spectral subtraction. Basically, the idea is that speech is non-stationary while noises are quasi-stationary. So we do some noise estimation at the beginning of the speech or in the poses of the speech and then we subtract the background noise from the signal, thus receiving much cleaner signal. There are advantages of this technology which requires very low resources, gives predictable results, it cannot spoil the speech without a reason and high frequency resolution so you can really clean the signal between the pitch harmonics.
The drawbacks of this technology is relatively long adaptation time between half and five seconds, depending on the technology. It works on quasi-stationary noises, the continuous noises only and it may produce musical noise artifacts. Noise reduction with neural networks it's a hot topic today. There are many companies, many institutions working in this direction. And definitely it provides a lot of hope to improve the performance of noise reduction, more than we can do with classical approaches. Advantage of that is an almost instantaneous adaptation to noises.
It cancels non-stationary noises, a kind of music noises you know clicks, pops or these types of noises and it provides relatively smooth output without artifacts. The disadvantages of this technology specially for hearable devices is that sometimes results are unpredictable. You don't know what neural network will think about a speech or part of the speech in some situations. So you can get artifacts, you can get some parts of the speech missing which is a big drawback if you want to use it, for example, to communicate with a computer speech recognition technology.
It still requires much larger resources compared to classic noise reduction, which is the battery power problem and when you reduce the size of the network, it provides relatively low frequency resolution, that really at high noises may create unpleasant like harsh sounds. The good news about neural networks is that it provide the reverberation capabilities. And the reverberation is a big problem for speech intelligibility for people with normal hearing, but also for people with hearing impairment.
And neural network may be very efficient in that. And I would like to ask to play two sounds here: the original reverberant speech and the speech processed by a dereverberation network [sound example #1] [sound example #2] Now let's talk about sound personalization in Hearables. So we can personalize sound.
Because we're all different, we are at different environments, we don't need to hear the same sounds. We may personalize sound for our hearing, we may personalize sound for our preferences, and we may personalize sound for our environment. So how do we personalize sound for our hearing? We need to measure our hearing and that means we need to conduct some kind of hearing test. And we don't need to go to hearing professional. We can do it using our smartphone and our Hearables. So we are running some hearing test application, we will start hearing tones, we will find our hearing threshold which means the minimum loudness of a tone that we can still hear.
After that running it through several tons on both ears we will get our profile or our audiogram, we will upload this profile to our device and we will start hearing different. And we will start hearing differently, we will start hearing sound optimal to us for all device uses: for hearing and conversation enhancement, audio and voice streaming and voice communication All sounds of the device may be personalized. So how do we do personalization? It's the technology that is used in the majority of the hearing aids and it's called multi-channel wide dynamic range compression which actually means dividing the sound on different frequency channels and doing compression in each channel separately.
This example gives you the typical compression scheme for a high tone frequency loss where we can see that high frequencies of low level are amplified more than low frequencies and this corresponds to high tone frequency loss. Now we may optimize sound not only for personal hearing. We may optimize for our environment. Our environment is not stationary. We may be in quiet environment or we may be in noisy environment. That can change dynamically.
So how can we do this? We actually can measure the noise using the same microphone used in our device for communication or other purposes. And we can modify the sound that we hear according to the ambient noise. Not only the loudness of the noise but also the spectrum of the noise. This is an example. So the case A corresponds to us listening for music in a quiet environment.
The noise spectrum is quite low. We hear all the music in full. Now we let's say go out and we have the case B where the music remains the same but noise becomes much stronger and at some frequencies starts masking the audio content. And case C when we monitor the change in the noise and we do noise dependent frequency equalization of the audio content so it remains comfortable to us. We can also do personalization for our preferences.
For example, when we speak over a phone and someone speaks too fast to us, We can ask the other person to speak slower. But we can also activate the technology that will slow down the incoming speech to us automatically. So we will hear the other person speaking slower in real time. And again I would like to ask to play two signals: [Sound example #1] [Sound example #2] And this is a real-time technology that you can experience also in our booth.
Very nice demo that you will enjoy. This slide explains the personalization for the listening, for the watching for sound when we watch a movie or watch a TV show with some background effects that are uncomfortable to us: too loud, we don't like them or we are in a mobile environment, we want to concentrate on the dialogue. So we can do it automatically by reducing the background effects.
3D audio and head tracking. This is also a hot topic today. And basically the idea is that when we have some sound image, we have some instruments or some voices, coming from specific directions and when we turn our head, we would like to preserve the original direction of the sound. So it's like a normal natural environment. Today it's done with the accelerometer and Gyro to detect the movement of the head, to compute the corresponding head-related transfer function and modify the audio accordingly. This is today. In future, I believe, we will also use the speed, direction and location. Because having it just based on the head tracking is not enough.
Sometimes it's confusing. We need more information to have this technology really widely usable and enjoyable Hearing enhancement is one of the most important health related functions in Hearables. So our devices, our Hearables integrate all the technologies that are used today in modern hearing aids which is acoustic beamforming, acoustic feedback cancellation. noise reduction and hearing personalization, that we discussed. All must be, of course, with the very low latency. Below 10 milliseconds is good, below 6 millisecond is best with a big challenge.
One of the major complaints of hearing aid users is hearing own voice amplified. And it's a big problem and big complaint. We can use the same accelerometer, the same bone conduction sensor, that we use for communication, for picking up user's voice.
We can use it to remove, to reduce user's voice and thus removing the complaint in hearing aid applications. One more complaint for headphones or earbuds with sealed design is the occlusion effect which is called sometimes barrel effect. So it's like to have a barrel put on your head. And it's caused by the voice transmitted by the Eustachian tube to the inner ear and trapped because it doesn't have a way to escape the ear.
And the traditional solutions for that include active noise cancellation, using an open design, using a large vent in our device. And today we can also use a dynamic vent that we will speak about. Using the dynamic vent means that we have a vent that can change its diameter automatically. This solution was first offered by Phonak which is a big hearing aid company, part of Sonova group, for their very high-end hearing aids.
Today there is a company called xMEMS, and they offer it for consumer electronics. It's called Dynamic Vent. And Dynamic Vent basically it's a chip that has an opening that you can control through I2C interface. And you can control it dynamically from DSP or MCU depending on the usage. For example, when I need to listen for a music with the high low frequency sound, strong low frequency sound, I can close the vent.
When I am in transparency mode, I can open the vent. So it can be done dynamically and very fast. I hope next time when I'm here I will be able to bring you a demo.
We are cooperating about that, because this technology to be efficient requires DSP Sound Processing algorithm to control the vent. The last technology I would like to talk about today is called Altered Auditory Effect and it's supposed to help people to correct their speech. This is for people with speech disorders such as stuttering.
You know that worldwide there are more than 70 million people who stutter, which prevents them from communicating. And Altered Auditory Effect can reduce stuttering in about 80 percent of people. What is Altered Auditory Effect? This is when person hears his voice with a slight delay but also with a pitch shift. It's like another person talking simultaneously to me and it creates a kind of, you know, chorus effect Stutterers, people who stutter, can sing perfectly in choirs, but they cannot speak alone. So we are creating an impression that there is another person or people speaking to us simultaneously.
We can do it in Hearables. You are invited also to come to our booth and experience this technology yourself either you stutter or not. Doesn't matter. You will enjoy it. That's all the technologies that I wanted to talk about today. It's a small part of what can be done in Hearables to improve people's life.
So thank you very much. It's a pleasure to be here. [Applause] [Music]