Future-Ready Tech: AI and accessibility

Show Video

[ Music ] Marco Casalaina: Hello, and welcome to this session on Future Ready Tech AI and Accessibility. I'm Marco Casalaina, VP Products of Core AI at Microsoft and AI Futurist, and today, we're going to look at how these new breakthroughs in AI that are now available on the Azure AI platform can make the world a more accessible place for everyone. Whether you have a speech disability or a hearing disability, a visual disability, or you're neurodivergent, this technology can work for you. Let's start with speech.

One of the most amazing uses of AI today is to give a voice to those who are unable to use their voice, and our partner, Tobii Dynavox, an assistive communications company based in Stockholm, Sweden, is doing just that. Here to give us a look at their technology are two employees of Tobii Dynavox, Kenta Bahn and Victor Kaiser. Kenta Bahn: Thanks, Marco.

I'm Kenta Bahn, and I'm the Customer Service Manager at Tobii Dynavox. I'm here today with my colleague, Victor Kaiser, and we're going to talk about how our company uses AI, among other technology, in solutions for people with communication disabilities. First, a bit about Tobii Dynavox. We're an assistive communication company, and our vision is a world where everyone can communicate. We make solutions for people with conditions like cerebral palsy, ALS, and autism. In other words, for people who find it difficult to speak, use their hands, or both, and since self-expression and a person's voice are a huge part of their identity, we understand how important it is to use tech in an ethical way.

But don't take it from me. I'm going to hand it over to my colleague Victor now. He works with me in our support team and he does his job every day using one of our communication devices. Victor Kaiser: Thanks, Kenta. Hi, everyone.

I'm Victor. Like Kenta just said, I work at Tobii Dynavox as a tech support rep, helping our customers with their products. I don't mean to brag, but I'm pretty good at my job. One of the reasons is because I have cerebral palsy which makes it very difficult for me to speak or use my hands. I've been using my device 24/7 to communicate, work, and live my life for many years, so I can really help our customers with detailed questions.

Right now I'm using a TD I-Series communication device, which I control completely with my eyes. I even generate the synthetic voice you are hearing right now using just my eyes. My device lets me choose from many different types of synthetic voices, too, including the one I'm speaking with now, which is a Microsoft Neural Voice, and these synthetic voices use Azure AI to help them sound as natural as possible. Kenta Bahn: By the way, I just wanted to point out that Victor's device, the TD I-Series, operates on Windows IoT. This is a Windows operating system that Tobii Dynavox collaborated on with Microsoft to serve the needs of the people with accessibility challenges. For example, Windows IoT makes it possible for TD I-Series to offer eye-controlled access to the Office 365 suite, including, of course, Teams, which Victor is controlling with his eyes right now to make this presentation, and Windows IoT also makes it possible for his device to connect to the neural voice he's using to talk to you today.

Victor Kaiser: That's right, Kenta. It also lets me use Word and Outlook, among many of the other MS Office apps, and here's an interesting related side story. As a child with cerebral palsy, I couldn't learn to read and write like other kids, so I started by using symbols to read before I became fully literate. I'm telling this story because I want to point out that Word and Outlook have a great feature called "Immersive Reader," which allows a person who is struggling with reading to click on a word and see the related picture communication symbol. Kenta Bahn: Thanks for sharing that, Victor. Tobii Dynavox has created tens of thousands of picture communication symbols, or PCS, as people call them, and it's a great reminder to see how they are integrated within our immersive reader.

Going back to our previous topic, the thing that impresses me the most about the use of AI in our communication solutions is that it enables people to express themselves more naturally. Can you show us a couple of examples of how that works for you? Victor Kaiser: Sure. For starters, there are a couple of apps on my device called "TD Talk" and "TD Phone." TD Talk allows me to write phrases which I can then speak out loud at the click of a button.

Because I'm doing this with my eyes, it can take a bit of time, but AI speeds up the process by offering word and phrase prediction. In this example, I want to tell someone about my evening plans. I start by selecting the letter "I." Then it offers word prediction for the words "am" and "going," which I also select, and since TD Talk is getting to know me really well through AI, it knows that I am a hockey freak and accurately predicts what I want to say next.

All I have to do is accept the phrase prediction and hit the "Play" button to speak it out loud, "I am going to a hockey match tonight," and TD Phone lets me make mobile calls and send text messages with my eyes. Like TD Talk, it also has word and phrase prediction to help me speak on a call and send texts more quickly, and over time, these predictions get more accurate because AI learns and adapts to my personal speaking style. Kenta Bahn: Your last point about AI adapting to your communication style is kind of next level but also super important because it's what lets your colleagues to get to know the real Victor, and including spicy language and all. So, as for all voices, TD Talk and TD Phone together offer many different Microsoft neural voices to choose from. What's truly amazing is that these voices are available in over 50 languages, too.

Victor Kaiser: That really hits home for me, Kenta. Since my neural voice is available in a choice of languages, I can use it to switch between English and Swedish. Our head office is based here in Sweden, but we all use English as our official business language. Also, our customers contact me sometimes in English and sometimes in Swedish, so I need to switch quickly between the two. In this example, I start writing in English, then decide that I want to switch to Swedish, then I click the globe icon where I can make a selection to quickly change languages.

I also love learning languages outside of work, just for fun, so it's exciting to have so many to choose from. [ Non-English ] Kenta Bahn: That's a great point about needing to switch back and forth between languages, just like anyone else in a company. So Victor, you mentioned a number of AI-driven functionalities that makes communication more natural for you.

You said that one is word and phrase prediction, and the other is neural voices adapting to your voice speaking style, and being able to switch to in different languages. Is there anything else that would help you communicate in a more natural way? Victor Kaiser: Yes, there are many other functionalities, but since we're short on time, I'll just show you one of my personal favorites. It's called "See-Through Mode."

Before see-through mode, my device sometimes blocked the view of my speaking partner's face. Now I can speak face-to-face with someone by making my screen transparent while I use TD Talk. So, I speak faster with predictive text, I sound smoother with a neural voice, and the whole experience feels more natural because there are no barriers between me and the person I'm speaking with. Kenta Bahn: Thank you so much for sharing this, Victor.

What I love about this technology, including all these AI functionalities, is that it really demonstrates technology used for good. Victor Kaiser: Yes, that's true, Kenta. We are using AI to offer people with disabilities more choices and help them express their identity more easily. That is also good for society since it allows people like me to be active participants in their own lives. Marco Casalaina: Thank you, Kenta and Victor.

You folks are doing amazing work. Here at Microsoft, we are also using this speech synthesis technology for a different reason. Our Teams team has been hard at work on the new AI Live Interpreter, which, as of this recording, is in private preview. The AI Live Interpreter can translate your words in your voice into another language. Let's have a look at how that works. Hello, Bogdan (phonetic).

Okay, you can't see this, but I am going to turn on my AI interpreter here in Teams, and I'm going to set my interpretation settings so it translates to my language, English. Now, Bogdan, can you tell me a little bit about Fabric and what Fabric is doing with AI? [ Non-English ] Bogdan: Certainly. [ Non-English ] Fabric is a complete data analysis platform where you will find all Microsoft products for data analysis, starting from data integration to Power BI.

Additionally, Fabric is a platform where data is kept in an open format and can be used [Non-English] by your team to use AI technology. We can converse with our data in English or Romanian. Marco Casalaina: So, that's how we and our partners are using speech synthesis technology to bring a voice to those who can't use their voice and to help bring the world together. But another key use of speech AI is captioning. I use captions all the time, in every Teams meeting, in every video that I watch. I don't have a hearing disability, but I find that these captions help reduce the cognitive load on me.

I have a colleague named Swetha Machanavajhala, and I often meet with Swetha. Swetha is hard of hearing. We've recently been making a bunch of improvements to Azure AI speech so that it can better understand her accent. Let's have a look at how that works. So, now I'm in my browser, and I'm in the Azure AI Speech Studio, which you can reach at speech.microsoft.com. On the screen, there are a number of tiles showing all the various things that Azure AI Speech can do, and the very first tile is called "Captions with Speech-to-Text."

I click into that tile and that gives me the captioning interface. So, I'm going to drag in a video of a meeting that I recently had with Swetha. Along the right-hand side are a number of settings that I can choose. First, there is a caption mode, and I can do these captions in real-time, like you might see in Microsoft Teams, but because this is a recorded video, I'm going to choose offline mode, which produces better results for recorded videos.

There are also a number of other options along the right-hand side. For example, I can choose a language, but here it has defaulted itself to English, which is fine for me because this video is indeed in English. Then I just press "Play" for it to create the captions and here's the result.

Hello, Swetha. How do these captions affect your life and your work? Swetha Machanavajhala: Being hard of hearing, I know it can be hard for others to understand me, but of late, our accuracy has improved a lot for people like me. So, now when I talk, it can turn my words into accurate text, and that helps everyone understand me better. Marco Casalaina: Now, the output that you saw here today is clearly not perfect, but it is definitely much improved over what we had before, and we've made these improvements through our work with the Speech Accessibility Project, which has allowed us to improve the accuracy of captions on non-standard accents to as high as 60%.

The Speech Accessibility Project is a project hosted at the University of Illinois Urbana-Champaign, and now from the University of Illinois, I'd like to introduce Mark Hasegawa-Johnson. Mark Hasegawa-Johnson: Thank you, Marco. Automatic speech recognition, or ASR, transforms speech into text.

In order to do that accurately, it needs to be trained using lots of speech data from diverse sources. ASR has great potential to improve accessibility. Forty years ago, Melanie Fried-Oken proved it was possible for a person with severe motor disability to use ASR instead of a keyboard to access a computer, and then for a couple of decades, people with motor disabilities were the most influential users of ASR, spending hundreds of hours training personalized ASR systems and actively contributing to some of the early research projects. In the past 20 years, ASR error rates for people with cerebral palsy went down by a factor of 3.5. That's a great achievement, but I would like to argue that it's not good enough, because during the same period of time, ASR error rates for people without speech disabilities went down by a factor of 7.5. The purpose of the Speech Accessibility Project is to make it possible for researchers all over the world to create ASR that works better for people with speech disabilities.

In order to do that, we need to collect speech data that represents as thoroughly as possible the wide range of speech patterns characteristic of speech disabilities so that engineers can train ASR to accurately respond. We're recording speech data using protocols that have been designed from the ground up by people with disabilities and their family members and advocates so, that when somebody volunteers to record speech for us, they enjoy the process. The speech recorded in this way has been provided to 50 universities and companies, all of whom have signed a data use agreement guaranteeing the privacy of the contributed speech recordings.

At Microsoft in particular, I've been told that data from this project has reduced ASR error rates for users with speech disabilities by 20% to 65%. Two of the people whose lived experience has strongly influenced the Speech Accessibility Project are David and Dr. Kathleen Egan, the author and co-author of the memoir "More Alike Than Different," which focuses on abilities in our shared humanity.

David and Kathleen, do you want to tell a little of your story? David Egan: My name is David Egan. I'm a man with Down syndrome. I'm very grateful to Microsoft and all of the companies, researchers for including individuals with speech disabilities in their projects. I met Dr. Marie Channell Moore from the University of Illinois at the National Down Syndrome Society Adult Summit. After that meeting, I signed up to share my speech on the accessibility speech website.

I'm recording my voice and promoting the project in Virginia with other Down syndrome associations across the country and even hoping we go worldwide. It's important for us to be included as we are more alike than different, as I wrote in my book. Speech and communications are very important to all human beings. I'm one of the lucky ones to have had early intervention when I was born in Madison, Wisconsin, in 1977. I went to the Waisman Early Childhood Center, which got started in 1979 with 12 kids, two with disabilities, a child with autism, and I with Down syndrome. It made a big difference in my growth and my ability to speak and interact with others.

My ability to communicate helped me throughout my 47 years. It made it easier for me to be competitively employed for two decades as a clerk in the distribution center at Booz Allen Hamilton in Vienna, Virginia. I also serve as a Joseph P. Kennedy, Jr., Public Policy Fellow, working on Capitol Hill and advocating for individuals with disabilities. Please check out my website at davideganadvocacy.com, and thanks again for including the Down syndrome community in your innovative projects.

Now my mother will share her views. Kathleen Egan: Thank you, David, and thank you to all the companies and researchers for inviting David and I to share our experiences and thoughts and for all your work in speech accessibility. I'm very happy to see that the error rates in speech recognition and natural language processing has made amazing improvements over the past 40 years, as Mark has shared earlier, but there is more work to be done. David and I would like to see, in addition to having this Speech Accessibility Project succeed, we would like to propose having it be expanded and extended to children with speech disabilities and not limiting it to adults. Based on my personal experience, speech development starts at an early age, especially with Down syndrome.

We have a great opportunity to develop the tools that parents and speech therapists could use with young children with speech challenges. It could help them with exposure to fluent speech and motivate them with fun games to articulate and produce spoken language. Practice could help strengthen the muscle tone of the tongue for kids with Down syndrome. That's where they have the most needs.

In the '90s, I worked with an international group called "INSTILL," Inserting Natural Speech Technology in Language Learning to improve learner speech skills in a foreign language, and it made a difference, what we did at the time. I think the Speech Accessibility Project could be extended to a speech communication project for the lives of all children with speech disabilities. It could start with kids with Down syndrome and expand to others.

The Down syndrome community is a friendly one and open to research and development of tools to help their kids be fully included in their community. The Beckman Illinois team at the Illinois University with Dr. Marie Moore Channell, Associate Professor at the Department of Speech and Hearing Science and Intellectual Disabilities Communication Lab, are well positioned to embark on this with support from all of the ASR experts and companies to develop the needed tools.

Thank you, and David and I are looking forward to supporting your efforts. Thank you. Mark Hasegawa-Johnson: Yeah.

Thank you, David and Kathleen. Thank you for that perspective. That really is important to us and is the reason why we do this project. Thank you.

One of the organizations that shaped the Speech Accessibility Project early on was the Cerebral Palsy Alliance Research Foundation, an organization that funds U.S.-based research to change what's possible for people with cerebral palsy. Jocelyn Cohen is the Vice President of Program and Operations at CPARF, so she is one of the people who helps to guide its research priorities in order to make everyday life easier for people with cerebral palsy and other disabilities. Jocelyn, do you want to talk a little about how communication in general and the Speech Accessibility Project in particular fit into that scientific mission? Jocelyn Cohen: Absolutely, Mark. So, CPARF funds research in six focus areas, including technology.

We also fund and run Remarkable U.S., the first U.S.-based nonprofit disability tech startup accelerator, which helps bring affordable, life-changing assistive technology to market more quickly for the disabled community and other people who may need it. Dovetailing with this, one of the earliest projects CPARF supported focused on brain-computer interface technology that aims to turn intentional thought into immediate speech for non-speaking people with cerebral palsy. CPARF has always known about the powerful intersection between communication and technology, and communication is the cornerstone of participation in society.

People need it for daily interactions, relationships, education, leisure activities, and employment. Communication helps us get the mundane things we need every day. It helps us in the most serious circumstances, and it helps us get any assistance and accommodations we need. Beyond this, communication helps us connect with others on a personal level, making us feel like we're a part of something, like we're heard and like we're understood.

That's why CPARF jumped at the chance to help with and help recruit for the Speech Accessibility Project on behalf of the millions of people whose speech is affected by their cerebral palsy. This crucial work on an open source free resource brings together the wider disability community to remove a societal barrier for people with disabilities. The Speech Accessibility Project ensures people can be heard and get what they need when they use their own voice. Ultimately, it speaks to the broader principle that everyone deserves to be understood and to have their voices recognized so, that they can get what they need.

The world isn't built for disabled people, but when we collaborate with research teams like yours and companies like Microsoft to place autonomy and communication at the forefront, it becomes far more accessible. Mark Hasegawa-Johnson: Thank you, Jocelyn. Finally, I just want to issue two calls to action. First, if you're at a university or a company or another organization that could use several hundred hours of transcribed recordings of atypical speech in order to improve accessibility, please contact us. The Speech Accessibility Project is currently distributing data to organizations that are able to sign a data use agreement guaranteeing the privacy of participants.

Second, if you have ALS or stroke or cerebral palsy or Down syndrome and would like to record speech for the project, please contact us. Information for both groups is available at our web page, speechaccessibilityproject.com. Thank you.

Marco Casalaina: Thank you all for your contributions to the Speech Accessibility Project. AI can be used for many different types of assistive technologies, certainly speech, but also vision. One capability that many of you are probably already aware of is the screen reader, and we have a screen reader built into Windows today called Narrator. This spring, we're going to be introducing a number of improvements to Narrator to enable it to better process visual content on the screen, especially on our new Copilot+PCs. So whereas, in the past, if there was a chart or a graph on the screen, a screen reader might not be able to interpret that correctly, but with these improvements that we'll be introducing shortly, Narrator will be able to do this.

Speaker 1: Describe image window. Generating description. The image shows a graph comparing the five-year cumulative total return of Microsoft, the S&P 500, and NASDAQ Computer Index from June 2018 to June 2023.

The vertical axis goes up to $400, and the horizontal axis marks yearly intervals. Microsoft, squares, grows steeply, reaching $375 by June 2023, S&P 500, triangles, rises slower, ending around $250. NASDAQ, circles, follows S&P initially but surpasses it after June 2021, reaching $300 by June 2023. Marco Casalaina: Another way that we at Microsoft can help people with visual disabilities navigate the world is through our mobile app called "Seeing AI."

Seeing AI has been available for a couple of years now and there are many thousands of users of it today. We've recently introduced an enhancement to Seeing AI that allows it to do audio descriptions of videos. Let's take a look at how that works on a video that I recently recorded with my colleague Udi at our Ignite Conference in Chicago. [ Laughter ] Speaker 2: Video, yesterday, 5 p.m., processing. Two men stand in front of an open doorway leading to a conference room with a "Microsoft Ignite" sign.

One man is holding two orange balls. Victor Casalaina: All right, let us know when you're ready. Speaker 2: The man in the suit begins to juggle the orange balls while the other man watches. Speaker 3: I'm now filming. Victor Casalaina: Here we go.

Speaker 2: The man in the suit continues juggling, and the other man prepares to catch one of the balls. Both men are now juggling, tossing the balls back and forth between each other. The man in the suit leans back to catch a ball, while the other man throws another ball. Both men raise their arms in celebration, having successfully juggled the balls. Marco Casalaina: And now we're going to have a look at what's coming next in AI, and that is agents.

Up to this point, you may have used AI to generate summaries or to generate content or to answer questions for you, but agents are a type of AI that don't just answer questions and generate content. They can do something. We're going to be using the OpenAI Responsys API. This is a new type of agent that we are soon releasing on Azure AI and it is going to use my computer to do an everyday task. Let's have a look at what that does. Now I'm in a UI that I've built for myself that's connected to the Responsys API agent coming soon to Azure OpenAI.

At the top, I've given it some instructions. These instructions are telling it to do some grunt work for me. You see, every day I have to go to this site and download this PDF for a shipment, and then I have to go to another website to enter what's called a "bill of lading" into there, and that is some grunt work that I don't want to do, so I've asked my agent to do this for me.

So, it's logged in already, and we see that it's on the shipment dashboard, but there's a pop-up on this page that might confuse it. Let's see if it can figure it out. Ah, it got it.

It goes past that and now it continues to understand the page. It's going to use the date picker on this screen to filter it down because I told it to do this job only for the last day. Well, it's a little bit confused by the date picker, but now it's got it, and it's filtered it to the last 24 hours.

It has found a shipment to process. Now it needs to download that document. Let's see if it can figure this one out, because these documents are represented by little arrow icons. All right, it's got the PDF. Now it has to read the PDF, and this PDF, which is full of shipping information, goes off the bottom of the page, so it's going to have to scroll, and it does scroll, and it reads that pretty quickly. It's taken some screenshots, and now it's got to go to that other site I told it to go to and remember all that stuff to enter it in as a bill of lading into the other system.

All right, so it's putting in that new URL. Looking good so far, and now it's here in the bill of lading system. Let's see if it can figure out how to enter this bill of lading, and there it goes. It has remembered everything that it read on that PDF.

It's successful, and now it's asking me for a message. So, at this point, this is the human in the loop, where I want to review this, the information that it's submitting in the bill of lading. Everything looks good, so I say, "Submit it."

Lo and behold, it is submitted and the grunt work is done for me. Now, I mentioned that I built this user interface myself, but how did I do that? Well, I used a coding agent. It is GitHub Copilot Agent. Now, we are in VS Code, the coding environment. In fact, the version that I'm using here is VS Code Insider. This is the preview build of VS Code, but it is available to everyone, and here in VS Code Insider, if you have GitHub Copilot, you will find that at the very top of the screen, there is a little button with the GitHub Copilot icon.

When you click that button, you have the option to choose an item called "GitHub Copilot Edits." When you click "GitHub Copilot Edits," it opens up a chat interface on the right side of the screen. So, now on my screen, in the central area of the screen is the code, and on the right side of the screen is GitHub Copilot Agent, and the way that I wrote this user interface, well, I didn't really code it myself.

Instead, I gave it these instructions right here. I told it to build a user interface that renders this screen in the middle. The Responsys API is using a protocol called "VNC" to connect to a remote computer or a virtual machine, and so I told it to render that screen into the central panel, and so it built that interface for me. Now, when I first built it and ran it, I discovered that the screen was not, in fact, rendering into that central area like I wanted it to, and so I went back to the agent and I told it that the screen is not rendering in the VNC area and to fix it, which it did, and that produced the interface that I showed here.

Coding agents like GitHub Copilot Agent can make building software much faster and much easier for everyone. Those are just some of the ways that we can use AI for accessibility. You know, I use AI all the time for all kinds of tasks, and you can, too. It's available now in Microsoft Windows and in Azure AI, in Copilot and in Microsoft Teams. It's everywhere.

Now is the time to give it a shot. Thank you all for joining us and have a good day.

2025-04-07 18:15

Show Video

Other news

Primitive Technology: Belt and pulley blower 2025-06-09 11:38

World's Greatest Military Inventions and Technologies 2025-06-03 18:26

Nvidia CEO Slams US Chip Rules, Trump’s AI Action Plan | Bloomberg Technology 2025-05-26 20:00