[MUSIC] AILSA LEEN: Hello everybody, and thank you for joining our breakout session today at the 2023 Ability Summit. My name's Ailsa Leen. I'm a Senior Design Program Manager working on accessibility and inclusive design at Azure AI. I'm so excited to be here today moderating this breakout session on artificial intelligence. Your organization is the future of accessible tech. As you all know, AI has been all over the news these days.
We hope this session will be both timely and relevant for you. The goal of today's session is to show you that any organization can get involved in this moment, and leverage AI to drive improvements in accessibility and inclusion. I hope by the end of the session, you'll leave inspired and excited and realize that you can easily leverage AI in your own organizations without needing a specialist team of data scientists. We've got some great presenters today who will all bring a different perspective on this topic.
Without further ado, I'll let them introduce themselves. MARCO CASALAINA: Hello, I'm Marco Casalaina. I'm Vice President of products of Azure AI.
SHAKUL SONKER: Hello, I'm Shakul Sonker. I'm one of the co-founders at I-Stem where we work to develop technology to solve digital accessibility solutions. ANNA ZETTERSTEN: Hello, my name is Anna Zettersten. I'm the Head of Department Languages and Accessibility at Swedish television, SVT in Sweden. AILSA LEEN: Thanks everyone.
Today we'll start with hearing from Marco about how our Microsoft Azure AI can be used to benefit accessibility. Then we'll hear from Shakul and Anna about the ways that they each use AI in their organizations. They come from different organizations and everyone has really interesting stories to share about the use of AI. Marco, why don't you take us away? MARCO CASALAINA: Thank you Ailsa, I work on the Azure AI Team here at Microsoft. Azure AI is the collection of artificial intelligence services offered by Microsoft as part of the Azure Cloud Platform. These services make it easy for developers and organizations to add AI to their applications and services without needing deep AI expertise.
My work focuses on Azure Cognitive Services, speech, language, vision, decision, and more recently, OpenAI. These are pre-built, pre-trained models built by our incredible data scientists. Now any developer or organization can add these capabilities into their app just by calling an API. You can use them out of the box or you can customize them for your specific use case. If you want to go deeper, you can build and train your own models using Azure machine learning. But you could do an awful lot of innovation without having to get that technical.
Now you'll notice that a lot of these cognitive services capabilities map directly to human senses or human capabilities: vision, speech, decision-making. At Microsoft, we often talk about how disability isn't caused by a personal health condition, but by a mismatch between a person and the system they're using. That's why I'm so excited about the use of cognitive services and AI for accessibility. With capabilities like speech to text, or image descriptions, or language translation, we can directly address these mismatches. Here are some common examples of how Azure AI is used for accessibility. These use cases can be found in our own Microsoft products but are also used by many of our customers.
We have live captions which benefits accessibility for deaf and hard of hearing users. Content reading, for people with a vision disability or neural divergence, translation for non native speakers, voice input or dictation for mobility impaired users, and descriptions of images and videos for people with a vision disability. Speaking of inclusive design, these are all great examples of features that enhance accessibility and make our products easier to use for everybody.
I know I love using captions in Teams when I'm speaking with a worldwide audience, it really helps me reduce my cognitive load. Captioning is a vital tool to create accessibility for people who are deaf or hard of hearing. Peloton is a company that sells exercise bikes and also exercise classes, which is part of their signature experience. They use Azure speech to text to caption their live classes which weren't previously accessible. They're able to customize our speech model to ensure Peloton specific exercise phrases are recognized correctly. Anna will go into more detail for us later on how to implement captioning.
Another example of where AI can have a huge impact on accessibility is in Computer Vision. Just yesterday, we announced the public preview of a new Computer Vision API, that has made a huge leap forward in quality and specificity of image descriptions. We're using this new model to improve automatic image descriptions across Microsoft, including in Seeing AI, and now have customers like Reddit using it in their products too. Here are some examples that I think really underline how exciting this technology is and how much improved these descriptions will be for blind and low-vision users. Where the model would previously described the image shown as a person wearing red shoes, which is true, it's now able to provide the specific details that a human would be likely to describe. For example, that the person is a child, that they're playing hopscotch.
The model's able to identify the colored squares painted on the sidewalk as hopscotch, which displays a much improved level of understanding of the world. This aerial shot of a parking lot was previously described as a large collection of batteries. Now, we're able to correctly describe this image as a parking lot full of cars. Lastly, we're able to correctly describe a pipette adding liquid into a tray, which is a piece of scientific equipment that I know many people in my team wouldn't be able to name correctly. These improved descriptions will soon be making their way into all of our Microsoft products and can be leveraged by any developer with a simple API call.
Another vision-based solution that can have a huge impact is Form Recognizer. This is an example of an applied AI. We've packaged our Computer Vision technologies to allow organizations to easily extract structured information from unstructured content, like images of documents. The information in this business card can be captured into a structured form, with name, phone number, email address, and company name, all stored separately. Shakul will be able to talk to us about how his company is using Form Recognizer technology to remediate inaccessible documents.
Lastly, and you may have heard a lot about this one reasonably, Azure OpenAI. Microsoft is partnered with OpenAI to accelerate the development and use of their models. While this is a new and constantly evolving area, it's also a very exciting one for accessibility. Technologies like GPT, Codex, and DALL-E are generative tools.
They can help you write content, code or generate images. They're exciting because they're essentially assistive technology for everyone. Natural language interactions like chat interfaces can make things easier and more efficient for everyone, but can have a particular impact on people with disabilities. For example, earlier in the Ability Summit, we've talked about, Hey Github! Github used a combination of our Azure speech to text and OpenAI's Codex to create a voice powered natural language to code editor. This eliminates the need for typing, and helps people with mobility disabilities or dyslexia to code more efficiently.
DALL-E can help people with vision disabilities generate visual content using text. GPT-35 is being used for intelligent recap in Teams meetings to automatically generate notes and focus on the meeting discussion. I'm really excited about this one myself. We're also working hard to make sure that these AI models generate content that's representative of people with disabilities and are in service of this community. That was a lot of information and a lot of different technologies. You may be asking, how do I get started? We're releasing this Toolkit as part of our ongoing journey with the accessibility evolution model.
We hope this can be a resource for your organization to accelerate your own progress for accessibility and inclusion in a sustainable way. It also builds on all the learnings we've taken from our work with our customers across both non-profits and startups. We hope that this will help you build on your understanding of accessibility and create a framework for accessibility innovation. It includes practical tips, case studies, as well as specific tools like datasets, APIs, and relevant research.
Our toolkit helps you work through your innovation ideas to help you ensure they truly serve the disability community with questions like identifying the opportunity for your potential product or service. What needs of people with disabilities are currently not addressed? Establishing the potential customers. Can you identify one person for whom you're solving a problem? Validating your solution or prototype by co-creating with people with disabilities and testing with them as well, you can identify potential barriers for your users. That was my overview of Azure AI and all the different ways that organizations are already using us for accessibility. It's inspiring and exciting stuff. But I always find that the most interesting part is in the details.
What actually happens when you start testing the technology and speaking to users for feedback. I'm looking forward to hearing from the rest of our presenters today to get into that. AILSA LEEN: Yes, thanks, Marco, and what a good transition. Thank you so much for sharing the Accessibility Innovation Toolkit, I think that seems awesome. Speaking of getting into the details, I'd love to hear from Shakul. Shakul would you like to tell us more about I-Stem? SHAKUL SONKER: Hi. Thank you so much.
At I-Stem, our vision really is to provide everyone and anyone who needs an accessible document and accessible content so that no one faces this book famine that we call in our community. Talking about our own background, I-Stem basically started four and a half years ago as a self-advocacy group. Where we really wanted to create awareness, especially in the STEM areas for blind students in India specifically. We started with organizing several awareness, creating events such as inclusive hackathon, tech training initiatives, internship, recruitment initiatives, so on and so forth. But we soon realized that if we want to create a bigger impact or if we want to really scale off what we are doing, we need to really use our own technical skills and develop something that can solve a bigger problem.
One of the biggest problem that we observed within the blind community is to find a solution that can to solve the problem of inaccessible content. When I think inaccessible content, basically a content is inaccessible if it cannot be accessed with the assistive technology that people who are blind or vision disabilities use in their day-to-day life such as screen readers. Now, in 2020, we started with developing a solution using latest in AI and developed an automated document converter that can convert an inaccessible content into accessible formats so user can simply upload their inaccessible file onto this portal and they can simply download an accessible output.
But we soon realized that accessible and usable documents are not sufficient. We need to also create a portal or a platform where they can also find 100 percent accurate documents because AI cannot be 100 percent accurate all the time. This is then we came into remediation editor. Remediation editor is nothing, but it couples human intelligence with latest in AI and really provide you an editor that can help you to edit a document.
This is also an AI-powered tool. What it does, it basically provides you a simple plug and play model where you can remediate/edit/fix any changes in your document that AI might have left it for you. The process to convert a document really is for users that they can simply come on to our portal, upload a document, choose an output format and get an accessible output document. If they are not satisfied with the automated result, they can simply escalate a document and then it is escalated to a remediator. Now, the remediated job starts from here.
Remediation is a two-step process. At step 1, a remediator fixes all the layout and reading about other issues. At step 2, they fix all the textual issues and that's how they complete the document. Now this completed document is 100 percent accessible, usable, and consumable document, and accurate tool.
Basically we build on top of Azure OCR, especially our layout detection models. All the output data remediators work during their remediation. It's actually vetting and create a new document.
That can also move this process and also becomes a faster process to solve this problem. Talking about our customers, I-Stem works really closely with DPOs, corporates, higher educational institutes, and other assistive tech companies. Founded by people with disability ourselves, I-Stem has in its code that we really wanted to co-create our product.
From the day 1 itself, we wanted to incorporate the feedback, suggestions, and recommendations from our users because it not only helps us to develop a better product, but also helps us to understand our users and also create user-centric offerings in future. For example, in last year 2022, my primary goal was really to move across the country and visit different places and meet as many people with disabilities as possible and take their feedback on our solution, which was a very eye-opening activity for me because I could heard several people and how do they think about our solution and what other feedback that they may have. Talking about the driving factor for innovation at I-Stem, we basically built on two things. First, if our solution is grounded in the lived experience of our own users. Second is the most important factor for us, is the iterative process.
This is how our development cycle look like. We start with a prototype, go to our users, ask them questions, get their opinion, come back and decide upon what are the next feature that we need to include in our offerings, and go back within our community. After multiple cycles, we then launch our product for public, and then again collect some more feedback and come back and see if we can prove the offerings that we have. This approach has not only helped us to drive innovation at I-Stem, but also helped us to understand our users better and create user-centric product that is inclusive, accessible, and usable for all. This is how our journey look like and thank you so much. AILSA LEEN: Thanks so much, Shakul.
In particular, I loved hearing about how grounded you are in your user's needs at I-Stem and the co-creation process, I think we should all develop our products with that level of iteration. Anna, you provide a different perspective. Please tell us about implementing accessibility at the national broadcasting level. ANNA ZETTERSTEN: Yes, thank you. SVT is the Swedish public service broadcaster. We provide content for the audience about 22,000 hours a year, of which 7,000 hours is buyouts from other countries such as United States, Australia, India, France, Germany, all over the place, and 15,000 hours of Swedish own-produced material.
Of those, 5,000 hours are live and 10,000 hours are pre-produced content. Amidst these 5,000 hours of live content, there are mostly news and sports, but we miss out one part of the news and that is the local news, the regional channels. Twenty-one channels all across Sweden that we don't have any closed captioning of because here in Sweden, the government demands of the public service companies to provide 100 percent closed captioning of the pre-produced Swedish material and about 80 percent of the live material. The local news has been an exception.
They still are in the demands, but we wanted very much to give the audience this much asked for service, to give the viewers closed captioning for the local news. But how to do it when there are 21 parallel transmissions seven times a day? In comes AI. We are using it since two years back and started testing in 2019, and we have had regular publishing since 1st of January 2021. Two years now and learned a lot. How did we do it? How did we choose which AI to use, which speech to text provider? We did a lot of testing.
We used Word Error Rate, WER, which is a way to measure how many mistakes the AI does compared to a ground-truth text. But we also used our own human intelligence to look at the closed captions to be sure that this is on our height of readability, the demands we at SVT have, to provide it to the audience. We have quite a few speech to text providers to choose from. But for this particular local news, we are using Azure and are very pleased with the improving that the AI from Azure has made over those two years, even though there still are, of course, learnings to do.
It's so much agreeable with what the audience needs today than when we started out a testing before started regularly publishing. We've also made survey after about six months. We asked the audience and in particular the audience with hearing disability.
We formed a collaboration with the organization for people with hearing impairment in Sweden and we got lots and lots of answers because this is a topic that really engages people. A bit of a surprise for us working in this field of closed captioning and localization, we found out that what the audience, most of all, disagreed with was when text was missing. They want the closed captioning to follow real close to what people are saying, even though sometimes people are speaking backwards, or stuttering, or making other mistakes. This particular target group, they want it to follow the speech, which is not so strange thinking about it because many of those people are having some hearing and it's much easier to follow when the closed captioning is following in speech. But we also found out many more interesting findings with this survey, and we are constantly having a dialogue amongst ourselves and with Azure about how to improve the AI.
Readability, as I mentioned, is our number one topic for this. The workflow is quite simple. The local news as soon as they are finished, for example at 9:05 in the morning, they are three-minutes long, the whole of the audio goes to Azure. The AI turns it into a transcript. Then it formats as we have chosen how the subtitles will look like as closed captions and then turn back.
Like about one minute after the transmission is finished, the three-minute transcript is in place. It's not live, live. It's live in the making, but not in the transmission. But the audience don't mind because lots of people are watching their local news when it suits them. They don't sit there, "It's 9:00 and now it finishes at 09:05." They watch it at 10:00 or 12:15 or whenever.
Mistakes and learnings. In the beginning, we agreed to put on "filthy filter." Not to get those words out in the open, but instead covering it up with stars or fences. It has worked well. Of course, the AI makes mistakes because it's a foreign language for the AI, every language is foreign for it. But it has improved in its Swedish very well.
Also we put in a lot of data. As you can imagine in local news, there are lots of names of lakes and cities and forests and people. A lot of data is making it more and more specific from the local regions. Local also means dialects. In Sweden, of course, we have lots of people with different ways speaking as dialects. Or you have broken Swedish, or you have something that sticks out like you're saying f instead of s and so forth.
The AI is improving even this. You can train it, which we have done together with Azure in Stockholm. We have trained the AI and the southern accents, which are for the AI a bit difficult to understand. All in all, we're pleased with the improvements and looking forward to take on larger challenges, which is correct implementation, which of course increases the readability. Also the golden challenge, which is editing.
Being able to edit like a person or a human can do when taking away small words and switching the way of speaking if you are not grammatically correct and it endangers the readability. Also speaker shift to being able to make diarization. So you understand who is saying what. There we are now after two years and we are already moving forward to the next step, which is live live. We have made some tests here in January in three weeks on a show that is very slow speaking show, with mostly one person at a time speaking and conference as long as three hours.
It could be any topic. It could be the war in Ukraine. It could be about horses.
It could be about daily politics in Sweden. It could be about hunting or whatever. Lots of data needed and lots of training to catch up with.
But we are impressed of the ability that Azure has shown in the live captioning. Now we're discussing what to do with the next step. That's where we are now.
AILSA LEEN: Great. Thank you so much, Anna. I'm so glad to hear that Azure speech to text as been standing up to your testing and with all the dialects and place names and accents, it's a challenge for sure.
I'm very excited to hear how live broadcasts go too. I'd say one theme I'm definitely hearing across all our presenters today is the importance of working with customers or end-users to gather that feedback. Like that's definitely something we heard from both I-Stem and SVT.
It's so important to not just use the technology, but actually understand how it's being used and what people with disabilities want and their accessibility needs. Today we heard a lot about AI and how it's not just the technology of the future, but it's a set of tools and technologies that we have available to us today with such great opportunities for impact on accessibility. Thinking about the future though, I'd love to ask each of you, starting with Marco, what are you most excited about when it comes to the future of AI and accessibility? MARCO CASALAINA: Well, I've got to say that I am super excited about the new conversational AI capability in Bing.
If you haven't seen that, you've got to try it. I think it changes how search works and it can really make it more accessible for all. AILSA LEEN: Alright, thanks. Shakul?
SHAKUL SONKER: For me really the latest in Form Recognizer because the kind of pairs and tables and whatnot, the layout element that we are able to retrieve by using Form Recognizer and it's really essential for us because it makes our job easier. What happens then? We start the remediation editor, we automatically detect all this layout edits for the remediation, so that they don't have to put in lot of work. That has really helped us. AILSA LEEN: Thanks.
I know you've been working hard with the Form Recognizer team to provide feedback and help us improve. We had the same from Anna earlier. SHAKUL SONKER: We actually have been sharing a lot of data with the Form Recognizer team to further fine tune the models. AILSA LEEN: Anna, how about you? ANNA ZETTERSTEN: I am looking forward to see what AI can do in translation because I didn't mention translation early on. That's because we don't use AI for publishing. But we are, of course, using AI for providing the translators with the material.
It's like the augmented translator we are talking about. It's like a boosting for the translators and to see how it's developing in translation between two languages, but also when several languages are involved. That would be really exciting. But firsthand, we are really looking forward to be able to use AI in live transmissions. As I told you about the testing for the live show.
SHAKUL SONKER: I just want to quickly highlight, when we see accessibility and assistive tech in technology, we only see these organization working in accessibility/associated domains, nonprofit or charity organization, which is not true all the time. We are seeing with the help of such technology that Azure is doing a lot of startups coming up in this field and doing some great work. Yeah, you can create and generate impact, but also make some money out there. AILSA LEEN: I think one thing we've definitely talked about today and are all very aware of is how AI is transforming industries as we know it at the moment.
It's all over the news. Lots of things are going to change and for sure the accessibility industry will be transformed too. I'd love to hear from each of you on what you think the future of the accessibility industry will be? What different organizations might emerge? What are your predictions for the future there? Marco, how about you? MARCO CASALAINA: I would suggest that the future of the accessibility industry is actually the future of the software industry.
These AI capabilities give you the ability to access people no matter where they are, no matter what their abilities are, no matter what language they speak. It's changing the world. SHAKUL SONKER: I couldn't agree more with Marco exactly. Latest in AI actually not only to solve accessibility problem, but when you try to make or think about accessibility, you actually tend to think wider and solve bigger problems. The use of AI you can actually not only solve this problem, but also make your solution accessible without even putting a lot of engineering effort, which is great. Earlier people used to think that they need to make a lot of investment in terms of making their offering/product accessible, usable and inclusive for all.
But that's not question anymore. Yeah, you can invest least and get the most of the output using the latest in AI. I'm really excited to see that. ANNA ZETTERSTEN: I have really high hopes that the tech will help out to make people more equal among each other. Even though you have a disability or abilities, or any ability at all, you should be able to feel more equal when you're in a dialogue with each other. I think if the tech industry and the accessibility organizations and others are cooperating, we will be able to find ways to communicate with each other without those hindrance of disabilities and abilities to stand in our way.
That's a hope. I'm not sure we're going in that direction, but something to hope for. AILSA LEEN: Thank you so much for sharing all of you. Thank you, Marco. Thank you, Shakul. Thank you, Anna. A lot of wise words and some really inspiring thoughts for the future.
If you feel as inspired as I have by what we shared today and you'd like to learn more, I have two things to share with you. The first is you can learn more about Azure AI and how it is used for accessibility at aka.MS/AzureAIa11y. That's where you can find out all about the different cognitive services we've talked about today and get hands-on.
You can also check out the innovation toolkit and our framework for accessibility innovation at aka.MS/innovationtoolkit. We'll also share these links in the chat and our presentation will be available online at the Ability Summit website, so you can access the links there. Thank you so much, again, to everyone. Thanks for attending our breakout session and sharing our thoughts, listening to our thoughts about AI and the future.
Please enjoy the rest of the summit.
2023-03-21