Stanford Seminar - Personal Assistive Technology

Show Video

thank you all for coming due even though it's raining a bit this morning um so uh really appreciate uh having me today so uh my name is aung I'm an assistant professor at uh in computer science engineering at University of Michigan also affiliate with school of information so today I'm going to talk about uh some of our lab's work on building uh and deploying assistive Technologies to make the real world more accessible for people with disabilities and towards this uh idea of um personal accessibility through personal assistive technology so let's start with the example first right so so the world is full of physical interfaces like microwaves information kios of thermostats and checkout Terminals and many many objects around us are also acquiring digital interfaces right even our uh digital um electri electronic toothbrush cooking pots they all have a bit of a screen and some controls that you can use now and they're also acquiring uh uh touch screens so I want to ask you to think about what kind of assumptions are these devices making so for example if we think about you run into this keyos what kind of ability absum are they making for you to be able to interact with it user parameters okay so the user need to be able to control it in some sense uhhuh anyone else yeah well with the screen you like definitely have to be able to see to get yeah you'll have to be able to see if it makes sound you should be able to hear and understand the sound and speech and also it requires you to be able to lift your arm you know touch the interface and also lift precisely on the same point within a time uh um span right so there are all these assumptions that are making based on this very simple uh user interface so although these kind of interfaces are wonderful for some people and you know they have some benefits of being more flexible um um they are also increasingly Inc comprehensible and unusable for others um due to their ability assumptions right so they require good Vision hearing speech and fine motor abilities so we have figured out ways to make these devices accessible for specific population right so for example for blind people we have self voicing devices like talking microwaves and clocks and scales that the user is able to interact with and the device will read out what is being um uh uh clict and you know for example if if someone is blind and also death for death blind people um tactile markings or Braille uh augmentations can make these devices uh more accessible so people can use their tactile uh uh sensation to to understand and operate these machines so how do we think make things accessible today right so physical objects can be designed to be accessible for certain people um and on the other hand the digital information can be accessed through assistive Technologies right so uh we have screen readers uh we have voice controls and physical controllers and many other types of assistive Technologies so give you an example of assistive technology so here is screen reader which is a commonly used assisted technology for blind people so let me play a clip drag your finger around the scen screen to explore Settings App Store reminders mail FaceTime this works with almost any word feature or app name so with this accessibility layer and this works on mobile devices and also on desktop users are able to interact with a lot of digital information right and another example of assistive technology for death and heart of hearing people is close captions so here I show you how close captions work back in the day on television and also on YouTube uh of a video that of Michael giving a talk at Michigan a couple years ago so we currently have created accessibility and usability guidelines to help companies design products to make things accessible uh that are usable by more people right however um you know the to make things accessible it require each product to follow a set of uh follow these guidelines such as you know we have percept able information tolerance for error flexibility of use these you might have learned in Universal Design principles but also we have wcag web content accessibility guidelines atag um offering tool accessibility guidelines um and we have you know all text for images we have close captions for videos and audio descriptions for videos uh for audio and U audio description for video so these are all amazing um uh um tools and guidelines that we can use to make uh physical and digital information more accessible however you know in reality there is still only a very small percentage of uh products that are actually accessible so why so as pointed out in the recent paper by uh gr vender Heiden and co-authors it is almost impossible to make each thing accessible using this Paradigm so the reason is that we have all of these guidelines but each the the current guidelin lines have hundreds of individual requirements in them requiring you know every single developer and Company to follow each guideline to to improve their products yet these guideline still does not cover all the people with all kinds of uh types degrees and combinations of disabilities and so essentially requiring each product team and Company to make their own product accessible for all people is not going to work and I will that anything especially the devices that I showed you anything physically situated in the real world like the kios that's already there like the the microwave that's already put into a kitchen they have to make assumptions about the user and it's fixed right so think about the chair you're sitting on that chair is not uniquely designed for you so even though you can sit in it is not the most comfortable for you it's designed to work with maybe 95% of the population but it doesn't work with your needs the the best um so anything physically situated is often fixed and have to make assumptions about the user and um even though let's say we you can add like screen readers and Assistive touch and all these assive technology on like a cooking pods or anything with a display that is also not going to work because you know first of all the cost consideration um the the requirement of hardware and software maybe the limited stream state for you to perform interactions but also uh if you if you are familiar with Assist Technology you know that people make deep customizations on their assistive tools right though they customize the gestures the settings for themselves so there is no way for you to be able to interact with something that's in the real world and be able to have it adapt to your needs directly in the uh most deep personalized way so to address this um one uh framework is ability based design um some of you might be familiar with this so ability based design is um the goal is to to to to design things to accommodate the user's ability so the system that we call a ability based system can adapt based on the user right so in some sense the current digital accessibility ecosystem is kind of already designed in this way but but breaking this down I I think ability based system have really two parts one part is the physical and digital um uh objects uh to be accessed and the other part is the assistive technology layer that um work with those information to present them to the user so I think in the context of digital objects such as mobile web um this makes sense right so you can Surface the information in one way and the user is able to use their own assist technology to work with it but if it is accessing the real world this often fail because the physical infrastructure is often rigid and cannot adapt so our lab have been working towards this um this vision of personal assistive technology so I draw from uh a Mainframe Computing uh where you know often many users access the same one device um you know and we transformed from that to personal Computing where you know it really enabled users to consume create and innovate um uh by themselves and you know mapping this to uh personal assisted technology um the idea is you know uh personal assisted technology aims to empower people with disabilities to be able to access the physical and digital world in their own terms using their preferred devices and modalities um and leverage be able to leverage their domain expertise to uh uh to create and customize technology for themselves so I'm going to show you a couple projects um that kind of map to how we we think about this so using the same uh uh illustration now we are kind of moving um this assisted technology part closer to the user and thus it becomes personal uh more personal allowing it to access the rigid techn uh uh physical and digital objects out there that's already in the real world so I'm going to show you a few examples of how personal assisted technology will be able to let people uh use their own devices and modalities and augment their environments uh as well as enable people with disability to make their own assistive technology and leverage broader context so I will start with the first ones so going back to this example making all of these interfaces accessible has been a long-standing challenge and especially because there is already a huge Legacy problem of these devices already in the real world right they are not connected to Internet and um the user is not able to control them using a different um um um device so in our prior work we have developed systems to interpret these uh uh uh interfaces either static or dynamic and uh enabling blind people to independently access them so to show you um uh similar to App Store reminders Ma so similar to how the iOS screen reader work the user is able to move their finger on the digital interface to hear what is underneath their finger we kind of map that experience to the physical interface kitchen timer kitchen timer and two two two and five five five and two two and five two and one one and two two one and two two so as you can see when the user move their finger on the physical interface uh the the the system Vis lence will be able to provide real-time feedback of what is underneath the user's finger kind of map to how a screen reader work for digital information so here is also uh show you a range of uh interfaces that this approach can work on um but the idea here is that we're able to bring the amazing affordances of digital accessibility into the real world and we are able to allow the user to use their own device um and their own preferred modality to access such information so think about if this user prefer a faster speech or this person prefer tactel feedback or this person may use a different Assist Technology to get the output they are able to transform what is in the real world what is the information that they need to access in the physical environment into this preferred modality and this device this Smartphone they have is really um uh know them the best right have all the contacts and understanding about the users abilities and preferences um so about two years ago in summer 2023 uh we deployed this app on IOS app store and uh have had about 500 people uh use this for a thousand different interfaces um and then subsequently maybe coincidentally uh Apple added a in iOS accessibility settings okay so you got the idea that you know the the user is able to use similar interaction although it still doesn't read what is under your finger you kind of have to show what is first above your finger and then uh know how much to move up to activate it right so building on this lens uh we then develop a series of extensions to extend the capability of what kind of interfaces uh and in what modality the user can can access right so first we develop this extension called facade which enable blind people to create these 3D printed tactile overlays for their um inaccessible appliances right so um you know uh they they can take a picture of this interface with a reference image like a dollar bill or a credit card that you know the absolute Dimension and uh features then you can use this to uh kind of uh um uh retrieve um the Buton information and be able to generate a 3D printed overlay that the user can then uh retrofit and aument their their Appliance so this lets people um a into their own environments uh using this approach and furthermore uh we also extended uh this to work with more Dynamic touch screens um so you like uh what we did is we used uh kind of the videos that people interact with this machine maybe like you you used a coffee maker to to order latte and somebody else go in and really change the settings and gradually we can collect those videos and be able to reverse engineer this state machine of how this machine works and we are able to present this using an audio interface to a blind user so that they can specify like which you know what operations they want and then the system is able to guide them to complete that task step by step and more recently uh we developed a um a phone case extension that we call brush lens that allow both blind users and also modor impaired people people um with difficulty with fine motor control to be able to access touchscreen kiosk so the the the the problem we're trying to solve here is that in the previous extension that the blind user is able to use the touchscreen using uh the the the system um the users still have to be really precise in moving their finger following this audio guidance and we found that sometimes cause uh challenges because that you know if the UI is kind of particularly cluttered the user make make mistakes and it might take a lot of back and forth so the problem we're trying to solve is okay how about we can use a hardware interaction proxy to automatically perform the actuation for the user so that the user can just focus on Broad scroll like you know brushing on this interface without worrying about a very fine um uh Mo motor control and then the system can U you can delegate to the system to perform the precise actuation um so let me play this clip we present brushless a hardware interaction proxy to make touchscreen devices more accessible for people with diverse abilities brush l uses multiple actuators to touch the screen it automatically determines when to activate the actuator and precisely touch the screen on behalf of the user we show the brush lens with solonoid and autoc clicker actuators visually or motor impair users can use the brush lens accessible interface on their smartphones to explore and interact with inacessible touch screens so in this case you know we we uh build two different uh phone cases one is with mechanical uors that can work with physical buttons uh capacitive or conductive uh screens and then the other with these Auto clickers that you can actually uh see that is often used in Mobile Farms right so um people like in in Mobile Farms you have uh like smartphones like running a Tik Tok video and then you have this connected to a auto clicker and automatically like a bunch of videos so so these these automated methods so we bought a bunch of those and we built the phone case to be able to uh change the the the capacitance dynamically and then when the user is brushing on the screen it's able to just um provide the touch at the right moment um uh um for the user so the the the idea here is that if you see that there are two different modes right so um uh one mode that's supporting blind users is able to map the functions that the user is trying to trying to perform onto a interface that they can use uh voice over to to to control on the other mode to support people people with um um uh like that lack fine motor control were able to uh create a custom interface that augment like that that you know um magnify the buttons that's close to where the phone is so it make it easier for the user to activate it so in this approach if you think about you know how uh this brings that interface closer this this assisted technology closer to the user is like this interface can really know what the user need right whether they need motor support or visual support or um uh other kinds of support and be able to translate whatever the users um uh ability and perform interactions that's kind of assumed by that physical object or in this case the touchcreen out there so this um this line of work on kind of create uh providing assistance for uh uh Dynamic uh physical interfaces uh can also be very broadly augment um it be extended to argument how people uh generally interact with touch screens in the real world right so for example um when configuring complicated medical devices or when interacting with um machines in the different language these guidance can also be provided through a a augmented reality overlay right it doesn't need to be audio or H text by itself um so these overlays can be in the form of uh a visual indicators animated instructions or interface simplifications um so we also kind of explore some authoring tools to create interactive AR tutorials through narration and demonstration so in this case um you know like using the same sort of backbone of how we model the interface uh uh uh structure we're able to provide kind of a stepbystep tutorial to guide the user to perform uh um uh operations on this printer okay um so any questions so far yeah what is the class of people with motor challenges that like uh can't use a touchcreen but still could hold the phone consistently enough to be able to swipe it over like a t screen um so uh I would say it doesn't map to a particular condition but uh we recruited participants who are able to uh for example rest their arm on the screen and they they're able to move their um hand around even though it's not in a smooth kind of zigzag motion but the uh kind it reduces the error rate a lot and the users are able to complete the task using this uh kind of augmentation yeah I got a question so you you motivated with sort of the personal computer and one of the weird ironies is that the personal computer wasn't very personal it was like it was very generic it was just cheap enough that everyone could get one but I feel like your your pitch here is about essentially Mass customization a little bit more is that is that fair yes I I think with the person computer the kind of analogy I'm also drawing is um you can access the same content on the web by using many diff very different uh uh extensions right so you can change your contrast you can have it read out at the same time you can you know use dark mode or light mode like many other extensions that you have on on on your browser or on your computer so you can have these deep customization that really work for you um and this is a device that's personal to you that then you can use this to access the best amount of information out there so I can install stuff on it basically yeah yeah yeah you also mentioned the microwave overlay was were was the idea to take a picture and then ship it out to some third party and then receive it or uh was the person supposed toate on her own yeah that's a great question so um when we did the project one uh that was uh I think digital fabrication is in parallel advancing and I I think the vision is that a blind user may not have a printer at home but they may uh in the future have a 3D printer at home but there are a lot of accessibility challenges and how to configure that how to remove the support material and all that but you know when we reach there this is something that the user can do completely by themselves or with a caregiver at home um um but for some of these devices we printed in the lab or we use like a a service like 3D hubs uh and then we are able to uh the the video actually you saw is when they are about to print uh your uh uh object they will send you a link and you can watch the live stream of how it's being printed it's pretty cool yeah yeah I really like the idea of sort of thinking about sort of changing the product and making it more accessible versus you know these personal devices and I was wondering uh if there are opportunities are are ways to think about kind of approaches that are hybrid are meeting both ways like perhaps the product provides some set of like metadata that makes you know the personal device uh makes it easier for the personal device to to still retain customizability but you know gain the types of sort of underlying digital information or like sys about the product that enables it to maybe do more or yeah so that that's a great Point um So eventually for this whole ecosystem to work to make um all uh let's say 99% of the uh devices or information accessible it requires a collaboration between all of these terms but but for example each individual company who is serving this information they don't have to be become that so this is the pointed out by Grant vendor Heiden Uh u u paper um is like they don't have to have a dedicated accessibility team to make it accessible for every single population but they can just serve this information in the right way that can be consumed by this gener information bot so that's what they say um so the the the kind of structure that they propose is there is this infobot and there is this UI generator right so with these two in combination um then you are able to kind of pair them to work well yeah yeah okay so let's move on um so I just show you a few examples of how we uh Empower people with disabilities to be able to uh access phys phical uh uh uh information using their preferred device and modality um so next I want to ask you who like so this assistive Technologies who developed them who designed them app Apple uhhuh companies and maybe maybe individual researchers and uh product teams right so they are they are de developed by designers developers and researchers and in prce they are these Technologies are typically designed for common us use cases in order to maximize their broad applicability right just like your chair right so it's it's trying to fit a a majority of the population but with this goal right to maximize broad applicability the the the consequence is that you know they become one siiz fits all and often inflexible and often times they are falling short of supporting the unique needs and preferen is of end users so the end users is not able to make changes to these um uh um um Technologies to to really make it work for themselves so we conducted a series of qualitative studies with blind participants to understand um what are the situations that you know there are these breakdowns and how do blind people want to customize or create new Solutions and so we found that people are already putting in a significant efforts to create workarounds to try to adapt to these long tail of knees um but existing workarounds are often very tedious overwhelming and ineffective right um okay so to illustrate this uh I will walk through one example given by our participant where they want to use uh the the AI um application seeing AI developed by Microsoft to sort mail so what they do is they pull up this phone uh this this app using the short text mode and then they aim the camera to scan this envelope right so they want to figure out okay are these mail for me or for my family or for others right and so when they do this scanning instead of just reading out the information that's relevant for them which is in this case just their name um they get all of the um they get frustrated by all the information that the app will provide and often times because of camera aiming issues you will kind of repeat over and over again and they were not able to complete this task so in response to this participant imagined being able to kind of create an assistive technology on top of cing AI That's just able to tell them this information that they want which is their name right and this is just one example of how um like a blind participant is engage in this design process this ideation process um and so we we observed uh in this study a range of breakdowns and strategies that people have been using right so uh for example they want to find just their name or address just the explanation date of the product or just the number of the incoming bus so there is actually a huge variety of things that people want to filter and um that they are not able to do with the kind of general purpose uh OCR or text reading applications today and also blind people often have to switch between multiple Services um to to achieve one goal like for example when they uh one of our participants was sorting mail using in uh in different languages some are in English some in uh Arabic so they prefer they have this kind of trick of using CI to to read um uh the the the mail if it is English or use Envision AI if it is in Arabic so they they know like this kind of work well in that situation so they have to kind of manually do this right so they use one they figure out what language it is and then they switch to the other app right or in another case when a blind user is getting a scene description from a like a generative AI service such as em AI powered by gp40 they have very detailed strategies of dealing with this potential AI errors like in high stake situations they will cross check with other um uh AI services like um cloud or uh sometimes Fall Back To Human Assistance and they also kind of uh often layer multiple Services right so for example when PE uh when blind people are navigating unfamiliar spaces um they will use a combination of multiple application such as uh Google Maps to get directions blind Square to know more about the the landmarks and also OKO to kind of uh specifically cross sidewalks right so the user is already doing this kind of manual process of switching and combining um different application and different Services together so from cases like this as well as from prior work we know that blind people in this case they are really the domain experts in envisioning uh designing and hacking assistive technology um but because of the current limitations in this design and hacking uh uh uh process individual needs often become unaddressed so our goal is to imagine uh how can we Empower blind people to leverage their expertise and creativity to be able to create uh C custom mobile assisted technology to address these needs so instead of you know they have to channel these needs to developer or a company and then people um and then the companies Crea it's something that can work with a range of uh needs what what if we can give the tools to the people and they can create their own Solutions right so that's what we're imagining so you know we are going Beyond cons consumption but also to creation and um and personal assistant technology here also aims to empower people to leverage their expertise to be able to create Technologies for themselves right so essentially this part um can change it and they can create this themselves so going back to Michael's question now not only they can install things maybe they can make those extensions and program those extensions so end user programming uh is a method that supports nonprofessionals to create software artifacts to be able to achieve their goals in their own domains of expertise um is potentially suitable for this goal that we're talking about but it is critical to address the challenges um to make them um um make these end user creation processes and tools uh approachable accessible and uh expressive enough for blind people right so um the traditional Paradigm of end ususer programming is uh often too high of a barrier for blind people to be able to create assistive technology so we need to figure out where's the right abstraction so that's what we um uh investigated in our recent work uh so that blind people could create filters such as you know find find my name on envelope to just filter their name or in the case of sorting grocery they can create something like find data on grocery item so let me show you like once this program is created ho ho how does it work runs the program pause running button no date found on grocery product found date January 10th 2024 Brown found date January 10th 2024 yellow orange so the use us is able to run this program called find date on grocery item and then aim at the packages or the grocery that they have and be able to turn around and hear this live update of um uh just the right information they want so so to to develop this application um we have uh three design goals right so we want to make it expressive right what that means is it should be able to support a wide range of needs um um and I should be able to easily extend it in to more uh uh kind of uh models and tasks in the future and second is to make it approachable which means that for um people for blind people who have very limited or no programming experience it should be understandable to them and uh they they still provide easy entry points for them to be able to create and make changes and finally is uh accessibility which means that on the one hand it should be accessible to screen readers so blind people can use them but on the other hand it should also provide sufficient contextual uh information um that help people understand what's going on so even when you know the the object that are trying to find is not in the field of view it should still provide surrounding contacts to help them aim the camera and eventually get the task done so in order to make it expressive um we kind of used all the scenarios um that apply in this case in the previous qualitative study um and we derived this program representation um uh to to support this kind of visual information filter task so this program follow this pattern of find something on something and each thing can have like adjectives on it um that can denote color location or size or other attributes so with this very simple structure find something on something like people are able to create um um uh assisted programs to find the number on the bus to find the largest text on a poster or find the address on envelope right and this can also be easily extended in the future to support new models and uh object classes um as well as maybe like uh alternative terms of find or up are those uh drop downs are those pre-op or like is that a closed vocabulary like license plate was that built in or is that something that end user offer yeah I will show you that but um but those will depend on kind of the models that you plug in for example if you use YOLO it will give you a set of object classes and the user can choose from that or if you use a combination of different models then you can you can have a more variety or if you use a VM then it's more uh it's more flexible that case yeah so when you when running the filtering programs um the the program Ally application will iteratively find the items and then provide information to the user and you know when the target object or maybe the number is not found but the bus is found it will also provide contextual information to help the user know like what is happening right now and maybe they see that grocery item but then they keep turning to find the exploration date right um and one one kind of side effect we found uh you our study is that you know this actually may reduce the hallucination from VM models so if you just ask the the model like a question then sometimes it's able to tell you something that's not part of this image but using this more structured filtering approach plug this into a vision language model is actually be able to kind of Reason better about okay this is what I see and that's the number that's within this bus not the number that's somewhere else maybe on a different vehicle yeah so we will find the bus and then find the numbers and uh you will read these things to the user and the user can uh make changes so in order to support approachability and accessibility so how do we support the user be able to create these applications is uh we develop a set of multimodel and user programming interfaces uh so there are um um block based programming there is natural language programming and there is uh programming by example um so in the block mode users can create the program um just by using like drop down like similar to how you may configure like a iOS shortcut or a ft if this and that uh programs um so this give you like sort of the most control and most prec Precision but it it will take a bit longer because you know you have to navigate through all of these interfaces right and then in the second question mode the user can just speak their question oh I want to find number on the bus I want to know like what is my name like is this mail um uh uh for me right and then we can use you know um fine-tuning and you know be able to generate candidate programs for the user and then they can continue to to edit it and use them um if they want and the third mode is explore mode um so this mode kind of takes inspiration from programming by example so the user can kind of first aim the camera at the scene or this object and get a bunch of output and then once they hear something that they want they can later go back and select that item 73 that they okay I know that's the bus number so I want to create this program related to this then it's able to generate a program uh using you know the tree representation of the scene and be able to kind of go back and uh and and generate this program and then um you will become find number on bus so these are kind of the the three different uh user creation methods that we created so we then uh conducted a user study with 12 blind participants to see like whether this approach really work and be able to support uh accessibility use cases that were not possible before and what what are the references and trade-offs of people using these three different creation methods right so what we found is that you know the each of these creation interfaces actually have their own strengths and uh and and challenges so for for block mode uh although it it take more time is generally a bit more time consuming participants appreciate that was very precise and give them the control over the final program um especially useful this mode is especially useful when they have something in mind that they want to create and for question mode um participant found it to be very intuitive right you just speak what you want and then it's able to kind of potentially help you generate the programs uh that can match the user's goal although sometimes this this like the output was not what they intended and it might take more back and forth for the user so I think you know this is not the ultimate form of how the natural language uh mode should look like but I think there's there's more work to to be done to kind of how you transition between these multiple different ways of creating this program to be able to um um best match the user's intent and then third for the explore mode um participant found that it is particularly helpful for situations of unknown unknown right so they aim the camera they may discover something they didn't know before or some attribute that they didn't think about and they're able to use those in the programs to make it serve their needs so what what this means is that providing these options a combination of multiple op options to users is kind of important to to make this un user programming possible and approachable for users with a range of uh use cases and um abilities so as a participant four put it uh it's all contextual it depends on what you want to do um I would use different methods based on what I know about the environment if you already have an image you're working with for example uh you want to explore it then then that's what it will use it really depends on the situation so like you know so provide again this kind of um U uh highlights that you need to provide multiple different modalities and ways for the user to be able to create an author that may best suit their kind of technical uh background or you know what they are trying to do in in C and finally participants uh see the benefit of using this approach and I think I I really like this um quote that says it all comes down to providing Choice uh ultimately you're putting the information available in the person's hands to choose it is creating modularity to access the information uh I wish more uh creators or assisted technology companies thought about how can we take these pieces of information and put it in the hands of the people who need it so that they can then modify it and change it and make it their own so going back to kind of these different strategies and breakdowns that identified in the qualitative study and of pro program Ally the tool that I just showed you kind of support this first gap of you know general purpose recognizers does not support a specific uh use cases but looking into the future there are a lot of opportunities in um kind of supporting the other uh breakdowns and uh and strategies as well so for example you know people automatically switch between multiple services or people layer multiple applications together how can we allow people to be able to author these work arounds in the operating system level that's able to U combine this information together um um so for example some um um some of our ongoing Explorations look into how can we build like Snippets of assistive technology extensions that can work on top of the existing commercial ecosystem so like there might be one thing that the user need on top of be myi that's to a better aim their camera then we can use that you know the shortcut uh automation to be able to quickly kind of get help the user achieve that task and bring them back to the the application they're already using okay any questions at this point okay so I I share with you um um uh other than kind of how we can uh personal assisted technology can support users to use their own devices modalities and augment their own envirment so this last example show uh kind of help illustrate how we can enable people with disability to be able to make their own uh AI assistive software and um you know and there are a lot of um potential uh uh Avenues so for example you know to to kind of broaden this uh towards more people maybe we can have people create like there are different communities that can create and they'll be able to share Solutions and reuse and we'll be able to develop templates um and maybe we can create a community for people to be able to kind of tap into this knowledge source and enable a community of knowledge sharing and um one thing that I touched on briefly is kind of intent right so many of these programming methods aim at kind of capturing what the user want to do right and help them uh kind of externalize that in the most natural way whether it's natural language or is you know programming uh by example or is direct uh creation right so in our uh um recent work we also uh explored this to uh in a project called World scribe to leverage their broader context so uh World scribe is is a is a is a project that um we we develop a system that can uh generate live visual descriptions um that are adaptive to the users's intent uh movement and also Visual and auditory context so what you see here is like as the user is moving around right if they are like you know uh turning very quickly or if they're walking quickly um or the visual scene is changing very quickly then we'll provide like bits and Snippets of information and then the user kind of start to focus on a certain object or a certain scene in front of them then more details will be provided to the user right because if you if you think about the current way of using like a gbd4 or a B AI to get image description it essentially just like it's uh it's it's not aware of what the user is doing and is providing like very long description so when people use these um assistive tools they often have to stop and ask questions and use this turn by turn interaction to to perform um to their task but we here in World scribe we're trying to map um the description to the user's context right so the user can specify their their intent and then based on how Dynamic the environment is or their movement um so we we we kind of map the descriptions Gran granularity or length to kind of what they're doing in the moment and also um we we can manipulate the presentation of the description so for example if music start playing or the environment become more noisy then we can raise the volume of the description to make it easier to to hear or if someone start talking to them and have a conversation then we can pause the description and without disrupting what is happening what the user might be doing as more important so overall um um that's kind of uh I give you a bunch of examples of personal assistive technology and as we are moving assist of Technology closer to the user and becomes more personal and enable them to create and modify and uh use their preferred devices Etc I believe it will move us towards a um a more access more accessible future that can be deeply personalized right but I also want to take step take a step back because I I believe this approach might be able to make him both deeply personalized and also more broadly applic uh Universal and uh accessibility ultimately benefits everyone so I just want to provide a few examples of how I think you know these approaches might be able to generalize uh Beyond uh people who are blind and also people with disabilities so for example the multimodel task guidance systems like visz and brushland um as I showed you a couple examples earlier might be able to make interfaces more usable and easier to use for more people for everyone and U my lab have been starting to uh uh work on projects related to kind of task guidance in the medical domain and there's a lot of kind of commonality there as in the users the surgeons or the medical professionals are often hands busy eyes busy and are very high uh cognitive load and uh also in my some of some of our PR prayer work we have looked into to like in the industrial setting of supporting uh warehouse workers to do order picking so many of these kind of multimodal task guidance systems may have implications Beyond accessibility but into other domains and uh end user assistive tools like the uh program Ally that I showed you um May generally make you know creating enduser workflows uh uh lower that barrier of doing that and uh like by leveraging multiple modalities so for example uh I think there's some ongoing work um uh you know in in in the def and heart of hearing space so for example what if uh uh death people are able to use different um sound models like you know uh to to be able to construct their own kind of um uh uh sound understanding uh software more easily um so the commonality here I believe is uh we're able to better leverage people's ability and expertise uh in C2 and Empower them um to be able to create and consume with greater agency and then we can morph that interaction experience accordingly by providing more Dynamic adaptations um so all all this work uh will not be possible without the amazing students I've worked with and also beyond what I shared today there are also U many other directions that I didn't cover and I would love to um uh potentially talk to you offline so uh now I'm open for questions [Applause] all right we got about 10 minutes so we can have discussion yeah oh thank you so much for the presentation and I think things are really really like GRA with technology especially health I think my question is more on like how would you imagine like the adaptation would work for people who are like used to their normal style of living for very long time but now when we like introduce them to this kind new technology and especially like um those might not be used to before and exactly like programming new tools would it be like true demanding for some some of them and like how do you support them in kind of adapation process MH yeah so that's a that's a great question um so for for example for the program Ally work uh the goal is that the user uh is able to author that once and be able to use it for many other repetitive situations or like kind of time sensitive situations and or you know perhaps we can Leverage The broader community and crowdsource these solution solution so they don't even need to create those uh very specific ones right but you know what we found in the qualitative study is that people are already doing this workarounds every single time so if we give them this tool they will be able to kind of um um create something that can kind of uh be a better solution than the that workaround they're already using uh which could become a better solution but but I think um we love to deploy that out in the real world and be able to collect data and see what people will be able to use it in the ways that we kind of identified here yeah oh yeah thank you for a great uh I noticed that is the first three projects uh people were using the pH to point at something and uh so they needed the ability to be able to aim at something and point it but in the last one it was a head mounted or body mounted camera I'm curious about your thoughts on the hardware for like the weight of the phone the people prefer to point it at and also are there small systems available that you dis attach yeah um so about the the form factor and the device um so so I think ultimately I imagine uh like so okay so when we did the original visin projects actually we we experimented with uh running it on the Google Glass and back then the hardware was not good enough and it overheats very quickly and there is no uh talk back or there's no voice uh screen reader on the Google Glass even though it's running Eng Android so then we kind of fall back to using like a device a smartphone that many people have and uh we chose iOS because iOS is the preferred uh uh like smartphone that blind people use um but I don't think that is like a the only choice uh so in the last one actually um uh so for this live visual description uh you can also use a phone perhaps or a a wearable camera but but I think as wearable camera uh like wearable classes that with basically cameras speakers and processing power become increasingly available and more powerful I I think that would be a a great form factor to support many of these yeah one that's EAS people like yeah so uh matter Rayban is a very promising option and I think they are partnering with B eyes if I remember correctly and also Ira but there's uh they don't they don't have open API yet for the video streaming mode um but I've seen people create hacks and workarounds as in they create a kind of a stream a live streaming session through Instagram or WhatsApp and they be able to intercept that audio feed and process it and give back the the audio so people are figuring out creative ways to do that but one day um it will be open hopefully and then we'll be able to create many assist Technologies on top of that you says that I use lot like a lot of Technologies um in in real life but like monetization what is like the monetization for a product like this just to make sure it's affordable but has enough funding that can get maintenance and yeah so you're asking about the maybe the business model of how to make these sustainable yeah so so I think um the example of U vience I think is is perhaps a a a good outcome so we we developed the initial prototypes and then uh we deployed it and uh there were a lot of discussions on like online Forum such as the Apple Vis of people discussing how to use it Etc and uh Apple eventually build it in to their operating system to support you know uh it still only work with devices with Lars but um that's a big step forward and people will be able to use it for free um so I think that that that's perhaps a a good model but you know even um you know when we put these apps out there uh we we make them free um and uh to to to help more people rather than you know try to sell it yeah for sort of yeah work in understanding kind of personalization in context um I was wondering if there are any sort of methodological approaches that you've you know seen yourself kind of gravitating towards for understanding these types of consideration H so I'd love to talk talk to you more about that I think uh end user programming has a like a rich body of literature looking into how to make it um you know high ceiling low floor and wide walls to enable that so there's a lot of commonality although kind of uh uh making that work for um people with disabilities like you know for example thinking about blind people trying to create visual assistive Technologies without being able to see the output right so there's another kind of uh verification uh uh feedback loop that's that's missing so how do we support that I think there there are interesting challenges there yeah some ways in reference to that last comment I'm curious especially on like program Al Ali if there were like if you saw debugging basically and if so how that worked like I'm not sure if people are creating kind of like programs that were complex enough to have really complicated debugging but I'm imagining like if you're using a screen right reader like debugging is probably harder because that's like kind of linear information processing whereas visuals yeah um so in the paper we did describe a bit more of so we do see debugging uh or even cross check and you know using the contacts that they already have to validate their understanding um and we think like there are opportunities for example like you know maybe combining that with World scribe right so you you have this live camera feed and you'll be able to author this program and then you can run it and you know there's a laptop so you you like you can use that to to see like whether your program is doing the right thing as you as you expect it and use that that way and there might be other ways to to use maybe like uh generate like generating a simulated uh uh uh environment for the user kind of using AR to like to tap into that spatial um understanding uh so I think there are some some interesting approaches there to explore I'm curious to what extent you think the the gaps that remain are underlying sort of perceptual gaps in the models like the model just cannot say recognize what is needed for many of these tasks and we have to wait for them to get to that point or to what extent it is essentially about sort of filtering and shaping like what I took away from sort of the world Dream Work was at some level that I was like it was just tuned to produce the wrong kind of output in cases and then if we just did the simple Step at some sense of fine-tuning it for what people actually need we can get much better I'm curious where you see the big blocker is it like actually still at the perceptual level or is it something on sort of how that's being synthesized and presented MH um I think in general cases like um maybe in 90 plus% of the cases I think the perceptual model the model capability is already there but there's a gap between what the model is capable of and how the user is able to use that for something that they care about right like whether it's turn by turn or whether U so so the current interaction modality has been this like turn BYT approach I ask a question I get a response even maybe in the advanced video of advanced voice with video mode it gets pretty good but it's still like a Q&A kind of ter back and forth approach that's kind of an optimistic take not sorry not not sorry not that you're not calibrating that's like that's a good thing that suggests that through design application programming and so on we could actually cross that Gap whereas if the perceptual layer is just like not not capable of it then I think we're in trouble yeah I I think now the perceptual um uh stuff is there and we we can catch up by providing these tools and better match the um basically better kind of Leverage the AI could ility for HCI use cases um but once we get there we might see like you know there are further gaps in either places and then we can U iteratively improve both sides yeah one other question I was chewing on throughout the talk um on the sort of DIY and user programming side um one of the things that you know you know as well as I is that from people tend not to customize stuff do you feel feel like there's opportunities here to break through that barrier where before it's been very Challen like no one changes any of the settings in Microsoft Word or any of these things but like you know for this Vision to come true we need to help people feel empowered to shape those tools how do we cross that Gap yeah maybe more of an affective Gap than a or or than than anything sort about the tech per se yeah so so I think um if there will be a breakthrough I I uh would be optimistic that you will happen in the accessibility space first because um people with disabilities are often at the Forefront so uh at adopting AI Technologies early on so this is a article written by Jeff bam and Patrick Harrington uh back uh back in HCC so so the idea is you know before the technology is perfect people are able to use it to enable something like they are not able to do before maybe from zero um to not possible to maybe 60 70% but only when that technology such as speech recognition become 95 99% accurate then the the mass uh the mass public will be able to adopt that so really by by building assisted technologies that working with people with disabilities you are able to uh uh uh first enable like provide technology enablers to to uh improve uh uh their lives but at the same time you'll be able to really get into the the Deep human aspects of how to design these Technologies for everyone so I think that's how I see like accessibility um benefits um everyone in that case yeah all right let's thank our speaker

2025-03-07 20:43

Show Video

Other news

Квадрокоптер GT8 4 Drone за $35 с AliExpress: обзор бюджетного дрона с ТРЕМЯ камерами! 2025-06-04 04:00

Salesforce to Buy Informatica, Apple’s Tariff Headwinds | Bloomberg Technology 5/27/2025 2025-05-29 12:47

"Атом для Мира". Как США Установили МИРОВОЙ Ядерный Порядок 2025-05-26 20:39