Envisioning Navigation: Harnessing AI for Assistive Solutions
Maria candelan who is a senior machine learning researcher and an amazing woman take maker Ambassador who wants to make AI accessible to all and she just LED uh two machine learning with professional machine learning engineer courses for New York and Boston so I love working with her Roya you have the floor thank you very much for the introduction thank you for having me I really appreciate I appreciate it and we just listened to two very amazing Talks by Peter and Alan I'm personally learned a lot from them in my professional work and I'm going to change the gear a little bit here to see how we can use AI for a different purpose so here we are going to use uh we are going to use AI for assistive navigation Technologies to help people with visual impairment so before we dive in into the technicality of this work um let me introduce you uh to myself my name is Roya and it means a very sweet dream in Farsi I was born and raised in Tehran and then I moved to Dallas to continue with my grad studies under the supervision of Professor commissioner maduri he is a well-known scientist in the field of unmanned Airborne vehicle after graduation I moved to Boston and currently I'm working as a senior research scientist at Aver and we are working toward basically building a biometric solution that are free of bias and are impartial I'm also a proud member of women take maker ambassadors and that's a program provided by Google to bring more women to a stem and to break the barriers that has been built historically for women to work especially in stem areas because we are trying to hold each other hands and lift each other and grow with each other because we are stronger together I'm also a core organizer with gdg Cloud Boston as Miss Anna mentioned uh we are trying to provide more education to the community to bring Ai and different Technologies to everybody around us to make it really accessible for everybody so without further Ado let's um let's start with how it all began as I mentioned I'm from Iran and my advisor is from India I want you to imagine yourself if you were in an area that you have uh it is unknown to you and your eyes are closed I want to really try to get the fear that you might feel the fear that you are feeling is exactly reasonable because vision is the most precise channel to gather information and reasoning location people who have visual impairment try to use other senses to collect the information that they require to have to basically understand where they are in their location and assistive Technologies can help them to essentially gain this information but most of these assistive Technologies and essentially the CTS structure that can help people with visual impairment navigate independently belongs to more developed country when you are when the person is in a basically more developing countries some of these assistive Technologies can be really expensive to get and sometimes they are not even available based on my background um I think I I was more exposed to this sort of challenges so with the help of my advisor we were trying to build a technology that is not just a theoretical um easy way for a smartphone which is on top of the line but something really affordable that people who are maybe having some financial difficulty can also afford so to build this technology we require to answer few fundamental questions so remember that we are trying to locate a person in an area we want to know what is the best path for them to reach to their destination and we need to make a decision about a map in addition to that given the difficulties and challenges of visual impairment be required to know information about the occlusion and dangers in the area on top of that all of this disinformation must be somewhere convey to the user initial designer sketch that we came up with was as you can see in this slide the the main core block of the system is a wayfinding module it is responsible for finding the user initial location interacting with the map and also navigating the user to their destination given the user has a visual difficulties we want to get in some information about the dangers in their area and be able to pass all of this information compactly to the user okay breaking it down takes this problem into three different stages the first stage would be the pathfinding as I mentioned for pathfinding we need to know where the user is initially because they have no clue about their basically their initial location also we want to be able to find the best path for this user to reach their destination and to find the best set of actions to follow to reach to the destination in addition to that we need the second modulus obstacle avoidance to First detect the obstacle in the area and second provide information about the relative location of this obstacle toward the user lastly we want to have and we want to First give the user a choice whether they what extent of information they want to hear from us and we want to be able to communicate the instruction about the actions that they must take to the user and lastly we want to be able to receive input from the user so let's see in this presentation I'm trying to walk you through all of these stages and see how we can tackle each problem but before we get into each of those problems given that we had this constraint of um budget we require to know what is the best platform for this work again because of the um the initial thought that we want to be helpful to people who might not have the financial means to get the best um basically a smartphone we wanted this platform to be an accessible and affordable Gadget we wanted to be able to do so minimal computation and we just for localization and other matter be required to have internet connection even that we want to do localization and navigation we wanted some inertial measurement unit or Imus to be able to reason in location at a time we came to decide to use Samsung S7 this project has started in 2016-2017. at that time this phone only crossed 120 dollars which relatively was a small amount of money for a smartphone it was able to handle Android Marshmallow with octa-core CPUs and a tiny GPU it also basically supported VLAN and Bluetooth and had accelerometer and gyroscope so all of the requirement that we had for this platform were basically Satisfied by this one now that we know the problem and we know where we want to go and what platform to build it upon we are we are ready to basically dive into finding the best solution the first piece of this proposal was to find the initial location of the user as I mentioned we are thinking about the user who cannot result in location so they have no clue about where they are in their um where where they are located traditionally people think about GPS or trilateration to reason about location GPS signal are essentially satellite signal helping us to figure out where we are and how to get to the destination however GPS signals are not available in an indoor scenario and trilateration are heavily prone to noise because of all type of um in basically signal to noise ratio issues and shadowing and bouncing and all of the problem that is signal can encounter when it's try to transfer in an indoor area essentially we cannot calculate um geometrically on where we are just by using three lateration and fight all of this noise can actually come to our health there is this other method called fingerprinting by fingerprinting we are trying to build a basically a barcode for every location in an indoor area when you are working in an indoor area and your phone is connected to the Wi-Fi you are constantly getting signals from all of the available Wi-Fi beacons and your phone is measuring which one has the highest uh basically a strain and tries to use that one to connect to the Wi-Fi but you always get all of this information so if you were to walk into the hallways and collect information about the strength of all of the beacons that are available to you you can build something called a radio map which is a coral an association between the real map of the area and how much each of this wi-fi beacons have a strength and in those locations so you are building this signal strength map it requires a lot of data collection and data collection is always the most challenging part the first part was me for to go and walk through the area of my choice which was the department of electrical engineering at the University of North Texas and annotate the map of the building with one feet a square and then I needed to build an application to essentially have somebody walk through this um one square meter um location and tell me what is the beacon signal strength in each of those that was the first time that I had to build an Android app thanks to Dan golfing who is a Google Android developer Advocate and his amazing course on developing Android app I built my first app for this data collection the app would allow me or any user who would help me with data collection to enter the coordinate of x y coordinate after location based on the annotated annotation that I already made in the building and it would collect all of the signal restraint for all of the beacons that were available to me by walking throughout these hallways a thousand times and getting enough of information and some more data processing on this data I was able to pass this information throughout the neural network and get the information about the x y coordinate of user location at this point I just checked one of the pieces of this puzzle I I could say I know where my user is when they are seeking my help but the bulk of the problem is still there I needed to know how I can get them from their initial location and take them to their destination or quote unquote navigation traditionally navigation is happening through pedestrian dead reckoning it's a fancy way of saying that I know how much is my asteroid and which way I'm having and by putting one step after the other I can tell where I'm going however this method is prone to error there are two main sources of error one is the compound error that happens because we are building the next step on top of the priorester so if I have one degree of um basically um miscalculation in my first step and I take a hundred step instead of going um to the north I might just go to the South or something in that thread the second problem is the measurement error as I mentioned on the phone we have accelerometer and gyroscope accelerometer calculates acceleration for me to get from acceleration to displacement I have to integrate acceleration to get to the velocity and integrate over velocity to get to the displacement it means that a tiny amount of error when it comes in acceleration can be a significant error when it comes to displacement so it raised in Need for some error mitigation method the method that I use for that was particle filter in 2016-2017 when I was working on this project Google self-driving car was the word of the math and Dr Sebastian third who was a faculty at Stanford um while he was working on this project he had a course on basically navigation and he would discuss different methods of removing the error with navigation essentially one of the interesting method that he mentioned after the traditional method of let's say Coleman filter or extended Coleman filter was particle filter the idea is that um you basically have all of these particle and based on the observation and measurement each of them has a probability of continuing the progress and by resampling and just choosing those that have the higher probability you are removing those particles that can do something impossible the simplest way of thinking about it is that you it allows you to make sure that your user is not going to the second floor if you do not have enough of measurement to support that your user is passing toward a staircase partial filter is amazing because it can handle basically non-gaussian noise and non-linear system which are the requirements for navigation in an indoor scenario at this point I was able to tell where my user is and I was able to follow them along the path that they are taking using the sensors that were on the phone and also uh using the error mitigation method to follow them along I still needed to find the best path for them to reach to the destination right now when you are driving and you use um Google Maps Google map has this cost associated with you turning um left being more costly than you being you turning right so if you can't find a path that allows you to turn right instead of a left it tries to do that to do that there is an algorithm called as star ASR allows you to find the best path not only by putting it to be the shortest path but to associate the cost with every decision that you are making so here because I was dealing with a visual impaired individual it was important to me that I don't want them for example to pass a staircase but that says that's essentially a costly action so ASR method would allow me to guide my user in the best possible and less costly pad to get to the destination with that being said I had a technology to allow me figure out where my user is what set of action they need to take to reach you the destination so it could be applicable to any person who is for example in a foreign country in an airport and they want to find their basically destination gate so they don't know where they didn't know where they are and where they want to go but they have Vision to reason about the dangers in their area however a part of assumption for this work was that the user is visually impaired so they do not have information about them obstacle in their area for obstacle avoidance I needed to be able to find these objects point this quote unquote obstacles and estimate the relative location of this obstacle toward my user and to be able to gain this information affected and enhance it effectively to the user obviously object detection comes to picture for object detection I use a YOLO a tensorflow version of YOLO which was available on Google API and I cannot talk about tensorflow and not to mention Lawrence Monroy whose courses on tensorflow was um really amazing for me in my professional career I met him in person at IO and it was an amazing event for me he has um this tensorflow courses and basically it allows me to get the earlobe and do transfer learning to introduce all of the new objects that are not a part of your law package for doing that I use a Google Cloud platform that allows me to take this model and retrain it just for a few top players to introduce new objects and thanks to Google the first three hundred dollars was basically free credit also in 2016-2017 we were introduced to tensorflow Lite it's our optimization and wrapper for the neural network models that you create essentially making it with a smaller number of bits allowing you to get this gigantic model and make it so tiny that you can fit it on an Android device it allowed me to have an obstacle avoidance module as well but everything so far doesn't convey information to the user so I require the way to be able to convey this information I wanted the user to be able to tell me how much information they want to gain whether they are on the run or they want to explore the area I wanted them to be able to tell me where they want to get and also I want to be able to pass all of this instruction to the user to pass this information to the user naturally I use Google text to a speech it would allow me to explain the options to the user and help them confirm um some of the questions that I had for the user and also it would allow me to give instruction and tell them about the obstacle that they have the more complex part of this puzzle is how to get feedback from user the first thing that a developer could do was to use tapping you always can have tapping on a screen of a phone as a way of receiving feedback from the user I used it for emergency situation for app to a start or for the user to tell me if they want to have some information right away but it would have been impossible to finish this project if it wasn't for good LS switch recognition API for people who are familiar with the field they know that building as first record a speech recognition API is immensely difficult because the amount of annotated data that you require to cover the diversity of tone and accent is it it's so much that it's not possible for somebody to build thanks to Google a special speech recognition API however I was able to gain information from my user so essentially take the information in a meaningful way and help them to declare their destination with that being said I was able to build a proof of concept for this technology and I was able to show that with putting at the right algorithm by side of each other you can actually help a person with visual impairment to navigate in an in the indoor scenario independently but two and at this point I got graduated uh but to take this problem to take this project and Technology to and to really avoid a real sport two more problems must be tackled first is uh when I was explaining the word I explained that I had a friends who were helping me with data collection to build a radio map the radio map was association between the signal strength from different Wi-Fi beacons and the location on them practically it's not possible to to constantly update the radio map and the radio map because of the dynamic nature of the Wi-Fi beacons are going to be um basically out of out of sync so quickly so to be able to really build this project in a real life we need some P we need some way of incentifying people to help us to collect this data and constantly updating the radio map we can use crowdsourcing essentially to have other people helping us with this um updating the second problem was inside in the initial uh basically initial problem I mentioned that in the fundamental problem I mentioned that the usage of a map is a question and in the in the explanation I mentioned that I used a priori information about the map and I knew I do have the map but essentially if you want to use it for a new area you can the map is not available so there is this method called a structure from motion that helps you to take a few picture and turn it to a 3D map and a 2d map to be able to use it for this project with that being said I'm pleased to say that I'm also a part of a PhD Committee Member right now for people who are continuing this work at University of North Texas once again thank you for being here and thank you for this opportunity and I would be happy to answer any question thank you we have questions area that was awesome thank you one question for you do you happen to have um your recent paper and maybe the the report available for anybody who wants to play with don't collaborate with you I do not have a repository for this but my dissertation with all of the code is available I think on University website awesome and uh where which chapter are you with and do you want to tell a little about it because you've been training a lot of our community on some of these Technologies on a regular basis of course I would love that so I'm a part of gdg cloud Boston and we are trying to help um so I'm an ml scientist obviously and naturally I'm tend to teach ml we recently finished a work uh finished the certificate for road to ml in Boston and New York and I'm happy to announce that uh by the end in late September we are starting a Row 2 tensorflow certificate based on the book by Lawrence yes so yeah sorry Alan I cannot hear you sorry about that um that was a great presentation thank you very much I I think one of the things that that I had while going well watching go through this are um what were some of the the biggest challenges that you faced as you were going through it or what were some of the setbacks that you encountered and how did you work past those so to be honest with you this project took me four years so my whole PhD okay I know my whole PhD was working on this problem and uh for each of these algorithms where they required for me a learning curve to get to them to get to find the right algorithm to do the work and also to be able to build this on an Android device as I mentioned um the Android device that I was using was computationally not really powerful so one thing that I um vividly remember was to build the particle filter and to find what is the basically optimum number of particles that I could use to make sure that my user is still safe but my my my device can handle the computation it there was a there was basically a suite as well to not to break the code and also not to lose the procedure well okay thank you very much thank you very much
2023-09-11 20:43