as lisa mentioned i'm the vice president and general manager of global education and literacy at benetech um our goal in life is to scale solutions for underserved populations and and really in global education literacy we believe that access to information is fundamentally what we need to be able to be doing as human beings we use technology to drive lasting social change so everyone can learn work and pursue pursue their dreams um regardless of ability regardless of disability um you know allow people to reach their potential so many of you have heard of bookshare some of you may not we're the world's largest library of ebooks for people with reading barriers blind low vision dyslexia or other learning disability or mobility impairment that keeps you from using a printed book so if you can't hold the book can't see the book or can't decode the book you qualify for bookshare if you don't have a disability then you cannot qualify for bookshare it's a very special library that serves people with disabilities only um and and in doing that we operate under u.s copyright law and now an international treaty that allows us to convert books under publisher permission or under or even without publisher permission convert books into multiple formats so we have books in audio we have books in what i like to call ebook karaoke synchronized text and audio electronic braille we have this industry standard publishing format and even a microsoft word format of all of our books which which is a great format for a screen reader which reads what's on your screen um the the the we can convert books any book that is published in the united states um without any permission the flip side of that is we we much prefer working with publishers and you'll see a little bit of those statistics later but most of our books come directly from publishers as donations to say hey you know what this is a great thing um you know help help us reach your audience with our books so they give us they give us the books we convert them into multiple formats and then we allow people to access those in in those multiple formats another thing that we do is we make sure that people can read it on any commonly used device whether it's a phone we don't care if it's android or or ios a computer a tablet um assistive technology devices i mentioned ebrail people go what's electronic braille it's a it runs on a little device that's you know about the size of uh maybe three or four cell phones stacked at up to the size of a kind of a standard keyboard but it's a basically reverse typewriter for braille the pins pop up underneath their fingers and then you hit next line and so you can just read a book and it stores hundreds of books on the device and allows people to read on braille so again we support any kind of device online and offline we want people to read their way and that's and that's that's super important um personalized learning is is super critical um we're funded part of our funding comes from the united states department of education the office of special education programs and and certain students um [Music] read in different ways some read with their fingers some read with their eyes somebody with their ears we want to be able to allow that student to personalize their learning experience so our goal is any book anytime anywhere in the format they want on the device of their choice um and that's and that's how bookshare has grown and in fact over the last you know 15 20 years we've downloaded over 17 million accessible ebooks to people um we have over 800 000 bookshare users in 95 different countries i mentioned our publisher partners we we work with over 900 publishers who donate content to us we also do go purchase books and scan them but a vast majority of our titles now come from publisher partners and we add something like 10 000 titles a month to the collection um we have books in 47 different languages so if a publisher gives us the little prince in french we'll put it in if they give it to us in english we'll put it in the collection if they give it to us in tamil or marathi or bengali or german or you know whatever we have we have books in a bunch of different languages um not only do we have the largest collection of books in the world but we power 15 national libraries around the world of some of some of the larger libraries in canada in australia in the uk um in in the middle east in in africa in southeast asia run national libraries around the world with the back end technology that also powers bookshare there is a call it a new normal or a now normal because it might be different tomorrow called uh distance learning due to a pandemic and we've had record growth through that pandemic because people have tried to figure out how to best get books to their students um and and um you know bookshare has been a fantastic solution for them um whether they're in the classroom or whether they're at home uh trying to get that same education uh you know bookshare bookshare allows you to download books that way and and through both bookshare and through our partners we we manage over uh 1.5 million books and and most of those are in remember in those five different formats so it's really 5 million different reading options for people with disabilities so that's a little bit about bookshare um but but let's let's get into kind of the meat of the presentation here that that that ai is important for what we do remember i said up front i said we scale solutions using technology for underserved populations and and you know the folks with reading disabilities reading barriers are are very much an underserved population the world blind union states that 95 or more of all content is locked in printed form so for people who go to the bookstore and browse the shelves and pull a book off the shelf and go buy it or walk into the public library and you know find a book and pull it off that those books are locked out locked away from people with visual disabilities so um so it's very much an underserved population as we convert books for that population um there's there's kind of multiple stages of it text the text conversion is largely solved um in in the 1920s optical character recognition scanning and a picture of a page and turning the words back into readable text in the 20s that was you know converted text and telegraph code and then in the 70s and and again in the 80s um improvements were made uh to that kurzweil in the 70s and then a company called calera in the 1980s interestingly jim fruchtermann was at calera he founded benetech and bookshare um in 2000. so then text in multiple languages is largely solved there are hundreds of languages that that you can scan and convert back into readable format script based languages roman character sets uh languages using a bunch of different diacritics like arabic languages the optical characteristics and the recognition ocr uses um some rudimentary form of ai and that it can learn and improve so text is largely dissolved scale is largely solved google drive and you can scan ocr google's ocr engine is pretty good and you can scan every one of your documents as it comes into google drive they store as of may 2017 they were storing over 2 trillion files i would submit scale is solved we import about 10 000 books every single month convert them into multiple formats um so skills not as much of an issue when it comes to text but what if it's not text what if it's stem or steam science technology engineering art and math so um that's that's a lot more challenging um on this slide i have some math equations i have a picture of the mona lisa i have uh the chemical formula for caffeine which is near and dear to my heart i have the uh the chloroplast stroma energy release i have the cell reproduction cycle that that's not scannable those can't be turned into words very easily um and and so all of a sudden you have some different solutions you put alt text on it to give it a quick example you know a picture of a woman well a picture of a woman isn't very descriptive when you're talking about one of the most famous paintings ever um you know so it probably deserves a long description um how do you describe that math equation that's sitting up there a chemistry formula church and graphs drawings and arts uh you know engineering schematics that that stuff that steam stuff is hard and so what that means is that it becomes very manual and very expensive and very slow and because of that especially in the global south a lot of lower income countries stem topics are not taught to persons with a disability after their primary school education because they don't have the materials you cannot study math unless you have special dispensation because you cannot get access to those materials but but but what if we could automate it like we did text remember i just talked to you about text and and bookshare when we started in in 2000 about 20 years ago bookshare started by using the ebook as a core format and that's that's an important important element there because what we did was instead of taking a book and reading it into a recording so human narrating it or manually transcribing the braille we took the ebook format and then we automatically converted it into text to speech so we have an audio version of it and electronic braille and all of our different we started adding more and more formats to support more and more different types of disabilities what if we could automate stem like that and so that comes to let's use ai to do it and so just a quick primer and i apologize this will be super basic for some people and i hope it's informative for others um you know ai is you know kind of all-encompassing um technology sorry all-encompassing phrase that utilizes a bunch of different uh technologies you know you can think of ai as the self-driving car um because it uses machine learning and computer vision and natural language processing and neural networks and a bunch of other technologies but but let me focus on on on these machine learning is really computer algorithms that improve automatically through experience and an example of that is a classification engine what category does something belong in so if i can train my model to say um this is a dog and this is not a dog and so i show it you know a thousand pictures of dogs and then i show it an elephant it should be able to say oh that's not that doesn't look like those other things what which of these doesn't belong um and so all of a sudden you can use this as a as a classification engine and you don't have to show it a picture it's already seen it knows that oh look a dog is furry and an elephant is not and so that's not a dog um you can use it in regression analysis um so predict the probability that uh this internet purchase is going to be fraudulent um so a decision tree things like that um computer vision so um automatic extraction analysis and understanding of useful information from an image or sequence of images and a question actually and i love this thank you a question came into chat um before the presentation that said um hey can we can we use computer vision to describe graphs and we'll talk a little bit about it but the sneak peek is spoiler alert yes you can do it um so extraction and also understanding useful information from a single image or sequence of images so um what what the way it works is it examines the data in very very small blocks to determine pixel density um so in graphs which areas of the image are aligned and which areas of the image are not and all of a sudden when you understand in very small blocks yes this is this has pixels this you know the pixel density in black this does not this does this does not you can start to recreate the graph um and describe it that way so we'll talk a little bit more about that that description um natural language processing read and understand human language this is probably the most used of everything right now because of the amazon alexa and the google assistant um you know it assigns probabilities to a given sequence of words and then you get pattern recognition and machine learning so um if i say really fast elephant or teleplan all of a sudden you know it's going to hear that at the beginning of teleplan a very rudimentary engine might not be able to tell the difference between elephant and teleplan so but it it'll learn over time and so we use it in data mining in machine translations in context sensitive descriptions uh speech recognition of course um so the last piece is neural networks a type of machine learning that attempts to recreate a human brain the way a human brain processes so what it does is it links together a bunch of different nodes or even a bunch of different other neural networks uh an interesting way to describe it is you know so um you'll you'll you'll go through a decision tree and come out with an answer and then that will determine whether or not it starts another action um one way to think about it is if you ask a man to describe i don't know why i'm hung up on elephants today but if you ask a man to describe an elephant but he can only feel the leg so close your eyes and tell me about this elephant and all he can feel is the leg he won't be able to describe the whole elephant but if you put multiple people um describing that elephant and they can each kind of piece together what it is that's that begins to get towards the neural network but even more importantly is what happens when that elephant starts to move if that if each of those people can communicate about how that elephant is walking you can start to put that process together so linking um different nodes and networks so so each of those technologies within artificial intelligence are used pretty extensively in some of the things that we're doing uh within bookshare so what blah blah blah machine learning blah blah blah train the model blah blah blah competing in standards how about what if we could use some of these techniques to address these challenges imagine now a book that comes to us from a publisher and um and because it's coming from a publisher we get the text they send it as an e-text as an e-book and so we don't have to convert the text into into readable format it's already there but the equations the math equations come in as images and and they do that because they don't know if it's going to be on a small screen or a big screen or a large monitor and the text will scale pretty easily but the image i'm sorry the mask does not scale as easily so they turn that map into an image and when it's a scalable vector graphic by definition it scales so you get you get text and svgs but what happens then is when it gets read it says solve the following equations image image image end of list and and that is what i call the massless math book um and so by the way that also happens when we um scan a book if we scan a math book we can use ocr to turn the text into uh words but we still need to deal with the pictures of the of the math equations all right so we use a class classification engine um using neural networks and computer vision the first thing we do is we go through and we we find all the math in the book we run it through a classification engine to determine whether it's math or not math what is not math in a math book you might ask i say it's a picture of a guy standing on a diving board with t equals zero at his feet and a dotted line to the water and t equals question mark and it says solve the equation right and so or find the equation that's not a scannable equation for us we can't turn we have to do an image description for that but the quadratic equation might be um you know hey here's the quadratic equation we know that this is a math equation so then we send it to our classification engine we get a confidence rating on whether it's math or not if it's image based math then we can send it to a very specific math scanning scanning engine ocr scanning tool for math we pre-process the image to make sure that it is scannable and then we send that to the ocr engine it comes back with the math equation we get a conference rating on that and then we either send it to manual approval cycle or we just re-inject it if it's confident that it's matching the equation we re-inject it into the book all of a sudden what you're doing is you're taking literally months of time to convert a math book that has on average 5 000 and sometimes 8 10 12 000 equations in it instead of having to hand transcribe every single one of those you're able to do that with this classification engine and special math ocr engine we're able to turn that math book in three days two days so what are the challenges there well certainly having the resources to train the engines um it can take weeks and weeks to train a complex model and and interestingly as we get deeper and deeper into this and go beyond math we have to determine whether this is a math formula or a chemistry formula or a physics formula right and and they're different and so all of a sudden you're building multiple models and training in multiple different ways it takes a bunch of computing power um we're a big amazon shop um and so we're able to scale up using aws but certainly not cheap um you know more data is more accuracy it's you know these these models are they're data junkies um the the more data you give them the the happier they are um and then of course tons and tons and tons of different content types um i i wish it would be hey brad gets to set the rules but there are standards that are in play because this is a global problem so once the stem items are described how you display them in that and that becomes a challenge as well and in fact i'll even use this example up in the top left of the screen you see 12-02-2020 that would be today's date but if you're in europe that would be the 12th of february because they go day month year we go month day year in the united states and if you're a screen reader reading that you're not sure if that's the date or if that's a math equation 12 divided by 2 divided by 20 20. so that's challenging right how do you there's there's some context there that needs to that that needs to come in top right of the screen i have t parentheses e close parentheses equals s parentheses t close parenthesis well that might be a math or a chemistry equation or a physics equation depending on what your variables are but a screen reader might just read that as test because it doesn't recognize the parentheses or the equal sign if it doesn't know that it's math so again multiple challenges there um i also have make sure that i can get displayed heading level two determine the degree of each of the following polynomials list two items image image so that that was straight out of a math book and it said determine the degree of each of the following polynomials and then there were two math equations and it said image image and the entire page in that math book when it's the math plus math book problem that entire page says image image even the word problems read incorrectly because they don't read as math but using the techniques that we just talked about classification engine we'll go pull those and now they're grayed out on this on the slide and i apologize we'll pull those images and turn them into math and it turns into heading level two determine the degree of each of the following polynomials list two items f left parenthesis x right parenthesis equals fraction start x to the four over x squared end of fraction minus 3.5 x to the 1.5 plus 0.85 math 1 of 2.
f left parenthesis x right parenthesis equals x to the 6 plus 2.5 x to the 4 minus fraction start 1 over 2 end of fraction x math 2 of 2 end of list all of a sudden you have usable math you need to be able to go back and forth to that because that's pretty quick but when you get practiced at listening to what you read when you get practically reading with your ears you can transcribe and visualize that equation and start to solve that equation that that's where technology all of a sudden leverages ai without us having to go identify whether it's a mathematician or not that's where technology gives us the ability to scale a solution we just went through and processed about 20 million images from math books in bookshare 8 million of them came back as math so we've just added eight million math equations in the bookshare for people to to read um so it's a i mean it's a complete game changer across the industry uh heading level two determine the degree of each of the following polynomials determines your degree in each of the following polynomials list two items f left parenthesis x right parenthesis equals fraction start x two there we go sorry started it again so that's one place we use ai um book image analysis for alt text and descriptions is another one rather than math what if we see that it's not math in our classification engine how about we then send it to another classification engine to say is this um a picture or not if it's a picture like a photograph why don't we send it to a photograph um uh identification tool and you know google has an uh has a photo ai uh descripting tool uh amazon has one microsoft has one right so they are commercially available they cost a bunch of money so you know we're still working through that but the ability to do image analysis to add alt text you know here's a coffee mug well is it a coffee mug to show you that cup you know the coffee mugs come in green or is it a coffee mug to show you there's steam coming out the top or is it a coffee mug to show you that it's a different vessel right so all text and descriptions are super important all textures easier long descriptions all often time need a ton of context often from the author um a recommendation engine we're all familiar with the netflix or the amazon type of recommendation engines if you like that movie you'll like this movie and it's based on how you ranked it and what other people who watch that same movie also like you know so lots of different machine learning and decision analysis there we are launching a smart speaker uh client and of course that uses the voice recognition stuff and and i mentioned the obstacle character the ocr optical character recognition so lots of different places and we will continue to use more and more ai within benetech um you know two quick closing slides and we'll open it up to questions um you know accessibility for for my audience for people with learning disabilities and for people who have visual impairment other reading barriers accessibility becomes a great equalizer form screen readers something that will read what's on your screen i mentioned that ebook karaoke synchronized text that's highlighted and synchronized with audio so you can see the words as they are read to you if you have severe dyslexia that is it's it's a brand new world because you stop focusing on what that word is and what that word is and what that word is and all of a sudden you get to follow along as well as understand the meaning of the page or the paragraph um you know accessibility from our recommendation engines accessibility from the text that we just from the math transformation that we just talked about um inclusive learning is is really really really more important than ever and we like to think our technology makes information accessible for all so personalized learning is the opportunity let me let me stop there we have just a few minutes for questions um and let me close my powerpoint here so i can get back to where these questions are coming in i mentioned that somebody just before the program asked what are the prospects for using machine learning to describe images or graphs marcus thank you for this question um in theory human volunteers could label large training sets of graphs according to preset guidelines specifying key elements to be described and how to describe them a system like this may provide mere instant type of feedback sighted people get from eyeballing graphs which is to say it will add a visual interpretation layer on top of just instead of just i'm on top of just reading data without capturing the essence of what is gained by the graphical representation so uh mark again great question thank you very much um that we actually about four years ago embarked on a project to describe the images in the top 100 children's books and it's really challenging to describe images the cat in the hat if you're a blind child reading the cat in the hat you don't know what a the cat in the hat looks like is it a cat with a baseball cap on is it um you know who would think it's a tall cat with a top hat on right so describing those images becomes kind of a challenge graphs and some of the stem topics are a little bit interesting a little bit easier than that and we we started working on a project that describes some basic graphs so you could tell that it was a upside down or a right side up parabola it was narrow or wide that started in the uh first second third or fourth quadrant all of a sudden you get some information about what that graph might look like you don't want to give every single point on the graph because you could be there for an infinite amount of time since they're an infinite number of points but but if you can give some basic information if you could say here's a parabola uh it is uh right side up starting in the fourth in the in the first quadrant um you would know that it's a two degree equation that's positive and and that starts to give you some information about that so um so great question yes we we we're looking at doing some of that stuff and there's some actually some other companies out there that are doing some interesting stuff as well another question came in imagine parsing diagrams is very different than equations could you speak a little bit more about how you parse and represent diagrams specifically so we we just we talked just a little bit about that if we can use computer vision to determine where there's pixels and where there aren't of course you have axes which become a little bit of challenge because you say oh there's pixel density here and is that part of the graph or is that part of the axis scale becomes a little bit of a challenge because you if it says you know where is the apex of this graph you have to know what your scale is but so it's not easy but you can certainly describe what um the graph looks like um and and arguably you could do it in pretty good detail interestingly of course the math books don't always have a perfectly accurate image so it says you know um where is this apex um and they might label it as one comma one but it might be shifted off so if we go too detailed we might find that it's actually in the drawing that we're describing because the computer doesn't doesn't match it in the drawing it might be at 1.251.25 so um so you know there's there's a little bit of accuracy that you have to play with as well um question came in what special math ocr engine i can take one more question lisa says thank you very much um what special math ocr engine do you use to render sophisticated mathematical formulas and equations so there's a couple of them that are out there we're using um in there's uh inft reader infpy reader um there's also a group that we work with called math text m-a-t-h-t-i-x um and i think right now in our in our models we're using math ticks because that also gives us the confidence rating on how accurate their ocr is based on the image that they get so check out math picks
2020-12-24