The Future Of AI And Accessibility With Be My Eyes
(screen whooshing) - Hey guys, welcome to another "Double Tap" on YouTube. I am Steven Scott. - And I'm Shaun Preece. Hello.
- Hey Shaun. How are you doing today? - I am beautiful. It's good to be back. How's it going with you, Steven? - Good. I am so pleased we're here today.
We're talking about Be My Eyes, which is always, I think, at the forefront of new technology. It is so exciting. Over the month of May when we were away, a lot was going on and I thought it'd be a great opportunity for us to catch up and just kind of almost take a breath, and look at all the amazing stuff that's been going on over at Be My Eyes.
Mike Buckley is their Chief Executive and he is back with us here on "Double Tap." Good to have you back with us, Mike. - Great to be here.
I think the lesson from your vacation is don't next time. - Yes, I think you're absolutely right. That is it. (Mike and Shaun chuckling) And you know, the thing is, after all the stories I've told about this vacation, which it was wonderful but it did have its flaws, we've booked in to go to the, I think the world's biggest collection of things that can kill you country called Australia. That's our next big holiday.
So yeah. What could possibly go wrong, Mike? - Yeah, there's only like, what, like 17 snakes per person in Australia or something like that? (Steven and Shaun laughing) - That's right. So it's gonna be fine. It'll be absolutely fine. I'm not gonna step on one of them or a jellyfish or something.
I'll be absolutely fine. Look, lots going on with Be My Eyes. It is such an exciting time because you've got so much going on. I want to sort of wind back a little bit into May and talk about OpenAI and the work that you've been doing with them. Now, you've talked before with us about the relationship you have and it clearly is flourishing. - Yes, it is flourishing and they're an incredibly valuable partner to us and they treat us, but more importantly they treat, I think, our community with an immense amount of respect.
And they brought us in early pre-alpha to kind of test this new model and it's remarkable. - Yeah, so let's talk a little bit about this because we're talking GPT-4o, the o standing for omni. And what we're talking about here essentially is a multimodal concept. So we can speak to it, it can see as well, almost I'm guessing some kind of live video, and you guys put together a video at Be My Eyes kind of explaining some of the features and how they work in reality, especially for those of us who are blind. Let's take a listen to that first.
- [Man In Blue] Is the King in residence at the moment? - [AI] Yes, it looks like the King is in residence right now. The Royal standard flag is flying above Buckingham Palace, which is the signal that the Monarch is present. How exciting to be there while the King is home! - [Man In Blue] Try and tell me exactly what they're doing right now, please.
- [AI] Right now the ducks are gently gliding across the water. They're moving in a fairly relaxed manner, not in a hurry. Occasionally one of them will dip its head under the water, probably looking for food and then pop back up. - [Man In Blue] I even know when a taxi is coming with its orange light on. I think I'll hail it to get home.
- [AI] Yes! I spotted one just now. It's heading your way on the left side of the road. Get ready to wave it down. (taxi engine rumbling) Great job hailing that taxi. It looks like you're all set to go.
That's a good dog right there leading the way into the taxi. Safe travels. - So, I mean, that video just is incredible because, you know, Shaun and I both were blown away. I mean, I know separately you were Shaun 'cause I wasn't contacting you at that point, but I know you were, you sent me a voice message and you were just, "Wow, this is amazing. What is going on?" - Who wasn't, I think everyone who saw that, (Shaun coughing lightly) excuse me, I'm so excited.
Everyone who saw that video would just, okay, this is the next level. It's like when we first saw the Be My AI demoed, right? It was just, wow, I can just. How useful is that? It was absolutely amazing. You just dropped the mic when you dropped that video. It was amazing. - It was almost emotionally overwhelming.
You know, our Vice Chair, Bryan Bashin and I were on a call with OpenAI weeks ago where they said, "Hey, can we demo this for you?" And of course, right? And there were five people on the video conference screen and they just started talking to this assistant. And the assistant was batting ideas back and forth like you and I are, Shaun and Steven, like conversationally, and like, they interrupted it sometimes and sent it in a different direction or asked it a different query or question and it came back and it could be funny and it could be sarcastic and it could, you know, give you information. It told everybody's names who were on the screen. Oh, there are five people. There appeared to be three women and two men, which was correct.
And so it was powerful. And the thing about it for any of your listeners and viewers who haven't, you know, seen it or played with it, you know, imagine that you're able to have a real-time conversation with someone very smart or something very smart, right? That literally just converses back and forth with you and helps you interpret the world in any way you want. In the videos we did, obviously, you know, it had a lot of information about sight needs, right? But you can ask it, you know, anything. You can ask it something out of an encyclopedia or, you know, a chemistry formula or you name it. And so it really felt like a glimpse of the future. And as I said, we were emotional kind of during and after having, you know, seen the remarkable power of this thing.
- There's a few questions here because you're absolutely right. This, just from an AI point of view, you know, the ability, the response time, absolutely amazing, almost instantaneous. And then the way you can interrupt it, change direction in a conversation. Absolutely amazing. Yes, everyone I think was blown away by that. But for me, I'm more interested in the, there's a taxi coming, it's on the left-hand side of the road, or the ducks are swimming gently across the lake.
Is this real-time video description? Is this where we are at with this? - Just about. Look, we talked about this a year ago, Shaun and Steven, right? One of the first times we got together after the initial GPT-4 model was launched, we talked about what we thought the future was. And I think we, the three of us agreed that, well, it's certainly probably gonna be on a device, some type of wearable, right? So where you have a hands-free experience and we're going to get to real-time video interpretation. I did not expect it nine months later (Mike chuckling) or 12 months later, wherever we are right now. But Shaun, that's what it feels like, right? And look, I haven't played with it in a GPS-challenged environment, so I don't know, right? I've had a good phone connectivity, so maybe there'll be limitations there. I haven't really seen how massive the token usage is gonna be, right? Like, and for your listeners and viewers, like, every query in AI, right, needs compute power and that compute power costs money.
Well imagine if you have that compute power on all the time, interpreting your live surroundings 24/7 or even just for an hour, right? So we're gonna have to figure out some things on costs in terms of how it works. But yes, Shaun, that's a long-winded answer to yes, it is basically real-time video interpretation. - [Shaun] I just wanna pick you up on a couple of things there, Mike, 'cause you're saying- - [Steven] We're both jumping in here. - There's so many questions. Yeah. - Basically this is real-time video and, you know, it kind of is.
Is it? Is that how it works? Is it actually looking at sort of, I don't know, 15-second bursts of video and then analyzing that? Or is it just taking bursts of still images and giving a sort of a spread description of all those images? - Yeah, I'm gonna leave that one for OpenAI. I don't think they've disclosed exactly how that works. - So sorry.
Well, I guess the question to you then, Mike, is, because I think this is the thing for us, what we saw in that video, is that potential or is that what is coming? - That's we used it. We put the model, an early alpha access of the model onto Andy's phone, our colleague Andy Lane who did that video that your viewers will have seen it or heard, and he just went around London and played with it. And so it's not edited, guys. Like, we didn't, like, reduce latency, right? We didn't say, oh, you know, cut it again. You know? - Yeah. - It was his literal interactions about, you know, the King being in residence or the ducks and the taxi.
I mean, look, they're gonna release it in limited API, there's going to be a whole lot of testing, right, on this, but like, that was in his hands and that video is true. - That's all I needed to hear. That it's true. - Yeah. It is incredible. It's so fascinating to know that this is what is coming down the line for us and the potential for that. I mean, you're right, mileage may vary depending on what signal is like, and what the computing power may be, and what the capability is to offer that all the time.
It's gonna be interesting to see 'cause there are limitations, essentially, with this that will come into play, but we are starting to see, I mean, we saw with the Microsoft announcement, I'm sure you follow that as well, you know, the news about the new Copilot Plus PCs, which are gonna have these incredible processors in them. Apple silicon working as well, and we'll likely see some improvements in Apple silicon this year, which will accommodate new AI functionality. So the hardware is getting faster, and also working on device as well, meaning that all of this capability is gonna become even more possible longer term without it having the impact on CS server somewhere else. You're not connecting to it, it's actually on your device, right? - Yeah, for sure. Look, I think the way we think about the future at Be My Eyes, and as you said, it's not the very distant future is, you're going to have powerful AI capability everywhere, right? On your devices, various software programs, right? The idea that anybody is going to have, you know, any real moat on AI, I think is a fallacy, right? OpenAI is ahead on this model, other models are gonna catch up, right? And it's gonna be everywhere and it's gonna be all around us. And it's really gonna be a question about you harnessing, right, the AI in the way that you want and the way that it works for you, right? And ultimately that will encompass a layer of hyper-personalization, right? I'm not just talking about voice.
It's voice, it's language, it's tone and style, right? Verbosity, short bursts versus long explanations and things like that. Like, all of this is coming and we're right there. I think we still have to have a degree of caution, right? I haven't seen the new model have a, like, a big hallucination in playing with it myself. But, you know, on the margins sometimes there are seemingly minor errors, right? If you were looking at a menu in real time, it might get one price wrong or something like that, you know? So we still have to pay attention to that, and I still think we need to make sure we have options of having humans in the loop. But gosh, it's exciting right now. - Do you think you are going to be looking at a subscription for certain different levels of features? - God, I hope not, Shaun. - When it comes to Be My Eyes?
- I hope not. You know, we've talked about this so many times that our ethos at Be My Eyes is we can do everything in our power to make it free. And right now, like ,we're doing that and our business, the financial side of our business and our company relationships are growing very, very nicely. We're actually hiring a couple of additional salespeople, and so our intent is to try to make as much of this free as possible. You know, I will never say never. I think the first time we talked about this over a year ago, I said, I'll never say never, but it's a very important part of our ethos to do everything we can to make it free.
And so in the near term certainly, near to midterm, I don't think that's at risk. I think when we get actually full release of real-time, you know, video interpretation whenever you're awake, right, I just have to see what the costs look like. We don't know yet, right? So I need to hedge slightly 'cause I never want to, you know, be dishonest about this, but we're gonna do everything in our power to try to do this for free. - Okay. While I've got you, I wanna talk about Windows because the other big announcement in May was that you were launching the Windows app. So tell us a little bit about the app and what the inception idea was for that.
- Yeah. I think that the fundamental issue, that you know better than I, is when using either a Mac or a PC, even if you have a good screen reader, there are functionality issues. There is, you know, are things that could be done better. And so we started working on this three months ago, I guess, to prepare an app that would work in the Windows environment, right? And functionally give an AI experience in the Windows application. So now this application is available in the Microsoft app store, and what it can do, some of the things it can do is it can describe your screen, right? So Be My AI can instantly describe what's on your screen within a window or other onscreen content.
And this could be websites, applications, documents, providing accurate and detailed description. Secondly, it can also describe local images. Local meaning if you have a picture stored on your computer, right? So you can use Be My AI's image functionality and get, you know, pretty clear and relatively concise descriptions of images that are stored or graphics, right, that are stored on your computer. So it's a big deal. Further, you wanna also be able to describe online images, right? So maybe it's a graph, maybe it's a bar chart, maybe it's online pictures and content.
It enables you to copy that image to your clipboard and let Be My AI generate a very detailed text description. The other fun one is that it can also help you prepare for video calls. So it can assist you and tell you if you're centered in the frame or what your background looks like, and you just take a quick picture and it centers you for your laptop camera or external camera for calls.
And so, it's trying to merge, Steven, a bunch of the functionality that we've seen our community really enjoy from the Be My AI experience into more of a desktop environment. And where this goes, right, other than, you know, we're gonna improve this. I mean, there are already some kinks, there are some shortcuts that I think need to be changed because they overlap shortcuts of other products and there's a little bit of functionality, UI, UX that I think we're gonna continue to tweak, but you know, what I encourage everybody to think about is not just, you know, your home use on a desktop, but think about the workplace. Like, this is one of the things that I really want to do.
Having this functionality in a workplace environment, I think should, you know, change employment, change productivity in a positive way. And by the way, build revenue for us, right? We'll kind of sell this suite of services, right, for an accessible workplace, which hopefully helps us pay for all the goddamn video GPUs that we're gonna need (everyone laughing) in a couple of months. - Pay for Steven looking for his bus 24/7. - Yes. - Steven all day, right? (Shaun laughing) So look, I'm really excited about it and of course, you know, we're going to try to do something similar for Mac and whatnot, but really excited about this and the reception.
We had 10,000 AI sessions on this product in the first 24 hours. - Wow. - It was kind of cool. It surprised me actually. - Well, I'm not surprised at all. I tested it out and it's all to do with your reputation, of course, at Be My Eyes, you know? And I downloaded it, I tried it out, and I think the beauty of it is it's just so simple. And I'm not behind the scenes, obviously, but I mean, basically you just tab around, there's four different buttons there so far, and as you mentioned, you know, tell me what's on my screen.
I just hit that and bang, it's telling me. And the detail, as we're used to with Be My AI, absolutely amazing. It told me the time that was on my, you know, in the bottom right-hand corner of my screen.
It told me all the icons that were on my desktop, far too many by the way. And the other option I tried out as well was, you know, was it tell me what's on my camera or whatever it is, where it actually turns on the webcam, like you mentioned, for a video call. So before I did this, I actually did it and it told me exactly what was in the background, if I was centered, you know, and I asked it. And the other thing I think, by the way, is the ability to ask follow-on questions, which we're used to anyway with Be My Eyes. But the ability to do that about the screen, your computer screen is another level.
It's very nice to say, okay, tell me more about the whatever icon on my screen, or maybe you can get some more information that may help you get out of a problem when it comes to accessibility. I was really impressed by it. - Well, I'm glad to hear you say that.
I think, you know, there are a bunch of things, as I said, that we need to improve, but like, you know, the big point that I think you're making that I agree with is technology's worthless if it's not simple, right? And I'm not just talking about for someone who's blind or has low vision, I'm talking about for anybody, you know? - [Steven] Yeah. - But maybe even more so for our community, right? Like we did a giant study of our users and there are a bunch of reluctant adopters to technology because people get set in the things that work for them in their ways and like, they don't wanna mess with their own personal stack, right? So if I have four things that I know work for me as a human being, irrespective of sightedness, right, I kind of hang out in those four things until something comes along that has to do two things, it provides extra and excellent utility in something that my current stack might not be provided or better, right? And second is, it's gotta be simple, right? If it's 26 steps to download, if I can't figure out, if it's not reasonably intuitive, it's dead in the water, right? And so I'm really pleased that you thought it was kind of simple and easy to navigate, Shaun. So thank you for saying that. - Yeah, that sounds great. You're doing amazing stuff, Mike. And you know, it is again, and Shaun said it, you know, your reputation is so high and has such value in our community because of what you're doing, but also because of the fact you're working so hard with it. It's something I talk about in this show.
A lot of companies need to work with disabled people. We need to be in the room, we need to be part of the picture, not just sitting on the sidelines. We need to be part of the process, part of the build, and you're encouraging that, which is fantastic. So I wish all the best with it and let's just see where this goes. It's gonna be such an exciting year again for AI. - Yeah, I agree.
And Steven, thank you for the kind words, but you know, I still feel like there's so much more to do. We had a really interesting conversation with a couple people the other day who said, does anyone know if any of these models, these large language models, have disability data sets in them? And the answer is probably no, right? Like, if you're scraping the entirety of the internet, right, what's gonna be in there? There's gonna be ableist tropes all over the place, right? - Yeah. - And so we're doing a lot of thinking about not just the short-term future of this brilliant new model, but the mid- and long-term future about how we make sure that different abilities, right, different ways of navigating the world, navigating your environment, consuming information are part of the training in all of these models, and so, look, they're the micro issues of does the product you just launched work, does your windows, you know, app work well, does Be My AI work well? And then there are the bigger societal level things of, okay, if in a couple of years we're all gonna have our own personal AI assistant, if that AI assistant hasn't been trained on disability data sets, is it gonna really serve your needs or our community's needs? So it's something we're doing a lot of thinking about now, but I still remain incredibly optimistic about the potential and where this is all going. - Yeah, absolutely. - Yeah. - Mike Buckley,
always good to have you on. Thank you for coming back onto "Double Tap." - Thanks for having me guys.
Always good to spend some time with you. - And there you go. Another amazing conversation, Shaun, with Mike and just a great chance to learn about what's going on over there at Be My Eyes.
I am so excited. And do you know, I may even have to boot up my PC (Shaun laughing) just to get this Be My Eyes app ahead of time. So you know, I may go and do that this afternoon. - You try and get out and they drag you back in.
Of course you will. - Every time. Be My Eyes always surprising us with amazing stuff. They're fantastic. - Keep in touch, drop us a comment below if you'd like to send your thoughts to Mike, and of course us here at "Double Tap." You can do that. Remember and hit that subscribe button as well to make sure that you are getting all the brand new episodes of "Double Tap" on YouTube.
And of course, check out the show daily on here, my audio, and on podcast. Shaun, catch you for another "Double Tap" on YouTube next time. - Thank you. Bye-Bye. (screen whooshing) (rock music)
2024-06-07 02:16