(bright music) - [Narrator 1] Welcome to "What That Means" with Camille, where we take the confusion out of tech jargon and encourage more meaningful conversation about cyber security. Here is your host, Camille Morhardt. - Hi, and welcome to today's podcast, "What That Means." We're gonna talk about scalability in artificial intelligence at the Edge, using the example of Audi. I'm very happy to have with us today, Rita Wouhaybi, she is Senior AI Principal Engineer within the Industrial Solutions Division at Intel, in the Internet of Things Group. Welcome, Rita.
- Thanks for having me. - You're an expert at the Edge and AI algorithms, and I'm hoping you can start by just giving us a general sense of the difference between AI at the Edge versus AI elsewhere in the Cloud. - Many of us are used to AI in the Cloud. We actually carry AI in the Cloud on our bodies, in our pockets, on our computers, every time we're asking for directions from point A to point B, we're accessing AI in the Cloud. AI in the Edge is when the AI does not run in the Cloud. It runs where the user is or where the usage of that AI is.
So I'll give some examples. What does that mean? That means if you are in a hospital and there is an expert system, or there is a recommendation for a treatment plan based on your diagnosis that's happening, it's not happening in the Cloud, it's actually happening on that tablet that the medical professional is carrying. Another example from my own field is industrial. If AI is gonna control the robotic arm, is gonna fix what the robotic arm is about to do, or is gonna use cameras to figure out whether there are defects in the item that's being manufactured on the factory floor, the AI is running on a device right there next to the cell where the robotic arm is operating.
It is not running in the Cloud. - The very first thing I was gonna ask, does that mean the model itself is sitting at or near the Edge as well, or? - Absolutely. Sometimes the model is even being created and trained, using unsupervised learning at the Edge as well, or using reinforcement learning from robotics. Now, why would we wanna do this? Right? I mean, we're happy, why change? Why not run everything in the Cloud? And often if you look at the literature or talk to customers and people who are creating and deploying those kinds of applications, they will cite three different reasons of why this could happen. And in a certain situation, you can have all three that are true or only one of them.
The first one is, that's a lot of data sometimes to ship to the Cloud. Imagine you have 20 or 30 cameras that are looking for defects in an item that is being manufactured on a factory floor. Every stream shipping it to the Cloud to look for defects is just a lot of traffic, a lot of networking traffic for no good reason. And we are engineers, we're all excited about efficiency. So why not run it at the Edge? That's first. Second, there might be privacy issues.
Medical is a great example of this. In many countries, there are regulations on where the medical data can go. It can't go to the Cloud where it will end up in a different geographical location. And the third one is latency. Latency meaning, sometimes if you're gonna, even if everything is beautiful, if there are no privacy concerns or you have your private Cloud, if it's not a lot of data, it still takes time to send data from somewhere on a factory floor or in a hospital or in a smart city or in a home, all the way to the Cloud and back.
What does that mean? What does it translate to? If I'm a robotic arm and I'm gonna do welding on a car, I can't just have the robot hanging out in space, waiting for the result to come back from the Cloud. Or even worse, if I am a robot that's assisting in a medical operation, gosh, you cannot be waiting, right? I mean, there are requirements on your latency that are sometimes in the order of nanoseconds. So sending that data to the Cloud does not make any sense. - So can you tell us a little bit about what exactly you did in partnership with Audi? And then we can talk about, I know that you've said it's difficult to scale AI at the Edge, and there've been a lot of hiccups in the industry in the world. And maybe some of those you'd overcome working with Audi, but just let us know what actually happens there.
We are we talking car manufacturing? - Yes. So Audi, is a top tier car manufacturing, we have had a history as Intel with having great relationship and partnership with Audi. And this was just another example of that. So let's talk a little bit about what was the request.
So this is happened a couple of years ago, the request came from Audi that they were having some issues of quality on their factory floor, mostly related to welding. So what Audi does is something called spot welding. And spot welding is basically when you have several metal sheets. Think about each one of those is a metal sheet, right? Each one of those papers is a metal sheet and you're gonna squeeze those metal sheets in specific spots to create a bond.
And that's basically how they create a chassis. It's fascinating that the car starts with these metal sheets that look like nothing. I was visiting the Audi factory floor by the way, I was like a kid in Candyland, you know those videos or pictures that they show, toddlers standing in front of a washing machine and just watching it spin for hours. I could do that in a factory, in the Audi factory. I can watch those robots start with metal sheets that look like nothing, make parts out of them, and then use welding and other techniques to assemble a full car chassis with very little human intervention. So with this welding, what was happening is that every car, depending on the car model will have several thousands of those welding spots applied to it.
And this is basically what attaches those different parts together to create a chassis. It's fully programmable, you can have the different Audi models program. So the same machines, the same line, can assemble different Audi models and different Audi cars. Now with every operation, so a robot is holding a welding gun and the welding gun, the car would come in inside the cell and the welding gun would lower and start squeezing, sorry for all welding experts, I know squeezing is not the right term. But this is a computer scientist take on it. It would start squeezing at those welding spots.
And the one thing controller has a configuration. It has like few hundreds configurations, it decide what to use based on how many metal sheets it's gluing together and the thicknesses of those metal sheets. And it uses that configuration to squeeze them together. And what Audi found out is when these operations are happening, after every operation, you get what us data geeks call a data object. So when the welding gun does every welding spot, the controller spits out a file, a text file that contains 140 something streams, okay? 140 key value pairs. What robot name was it, what configuration was used, what was the outcome of temperature, what was blah, blah, blah.
All 140 streams. Including some streams that contain, what is the status of that welding spot? Did it work? Was there an error generated? What kind of error and so on and so forth. They found out that sometimes some of these welding spots will not generate any error, but the bond won't be there. When they would take it for quality inspection, those welding spots would fail quality inspection.
Well, that's- - And in that case, quality inspection is a human. - It's a human operated, which means it cannot scale. They cannot take every car for the human quality inspection. So they would sample out of their factory out of their assembly line, and they would take that sample car. Now they wouldn't, we would find some welding spots that did not take, right? But there was no error generated. And they wanted to be able to figure out why are these not taking and to flag them.
At least to be able to figure out there is a car that has 20 welding spots that are not good. Now what Audi did to combat this problem is they over-welded. They were over-welding, but by up to 20%. Which means they can tolerate few welding spots here and there that didn't take. The problem though, would occur that some cars, very tiny percentage, but it still existed would have bad welding spots concentrated in one area.
And even the overwelding will not create, will basically not accommodate for these bad welding spots. So they wanted to be able to figure out which cars, because by sampling, they weren't catching those cars, which cars had this issue. And they wanted something in line, not a human going and poking around. Quality inspection was done very manual, it was by ultrasound, there were engineers, carrying ultrasound, little sticks and poking at every welding spot manually one by one and taking notes.
So this was the problem that we decided to work with them in order to solve. Well, it turns out that there is a big problem for scaling, especially in industrial, which is the lack of data. Not all the data points are gonna be labeled because creating labels saying, this welding spot is good, this welding spot is bad, means it went through that manual quality inspection, means it had, you had to invest in it human time to create those labels. So when Audi handed us that first data set, it literally contained million of data points, but only few hundreds of them were labeled. And to make the matter even worse, and this is very specific for industrial, but also applies in health, most people are healthy, right? Luckily, thankfully.
And most things that are created in factories are not defective. So even out of the few hundreds that we got that are labeled more than 90% were not defective welding spot. Well, how on earth I'm gonna find the defective ones if all you show me is not defective, right? It's a little bit of a challenge, and that's what we call an AI unbalanced classes. If we think of this as a classification, right? If I have an input, I wanna classify it as defective or not defective, and if you're training me, think of me as a toddler and you're training me, right? And you show me a lot of not defective and only very few defective, I'm not gonna be very good at identifying and classifying the not defective. I'm gonna be way better at, sorry, at classifying the defective. I'm gonna be way better at classifying, identifying the not defective.
- Can't you just set like a threshold of it needs to be, it needs to look somehow like all of these other good ones, otherwise, I'm not sure why, but it is defective somehow? - It really depends what the use case and how complicated the data is. It's often thresholds are not enough. You have to look for patterns. And actually that's one way right now that we are very much focused on, which is semi-supervised or unsupervised learning of saying, can I learn what a good product looks like, and hopefully that means when I see an outlier, that it's bad. But as a human, you can tell from my statement right now, that it's not gonna always be true. You might see an outlier and still good, but you happen not to see it before. So yes and no.
It's a little bit of a trade off. Anyway, but the interesting piece about the Audi is, we crunched the numbers, we went and deployed, and lo and behold, we were detecting the non defective at a over 90% accuracy. But the defective we were at around 60%. And 60% in AI is a little actually embarrassing, 'cause it's almost, it's slightly better than a coin toss. (laughs) 50 is a coin toss, right? So I was very embarrassed and I went back to Germany and hang out with a domain expert and spent like three days with him chatting and harassing him about explaining welding to me and explaining the process. And that is the second point about scaling, right? Which is what we call a data centric approach.
Which means you AI expert, you data junkie, don't just take the data and look at it blindly as numbers. Understand what the data is representing, right? Focus on what your data is trying to tell you before you just start crunching it with some AI algo. So spending time with this domain expert, one day after lunch, he starts telling me about the phases of welding.
I was like, "What? What are you talking about? Welding you just, you put the thing and squeezes the thing, current goes in and now there's a bond." He's like, "No, no, no, no, no. It actually goes through four phases."
And I'm like, "Okay, so how long is every phase?" "No, no, no, no, no. It's not a specific duration of time, it's actually the process, the behavior." So this is where you start getting that feel, right? That gut feel that us humans, that a domain expert has, which is this like yeah. In the first phase, what you're trying to do is melt the glue because there is glue between the sheets that we've applied several cells before.
And again, this is another eye opener, right? For AI, it's never just that well documented problem. It's what happens in the world next to it, right? What happened in that cell 20 or 30 cells before welding starts, which is they applied glue. And sometimes the glue because it's too humid, didn't spread as well or did spread too much, right? So you end up with different blobs. So, okay, that first phase we're melting the glue. The second phase, we are ramping up the welding, the third phase, this is really, or actually it was preheat and then melting the glue, and then doing the welding.
And then in the fourth phase, we are cooling down because if we just finished welding and boom, we open it, the sudden change in temperature will crack the steel. And it's like, "Whoa, that is so cool. Okay. Tell me more." Obviously I can't share all the details for intellectual property like we said a little earlier, but we ended up creating what we called a "heuristic." So it's not an AI algo, it's an algo that tries to breaks down the data into those phases and understand the characteristics of those phases.
So I came back and I worked with myself and the data scientist who works for me, we were developing the algo. So she and I got into a room, we made up how we're gonna create a heuristic. She created the code, and then we attached that to filter the data, kind of like a funnel to say, Hey, data, I'm gonna get to meet you right now, I'm gonna extract some patterns from you before I feed you into the AI. And lo and behold, with that updated model, our accuracy jumped to over 94%. We talk about AI and the big promises, but really one big thing about scaling is also understanding what is that problem you're trying to solve and how would you solve it? So that was a, this data centric movement, that right now, I'm not obviously the only one who's talking about it. There is a lot of AI experts who are now focused on it because it does help you breaking down and simplifying your AI.
And when you simplify your AI, you can create more models, they can run faster, they can run on less compute, and it's just about scaling it on the device itself as well. - But when you say breaking it down, you're talking about, okay, there's four distinct phases, according to this domain expert. And you're gonna have AI look for variations or potential causes of defect at each one of those phases, as opposed to just giving it everything all at once and saying, go crunch and look for patterns. Is that what you mean when you say break it down? - You could, that is definitely one approach.
The approach that we took was to actually identify how these four phases are happening and what are their characteristics. What's interesting about that is, now instead of just dumping the data as is, and having the AI trying to figure out those patterns, you kind of gave it a tip. - I feel like one of the things that the world talks about and sort of the magic and amazingness of AI is that you can just dump data and that it may come to, it may come to the same conclusion that let's say an expert human after decades of work would do, but it's also not necessarily able to explain, well, it can probably explain how it's coming to that conclusion, but that might not be the same way that a human is coming to the conclusion. - So yes and no. Actually, you say two things in there, and I wanna ambush them separately. The first one you said, hey, I thought Rita, the whole promise of AI is that you dump data and don't have to worry about it.
That's true if you have the data, but when it comes to expertise, right? When it comes to expertise in industrial and medical and a lot of domains, you don't have the luxury. I don't have the luxury of collecting data for 40 years, but I had an expert who had been looking at those patterns for 40 years. So I bootstrap my AI with his knowledge, and then I was able with AI to find patterns that he could never find. So that's one.
The second one you said, AI's gonna find perhaps different reasons. Yes, absolutely. Even when we started and I, and this use case actually shows it very well, proves that point, is that even when I started with the assumptions of the human, AI was able to find additional patterns. Which by the way, where you can take this, is very powerful because it started with Audi where Audi said, okay, I wanna be able to flag these welding spots that my controller can't see that they are problematic.
Now we're able to flag them. But the power of this is not just able to flag them. The power of it is being able to push to what we call autonomous behavior. Meaning now I cannot only flag after the fact, using this specific configuration, if a controller is about to use configuration number 212, I can say, hey, I'm gonna predict that this is gonna be problematic.
Why don't use instead 172. That configuration has a better reputation in this particular situation. So you can even eliminate some of the errors that AI can do that a human is not able to do, at least not in 10 milliseconds to make that correction before the operation happens. - That makes sense. So it can take into account something, I'll make it up, but like the humidity in the gluing cell, 30 cells prior and then make an adjustment to the- - Correct. - The welding arm on the process.
- Correct. Exactly. It can learn from what happened, because you don't have one welding arm, you have few hundreds of them, right? And you can learn that, oh, today, all the problems I'm getting are from configuration 212.
So after it errors two or three times and AI flags it, all right, everybody, all the robots can start gossiping with each other, right? "Stop using 212, it's problematic, let's switch to 179." Whatever it is. - Are there ways of looking at trade offs when you come, you brought in safety, human safety within within a factory.
And I think obviously, you would always prioritize that over anything else within a factory. But is AI ever making that kind of a trade off, or how does it know which one to prioritize, things that may be really obvious to a human? - So there are a lot of policies around safety in factories because obviously, human life is very precious. But even if you're not looking at human life, so forget about human safety and worker safety, even when you're comparing, hey, welding quality versus predictive maintenance on the electrodes, how do you look at the trade offs? So there are usually some ROI analysis that happens and includes control engineers, include finance and so on and so forth.
But I think the goal for us, especially as people who are in love with the technology and have dedicated our lives to pushing this technology, is to make it so that most of those ROI conversations don't matter anymore, right? If we create an environment, if we set up somehow a system that can scale on a factory floor and allow for experimentation, then you kind of remove that complexity of ROI analysis. So let me say what I mean, let me explain a little more, what I mean. As a matter of fact, I cheated a little because when Audi came to us, they were visionary enough and futuristic enough in their thinking, that they didn't wanna solve one use case. They said, what we want to have, is a framework for running AI applications at the Edge.
And I remember the head of their P-Lab sector, Henning Loser, he said, "I wanna wet the appetite. I wanna create something so that every control engineer says, 'Oh, what if I can use AI to do this task better?' Right? And I want them, I wanna scale by having every person on the factory floor think of AI as a tool in their toolbox. So how would I get there?" So we were like, okay, we need to get Audi excited, and the VW group excited, how do we get them excited? And he's like, "We get them excited by finding a use case that is very hard and that is painful and we go solve it." And this welding use case was very important for Audi for a number of reasons. So when we solved it, not only did we solve this use case, but we showed what can be done with AI. And I remember in the deployment when we went to deploy, he's like, "No, no, no, no, no.
Don't, think about how you're gonna help me Intel deploy for this use case. I wanna deploy for any AI use case. And I think kind of changing the game. Another instance in history when this happened, when I was a Computer Science or Computer Engineering undergrad, to call yourself an application developer, you have to write code in C or C++, which was a high bar, right? Not many people could call themselves a developer. And then what happened was with mobile apps, the bar got changed.
And I hate the fact of calling the bar lowered because I don't think it was lowered. I think it got changed to allow for different creativity and different personas of people who would have never thought of themselves as developers, right? Artists, people who have a lot of talents and different domains to come in and say, oh yeah, I can write an app. Writing an app became much, much more doable. So we want AI to become much, much more doable. We want people to be able to think of AI as one tool in their toolbox rather than, Oh my gosh, do I use ResNet50 or GoogLeNet, or, and what's the license on it or YOLO or. We want them to think of, I'm gonna take AI and put it in here, and it's gonna predict something for me, right? With it's functionality.
And we're not there yet, but I think the whole Edge community and AI Edge community is really focused on making AI more accessible so that it becomes pervasive. It becomes something that you use, and that's how you scale. You don't scale by going hiring a very expensive data scientist. Not everybody can afford that for every use case.
You scale by making AI more accessible to the masses. - That's very interesting. And do you see the Edge and the Cloud and the AI coming together or working in complimentary ways, or do you see them as remaining very distinct? - No, they have to become as one big blob. It has to be where the end user doesn't think where AI is running. Right? I mean, you don't today.
Well, maybe you do, but you're a little bit of a geek, so. (Rita laughs) You might be thinking where my recommendation is happening when I ask for a restaurant recommendation or what have you, but most people don't. So it has to continue to be like this. And it has to continue where they say, oh, I wanna predict something, or I wanna classify something with the functionality without worrying, whether it happens at the Edge or in the Cloud, the devices themselves are smart to figure out where to run that particular compute.
- Okay. Well, Rita Wouhaybi, Senior AI Principle Engineer in the Industrial Solutions Group in the Internet of Things division at Intel, working on AI at the Edge. Thanks, Rita. - Thank you. (bright music) - [Narrator 1] Never miss an episode of "What That Means" with Camille, by following us here on YouTube.
You can also find episodes wherever you get your podcasts. (bright music) - [Narrator 2] The views and opinions expressed are those of the guests and author, and do not necessarily reflect the official policy or position of Intel Corporation. (bright music)
2022-08-10