Free no-coding AI development tools for embedded devices
OK. Welcome, everyone, to the New Product Update webinar series. My name is Gavin Minnis.
I'm the brand marketing manager for our processing technologies at Texas Instruments. And I'm going to be the host today as we talk through simplifying and accelerating Edge AI application development. AI and embedded devices is rapidly evolving and increasingly being adopted around the world.
We're seeing things from defect detection on assembly lines to smart surveillance and security cameras for crop monitoring and disease detection in agriculture. The opportunities are very, very large. But the challenges that we hear from our customers has to do with, how do you get started? If you have a background in classical computer vision, what is it going to take to transfer that knowledge and that experience into edge artificial intelligence and more recent deep-learning types of technologies? Do you have sufficient expertise and knowledge on your teams? Do you have the right resources, the right software and tools? How much is this going to cost? And so all of these questions are things that our customers are constantly asking. And we have one solution to that, to try to help, which is called Edge AI Studio, which is what we're going to be talking about today. Our speaker for today is Reese Grimsley.
Reese is a Systems and Applications Engineer. And Reese is actually someone who has played a pretty significant role in helping us get our demos built out, using the Edge AI Studio tools that we're going to cover today. A lot of these demos, we were using in our Embedded World participation that we had here in March of this year, where we announced Edge AI Studio.
And we also introduced our new AI Vision processors, the AM6xA family. And so excited to allow Reese to go through this with you, to provide you with some background information on Edge Artificial Intelligence, and then, of course, to get into a demo, showing you how these tools can be used, and most importantly, to help you feel more confident about making this transition and to be able to do this smoothly and quickly. So we'll go ahead and get started here quickly. We've got about a 20-minute presentation that Reese is going to go through. And then we'll get into some Q&A at the end, around 10 minutes or so.
I do encourage everyone to use the question-and-answer chat box that you have before you. I'm going to be monitoring that myself. I'll do my best to answer a few questions throughout the presentation.
But we'll certainly make sure to address everything, give Reese the opportunity to give you some of his technical expertise, here at the end. As I mentioned, there's a live demo here. This is all about Edge AI application development tooling.
And we really believe that, coming out of this presentation, you're going to have the confidence and the capabilities to go ahead and start experimenting with these tools, actually putting them to use. So please make sure that if you do have any questions, you let us know. We want to make this the most useful presentation for you. So I'm going to go ahead and turn it over to Reese, who can get us started with the presentation. All right, thanks for the introduction, Gavin.
Appreciate it. Let's get going. So some of you may already be familiar but I'll go ahead and describe, basically, what Edge AI, what is deep learning. So oops-- excuse me. Sorry.
I seem to have lost an animation on that slide. So generally, we can look at these, AI machine learning and deep learning, as of subsets of each other. So artificial intelligence is what we'd consider using human-like intelligence to solve tasks with the computer. How do we generate algorithms that will help computers view the world that we do? And audio and vision are two of the core application spaces for those. Going a step deeper, machine learning is using algorithms that are specifically trying to find patterns and ways to solve problems by using data, taking lots of data and using those as the basis for how they'll solve problems.
And then deep learning, which is a step further into that, is using very, very large algorithms to use do this. And it usually allows us to use more raw data input, like just the stock RGB image or even just a stream of audio samples. And to do this, we generally have very large models that require a lot of data so that we can tune a lot of parameters. And using machine learning and deep learning does have some pretty significant advantages over classical computing. And I'd call classical computing specifically-written algorithms, where you're using if statements and for loops and generic structures to make the algorithm run. In contrast, machine learning is using general-purpose algorithms, where all those parameters need to be set by using the data.
So these machine learning algorithms are more scalable since they can solve a variety of tasks without requiring you to learn how to specifically code those. They require less R&D effort. This is a really hot topic in research. So we see that burgeoning a lot so that you can use a lot of these models and tools without developing yourselves. And generally, machine learning and deep learning are more accurate.
That's really the basis for why these topics are getting so popular. And applications for this are pretty widespread and increasingly so. I think one of the main spaces that we've seen this start out in was driver assistance or ADAS. We also see this, as Gavin mentioned, in machine vision and defect detection, where you have products coming down the assembly line very quickly.
And you need to determine whether there's something wrong with them or whether they need to go into separate bins or so on. We also see this in security and home automation cameras. In surveillance, you can have hours and hours and hours of footage. But turning that into something you can act upon, like just a few, quick seconds of, oh, that was the interesting thing that we need to look at, an intruder trying to jump in the window, machine learning can help with that.
And the same is true for speech recognition and text analysis. We see these large language models, like ChatGPT, becoming very popular. Those are also an example of AI.
So as an example problem, what I'm going to be talking about is specifically in the space of vision. And that's for automating a checkout scanner. So we're all probably pretty familiar with these self-service checkout scanners or kiosks within grocery stores. You walk up and you scan each individual barcode as they come through, so that you can build your order on your own. So we can take that a step further in automating it by having a camera look down, recognize everything in a singular shot, and fill in that receipt. So using deep learning for this is going to be a lot easier to implement than a custom algorithm.
And it's probably going to be more accurate or robust, assuming that you come at that with the right kind of data. And so to solve this problem, we need to collect some data, take a bunch of pictures of these scenes, train a model offline, and then move that trained model onto an embedded platform so that we can run inference locally. So I mentioned something here, and Gavin did too, about running locally at the edge. That's basically where we're running all these applications, instead of being on these almost infinite compute server platforms, but running them instead on embedded processors or microcontrollers. And this has several benefits.
One of the ones that we see often in time-critical or safety-critical applications is that it has reduced latency. We don't have to worry about the network being slow or even going down. For applications like surveillance, privacy can be pretty important, keeping that data where it is on the device that it's collected on. There's also no cloud compute or network costs.
And we're also seeing a lot of support from embedded vendors, making it possible to run these-- what I said very large models-- actually really, really quickly, at the edge, where the data is being collected, so we can get AI models running at 60 frames per second. So coming into this, Edge AI can be intimidating. It's a new space.
Not everybody knows exactly how to work on this. And it can be pretty intimidating for a few different classes of users. This isn't every single type of user that would try to get into this.
But I've broken it down into three buckets, where we have the very experienced embedded engineers, data scientists who were used to working on cloud compute, and then your hobbyists who are going to be working on this in their free time. So for embedded engineers that have been working on this for say 10, 20 years, both the Edge AI is a divergence from traditional signal processing. We're using some generic, almost black-box model to solve problems instead of something that they can go in and tear apart where decisions were being made in the algorithm.
And that can be scary for some people. It's going into the unknown. They also see that the topics and tools are rapidly changing. This has been a rapidly evolving space, with lots of research papers coming out in the past decade that show different methods. And it's just changing so quickly that can also be intimidating, whereas generally, embedded topics tend to evolve fairly slowly.
And for the errors and deep-learning accuracy, that's pretty similar to what I was mentioning with the divergence from traditional signal processing. And we can call this topic explainability for an AI model. For the data scientists and AI experts, people who are experienced with those sort of infinite computes, cloud platforms, embedded is very different because it does have very constrained resources. You have to make a lot of trade offs. You can't get infinite compute. You have to work with what you have.
And that may mean sacrificing some accuracy. It may mean sacrificing some performance as far as how quickly it runs. And this can also be frustrating. And the tools for finding the optimal performance can be fairly difficult to work with for some vendors. And they're usually vendor specific as well.
Every embedded vendor will have their own tool for turning their cloud-compatible model into something that runs on the embedded platform. And then for the last bucket, people who don't have as much time to commit to this also need a straightforward path so that they can get started. They need to know how to use the hardware and the software, see some examples, or see community support, like online forums, where they can get a sense of how people have used this before, and reproduce it on their side without too much difficulty. So to answer, or rather, get around some of these barriers, what we've been working on is something called Edge AI Studio. This is a cloud-hosted tool where we're helping to get models ready for embedded processing. We can analyze models.
The model analyzer is a tool that we've had for a while, where you can go in and see some benchmarks on a lot of popular, state of the art models. And the new one that we're talking about is what I'm focusing on here. And that's going to be-- whoops, excuse that-- is going to be the Model Composer.
So this is how we're going to be building the model, where we collect data with or without the device. We train a model without having to set up our own environment. This is the no-code tool that we like to talk about.
And then get that model ready for our embedded platform. So before I get into that, I want to go back and answer one of the questions that I mentioned before. And that's that deep learning requires a lot of data. So deep learning models will be tens or hundreds of these computation layers. And that's the structure of the deep learning model, such that it has millions or billions of parameters that we need to set.
And that requires a lot of data usually. 10,000 data samples, like 10,000 images, is probably a fairly small data set for training from scratch. But that's not explicitly necessary. So there's something called transfer learning. And that's how we can make this problem less impactful. So the original model can be trained from scratch on something that's very, very large, like a stock data set, like ImageNet or Coco.
And it can basically learn how to set a lot of the parameters at a course level before we go in and we fine tune it or retrain it with a smaller data set to solve a specific task. So considering that automated checkout example, we could train the basic network on a million images. It could learn what circles and squares and edges and some basic structures look like within the imagery. And then we can go in and use 50 to 100 samples of data to retrain it such that it recognizes apples or oranges or soda cans or bags of chips.
So I mention that because transfer learning is exactly what we're doing within the Model Composer. This is our no-code AI tool that's aimed to get you up and running with an end-to-end AI developer environment using only the cloud. Again, no code, no setup for your own environment. And for what we're using right now, we're doing transfer learning on some pretty popular state of the art models called the YOLOX family. And we can target a lot of applications here, like machine vision, factory or warehouse automation, agriculture, like Kevin was mentioning, pretty quickly, especially for proof of concept. So with that, I'm actually going to switch over to do a little live demo of the tool so you can get a sense of what it actually looks and feels like.
So bear with me while I swap over to this. Let's just make sure that I'm seeing it correctly. Hey, Gavin, can you confirm that this is showing? Let me take a look here. I'm still in the live view, seeing your PowerPoint. OK, hold on. Let me try that again.
Did that update? hmm. For those of you on the webinar with us too, if you could use the chat as well, to confirm what you're seeing on the screen, I believe it's still just the PowerPoint. If you would please put that into the chat box for me so I can get some confirmation from the and those of you on the presentation too. OK, apparently I had it for a second. And now I don't.
I'm sorry about that. Let me just show the entire screen. Maybe that's a little bit better.
All right, so this is what the tool looks like once I've loaded it into a program. So this is one that I've already created, called Food Recognition. And what I'm going to be doing is showing images-- and we can actually collect one live-- where we'd have a bunch of different food items. And we want to annotate them, train a model, and get it all ready for an embedded target.
So I can actually connect into a device camera. I know that this one is available on my network. And I should be able to show that running live.
So there is a device in the loop, where we can see what the device is seeing. This is running back at my desk. We can come in and we can take pictures with this.
You could also load them in from elsewhere. But this is a good way to make sure that the camera that you'll be using in practice is the exact same one, or is going to have the same sort of settings that your model would be trained on. So we can save those images and add them in.
This is more than you would really need, 269 for a full-use case. But this is a slightly larger data set since I was taking this, actually, to Embedded World 2023. The next stage is doing annotation.
So this is how we're telling the deep-learning model how to basically isolate the correct things that it wants to find. What are the patterns that it's looking for? These are called supervised models, meaning that they need some help so that they can optimize parameters to recognize that they need. So I noticed that I was missing a label here, for this apple. So I can come in, draw a box around it, and then label that as "apple."
And then we'll also see, down at the bottom, the most recent . one that I've added this into. And I can add a few for that as well. So I'm not going to complete this, in the interest of time because I have already gone through all of these steps, just to make sure that we're going to work with the time that we have allotted.
So the next thing is going to be selection. We can select which device we want to compile this for. We can use the slider here, to select on this axis of lower power and lower performance versus faster performance and higher power. So we have a scalable platform here, across the AM6xA family. And I'm going to be showing for the AM62A. That's what's running on my desk.
So that's what we need to use. And then we can also, similarly, do a selection for the model based on higher accuracy or higher performance. there's a trade off here as well. And I believe that I chose something middle of the road, this YOLOX tiny model. The next stage from there is model training. This will take all of the images that we captured and annotated, and train that model that we selected to recognize this.
So you can select a few of the training parameters here. And if you want an explanation for those, there's a little tooltip that'll show once you hover over it. Now I've already trained this model. So I should be able to just pull that up right here. And it'll show me what the training performance looked like over time. So we can see that for each epoch, that's a checkpoint as the model is being trained.
We saw that it started to converge to a high accuracy around 10 epochs in. So we probably could have stopped this a little bit early. With the model string, we can go over to compile it.
And this is how we get the device to actually run the model. This is taking the model and it's turning it into a form that can run embedded, using our accelerator for the AM62A. That's a [? two-tops ?] device. So we can run this fairly large model at about 60 frames per second, if I recall correctly.
And there are a few different settings here as well. And we have some preset parameters that you can use. This can take a while. So it was important that I did do this beforehand. Then we can see what the prediction looked like.
So it recognized that there is a Pringles can, a soda can, an apple. It missed this bag of chips. But it got this one. So it's not 100% perfect. But we can at least get a sense of how well it trains and is running as an early evaluation.
From there, we can also do a live preview. So this is where we can connect back to the device that I was collecting data on originally and show this model running. So the model has been trained. It's been compiled. And then once my camera feed shows up, this is what is currently at my desk.
And I can start a live preview. So this will download that model onto the device and then start up a stream from the camera, through inference, on the device-- this is running locally-- and then just stream everything that's been output back to my computer here. So we'll see that pop up in the window in a second or two.
And as that's running, we'll see some logs being shown as well. And this can give us some information about, for instance, what the frame rate is for this. So we're seeing that it's correctly recognizing the salad, the banana, the orange, and the soda can. And then there's some statistics being printed over here.
The frame rate is about 17 frames per second. But that's limited based on the camera. The actual inference rate is about 15 milliseconds per sample, which equates to around 65 to 70 frames per second, which is pretty close to what it was suggesting from the Model Selection tool. And then from here, assuming that we're happy, we can move on and actually deploy this onto the device.
We can download that model to the PC, the original floating point model, before it was compiled. Or we can pull those compiled artifacts, the portion that's ready to run on the device. We can also just directly download it onto there. Once you connect that to the IP, we can download that directly to the development board without having the PC sitting in the loop. And I will mention, for all of this, I've had the device on the same network as mine. I can see the IP address.
And that is important. But it's not important that the IP address is visible from the server. So this does work if you're within a firewall. All right, so with that, I'm going to come back to the presentation and just do a little bit of wrap up.
So from here, I'm going to set the onus on everyone on the call. So go on to Edge AI Studio and build a model. Pick a simple problem to solve and build a proof of concept. It can be something simple, like recognizing whether your dog is on the couch or not if you're not at home and you have naughty dog. It can be birds attacking your garden. I know that Gavin has had similar problems before.
Or if you're a productive yet lazy, you can recognize if your boss is approaching your desk or not, and give you a little warning signal. So to do that, you'll need to take 20 or 30 pictures, load that into the Model Composer. You don't actually have to have a device to do this portion. And then select a model that you want to train it for. The simplest is going to be a classification model. And that's just meaning that you tag each image with what the dominant thing in the image is.
Is a dog on the couch or is it not? I showed an object detection model, where we can recognize multiple things at a time. But it's not crucial. From there, you'll train and compile the model and then run live inference just to see how it performs. And once you have that proof of concept, then I suggest getting an EVM, like one of these AM6xA SOC.
The AM62A is on the lower end of that spectrum in terms of performance and cost. And the AM69A is at the top of that spectrum with a couple of devices in between, the AM68A and the TDA4VM. At that point, I'd recommend you go on, beyond the proof of concept, and really start building the application. So I think that I've said, my main piece. How about we get started on the Q&A? Yeah. What kind of questions do you have? Thank you, Reese.
Appreciate that. Great presentation. I want to start off-- there's been a couple of people who have been asking about the presentation deck itself.
That's something that I think there's a bit of a technical issue with that. And our technical expert has said that everybody can check back later today to try to access that deck. So that will hopefully be provided to you directly if at all possible.
But let's actually get into some of the technical questions here, that we've had from some of the participants here. So Reese, first question comes in is curiosity around whether or not we're considering any updates to Model Composer to support pose estimation, considering its utility for specific applications and those complexities related to it. Yeah, yeah, that's a really interesting one. I think pose estimation is a very interesting use case. And for those who don't know about it, basically, pose estimation is a human-centric task, where you're looking at how a human body is oriented.
So we might have a couple of points that recognize where my eyes are, where my mouth is, where my shoulders are, hip points, feet , and so on. As far as having support for that within the Model Composer, I'm not sure if that's on the direct roadmap or not. That's an interesting one to us, that we've been building more on, in terms of examples, on ti.com and GitHub. But we probably have to get back to you on what our roadmap is, as far as supporting that directly within Model Composer. Yeah, I would just add to that, we definitely are highly committed to this platform and to these tools.
And so the feedback that we can get from all of you on the experience that you have with this and recommendations on future improvements is highly valuable. So as you do dive deeper into this, being able to provide that feedback through our forums and through other communication means-- definitely encouraged. We appreciate that.
So Reese, another thing that's coming through is just on the excitement around the no-code piece. I think seeing the [INAUDIBLE] tool and being able to actually see the usefulness of not having to write code as you're developing out these models is appealing. How far does that extend? Is the entire application development no code? Or is it limited to the models themselves? Good question. Yeah, so that is isolated to the model development itself. So ideally, once you're done with Edge AI Studio, you have a model that can run on the device.
You don't have to write any code to get that model trained or compiled for the device. As far as building a full application around that, yes, that will require some coding. I think the simplest way that we can get started on that is using a Python application. That generally performs about as well as something that's generally considered more difficult to program, like C++. There are multiple options there, yeah. But it does require some actual coding to get a full application running.
[INTERPOSING VOICES] But what I will mention for that, as well, is that we have a quick tool, called Edge AI GST Apps, that actually comes within the Linux development kit, that you can start on for development on the EVM boards. And what that does is it makes it easy to set whatever input you want to use, whether it's a camera or a video file or just a set of images. And that will construct what's called a GStreamer pipeline so that you can have this data go through multiple processing nodes that are necessary for a deep-learning inference application, as well as outputting to a display, without requiring you to do more than change a couple of lines in a configuration file. And those are actually really flexible. I use those a lot myself, for both simple demonstrations, doing proof of concept, making sure that models are running correctly.
They're a pretty powerful tool. Thanks. Got another good one that I think we need to address, as it relates to our product offerings as well. Question coming in asking about whether or not this is supporting image processing exclusively right now. Or is there any other support outside of image processing? That's a very good question.
Yes, we are focused on image processing right now. These AM6xA processors are very vision optimized. They have basically all the hardware accelerators to do vision tasks.
But they have ISPs. They have lens distortion correction engines. Some of them have stereo-camera hardware support. But what we don't have is a lot of the time series models right now, that are making use of the accelerator, although those can run on the CPU themselves.
We have a lot of open-source run times like TensorFlow Lite or [INAUDIBLE] run time. So if you have a model in one of those formats, then you can run those easily on the CPU. And you don't even actually need one of these AM6XA processors. You could use something more like an AMD62X, that doesn't have the AI acceleration.
But most of our optimizations have been around vision just because the models are so much bigger and so much more complex. I will say that, I guess in terms of complexity, going a stage above that is going to be more of these text-processing models, like large language models, LLMs, like ChatGPT or GPT3. Those we don't have supported on the SOCs either. And then I would just reinforce again, this is something that we want as much feedback as we can get.
We do have a commitment to this type of technology and these types of development tools. And so some of you have been asking about other families that we have, like C2000 or MSP. And so even if we aren't supporting something today with Edge AI Studio, we are certainly open to seeing what kind of demand exists on that and potentially putting that on the roadmap, as Reese mentioned earlier. So the most important thing is to get in there and use it, to get the feedback over to us so that we can continue to build this out and make it better as we move forward. Let's see what else we have here.
Reese, how about-- this a question about the actual models. "Does it only work with supported models that you've pre-configured or can we use custom models as well?" Good question. So a few caveats to that. Within Edge AI Studio, yes, it's the models that we have direct support for.
Right now, we're supporting in that no-code tool, Model Composer, we're supporting a couple of YOLOX models. But if you go outside of that, we have a lot of other tools for more custom models. There's a GitHub repository called Edge AI TIDL Tools.
And that can help you compile a much more custom model to run with the accelerator. And really, the only limiting factor there is making sure that the composition of the model, the layers that compose it, are using sort of supported operators for our accelerator. If it's not fully supported, then some of that can fall back to arm. And there's a bit of a performance penalty.
But the answer is yes. We do support other custom models outside of Model Composer at the moment. All right, and I think another question that should be addressed has to do with where the boundaries are drawn for when hardware is needed.
And so what Reese presented here had to do with Edge AI Studio as a complete suite of tools. And there's both the Model Composer and the Model Analyzer. And so Reese, if you'll just go back through briefly, a summary of what each of those is delivering. And then at what point would external hardware, physical hardware be required? Because I think one thing we want to make sure customers understand is, what can I literally go do right now, today? Edge AI Studio is a free tool. Everybody in here today can go to dev.ti.com--
dev.ti.com/edgeaistudio. And you can get started with these tools right now. But I think clarifying what those tools are doing and then when do we want to bring in actual physical hardware, just to make sure that's clear. Yep, yep. I'll answer this last question before we wrap up. So the hardware required to use these tools does not actually require any hardware.
So the model analyzer tool, up here at the top, that one is unique because it connects to our own server farm. And what we call our server farm is actually a set of EVMs. We have a server rack that's basically full of AM62As and TDA4VMs and so on. And you can connect to that through basically this Jupyter Notebook. You can write Python code and use some of our pre-written Python code to run on those devices and see how they perform.
You can see what the performance is going to be without actually having the hardware in your hands. And that's actually a really helpful thing for a lot of people, especially since it avoids having to do any setup on your side. The Model Composer can also be used without having hardware. I think that it's nice to use with hardware, since you can see how well the model is performing.
But it's not explicitly necessary. You could take photos on your phone or with your webcam, load those into the tool, annotate them, train, compile the model, do everything, basically up to that last portion, that last page I showed on Model Composer, without having a device. It's only after that point, once you really want to see how it performs in your hands with, say, a camera, showing live input-- at that point, you would probably need to have an actual EVM with you.
Perfect. Well, I think that we're going to wrap it up there. Reese, thank you again for your time and for this presentation. Thank you to everybody who is on the webinar with us.
We hope this was very exciting and useful for you. We definitely will go through all the additional questions and just make sure that everything was answered, and if not, send out a follow up there. And of course, we encourage everyone to use our EVM forums as well.
Or if you have a contact already at TI, sales contact or marketing contact, make sure to reach out to them as well. We'd be happy to communicate with you more about this and answer any additional questions. So thank you all for your time.
And have a nice day. Thank you so much, everybody. Goodbye.