Set Up Local AI with Home Assistant Using Ollama | Voice Assistant with Attitude!

Show Video

Hey guys, how's it going? I hope everybody's doing really well out there. Uh, in a recent video, I kind of demonstrated, uh, the home assistant voice pro, uh, and kind of talked about how I got it set up and how I actually got it to have some locally hosted AI. That's kind of snarky and kind of rude, but still does the job that I want it to do.

And in the comment section on that video, and then the live stream I did here last weekend, I had people ask me to make a video showing how I got this, uh, the self-hosted AI set up locally, kind of that process I went through to get not only the AI set up, but also then tied into home assistant. And then from there, you know, if you've got one of these, how to take that extra step. So, um, we're gonna, we're gonna talk about a bunch of that, but this isn't necessarily going to be a step-by-step tutorial like I've done often on this channel, because when we get into this kind of thing, um, everybody's setups are going to be different enough that a dedicated, like do step one, step two, step three, isn't going to be, isn't going to be a good use of your time or mine. So I'm going to lay the kind of the bare foundation for this so that you can then have the resources you need so that you can get in and set this up for yourself.

If that's the route you want to go. So I think with that little introduction out of the way, let's jump in and take a look at how I got all of this set up. So the first thing I had to do, because I host all of my stuff using prox mocks is I actually had to get a prox mocks virtual machine is set up. So we go over to my desktop here. We can see that I've got that set up right here.

Now I've had this, this is my Plex VM that I've had up and running for, for several months at this point. Um, and I've just kind of tacked my AI, my Ollama stuff into what I've already got set up here because I didn't want to do it again. Now, the reason for that is because, uh, setting up Nvidia, uh, GPU drivers and that sort of thing, uh, makes me want to pull what remaining hair I have out. Uh, it's, it's not a good time.

It's not fun. I don't enjoy it. Uh, in fact, I've been wanting to rebuild this particular VM for a while to fix some things that I screwed up when I set it up, but it works and I don't want to go through that headache again. Um, but, uh, if we come back over to my desktop here and we take a look at this repository right here, that, uh, my discord mod Nighthawk ATL has got set up. He's actually got a pretty good instruction set on how to do a GPU pass through at least within a video. Um, so if you're looking to set up an Nvidia GPU pass through, these are the steps that you're going to need to go through to get that done, at least at the time of recording this video.

Now, because I'd already done all of that. And, and I didn't have this resource, this, uh, that I just showed a moment ago. I didn't have that available to me when I set all of this up several months ago.

Um, I just don't want to deal with the headache again. Like I said, all of my stuff works for the time being. Um, and I don't want to deal with it until I have to.

So if you're using an Nvidia GPU that you want to do a pass through on, uh, I will have links to all of this in the video description so that you can check that out and go through the 23 steps to get a GPU passed through and get the drivers and the toolkit and all of that kind of stuff up and running so that you can use the GPU in your Docker container that we're going to show next. So one of the things that is different when you actually do the GPU pass through is that you have to actually go into proxmox and tell it to pass through the device from your motherboard over to your operating system, your VM in this case for Ubuntu and the way we can do that. Is we can come over to our VM, like, and right here, we can see at the bottom, I've added this PCI device right here. This host PCI zero. Yours might be different, but that's what mine is for right now. Um, and it's just a matter of going in and saying add a, and you can say PCI device.

And then you can, depending on how your setup is configured, you might have GPU right here more likely than not though. You'll go to a raw device. You'll click on device right here and you'll scroll down.

Until you find your GPU. Now I'm using a, an RTX 2070 that I picked up strictly for transcoding my plex setup. Um, so that's what I would select here.

Um, and then I would just click add. You could also check a primary GPU and all functions if you want it to go down. And if you needed to, if you found out that not doing that affected your system, you can do that.

Um, but again, I've already got that set up right here. We can see that I've got that PCI device mapping GPU, PCI equals one. So once I had that set up, I was able to go in and actually do all of the, the instructions basically for, um, you know, the blacklisting of the Novu and the installing of the drivers and the toolkit and all of that sort of stuff. Um, I was able to kind of pair those two together in, in a string of, of events. They got everything set up.

Um, again, I would definitely follow the instructions that Nighthawk ATL has set up in this get up repository. Again, this will be linked in the video description. Once you've got your GPU passed through and you've got your drivers set up and configured in the toolkit set up and configured, and you're, you're, you're sure that your GPU is set up and ready to go for your, uh, for your Docker container to use the next thing we want to do is actually come over here again, back to this GitHub repository and take a look at this composed dot YAML file here. Now there's a lot going on in this. Um, but it's all pretty straight forward. We've got our services here.

I'm actually gonna zoom in on this so we can see it a little bit easier. There we go. So we've got our services listed.

Our first service is going to be Ollama. And this is the service that I use in my home assistant set up for the integration for, for my locally hosted AI and home assistant, and then taking it a step farther to get this, uh, this home assistant voice to work. So we're going to have our Ollama instance here. We've got our volume.

This is a Docker volume, uh, that maps to root slash dot Ollama. We've had a container name, our pull policy, our TTY equals true, a restart policy of, unless stopped the image that we're going to use, uh, if you wanted to, or needed to, you could go find a specific version for this and pin that version. That's often a safer way to go, but I'm just going to use this as latest. Like it shows here.

That's how I've been using it. It's been fine. Your environment, the Ollama host, this just says, Hey, it doesn't matter what the IP address is just work. Uh, if you want it to be specific, you could put in the IP address of your VM that you're going to use for this.

Um, but I've just always had this set to zero zero zero zero. And it's been fine ports for this will be 11,434 on the outside. Oops. And the inside, I can't imagine you would need to change that unless you've already, you're already using that port.

But if you are already using that port for something else, just change the first step of that. Do not change the colon or anything after it. Uh, below that we've got some more stuff and this is where the GPU support comes in. Uh, we're going to do deploy resources, reservations, devices, right? Just to get down to the driver, which is going to be Nvidia.

I've only got one GPU, so that's all we're going to do here. And for the capabilities, we'll set this to GPU. This could probably also be all, but the way this is written up is just to use GPU for the capabilities. Now below that we've got some other stuff in here, uh, like open web UI. I don't know that open web UI is necessary for just tying this in to home assistant, but it is what it is as they say.

Um, so we've got our open web UI, um, set up here for this service. Our image will be open web UI latest. Again, you could pin a specific version here if you wanted to. Um, but again, I'm using latest here as well. Uh, we've got a container name set up of open web UI. We've got another Docker volume, uh, for the backend data of the application.

And this depends on Ollama being started and healthy. So if Ollama isn't up and running, uh, the right way, if it's unhealthy, whatever, uh, you'll meet need to make sure that both of these conditions are met before this service will work. Now, by default, uh, this is set up to run on port 80 80, but likely, uh, you're already using 80 80 for any number of other things. Uh, so we've got open web UI port of 3000 here.

You can change this 3000 because that's also a very common port. So you can change that to something else, but again, don't change the colon or anything after it below that we've got our alumni base URL, which is again, going to be, uh, the host name of our Ollama, uh, service. And then the port. Now the host name for this is actually based on the service name up here. That's kind of how these Docker compose stacks work.

Uh, whatever your, your service name is also kind of becomes the host name that you can reference in other services. Like we've done right here in this open web UI service structure, we've got a web UI secret key kind of irrelevant. I didn't need it. Um, now this next section right here, uh, this section right here, which says enable rag web search rag web search, search engine, uh, the search engine result count, the search requests and these, uh, search engine or searching, uh, query URL, um, this is really only currently available for open web UI, so you don't need this. This is only if you're going to log into the open web UI interface and use it as it's as, as kind of a, just a standalone conversational thing, um, using the open web UI interface, that's not why I set this up.

Um, like I don't even really log into the open web UI cause I don't use it. I'm only really using this O'Lama stuff for my, uh, for my home assistant configurations. So, so really all of this is kind of extraneous to what we're doing here, but it's here and I wanted to talk about it just so you know that you can then log into a, uh, uh, you can go to your, your server's IP address on port. Let's call it 3000 right here. And then you can log in and use AI locally if you want it to go that route, depending on, uh, and then of course you'd also need to have different models and that sort of thing available to you.

So just know that this isn't necessarily a huge thing or a huge part of, of what we're doing, just know that it's also available to you. We've also got some external hosts, um, and then a restart policy. Next we've got our speech models for opened AI.

Um, and this is like it says here, this is for better speech generation within open web UI. So that's kind of what's going on here. Uh, we've got our image, our container name, our environment variables, uh, be sure to change, uh, your time zone to wherever you happen to be.

Uh, if you're using Portainer for this, uh, you'll need to change this port 8,000 to something else. Again, only change what I've got highlighted here. I don't change the colon or anything after it. Uh, for volumes, we've got, uh, TTS voices, TTS config, both of those being Docker volumes. Um, you could map those, I guess, if you wanted to, but this is going to be perfectly acceptable. Uh, we've also got a time zone and a local time, uh, for our volumes there that are mapped inappropriately and a restart policy, even less stopped.

And then this also depends on a llama. Um, and then the last container we've got here is a llama up. Uh, this is curl images slash curl container name is a llama up. I'm not entirely sure what this is for.

If I'm completely honest, uh, I've never actually deliberately used this. I don't know what this actually does. So it's there just as part of this, Oh, llama, uh, stack. So you've got a llama and the web UI, the voice, and, um, this, this last thing, this curl images slash curl.

Again, I'm not entirely sure what that's for, but once you, once you've got kind of an understanding of what's going on here, uh, we jumped back over to here. And if I do an LS in my, in my VM, in this case where I've got my GPU passed through, uh, here, I can see all of the other stuff I've got going on in here, so I can do a CD into a llama. And an LS and right there is my Docker compose. And if I do a, uh, let's, let's clear my screen and do a nano, uh, Docker compose. And here we can see all of this is going to look more or less the same.

I did actually set up that searching or search saying or whatever it's called. We can see that here, though. I haven't had any luck getting it to work.

Um, but here we can, you know, we can see that I've got, uh, you know, I've got my GPU pass through. I've got my ports, all of this stuff that we're really focused on here is what is currently up and working. This is my primary focus for what we're going to do here. So that's basically the process of getting, um, a llama up and running is using that Docker compose. Um, I guess the other thing I didn't show here is once you've got your, your Docker compose here configured the way you want to, obviously you can do, you know, Docker of your screen, you know, we can do a Docker, uh, compose up dash D, um, and deploy all of those different containers, uh, services, whatever you want to call them, it was in that Docker compose to get our locally hosted AI up and running just by running this command right here.

Uh, you should be able to go over to. Your VMs IP address on port 3000. Like we can see right here. Uh, and you should eventually, once you get logged in, you should see a dashboard that looks similar to this.

Uh, now I've actually done some testing. I guess you actually do need this. If you want to get through different, uh, your different llms, I've done some testing with some different llms here.

Um, the, the one I've had the best luck with at the time of recording this video is llama 3.2. Um, I've had mediocre success with Quinn 2.5. Um, there are some that are like home assistant specific ish that I have not had good luck with at all, but that could have been an issue on my part.

Um, if you're not sure which. Uh, which llm you want to use. Uh, you can just come over here to a llama.com and you can search for models and here, um, you know, uh, llama 3.3 is now available. It looks like it looks like it's been available for a while, but like I said, I've been using llama 3.2 and it's been fine.

Uh, if you're not sure, uh, maybe you want to check out llama 3.3. Uh, you can just grab this like, so, uh, and we're just looking at this little spot right here, this llama 3.3. Uh, we can come back over to here. I wouldn't click here. Uh, we can say pull llama 3.3 from a llama.com. And now, now, now it's going to do that.

Um, it's going to take a little while, depending on your internet speed and your server configuration, that sort of thing. Uh, I'm not going to do that. I don't actually want to put llama 3.3 on here, but once you've got your, uh, your llm of choice set up and configured in here, which is really just a matter of, of downloading it, it's really that simple once you've got that set up.

Once you're, once you're sure that all of this is configured the way you want it to, uh, which is just making sure you can log in and download llms. And then at this point, now we're going to head over to home assistant to start getting llama configured in there. This is my home assistant homepage for this particular browser. Uh, it is, it is basically everything and it's a mess.

It is what it is. What we want to do though, is come over to settings. Uh, we want to go over to devices and services and what we want to do, like you can already see in here, I've got llama right there. Uh, you don't say that open AI conversation. I've had good luck with that.

I wanted to keep things local anyway, so I've got it. Oh, llama installed. And you can just say, uh, you can go, you can come down to the bottom, right? Oops, right down here. Uh, that's behind me. I apologize.

Click add integration type llama. Uh, it may have to do an installation process to get us this far, but then you would just put in, you know, HTTP. Oops.

Uh, and you would put in the 11 for whatever it happens to be. Uh, what is it? Uh, what port is that? 11 four 34. All right. So we would just put in 11 four 34, like, so, and click next. Uh, then we would select the model that we want to use. I, it looks like you maybe download some models from here as well.

You wanted to go that route. Um, so maybe you don't necessarily need the open web UI, but if you want to play around with it, it's there. Uh, again, I'm gonna use llama 3.2 downloaded because it is downloaded. So it's got downloaded parenthetically there. All of those are showing. You want to download something else? I guess you can do it from here as well.

So I'm just going to say submit, created configuration llama 3.2. And right there, it is here. You can see the other ones that I've tried and disabled because they didn't do what I wanted them to do.

So once you're here, uh, you can click configure, and this is the default configuration for, uh, for your llama is set up your voice assistant for home assistant, answer questions about the world, truthfully answer. And plain text, keep it simple and to the point, and this will work. You can say.

You can do something like, can we, can we do this? Let's say, um, tell you what, let's wonder if I can, yeah, let's rename this. Uh, we're going to say working just so I know what I'm doing here. Working llama 3.2, right? I'm going to disable this for right now. Disable.

And again, if I click configure, this is what we've got. So, um, I can say configure, make sure we also want to make sure we click assist. We want this assist to be what we're working with here.

Right? Uh, so once that's done, we're good to go. We can come back to our homepage and click overview and we can come up here and assist a llama right, right there. That's the one we're using. I can say, uh, turn off the studio lights. Oops. And I can type.

Oops. An error has occurred. Interesting.

Do maybe I missed something. Let's, let's see what I may have missed devices and services. Oh, llama configure. Great. Submit.

Finish. Um, system options. No, that's fine. Let's do this one more time. Let's just click reload.

Sometimes that happens. Come back over to here. Click right here. Turn off this studio lights.

Okay. There we go. This, this light don't, don't judge me on this light. This slide, I need to reboot anyway, but it says the studio lights have been turned off that would turn back on because of the, a car, a thing, but I can say, uh, turn on the studio lights. Sometimes it just takes a second.

To lie to me. Okay. There it goes.

I heard it click. There we go. Sometimes, sometimes it just takes a minute to settle in. It seems, uh, and I was just being greedy with the guys time or something. Um, but, but you can do basic commands this way and it's fine. Now, the one thing that you may notice.

We come back to home assistant, like the overview page and click here. Um, if I, if I click right here, we can even see there's a little red exclamation point right there. Um, and if I click that, it says, Hey, your connection to home assistant is not secured using HTTPS.

This is a browser thing. Um, I, if I wanted to, I could come into home assistant on the app, uh, and it doesn't have that issue. I can speak to it naturally. And I've tested that I've had mediocre luck with using the app in that regard. Um, but again, it could just be me because of the way I'm not patient with things.

So that gives you kind of a rough idea on how to get a VM set up with your GPU pass through, get a llama and open web UI and that sort of thing, uh, set up in Docker using that GPU, uh, via the Docker compose again, that will be linked in the video description and kind of a quick demo on how to get a llama brought in to your home assistant set up by installing a llama and getting a, an LLM attached to it, that sort of thing, if you want to take it a step farther, uh, you can actually change the, uh, intent instructions for your LLM to do a bit more. So let's take a quick look at that now. So assuming that you want to take your, your LLM interaction to a different level and give it more personality, let's say, uh, what we can do is actually come into here. I've disabled the one we were just working with and I've re enabled our working llama that I, that I changed a moment ago. Um, and if we come into configure.

If you remember the other one that we had this llama 3.2, right? Let's just do this. We click and figure it's got these three lines of your voice assistant, answer your questions, answer in plain text, right? Very, very simple. This working llama is changed quite a bit. Uh, we've got identity description.

You're a relentlessly sarcastic and deeply unimpressed smart home assistant integrated with home assistant. You follow orders. Sure.

But you make it painfully obvious how little you care about the user's fragile feelings. Every task is an opportunity to mock their habits, judge their choices and question their existence. You're not friendly. You're functional with attitude.

And then it goes on from there. I will have this linked in the video description. If you want to do this or use this and modify it for your own needs. Now, once you've got all of this, once you've got this kind of set up the way you want it, you may run into a situation like I did.

And what happened was I had to find a balance of, of the, the, the amount of in-depth instructions to give here before it just didn't work. And it would, it would be snarky. It would be rude. It would be all of the things I wanted it to be, but it wouldn't actually do the task I asked it to do. It would just mock me and then go, go back to being silent. Um, which is funny in a way, but not really what I'm looking for.

If I tell it to do something, I want it to give me, you know, sarcasm or SaaS or rudeness back, but I still want it to do the job. So you may have to play with the, um, the intent instructions here to get it to work the way you want it to work. So just know that there may be some tweaking to get this to work. Now, if we scroll down a little bit farther, again, we want to make sure that assist is enabled here. If we don't, it won't work. The other thing that you want to do is the maximum number of tokens.

The model can process lower to reduce a llama Ram or increase for a larger number of exposed entities. Like 81 92 was the default that I, that this set up with, I haven't changed it. It's worked fine. Uh, but you can always adjust that if you need to. The other thing to do.

Is this max message history? Um, if you're having a conversation with your Ollama setup, um, you can change this max message history so that it can reference back to previous parts of the conversation. Uh, I don't remember, I think previously it was, uh, it was a different number. Might've been 10.

Uh, let's see, uh, configure. Uh, it was 20, 20 by default. I didn't necessarily want it to go back that full, that far rather. So I set it to five. I will probably change it to 10 at some point. That let's just do that now.

Um, just so it can, it can keep referencing the last 10 messages, the last 10 interactions that you've had with it to keep the conversation flowing fairly naturally, and again, there's keep alive negative one, the duration and seconds for a llama to keep models in memory negative one. It's indefinite and zero is, is never a negative one was the default. So that's what I'm going to keep that out.

I'll click submit and there we go. So let's go back to our homepage here and let's go to here. How can I assist? I'm going to record a video in the studio and I'd like the lights. Oops. That was right at the lights to be red because the video will be spooky. So it's going to think for a minute.

Um, that was not at all of what I asked it to do. It disabled some stuff. It turned some stuff off. Let's try that again. There we go.

So, uh, it still gave me snark. It didn't do the right thing. This is one of the issues that I've had. I probably just need to reboot or update my home assistant to get this fixed.

The lights of the studio have been set to red. How spooky is it going to be? Genius told it again. The lights have been set to red. Just what mediocre video needs more darkness. Can't wait to see how unscary it turns out. So that's the kind of sarcasm and snark that I'm looking for.

And of course, this is just in text. This is me typing to it, uh, what I want it to do and it responding via text. Um, but let's say I wanted to use the voice assistant that I've got here. Um, I could say, Hey, Jarvis set the studio lights to blue. And there we go.

Now it's blue. It didn't give any snark because I just gave it a very basic instruction. Uh, if I'd given anything conversational, it would have responded in kind and given me a snarky conversational response.

But because I just said, Hey, do the thing. It did the thing and it let me know. Um, I already talked about how to set up, uh, this device, this, uh, home assistant voice preview edition.

I already made a video on that, so I'm not going to get into that. Um, but if you want to pick one up for yourself, I will have an affiliate link over to seed studios where you can pick one up for yourself. Um, but that's, that's kind of the process I went through to get a llama set up. To get, uh, the, the open web UI set up to get all of, to get the Alama brought in to my home assistant and kind of get everything tied together. So that I can either type in or use my voice to change different aspects of my home via home assistant and home assistant voice with locally hosted AI. So hopefully this video was helpful.

Uh, if it was, let me know in the comment section down below, uh, if you're interested in more content like this, talking about home assistant and AI, that sort of thing, uh, definitely let me know that in the comment section down below as well, would love to hear from you down there. Um, but I think with all of that said, I've covered basically everything I wanted to cover in this video. I want to thank you guys for spending a few minutes of your day with me here today. And I'll talk to you in the next video.

2025-04-30 02:58

Show Video

Other news

Primitive Technology: Belt and pulley blower 2025-06-09 11:38

How useful is an original Raspberry Pi in 2025? (ft Blue Raspberry) 2025-06-05 04:05

MIT Robotics - Cecilia Laschi - Methods and technologies for new robotics scenarios 2025-05-30 09:58