How to Build a Local AI Agent With Python (Ollama, LangChain & RAG)

Show Video

in this video I'll be showing you how to build a local AI agent in just a few minutes using python we'll be using AMA Lang chain and something called chroma DB to act as our Vector search database because I'm going to show you how to add retrieval augmented generation into this app that essentially means we can retrieve relevant information from something like a CSV file or a PDF and bring that into our model now all of this is completely free you don't need an open AI account you don't need a clot account or something you can do this all from your local computer so let me show you how to set it all up so I'm just going to show you a quick demo of the finished product and then we'll get into the tutorial now you can see on the right hand side of my screen here that I just opened up a CSV file this CSV file just contains some fake reviews for a random pizza restaurant so we have title date rating and review and you can see something like best pizza in town here's the date here's the ID or sorry the rating of the review out of five and then you have what the actual review is and there's kind of some information now what I'm going to show you is how to build an AI agent here that can actually go and look up relevant reviews from this document to answer questions about the restaurant I don't know about you guys but whenever I go to a new place I always look at the reviews and typically I'm looking for an answer to my particular question so this can kind of do that for you so for example maybe I want to know you know how is the quality of the pizza okay well what it can do is then go to this document find the relevant reviews which you can see it kind of pulls into here and analyzes and then it gives me a conclusion overall without more data or context it's challenging to give a definitive score on the pizza based solely on the reviews however they do suggest a restaurant with potentially room for improvement in presentation and overall consistency so there you go right I could ask something like are there vegan options let's see what that gives us and you can see here in conclusion based on the reviews there appear to be at least one vegan pizza pizza and possibly more vegan options available Okay cool so that's what we're going to build this isn't going to be super complicated it'll be pretty fast so stick around around and let me show you how we make it all right so we have a few quick setup steps here and then we can dive right into the code now the first thing that we're going to need is obviously some kind of CSV file now you can use anything that you want and I'll show you how to adjust this code for your own example but if you want to download the CSV file that I'm using I'll leave a link to it in the description and in fact all of the code will be available from the GitHub so you can go to the GitHub and you can download this CSV file and just bring it into a new folder in VSS code so to begin open up some kind of code editor I'm using VSS code code create a new folder you can see I have one called local AI agent bring in the CSV file and then I also created this requirements.txt file which just has the three things that we're going to need to install in Python so let's get started with that installing our Python dependencies and then I'll show you the next steps so what we need to do is open up our terminal again I'm inside of the directory that I want to write code in for this video and what I'm going to do is create a virtual environment so to do that I'm going to type python DMV en EnV and then VV if you're on Mac or Linux you can change this to Python 3 and what this will do is create a new isolated environment that we can install various dependencies into if you don't know anything about virtual environments and you want to learn more I'll leave a video on screen now that the virtual environment has been created we need to activate it to activate it if you're on Windows is going to be dot slash the name of the virtual environment slash and then scripts with a capital S and then slash activate when you type that you should see that you get the name of the virtual environment as a prefix before your command line now if you are on Mac or Linux then the command is going to be SL venv and then this is going to be bin SL activate Okay so it's different if you're on Windows it's this one and if you are Mac or Linux it's going to be this one and again I'll leave a video on screen that we'll go through this more in depth now that we have the virtual environment activated what we're going to do is install the various dependencies inside of here now if you have this requirements. txt file then you can say pip install dasr and then you can do requirements.txt and this will install all of the requirements into our virtual environment however if you don't have the requirements.txt file you can just type them out so you can just install Lang chain you can install Lang chain dama and you can install Lang chain Das chroma like that okay so we just need to install these dependencies in order to be able to use these in Python so that's going to take a second installing of those dependencies for us and then once that's done I'll be right back okay so those are installed and the next thing that we're going to need to get is something called olama now olama allows us to run models locally on our own computer using our own Hardware so that's why we're able to do everything locally here rather than having to use something like an openai API key so please go to this page just ama.com if

you don't already have this software and simply download it once you download it what you should be able to do is just open up some kind of terminal or command prompt and then type the command olama if you have any issues with this again I'll put another video on screen that walks through AMA in depth and we'll show you how to set this up but once we have Ama installed on our computer what we're going to do is install an olama model now AMA again it's just this open source software and allows us to pull various models to our own computer and then run them using our own Hardware now depending on the type of Hardware you have that will dictate the models you'll be able to run for example you probably can't run a 200 g by model if you don't have a graphics card in your computer so I'm going to show you a few models that should work on most machines if you have a graphics card if you don't have a graphics card and you just have a CPU there's some very small models that you can download and use but obviously the performance won't be as good so what you can do is you can actually go to the olama library I'll leave this link in the description and you can see that there's various different models and it kind of shows you all of the options that they have now we're going to pull two models to our computer we're just going to pull llama 3.2 so this is kind of a smaller model that we can use that performs pretty well and then we're going to pull an embedding model and I'll show you the name of that in 1 second which we'll use to embed the documents that we add into our Vector store if that means nothing to you don't worry just follow along with the next steps okay so we're going to go into our terminal and again we're going to make sure that ama command works and then we're going to type AMA pull and we're going to start by pulling the model llama 3.2 now you can pull any model that you want you can choose you can go look at the directory but I'm going to go with 3.2 once that's done okay you can see it's here because I already had it downloaded then we can move on to the next one now the next model that we're going to pull is going to be an embedding model now this embedding model is going to be mxb Ai and then this is going to be Dash embed Das large there's various other embedding models you can use but this is the one we'll use for this video okay so we're going to go and hit enter and then again downloaded to our computer these are not super big so you should be able to run them on your computer if you have any kind of GPU all right so now that we have these models we're good to start writing some code so I'm going to go back into VSS code I'm going to make a file called main.py and in this file I'm going to start writing some code now you'll notice that I actually get this autocomplete here this is coming from GitHub co-pilot you know that really cool assistant that replaces a lot of your manual typing work they've actually sponsored this video and speaking of Microsoft's GitHub co-pilot I was fortunate enough to have them sponsor a video a few weeks ago on AI agents and today's video where I promise to highlight some of the standout ways that developers are using GitHub co-pilot that you guys submitted with the coding with co-pilot hashtag so let's get into it check out these examples of how developers are using GitHub co-pilot like Emy who created an entire flutter mobile app tug Duel who created a python script to resize and save images Adrien who used co-pilot as a beginner when he was working in Jupiter to learn better ways to write functions and Yousef who uses it to avoid manually writing tedious documentation Now personal I use GitHub co-pilot every single time I open up vs code and it's insane how well it can predict what I want to do next and save me tons of hours of manual typing it's literally like it can read my mind now I'm sure that you guys have more stories on how you're using GitHub co-pilot so please share them with me using the coding with co-pilot hashtag because I'm excited to check them out now with that said let's get back to the video all right so back into the code editor here let's go ahead and get started now we're going to begin by just importing a few things so we're going to say from Lang chain. llms import the olama llm we're then going to say from Lang chain core. prompts import the chat prompt

template okay now if you're unfamiliar with Lang chain this is a framework that just makes it a lot easier for us to work with llms it's very popular in Python and it has all of these extensions like the AMA extension that allows us to directly use our llama models and by the way what will happen is a llama should be running in the background on your computer and it's going to expose a server or like an HTTP rest API that we'll be able to communicate with from our program so when you pull these models they are actually running on your own computer and we can trigger AMA to utilize these models from code in python or we can actually just do it directly from the command line so everything that I'm showing you here will run 100% locally on your own computer even though it might not necessarily feel like that it also means it'll be pretty fast okay so after this we're going to specify our model now I'm going to show you in this code snippet here how to utilize an AMA model like quite quickly and then we'll start connecting some more complexity to it with the vector database and I'll talk about what that means so I'm going to say model is equal to oama and then inside of here I need to specify the specific model from olama that I want to use now if you're confused on what to put here you can open up your command prompt you can type a llama list like this and it will show you the models that you have available so you can see that I have this embedding model I have llama 3.2 I have mistol I have llama 2 so any of these models I can use so what I'm going to do is just copy llama 3.2 you don't need the latest part of it you can just do the original name and you can put it right here okay so I'm going to use model o llama model equal to llama 3.2 and now I can start utilizing this model and kind of invoking it so next what we're going to do is is we're going to create a template and this template is going to be just a string and inside of this string we're just going to specify what we want the model to actually do so we're going to say something like you are an expert in answering questions about a pizza restaurant okay here are some relevant reviews and then we're just going to put inside of a variable here reviews and say here is the question to answer okay and then we're going to put a question perfect then what we're going to do is we're going to say our prompt is equal to a chat prompt template we're going to pass our template and actually we don't need to pass the model I don't know why it's doing that and now we've created a chat prompt template where we'll be able to pass in a reviews variable and a question variable and then the model can respond to that okay then we're going to create a chain so with the chain we can say prompt and then we can put a type and then we can put model now what this allows us to do is essentially invoke this entire chain that can combine multiple things together to run our llm so first what we'll do is we'll pass variables reviews and question into this prompt this chat prompt template that we just created and then that will automatically get passed to our model because we put it inside of this chain and then it will return to us whatever the answer is so if we want to test this out really quickly because this is literally all we need to in order to do this we can say chain. invoke and then inside of a

python dictionary we need to specify the two variables that we had inside of this prompt so we're going to have reviews and then question okay so we'll start with reviews and for now we can just make this an empty list and then we can say question and something like what is the best pizza place in town that might not necessarily make sense because this is just about one pizza place but I just want to show you a quick demo so we're going to say result is equal to this and then we're going to go down and we're going to say print result okay so we can just test this out and make sure that it's working and it should go ahead and invoke our olama llm and give us some kind of response so let's go here and run this we can do that by typing python the name of our file which is main.py or Python 3 main.py so I'm going to hit enter give this a second to run and we got an error some kind of formatting issue so let's see what the problem is okay so silly mistake here what we actually need to say chat prompt template. from template I forgot to specify this method so of course that was giving us an issue so let's go back here and fix that quickly Python main.py and we should see that this works now give it a second and you can see it say based on our customer feedack and ratings I would highly recommend this the top rated pizza place One reviewer mentioned blah blah blah blah in fact our own team has sampled their pizza so it just came up with something random here because I didn't actually give it any reviews so it's kind of hallucinating the response but you get the idea okay it did actually work we were able to use AMA and we got a response from the model which is really just the point of what we were testing here okay so now what we're going to do is we're just going to put this inside of a y Loop so essentially we can just keep asking it questions and then we're going to set up the vector search so we can actually get a relevant response so let's set up a simple Loop here we're just going to say while true then we're going to ask a question so we're going to say question is equal to input so we can get some input from the user and we'll say you know ask your question and then we're just going to put a set of parentheses here and say Q to quit so if they type Q then we can quit we're going to say if the question is equal to Q then we are going to break otherwise we can invoke this chain so we're going to say result is equal to chain. invoke okay and for the question we'll just put the question the user asked so we'll replace this with question and then we can print the result now we also can just have a few kind of formatting variables here so I'm just going to say print and I'm just going to print kind of a big line with a few back slend characters and then same thing here I'm just going to print a few back SL ends so we can kind of read what's happening okay so we don't need to test this but this will just allow us to continue to ask questions until we type in Q now what I want to do is show you how to set up the vector search all right so we're going to create a new file here called vector. py can call

this anything that you want and here's where we're going to write the logic for actually embedding our documents and then looking them up or vectorizing our documents now in case you're unfamiliar with Vector search this essentially is going to be a database it's going to be hosted locally on our own computer using something called chroma DB which we installed earlier and this is going to allow us to really quickly look up relevant information that we can then pass to our model and then our model can use that data to give us some more contextually relevant replies so obviously llms are really good at kind of synthesizing text and giving us responses but usually they don't have the correct data so in this case what we're going to do is we're going to take this entire CSV file we're going to put it inside of this Vector enabled database and then as soon as we ask a question we're going to look up the relevant documents in that database we're going to pass those to the llm as a list of reviews and then it will be able to search through those reviews and answer our question okay so that's like the very Basics on Vector search let me show you how we do that so we're going to say from Lang chain uncore olama we're going to import the olama embeddings okay now one thing that we need when we do this vectorization process is an embedding model this model will be able to take text and convert it into a vector this is essentially numbers that we can then use to look up data really efficiently next we're going to say from Lang chain and then underscore chroma and we're going to import chroma like this which is which is going to be our Vector store we're then going to say from Lang chain uncore core. document import a document we're going to create documents and then pass these to our uh what do you call it chroma database we're then going to import OS and we're going to import something that I forgot to install before which is pandas as PD okay now pandas is a library that we can use to really easily read in our CSV file so just quickly before I forget we do need to install this so same as before we're going to type pip install pandas in our virtual environment and then we should install that dependency and be able to use it I'll also add it to the requirements.txt file so if you guys were to have downloaded this before you would already have it okay so pandas is installing we can just wait for that to run and start writing some more code so first things first we're going to load in our CSV file we're going to use the data in the CSV file for our Vector store so of course we're going to need the data so we're going to say DF standing for data frame and this is going to be pd. read _ CSV and we're going to read in the realistic uncore restaurant uncore reviews. CSV and obviously you know read in whatever the name of your CSV file is I think that I spelled that correctly although maybe not restaurant let's see you know what we can just do this rename copy and then paste here to avoid any misspellings okay anyway so we have our data frame here next we're going to bring in the embedding model so we're going to say embeddings is equal to the olama embeddings and then we're going to say model is equal to and then the name of the model that we installed which is mxb ai- embed D large okay now after that we're going to specify the location where we want to store our Vector database so I'm going to say do slash and then chroma Lang chain and then this is going to beore DB you can call this anything that you want but this is just going to a folder where we store our uh database okay next after that we're going to say addore documents is equal to and then we're going to say not os. path. exist

and then the database location now what I want to do is I want to check and see if this database already exists if it does that means that I've already performed the process of converting the CSV file into vectors and adding into the database if it doesn't exist then it means that I need to do that okay so we don't need to keep doing this every single time we can just one time vectorize our data and then once it's vectorized and it's in the database we don't need to do that again we can just start using it so below here I'm going to say if add documents so if we do actually need to add them then we're going to say the following we're going to say documents is equal to an empty list and we're going to say IDs is equal to an empty list as well then what we're going to do is we're going to iterate through our rows so we're going to say 4 I comma Row in DF do eer rows this is simply going to go row by row through our CSV file and then allow us to access the various entries now what we're going to do is we're going to create individual documents we're going to add them to the documents list and then we're going to add them to our Vector store okay so we're going to say document is equal to document and inside of this document we need to pass three things we need to pass a page content and this page content is going to be what we will actually be vectorizing and what we'll be looking up so if you wanted to adjust this for your own example any of the content that you want to use to actually look up the information in the database that needs to go in the page content so what we're going to do is we're going to combine the title of the review with the review itself so that we have a bunch of information to be able to actually query our data okay there's all kinds of different things you can do here but you want to include the important information that you'll be querying based on in the page content so we're going to take row at title and then we're going to say plus a space and then row at review okay then we're going to specify some metadata co-pilot's already doing it for me so we have metadata and then rating and that's row rating and then we're going to have date and then this is going to be row date okay so the metadata is just additional information that we will grab along with the document but we won't be querying based on the Met metadata okay so hopefully that makes sense again just additional data that will be included with the document but it won't necessarily be used to actually query and then lastly we can specify an ID so we're going to say the ID is the string of I which is just the index of this value in the row or in the uh what do you call it the CSV file and just make sure that you convert this to a string okay so I think that should be good for now after this what we're going to do is we're going to say IDs do append and we're going to append string I and then we're going to say documents. append and we're going to append our document now the reason why we need to store the IDS is because when we actually create this data in the vector store for some reason we need two separate lists we need a list of documents and then we need a list of their Associated IDs in case for some reason they're different so I know it seems a bit weird that we have the ID twice but just follow along because we need that for this process okay so now we've kind of prepared the data in documents and the next thing we need to do is add this to the vector store so we need to create the vector store so we're going to say Vector store is equal to chroma and then inside of chroma we're going to specify the location and the collection name so we're going to say collection name is equal to restaurant reviews we're going to say the persistent directory co-pilot is leading me wrong here is going to be equal to the DB location now this just means that we'll store it persistently rather than just storing it in memory you don't need to do this but I recommend that you do store this permanently so that you don't need to keep regenerating this chroma database and then lastly we we need to pass the embedding function which will be equal to our embeddings from olama okay so we're using all of this stuff locally we have the chrom ADB locally we have the local embeddings model and now we have the vector store next I'm going to do a quick if statement and I'm going to say if add documents then we're going to say Vector store. add documents and then this is going to be documents is equal to documents and IDs is equal to IDs okay so this is how you add this you just say Vector store. add documents you specify the documents that you want to add which we've already prepared here and then you specify the corresponding IDs and we're only doing that if this did not already exist because if it did already exist then we don't need to add the documents right and we wouldn't have already prepared this data hopefully that makes sense but that essentially will create the vector store for us and automatically add the data last thing we're going to do is we're going to make this Vector store be usable by our llm so I'm going to show you how to do that we're going to say retriever okay is equal to the vector store. asore retriever okay now inside of here there's a few parameters that we can pass for example we can specify the number of documents that we wanted to look up so I'm going to say search keyword arguments is equal to K and then five now when I do this what's going to happen is it's going to look up five relevant reviews and then pass those five reviews to the the llm now if we wanted 10 reviews we would make this 10 if we wanted one review we would make this one you can specify as many or as few as you want obviously minimum of one but I'm going to go with five okay so now we have the Retriever and what this retriever will allow us to do is look up documents then we can pass those documents into the prompt for our llm so quickly recapping we import all the relevant data we bring in the CSV file we uh Define the embeddings model from llama we check if this location already exists if it doesn't then we're going to prepare all of our data by converting it into documents we're going to initialize the vector store if for some reason this directory already exists then there's no need to add the data but if it doesn't exist then we're going to add this data into the vector store by adding all of our documents this will automatically embed all of the documents for us and add it to the vector store and then we can create this retriever from the vector store which will allow us to grab documents so the last step is to Simply use this retriever from our main.py file

so we're going to go into main.py we're going to say from Vector import retriever because we're just going to import it from the other file and now before we actually invoke this chain we can use the retriever to grab the relevant reviews and then we can pass the reviews as a parameter to our prompt okay so in order to do that we're just going to say the reviews is equal to the Retriever and then this is going to be Dot invoke we're just going to invoke this with our question and then we can simply pass the reviews that are returned here to our chain so all that we do here is we just say retriever do invoke we pass the question or like the search string that we want to use to look up the relevant reviews what will happen is the retriever is automatically going to embed that question it's going to go into the vector store it's going to look up all of the relevant reviews using a similarity search algorithm it's going to grab the top five reviews and then it's going to pass this to our chain and then we can print out the result and hopefully we get something meaningful based on those reviews so let's give this a run now and pray that it works with python and then main.py give this a second to run you can see that it creates this chroma Lang chain DB directory it will take a second because it does need to embed all of our documents and now we can ask a question so I'm going to say how are the you know vegan options if I can spell anything correctly which apparently I cannot okay so let's see what we get here and you can see that it pulls up a few different reviews here and it says based on the reviews provided appears the vegan options of the PE Peach Restaurant are a mixed bag on the positive side some reviewers have raved about the vegan pizz saying they're hidden gems okay and it even tells us what document it got this from however not all reviews are glowing One reviewer had a vastly different experience with the vegan cheese option calling it tasteless and then it says overall it seems that the vegan opt options are Hit or Miss but there's definitely potential and then it gave us a overall rating three out of five based on the two positive views out of the four total okay cool we can also ask it something like you know how is the Ambiance or something I don't know if I spelled that correctly but let's see what it says said overall I would say the Ambiance of the pizza restaurant has an all Style no substance feel and apparently they don't like the pizza restaurant based on these reviews but you guys get the idea it is insanely fast it uses the vector store database everything runs completely low locally and we're ready to quit we can hit q and we can exit out this was a simple example that was just meant to demonstrate how you can run llms locally on your own computer using your own Hardware obviously you can adjust the CSV file and you can make this any type of data that you want it also doesn't need to be CSV data you can just convert anything that you want into documents like I demonstrated here and if you want the code from this video it will be available from the link in the description if you guys enjoyed make sure to leave a like subscribe to the channel and I will see you in the next one [Music]

2025-04-03 07:41

Show Video

Other news

The Minimal Phone sucks...and that makes it GREAT! | Digital Minimalism Device 2025-06-05 07:11

Claude 4: Everything you need to know 2025-05-29 15:05

Varun Chhabra, Dell & Kari Briski, NVIDIA | Dell Technologies World 2025 2025-05-27 02:08