Optimize RAG with AI Agents & Vector Databases

Show Video

Have you ever had a tremendous amount of data in your VectorDB, and you're using that for retrieval-augmented generation? But the context you're getting to give to your LLM to produce those great results is kind of lacking because it's pulling in different data that shouldn't really be pulled in with your query? Well, today I'm gonna show you how to seamlessly integrate multiple AI agents into your application to combat this kind of problem. We will walk through a practical example that covers query categorization, context retrieval from a VectorDB, and natural language response generation, all using this multi-agent approach. This session is designed to give you a clear step-by-step guide on working with agents in your projects.

Let's dive in and explore how you can leverage these tools to build smarter applications. So in the description of the video, you should have a link to this repo. The first thing we're gonna do is just clone down the repo to our local machine. Okay. And let's go into the repo and then into the UI directory. So if you look at the structure of the application, we're gonna have an API, which is what we're gonna be working on today.

And then we have a UI, which we're not gonna work on, but that's what's gonna render what we're doing to your browser. And so we're gonna go into the UI, and we're gonna install the dependencies. So the first thing is just install the root dependencies, and once that's completed, we're going to run a setup script. So, the UI, even though we're not working on it, the UI is made with React TypeScript, and it has an Express TypeScript server, and we're using something called Carbon components. And I just wanna highlight Carbon components for a second. So if you search Carbon Design React, it's gonna bring you to the docs page.

And this is where I get all like the components to put into the UI. It's super easy, especially if someone who's not a particularly adept front-end developer. It just makes it super, super easy and they look good. And you know, they have the code, everything you need.

So my suggestion is, even if we're not working on the UI today, go look at Carbon Design, look at what we can do and maybe change the application when you're done with it. Alright, so we're gonna wait for the dependencies to install and we'll be right back. Okay, the dependencies have been installed. One last thing to do within the UI is we're gonna copy the client's env example into, back into the client, and we're going to create an env. And we're doing the same thing for the server. Copy that env example and place it right back as .env in the server.

Something we also could do, if we want to go into client.env, we've added a way for you to brand the application after/if you want to make something your own. So we have a branding and an application name. So for this one, we're going to say Agents in Action! Okay. So now we're done with the dependencies, and we're done totally with the UI today.

So let's go back to the root and then let's head over to the API. So the API is written in Python. So let's create a virtual environment. We'll name that aiagentic. And once that's done, let's activate.

Let's activate the virtual environment. And now we're gonna install all the dependencies. So this is gonna take a little while. We're installing CrewAI. We're installing watsonx.ai, a ton of dependencies. And so, once this is completed, we'll continue along with the tutorial.

Okay, so now that the dependencies have been installed, we have to just copy the .env.example and paste it into .env in the API. Let's take a look at what's in that env. Because we're using watsonx.ai, we need to have the connection strings for, to connect to watsonx.ai. So we're gonna head over to Cloud. And if you go into your resource list, just open up your Watson Studio and go into IBM watsonx.

And it's just gonna log us in. And we're gonna head over to Prompt Lab. Now, I'm sure there's a better way of getting this information than the way I do it. But the way do it is I just go to Promp Lab. At the top right, we have a view code, and it shows you a cURL command.

And it has all the, most of the stuff that we need in order to make that configuration with our API. So grab from the cURl the base URL and paste it here in the WATSON_URL. And we go back, and we grab the project ID.

Just copy and paste that. And then finally, let's go back to cloud.ibm. We're gonna go at the very top. You're gonna have a Manage and you're gonna wanna go to the Access (IAM). And when we get there, on the left-hand side, you're going to see something called API keys. Let's create a new one.

Call it AGENTIC and create. So let's just copy that and put it right here. And that's it. Now our API is set up. So the next thing we wanna do is check out into a new branch. Alright, so let's check out our first branch.

It's gonna be a one-step. And what we wanna do is we wanna start up all our services. We have three services. Remember we have the FastAPI, the React UI and the Express Server.

So let's first start up the. Wait, we have to go into the API directory first, and then we could run our uvicorn command. We're just gonna run the uvcorn server up, reload, and we're gonna start our FastAPI. Also, let's have two more windows, because we're gonna go back to the UI, and we're gonna start up the client, and then, we are gonna start up the server. And all these commands are in the repo, so you can just copy and paste them. So we're just waiting a second for Uvicorn, for the FastAPI to get all ready, and we'll head over to the browser, and we could already see what the UI's gonna look like.

Beautiful. It's a chatbot. You have a chat window and a couple buttons, but what it's gonna do on the backend is gonna be pretty cool, I think. So if you go to your API directory, there's gonna be a couple of folders that you're gonna be interested in and a questions.txt that I've been using. So let me copy and paste this into our chat window.

And what we're gonna wanna do is when we hit send, we're gonna want the backend to categorize that query, grab the correct data from the correct collection of VectorDB and ChromaDB. And then we're gonna want to pass that to a customized prompt and return a nice response. Currently, it's just gonna say this will be generated by our multi-agent process and the category is something cool, but it will be something cool once we set it up. So let's go and run the first script that we have. So if you look in API scripts, and you look at the process document script.

This is how we're going to create our ChromaDB uh VectorDB, right? We have a directory called docs, and in that, we have three text files, one called accounting, billing and technical. And what I'm trying to show here is that this is, imagine you have a tremendous amount documentation. And some of those, when we query them in a VectorDB, the cosine similarity distance is gonna be pretty close for stuff that might not be relevant. So we isolate topics, right? So we have an account, we have a technical, we have the billing, and I'm just trying to recreate that locally. Something very simple.

So if you look at the script, all it's doing is it's saying okay, what's the file? It's gonna loop through all the files and docs. What's the file name? Create a new collection with that file name and insert the embeddings for that file into that collection. So let's run that script. And you'll see.

First, it's gonna say okay, is there any documents? No existing collection found. So it's going to create three new ones. And we're going to have account, billing and technical. Before we could do any of that, we have to first categorize the query.

So this is where we're gonna create our very first agent. It's going to be the categorization agent. So the route we're looking at that we're gonna change is called the agentic route. And here, you can see in the docstring I have what each of the agents are gonna do.

The first one is gonna be the query categorization, then we're gonna have the context retrieval, and then we're gonna have the response generation. And these are all gonna be agents. So let's bring in our agent framework, which is CrewAI.

So we're to say, from crewai, import the first class we're gonna bring in is agent. Then we're gonna bring in the task. We're gonna bring in the crew. We're gonna bring in a process. And we're gonna bring in LLM.

So let's go back. And you can see in the first docstring. It's an LLM-powered agent. So this is where we start connecting watsonx.ai to the CrewAI agentic framework.

So let's create our first LLM. And you look at the docs. The only thing we're really concerned with from this is just the model. We're gonna use a watsonx.ai model, temperature, max tokens, and then all the connection strings. So if we look in the server.py, we have a list of available models that you could use from watsonx.ai. So I'm gonna use the Granite 3.8 billion.

So let's bring that in. And we just have to append watsonx to the front of it. Then we're gonna set the temperature. Now this is from trial and error, but 0.7 works well. And the max tokens is 50.

Because again, all this is doing is just categorizing a query, right? We're not doing any massive return or anything like that. So the next thing we have to do is we're gonna bring in the connection string for Watson. So we're gonna need the API key, the project ID and the URL. So let's say URL is going to equal OS And you get... the key. Let's just paste that in.

And then we're going to have the API key, where you get that from our env. And then finally, we're gonna get the project ID. We're just gonna, again, just bring it in.

So let's bring it into the the CrewAI LLM class. So we're gonna have the base URL is gonna be the URL. The API key is gonna the API key, and the project ID is gonna to be the project ID. So now we have to create our first agent. Let's do it. We're gonna name him categorization agent, a very clever name.

And if we look at the docs again, we can see exactly what we're gonna be doing here. So the top three are particularly cool, right, to me. You have a role, a goal, and a backstory. If you go to the CrewAI docs, you have really, you know, they are really good about explaining exactly what all those attributes do, but I just wanna read them to you. So the role is defining the agent's function and its expertise within the crew.

Remember, it's a crew of agents. The goal is we're gonna, it gives you the clear defined goal of what it's going to do. And then the backstory is great. It's just provides context and personality to the agent, which I find very, very cool.

So let's start with the, uh, let's start with the role. And so for the categorization agent, the role is gonna, the one I've come up with is Collection Selector. Now, if you've worked with LLMs, you know a lot of these, and this probably looks like prompt engineering to you because it kind of is.

It's just trial and error. This is what worked out for me. So I gave it the role of a collection selecter because it's selecting the collection.

The goal is to analyze. It's going to analyze the user queries and determine the most relevant ChromaDB collection. And then, we're gonna give it a backstory. He is an expert in query classification. And he routes questions to the correct domain.

Alright. Finally, we're gonna do, add a couple of the other things that we need. Verbose, I'm going to set to true because we want to see what it's up to in the logs. Allow delegation. So this is interesting. Remember, we're gonna have multiple agents, and they could all have different goals and different expertises.

And it can make the decision on what it wants to do based on that, right? Like, okay, this is not for me, let me send it to another agent. It's gonna allow it to delegate. But in our case, we don't, we're just going sequentially, where we have three agents.

We need them to do exactly what we want them to. So we turn delegation off. And in line with that, there's also something called max iterations. So it defaults to 20. But in our case, again, like if something is not working, because these are pretty simple tasks, if something's not working, it's just gonna try to, try it over and over and again.

We just have to fix the code, at least in my experience of what's happening. And finally, we have to give it its brain, right? We're gonna give it the categorization alarm. This is its reasoning capability.

This is how it's gonna actually do what it needs to do. And that's the categorization alarm that's using Granite. So now we have an agent who has a brain and has a role and has a backstory and has a whole life story. We have to it a task. We are going to ask the agent to do something. So let's create our task.

And again, let's look at the docs. So the things we're gonna be concerned with, obviously, is agent. We have to assign this task to our categorization agent. Description, which is gonna be really just a prompt. And then output JSON, which is really important for what we're going to do, because we're gonna send this first agent's response directly back to the UI.

So it has to be formatted in a particular way. So I'm just gonna copy and paste the description, because it's just a, it's a prompt, right? Like if anyone's used prompts before, this took me a while to get correct. But you can see exactly what I mean, right? This is a prompt. We're saying, look at the query and determine the best category.

You must only return one word. Because again, we're gonna be using this later down the line as the classification agent. And then we give it category definitions, and we're really kind of broad with it because we want to give agency to the agent. So we're just saying, okay, this is what a technical query could look like.

This is what billing... We're giving that. We're giving the agent agency here. And then finally, I just really want to hammer home: please, just only one word from this list. And then, we also have a expected output.

And this is important because we need something explicit. So we want a JSON object with category field, and it has to either be technical, billing or account. The agent we're assigning to it obviously is the categorization agent. And finally, the last thing that I mentioned was the output JSON.

And I find this really nice. So the output JSON takes in a Pydantic model. So let me copy and paste the Pydantic model I have. And I'll show you what I did. Paste it in here. So we have a category response.

I'm expecting a JSON object with a category field, and the value is going to be either technical, billing or account. Now I added this description because I have a feeling that the agent is actually looking at the description of these models before it's responding. I not don't don't quote me on it but that's what I think it's doing. So I added it, and it worked well.

So from all I can tell it is working the way it's supposed to be working. Alright, so now we have a agent who is powered by our LLM and has a task to follow. So let's create the first crew.

You look at the docs here, we have tasks, we have agents, we're going have, we're gonna use the process, and we're gonna have verbostics. We want to have some responses. So first thing to first is let's add the agent.

And right now, we only have one. Then we have to add the tasks. And that's gonna be the categorization task.

Remember, this is gonna be a crew. There's gonna a couple of agents here. And then we're gonna have verbose. We're just gonna set it to true again because we wanna see what it's doing. And then finally, we have our process. And the process is going to be sequential.

Because if you look at the docstring; we're going just step by step. So now that we have the crew, let's have the crew kick off. And we're gonna call the kickoff method from the crew. And if you remember from the category response, we're expecting it to be a JSON object with category as a field. So what we're going to do is we're gonna, instead of sending back something cool to the UI, we're going to grab the category result and we're going to try to grab that category from the response.

Make sure nothing broke. Looks good. We have the category result. Let's test it out. So let's copy and paste what we already sent.

And hopefully it returns back category something, category technical. Perfect. So if we look at the logs, we can see exactly what it's doing, right? We are using the collection selector, that first agent. You can see the test that we're giving it. We pass in that user query, the one that we sent from the UI.

And then we get the final answers as exactly the structure that we were looking for. So the UI could ingest it and render it nicely. Alright, so now that we have the basic categorization and agent in place, let's move on and enhance our pipeline.

So let's just commit our changes, and let's check out the second-step branch. Alright, perfect. Nothing broke. Great Okay, so the next step here, if we go to the docstring, is now to retrieve that data from that VectorDB, right? Let's make it. The only, so we're gonna do the same process, we're gonna copy and paste the categorization LLM. We're gonna create a new LLM, and this is gonna be the retriever LLM. It's going to necessitate more tokens.

Like I said, it's 1,000, but everything else is going to stay the same. And then, we're just going to grab two more: an agent and another task. So this is the retriever agent and the retriever task. I'm going to copy and paste this from our notes, and I'll explain exactly what they're doing. There is going to be a significant difference here, and you'll see the error right away is that it's using a tool, and I'll explain what we're doing there. So the retriever agent has a job, right? It's going to take that category that it's receiving from the categorization agent, and it's going to pass it to a function that is going to query our VectorDB.

And so that function is going to be that tool. So let's create our first tool. We're going to name it the query collection tool. Let's define it.

It's going to take. What does it take? It's only taking the category, and it's gonna take the, it's going take the query to embed. So, query, string, and it's gonna return a dictionary. Perfect. Instead of docstring, this is going to be the tool to query ChromaDB based on category and return relevant documents.

Now if you've ever worked with RAG, if you've ever worked with VectorDB, the functionality of this is gonna be very familiar, right? So let me just copy and paste what that actual tool is gonna do. We're using the watsonx.ai embeddings. Don't worry, like if you don't have that, you could use your own embeddings model if you have it locally. My computer is not capable of it at the moment. The interesting thing here, though, is this part. We're grabbing the category that was returned by the categorization task, and we're using that to query the VectorDB, which is fascinating because you're just saying this is what you do and the LLM is doing it, the agents are doing it.

So that is very, very cool to me. So now that we have that, we have that tool, you can see what the retriever agent is doing, and we give it the task, you know, we're passing in that query from the route. We have an expected output. We're not worried too much about this because we're not ever gonna send back the context to the UI, so we're not really enforcing that output JSON.

But the only other thing I want to mention here is that we had to add a context. And context is: We're giving access to that categorization tasks, like what it's output was. So it knew that's how we're getting that category and that's how we're telling the agent, look at this category, you have the query, pass it, use this function, and then call it and return back the context. So for us now, all we're gonna do is we're going to add the new agent to our crew. Welcome. And we're gonna add the new task.

Process is still gonna be sequential. This time, let's just remove the category, like actually grabbing it, because we're not gonna be returning that anymore. So we'll just say, bye for now. But what we are gonna do is we're gonna print out that category result. Okay. Make sure everything comes up.

Good. We're gonna print out that category result and we're gonna see exactly what happens when we send over that request, that query from here. Hopefully, we'll watch the new agent do exactly what we want it to do. Okay. Let's get there. Okay, so we're okay.

It already has the correct category. So now that it's gonna take the collection... Look at that. We got the result from the RAG. It used that category and passed it to our ChromaDB collection.

So we got that collection and then we queried it. And now it returns back all the context for that query by basically by itself. We just told it.

Yeah, and so, we have, you could see almost exactly what we're gonna send to the final agent, right? We're gonna send it the category because we want to return it to a UI. We're sending it the query. And now we have the context from our VectorDB to augment, retrieval, augment and generate our response. I know I think that is particularly fascinating. So that's just a way that you could use tools with agents. And that was really just our retriever agent, right? Like we're able to use that function and tool in this case means like we're using a function.

We're giving an agent tools to use a function and that is very, very cool to me. So we're done with the retriever agent and we're gonna move on to the generation agent. And this is the final step of the application.

So let's just commit our changes. And let's check out the final step. Let's go back to our API.

And if we look back at our docstring, we know what the final step is. We're going to.... We're going to create an agent that creates a nice response for the user. Basically, everything that we just did, we're gonna do one more time.

And I really like this, this uh, this pattern, right? Like I've creating their own LLMs for each of the agents. I find that to be very, very nice, because we could set different, for us, we didn't really set any, the only thing we're changing is the MAC tokens. Obviously, we want the response to have more leeway. But other than that, we're just, like, we could make it drastically different. Each LLM could be, that we power, can have a different model if we're using watsonx.ai. We could use Mistral.

We can use Llama. We could use whatever we want. So let's add the final, the final, to the task and the final agent, which is gonna be our generation agent.

And once again, we're gonna be using a tool, and I'll explain why in a second. So again, we gave a roll, we give it a backstory. We have a...

an LLM. Let me just make sure I named it correctly. Oh, yeah, there's a generation one, not the response. Let me just update that. Okay.

But we have, we're missing one last tool. Okay. And this tool, what I'm gonna show you is how I found the prompt for this. So this tool is going to interpolate that query and that context into a nice prompt. And where I got the prompt is if you go to your projects, you can create this accelerator, just like look up watsonx.ai RAG, and it will give you this accelerator that you could just create.

And within there, they have prompt templates written by the people who train the models, you know, or work with it a tremendous amount. So this prompt template is just, I'm going to take this, because they wrote it better than I, and I'm really not a particularly good prompt engineer, to be totally honest. So I just copy and paste this, and I wanna then interpolate the context that we received from the ChromaDB into the context and the question from the query, right? So what we're gonna do is we're gonna create another tool, and I'm just gonna copy and paste the tool and the process, and uh the prompt. And we give access to the generation agent. So this generation response tool, the generate response tool, you can see exactly what I'm doing.

It's grabbing the context. It's grabbing the query and it's sending it to this prompt. And finally, there's one last thing we have to do, which is create a Pydantic model for the output. Because now we're sending back the entire thing to the UI. I really want to enforce that it's just gonna be a JSON. I really want that category, and I really want that, I believe I call it, response.

I'll figure it out. Let me look at what actually I called it. But I need both of those to be there in order for it not to, you know, blow up on response.

So let me copy this model. Let's add it to the top over here. Okay, and this is going to be... So let me just copy this model and paste it right here. It's gonna be the final response. This is gonna be the JSON object that we're looking for that has a category field and has a response.

And we're gonna send this back straight through directly to our UI. So that's why we have this in the generation test. That's why have, we're trying to say, okay, this is what we want, this what we want it to look like, and this what you need to return.

So let's add our final agent to the crew. Let's give him his final task. Okay. We have the crew kickoff. Let's just call this crew result now, because we no longer need to send back that hard-coded response.

Get rid of this if you're not sending this. Okay. Make sure nothing broke. Perfect. And let's see if we get a nice response. Okay. Alright. It grabbed the correct category, okay. It sent that category to the retriever who she returned back all of the context from the RAG, from the VectorDB.

Now she's going to send it to the final agent who's going interpolate that into that prompt we cribbed notes from, from Watson Studio. Let me see it. Perfect, okay, yeah, so you see it, the response has, okay, we have, we have everything set up. It has the context, and look at that. Huh? That worked! I'm not surprised. It worked before.

I built this. But still, it's always kind of surprising. It's an amazing, it's an amazing technology. So you see, we have the, this RAG response. Let's actually double-check to make sure everything looks right.

So we have, it's referencing error 01. So let's look at our docs. And let's make sure that we have the correct stuff. Error one.

Session expired. Clear your browser. And let's see what that says. Perfect.

Yeah, so it worked exactly the way we wanted to. Something I really like is it's able to give me back like different steps, like responses in the pipeline. I'm able to pass them along during the agent. So I'm able to categorize the query.

I'm able to show a really, really nice message and a good, accurate response. And it's all done with these agents. I think it's very cool. Umm, yeah. And so obviously we could, we can enhance this, we could refactor it, we could change the parameters that we're using for the LLM to make it do different things. We could use a totally different LLM, totally different models to have whatever we want.

What I really want to do, the two new agents that I really want to make, I'm probably gonna do it later, is to route queries to the web if it's not part of the ChromaDB collections, if it's able to categorize and say, okay, this is out of the blue. And also I wanna really, I wanna, I wanna format that response when we get it back to the UI, to maybe maybe format it within HTML and have another agent do that, right? Look at this and put this into a nice HTML package and post it on as a response. Awesome. We've built a pretty sophisticated multi-agent pipeline here.

So let's just recap. We built the backend to an agentic RAG chatbot that is able to identify the queries category, target the correct ChromaDB collection and interpolate the query in the context into a custom prompt and generate a natural language response. So with this application and this process, we would love for you to explore additional use cases, customize the UI and experiment with the CrewAI framework, and build something really cool. Maybe add a route that makes a web search if the query is just totally out of bounds. Maybe create an agent whose only job it is is to format the response in a particular way. We would love to see anything you do with it.

Dive into the code, have fun, build something cool, refactor it, make it better, just be creative.

2025-05-05 06:31

Show Video

Other news

Nvidia Shrugs Off China Concerns With Upbeat Forecast | Bloomberg Technology 2025-05-30 11:26

Nvidia Earnings Ahead, SpaceX Faces Another Setback | Bloomberg Technology 5/28/2025 2025-05-30 02:57

Salesforce to Buy Informatica, Apple’s Tariff Headwinds | Bloomberg Technology 5/27/2025 2025-05-29 12:47