you can build some pretty insane applications using just llms even if you don't really know what you're doing but what separates a good AI app from a great AI app is one thing and that's data yes llms are good at reasoning they can do a lot of stuff out of the box but they get even better and more powerful when you provide them with the correct data and context and useful tools so that's why in this video I'm going to show you how to do exactly that now a few weeks ago bright data a long-term partner of this channel actually issued me a challenge they wanted to see the best AI application I could build in just a few days using a ton of their data and apis so that's exactly what I did and that's what I'm going to demo for you in this video but I'm not only going to show you how it works I'm going to break down the architecture and walk through step by step exactly how I built this so that you understand how you can do the same so with that said let me get into the demo and show you exactly what I made now in front of me I have an AI travel agent built purely in Python that goes beyond the simple use cas that you've probably seen before I'm not just prompting an llm and telling it you know give me the best itinerary for a trip this actually uses a ton of data realtime data as well as historical data to give you relevant replies that are actually real time so it can go and look up flight information using things like Google flights it can find all of the hotels and their availability and it can actually use a ton of historical data on restaurants reviews and attractions to give you a really contextually relevant response I'm going to show you how it works but all of the data for this application does actually come from Bright data who did sponsor this video now even if you don't use bright data services you can still create something like this and I'm going to break down all of the architecture in this video so make sure you stay tuned okay so let's actually just demo the application you can see that I have a prompt here I'm flying from New York to Bangkok on this date and returning on this date I need a cheap hostel under $20 per night with Wi-Fi free breakfast then some other requirements you know I like economy flights with layovers I want to save money Etc so as I press plan my trip here A bunch of things are going to happen the first thing that happens is it actually goes and parses the information from my prompt or my requirements into information that we can then pass to the model you'll now notice that it's actually opening up a browser and the reason it's doing that is because it's automating Google flights using an AI browser Library as well as using bright data's automated browser which you'll see in a second and it's actually scraping the browser using Ai and using llms to grab all of the relevant cont context so you can see that what it did is it punched in New York to Bangkok and put in the details that we specified and now it's actually going to the Google flight site scraping all of the relevant flights as well as their return flight details and then it's going to pass it back to an llm that will parse it for us I know this is super cool I'll share with you how I did this in just a few minutes but I wanted to kind of automate it here or put it locally so you can see exactly what's happening so anyways it's going here it's doing that sorry for the quick cut here but I did want to mention this browser could actually go and book the flights for you as well if you were comfortable giving it access to your credit card accounts all of that kind of stuff I didn't do that for this video just to keep things simple but you could totally build that you can see that we kind of get the loading state right here this does take a second just because I'm doing it locally on my own computer and then once we have that details it will go and get flights it will aggregate these replies hang on I'll show you so you can see here it found the flight details now so it just say completed now it's searching for all of the hotels now it's doing this using something called the Ser API from Bright data which I'll show you later and then it says it's going to put all of this together so it's going to grab all of this information and then pass it to the llm and generate our trip now the way I've done it is it just puts the trip here in this tab so you can see that it gives us all of the details like the outbound flight the return flight information on pricing as well so you can see here it tells us the price of the flight it's in durhams just because I'm in the UAE right now so that's why it's giving it to me the hotel price the total price and then it gives us uh kind of why it recommended a particular hotel and some more information now what I've done is I've set up a window here where I can just talk about this information so I can ask it for more details I can ask it to maybe give me a different hotel but then more importantly I set up this research tab now from here I can ask it a question like what expensive restaurants should I visit in Bangkok or something and this is actually a different AI asan has access to a bunch of tools it can actually go and search the web using duck ducko search or it can access again that huge data set of restaurants using retrieval augmented generation so we'll go go to the database really quickly look up a bunch of relevant results and then it can pass all of the results to the llm which can then filter them and give me the response so you can see here it says based on the available information you should go here here here here and if we go back to my vs code you can see kind of the reasoning that's going on here and it will show us actually the tools that it's using so here's the rag tool where it's going to the database and pulling information out of our data set and then it's also using internet search as well and kind of combining those results together sorry to give us that reply so there is a bunch of cool stuff that this can do obviously this is just a quick short demo what I want to do now is hop over to the architecture and explain to how all of the components work together so you get a sense of how you could build something like this that actually works and is a really cool tool so I just put together a bit of a diagram here to explain to you everything that's going on because it is a little bit complicated so let's start here with our travel summary and recommendation now the first thing we need to do for this tool to work is we need to give it some kind of prompt so if you see here I have this user input uh travel requirements and then after that what we do is we parse these requirements using an llm so just something locally or we can use like clae or GPT to get what's known as structured data so out of llms you can actually grab structured content and first we need to understand like the checkin the checkout so we can then use that later in our API requests so anyways we get some trip data parsed into kind of a schema that we want and I'll show you this in the code later on and then we have two things that we start running in parallel we want to start scraping or searching ing for Google flights as well as for hotels data now the idea is that we want to get this information so real time data stuff that's actually relevant like the flights that are actually available today not ones that were available many years ago and then we want to pass that to another llm which can then recommend the trip so Within These let me kind of break down how we actually do this so when we're looking for Google flights the way that I've done this and there's multiple ways to go about doing it is I've split it into two components the first thing that I do is I automate the browser manually using a framework called playwright now playwright allows you to do browser Automation and press on buttons and fill in fields so that's what I do to generate the Google flights page that has like you know the destination airport the origin airport and the dates that we're going to be traveling on now I combined that with bright data scraping browser and what this allows me to do is bypass any of the captas blocks or IP bands or mask my location so I can scan for flights in different areas now this scrap in browser is essentially a remote browser instance that allows me to again get past all of those issues by connecting to a proxy Network so that I don't need to do this locally from my own computer so I essentially send an API request out to Bright data scraping browser it then automates the browser and gives me the URL of that now automated page where I have the details filled in so you can actually see if we go to Bright data this is the scraping browser that I'm using I can just connect to it literally with a simple WSS string which I'm going to have to blur out here so you guys don't use my scraping browser instance and I can kind of view all of the logs and the pricing and all of that kind of stuff here anyways that's the first part and then what that does is automate the form submission for me so now I have this Google flights page now after that I don't want to have to manually scrape the data from the page because that's really tedious and I'll have to write a really Advanced web scraper and be like you know searching through the Dom manually and doing a bunch of stuff that just is really annoying so instead of that I'm combining again the bright data scrap browser with something called browser use so I just pulled up the page here so you guys can see it but this is free you can run it locally it's open source it actually outperforms a GPT operator now this is an AI automated browser it can connect to any llm that you want as well as to a remote browser instance and that's the framework that I was using where you saw kind of all of those boxes and the browser being automated so you can just run this on your own computer or what I actually did and what I've done in like kind of more production for this application is I've pushed it out so so that it uses the bright data browser remotely so you don't get any capts blocks IP bands all of that stuff that happens when you do a lot of web Automation and we can take advantage of a model to automatically scrape the page for us so that's kind of what happens uh we again use the browser work framework we use that with uh bright data and then we connect with an llm and then that actually goes and based on the prompt that I give it collects all of the information that we're looking for in terms of the uh outbound flight as well as the return flight and then give gives us that information so that kind of covers this step right here and then we have all of this flight data okay the next step is to get the hotel data now what I used for this is the Ser API or the search engine API which is a really easy way to collect data from search engines so in this case we're using Google hotels and we go to Google hotels or we don't go but bright data goes to Google hotels takes our search query and then grabs the real-time upto-date information automatically parses it for us and then just returns it so I literally just need to send a simple API request and then I get all of the real-time Hotel data which I then again pass into the llm I'll quickly show you this is the page for the Ser API if you want to check it out it works with multiple search engines not just Google and if I go to the playground here you can see we can do like Google Maps Google Trends Google reviews Google search all of that kind of stuff and you can grab all of those results so I just use it for hotels but you can use it for all kinds of pieces of data and the advantage is you just send an API and then you get all this data back that you can feed in into your AI model so then we come to the recommender llm this is just a simple llm that's set up with a custom prompt where it now takes in this information the Google flights the hotels data it also takes in our original trip data and the input travel requirements and then it generates a recommended trip for us that recommended trip includes the flights and all of the information that I put in the prompt okay so we get this information our recommended trip then what I do is I store this in a chat or memory context and for for llms you can set up kind of like a conversational buffer memory that's typically what it's referred to and you can add additional context to the llm so what I do is I take the recommended trip I put this in kind of a shared memory and then I have two more agents or llms that will have access to this so that they have kind of context on what it is that was just generated and can give me some better responses so for our travel assistant this one is quite simple the travel assistant is just what pops up after you um generate the trip we just take in the context the user's query and then we have a response this one doesn't have access to any tools or anything crazy it's just another llm that will give you kind of specific recommendations or allow you to ask questions about the information that was generated from the previous llm so just different llms for different tasks that have kind of a different objective that's the idea then what I have here is my research assistant now the research assistant is a little bit more complicated and this is what we call a tool calling agent so rather than being a simple llm just with a prompt where we inject some data into it in this case I actually give the agent access to tools so we might not necessarily need to automatically pull up restaurants or automatically search something on the Internet like we do when we're generating the recommended trip like they give a prompt and then we know we need to get flight and hotel data here we don't really know if we need to do this so what we do instead of automatically doing it is we set up these resources as tools we then give our agent which is really running by an llm access to these tools and when the user sends a query the agent determines what it needs to do and what tools it has access to and can go and use these various tools to generate the response so for the tools we have two things first our restaurant tool this again uses a database specifically a vectorized database or a vector search database so what I do is I bring in this huge data set from Bright data which has like hundreds of thousands of restaurant entries so actually let me dive in here and show you the exact data set that I use so you get a sense of what it looks like now this is the Google Maps business data set it has 65 million records so obviously we don't need all of these because we just want restaurants and this gives us like all of the businesses and what you're able to do is you're able to filter this data set with the filters right here and then you can purchase whatever portion of the data set you want and download it so in my case I just got a bunch of the top restaurants but if you look at it you can you know kind of see an example of an entry we have name category address phone number description open hours Etc and this data set gets updated every now and then you can kind of pull the updates and do all of that stuff that you want so we have a bunch of information here if we filter by top rated restaurants you can see it will take a second cuz it is actually filtering it and then it will give us all of the restaurants and that's what I did I downloaded a bunch of the restaurants and then used that so my AI had access to it in the uh llm or with the kind of vector search or rag use case which I was using there now obviously there are a ton of data sets here from Bright data I just search Google and you can see a few are popping up if I get out of the search should we just go back to web data sets data set Marketplace you can scroll through and there's like literally hundreds of different data sets with hundreds of millions if not billions of pieces of data so pretty much anything you need you can probably find here and in my case again I was just looking at specifically the Google Maps business one there's also some cool travel ones here like hotels listing Airbnb listing property information all of that kind of stuff if you want more of the historical data they also have a web archive so if you want the data that existed previously on the web you can get that as well but I didn't end up using that for this project because it wasn't really necessary anyways that's the data set now let's go back to the architecture then what I did is I vectorized all of this data so that just means converting it into something that can be searched essentially from your llm and really quickly in the database and then I use something called chroma DB this is a local Vector store uh database that you can run on your own computer and with all of these three things combined so the data set vectorizing the data and then throwing it in chroma DB I now have a vector database that I can query really quickly and pull out relevant replies so essentially what the agent will do is when I ask it something if it needs to use the restaurant tool it will generate a query string pass that to the restaurant tool it will really quickly look up all of the relevant restaurants based on that query pass it back to the llm and then the llm can use that information to generate its response now in addition to that I also gave it access to a Search tool from Lang chain which is duck ducko search Lang chain is the main kind of engine I'm using in the background to run on my various agents and I'll show you that in the code so that is how that works I know that was a lot of stuff but I wanted to kind of break down the highle architecture which I think is the interesting part of this project now what I want to do is I want to jump into the code a little bit and just show you some of the main components so you get a sense of how this is structured and by the way all of this code will be available from the link in the description so I'm in the code here and I'll just really quickly walk you through it in case you want to explore this on your own so you see that I have two main directories here front end and back end of course have some requirements and things that you need to set up the project now if you go to backend this is where I have a little bit of an API that is just proxying all of their requests to the bright data apis so that they're a little bit more secure realistically this doesn't need to be secure because it's just kind of a demo and it's like a cool project for myself running locally but I've just used a simple flask back end here you can see that I have a few different end points that I set up here for things like searching the flights um you know searching hotels kind of all of those operations and pulling the result you can also set up the results to be delivered to you by web hook but I just went with a simple polling method because I didn't want to deploy this on the internet anyways that's my main kind of app here for my endpoints and then in here I have two main things I have my Google flight scraper uh it's not really a scraper but that's kind of what I'm calling it and what this does is again those two steps where it goes automates the browser using playright you can see that I'm connecting to the browser here remotely using that WSS URL from Bright data now after the connection I can do some automation of the browser here so that's what I'm doing like like filling in and selecting the airports filling in the flight search details and then at the end this gives me a URL of that filled in page which I then pass to the AI browser and the AI browser does the scraping for me so I just automate the manual part so it's a little bit faster and we don't need to use the llm for that and then the more complex component I push to the llm so then we have uh this where I set up the kind of agentic browser with browser use I create the agent the llm all of that kind of stuff again right now I did it locally just so you can see what's happening but you could push this out to a remote browser instance as well so you can do it at scale and you can spin up tons of browsers and have this working for you know a bunch of users not just one person so that's kind of the Google flights as for the hotels like I said it's very straightforward all we need to do is literally just send a post request to that Ser API with the information that we want so I just create some query parameters with the data that I want to include for my flight search pass that here to the Ser API from Bright data and then we just set up a simp uh polling endpoint again just to kind of grab the data when it's ready again you can do this via web hook but I didn't want to set that up here then we have all of this stuff you can see like you can specify the occupancy free cancellation accommodation type and you can see I just have like a sample run here you know check in check out occupancy all of that kind of stuff okay so that's pretty much the flights there's a few utilities and you can see like this is the prompt that I'm giving the AI browser in terms of what it needs to actually do and how it should be returning the data to in a structured format so that's the back end and that handles the first kind of parts of our task where we're grabbing that data in real time and then after that I just set up some AI agents on my front end so I used a simple streamlit front end if you go to this front end code you can see this is all written in pure Python and it uses a library called streamlet which is super convenient and then we have a few of our agent tools here so if I go to AI you can see that I have things like some context so this is a context prompt that we can give to our models we just have a simple model here I'm just using Cloud 3.5 but you can use really anything that you want I set up a research assistant the research assistant uses some local embedding model from AMA uh this is running kind of a local model on my own computer and then we initialize some of the tools so you can see we have access to the Search tool as well as the restaurant data set tool or vector index and then we create the vector store here again bunch of code I don't want to walk through all of it and then we have the ability to query where we're just doing a similarity search pulling the top 10 results filtering them and then kind of passing them to the model and we have a schema this is what I actually parse the original travel requirements into so I'm able to in Lang chain pass this schema to my agent and say hey once you get this query from the user parse it and I want to have it returned in this format so that's how I've specified it here just in a Json format we have our travel assistant the travel assistant like I said has some memory it has access to that context uh that is kind of that paration information and then it just has a prompt that I threw in another uh model or another file sorry has a conversation chain so it can remember what you've said and then it can just give you a response okay we have two more things we have a travel summary so this just takes in all of that information from our Google flights and our Google hotels use as a model very simply here so self. model. invoke I just have a prompt here I pass in all of the context and it makes the recommendation based on all of those details then I have user preferences and this is where I set up that simple Lang Chang model with the structured output where what it does again is parse using that schema and then give us that data there's a bunch of other stuff here for example if I go into Data you can see actually the Json file that contains all of our website information it's kind of too big to format we have the restaurant database which is our chroma Vector database we have some utility files like downloading the data set via API from Bright data I have some constants for streamlet and then obviously the front en code and I just set up a basic API client as well to communicate with my backend now look there's a lot of code here I can't walk you through all of it feel free to download this and mess around with it if you want it is a super cool project and if you made it to the end I hope you found it cool as well if you guys enjoyed the video make sure you leave a like subscribe to the channel and I will see you in the next one [Music]
2025-03-12 02:56