in this video I'm going to show you how to build an AI agent that can control your browser That means it can go and buy something on Amazon It can book tickets for you or flights and it can even parse information into a structured format This requires very little coding and it's very beginnerfriendly but we will be using a little bit of Python in order to get this to work Now we're also going to be using a framework here called Browser Use All of this is free You don't need to pay for anything and you can use this framework with any LLM You can run an LLM locally using something like Olama or you can pay for an LLM like Claude or GPT or DeepSeek and have that be kind of the controller of this browser framework This is insanely cool You're going to learn a lot Please stick around I'm going to show you exactly how this works and how to build it step by step So first let's just understand what we're going to be using Now again it's called browser use We can just install it with pip And really what it is is a framework that allows an LLM to control your browser Now you can control a remote browser or you can control a browser on your actual computer And that's where it gets quite powerful because it can use the existing context that you have in your browser So if you're already signed into a website for example it can just open that up and it can act as if it were you which is an issue you have with a lot of the web scraping frameworks that exist out there There's all kinds of other benefits of it You can see that it actually performs better than chat GPT operator And the best part is it's free to run locally Now they do have some cloud service that you can buy but personally I've never used it or needed to so I'm not going to talk about that And in terms of the features you'll see when we start working with it but what it will do is kind of take like an image of all of the components that are on the website as well as scrape the DOM and then the LLM will be able to look at all of that and kind of make a decision on what it should do next So it can automatically click a button It can search for a page It can go back and forth It can open multiple tabs It's really interesting And again I'm going to show you in this video how to build it So stay tuned But first a quick word from our sponsor If you're in it then you know the grind of endless tickets constant firefighting and never enough time But what if AI could solve that for you well meet the sponsor of today's video CIS AIDS AI agent builder powered by Agentic AI Now this isn't just another IT tool It's an absolute gamecher And let me tell you why With zero coding you can build AI agents that automate repetitive IT tasks like onboarding new employees handling software access requests and even troubleshooting performance issues Imagine cutting down hours of work to just seconds with a simple chatbot command Now your IT team is not meant to drown in tickets And that's why the AI agent builder helps you scale efficiency without adding extra headcount so your IT team can focus on what matters No more late nights no more burnout just smarter faster IT service management If you want to see it in action then click the link below and schedule a demo today Let the AI handle the grunt work so that your team doesn't have to All right so let's go ahead and get started here by setting up our environment Now I just opened the quick start guide here from browser use because the docs are quite good And the first thing that we need to do is we need to install browser use and then install playright Now here it's telling you how to use UV to do this If you're not familiar with UV which I imagine most of you aren't then obviously that's going to be difficult to follow along with So I'm going to show you a little bit of an easier way to get it set up Now what I'm going to do is open the terminal Now you can open terminal command prompt depending on your operating system And we're just going to type the following command here which is going to be pip install and then browser use Okay Now this assumes that you have Python installed on your system already If you don't have Python installed of course you need to install that first Also if you're on Mac or Linux then you're going to want to make this pip 3 whereas I'm on Windows so I'm just making this pip Now some of you always ask me about virtual environments If you know about a virtual environment you can go ahead and make one here activate it and install it But I don't want to bog you down by showing you how to do that in case anyone gets stuck at that step If you have no idea what I'm talking about don't worry Again all you need to do here is just successfully execute this command pip install browser use Now the browser use is installed We're going to say playright install like this And when we say playright install that's going to install playright for us I already have it installed so I didn't get any output but for you guys you should get some output saying that it was installed Okay so that should be everything that you need From there we can clear And then what we're going to do is just make a new file Okay so I'm going to make a new file here in PyCharm I'm going to call this main.py And I've just put this inside of a folder that I called AI web scraper and that I spelled incorrectly You can put this anywhere that you want Use any IDE that you want I'm just using PyCharm because it's good for these types of projects Okay Now in terms of creating an agent you can see that it gives us some starter code So what I'm going to do is just copy this code right here However this isn't going to work right now because I haven't yet initialized my LLM So let me make this a little bit smaller and talk to you about what I mean by that So for browser use you're going to have some kind of LLM which is going to be running this framework Browser use allows the LLM to control the browser However you need to provide the LLM or tell it which LLM to use Now you have a lot of different options here You can use Claude or Entropic You can use GPT You can use DeepSeek or you can use a local LLM if you really don't want to have any of your information being pushed out to the cloud Now if you are interested in the local LLM setup I'm going to link this video in the description It teaches you how to use Olama and set this up on your own computer Again it's from my channel Once you have Ola installed then you can use the Olama setup steps from the documentation But if you don't care and you just want this to work as quickly as possible then follow along with what I do where I'm going to get a clawed API key So what I'm going to do here is I'm going to make a new file in the same directory as my Python file called env It's important that you name it that Inside of I'm going to come here and I'm going to copy this entropic API key And then here I'm going to paste my entropic API key Now that's going to require me going to Entropic the API console getting the key and pasting it here Now if you want to use a different provider again you can go to the documentation I'm on the supported models page and you can see if you want to use OpenAI then this is the ENV variable that you need If you want to use Azure this is the uh what is it variables that you need If you want to use Gemini this is the variable that you need And if you want to use a Lama I think if you go all the way down here it shows you how to do it here You can use chat oama and then just run it like that Okay So I'm going to go and get my API key and then I'll be right back So the way you get this by the way if you want to follow along with me is you go to console.andropic.com Then you simply go to API keys and then you can make a new key So you can press on create key here Press create key It's going to make a new one and then you can copy it Now keep in mind this can be a little bit expensive depending on how much you use this framework So watch your usage And again if you really just want to run it free and locally then you can use Lama if you have a powerful enough computer So I'm going to press on create key I'm going make a new one I'm going to call this browser use 2 because that's what I'm doing So browser use 2 And then I'm going to copy this key Okay So my key is inside of the environment variable file here Notice I don't have any quotes or anything surrounding it And that's exactly what you need for whatever provider that you're using Once you have that inside of here then you can take the sample code again that we copied from the documentation And we can actually just run this and see if it works But we need to adjust the model So we're no longer using OpenAI and we're using Antropic Okay So I'm just going to copy these lines right here I'm going to just paste this inside of here I'm just going to move a few things around So these first two imports we're going to replace with these ones And now we're using chat and Tropic rather than using chat open AI Now again I'm just showing you the copying and pasting because this is exactly how you would do it if you were learning from scratch And now you can see that your agent is going to be using this LLM which is now the Entropic LLM And if we run this code it should actually open up a new browser for us and then start completing this task Okay so what's going to happen by default is it's going to use Google Chrome to complete this And it's going to use something in the background called Playright which will kind of control the Chrome instance Now you can change the type of browser that you're using And there's a lot of settings you can go through which I'm going to show you in a second but it's very important that you close any Chrome browser instances that you already have open So in my case I have this instance So I'm just going to make sure that I close this And if you want to open some kind of web thing then open it in like Edge or a different browser in the meantime because if you have multiple Chrome browsers open then this isn't going to work or you're potentially going to get some errors All right so we're just going to test this just to make sure that it works right off the bat and then I will go through some more advanced settings Okay so we have everything installed I'm just going to press the run button here to run my code I'm assuming that you guys know how to run Python code at this point and it's going to take a second and it's going to open up the browser Now you'll see in the console that it's kind of telling us its thought process on what it's actually doing and it's visualizing everything that's going on So let's go out of this for one sec And you can see it wants to compare the price So it's saying "Okay there's no previous actions We're going to start the task." And then it's kind of walking through what it's going to be attempting to do here in order to figure out what the price actually is So it looked up the GPT40 price found some options here with OpenAI Azure etc And now it needs to find the price for what is the other one deepseek Okay so it's going to go and look at DeepSseek and see if it can figure out what the pricing model is And notice here that it's kind of automating the browser It's clicking through different pages I don't need to tell it how to do any of this It's just figuring it out by using the reasoning model Okay so let's pause here And you can see that actually failed three times here and said it could not parse the response Now it still gave us some kind of output here that we could go and read through or that we could print out Uh but sometimes it does fail right sometimes it's not able to actually achieve the result and that's just part of using this framework Obviously I gave it a very vague instruction because it says compare the price of GP4 to deepsek v3 but we could give it more detailed instructions on exactly what we wanted to do and then obviously the chance of success would be higher Okay so that's the absolute basics right that's how we get this set up You know we create our LLM we create some kind of agent we say you know agent.run We're running this inside of an asynchronous function And then we can print out the result But I want to show you how we can get this to use our own browser context because notice here when it runs this that it's actually using kind of like a fresh Chrome instance something like an incognito window And in my case because I'm in the United Arab Emirates you can see that all of the stuff is Arabic and it's kind of on the right hand side of my screen And obviously that's not my default Chrome instance My default Chrome instance looks different So if we want to really leverage the power of using our own browser which is the main advantage of doing this then we need to change a few settings So let me show you how we do that So from the documentation here if you kind of open up the side menu you'll see a few different options We have browser settings and connect to your browser Now if you're interested in connecting to a remote browser then you can check out the browser settings and it shows you how to do that with like a WSS URL But in our case we want to go to connect to your browser Okay So again make sure you close all of your Chrome instances And then what we're going to do is just copy a few lines here where we specify the Chrome instance path so it knows the executable that we should be running So in order to do this I'm just going to copy these lines We'll delete what we don't need We're going to paste this in here We can get rid of this We can get rid of this We can get rid of this And then let's take this import and let's just put it up here Uh and actually it already imported agent So we can remove the previous agent import Let's make this a little bit larger so you guys can see it And what we want to do now is specify the Chrome instance path Now if you are on Mac then you can probably just leave it like this However I'm on Windows so I need to change it to the Windows path that it's specifying in the comment So let's change that here to be the Windows path And then same thing for Linux It would be this one Again just copy it from the documentation That's exactly what I'm doing I'm just showing you how to do it And now we have our browser Now that we have our browser what we can do is we can pass the browser to our agent So we can say browser is equal to browser And then down here we can just say await browser.close and it will close that
browser instance for us So it will stop controlling the browser So now if we were to run this it's going to run in our own browser instance So we should see that it uses my dark mode theme and all of those kind of things So let's boot it up uh and see if we get that Now sometimes it will take a second to load the browser So do be patient And again if it's not working make sure you close all of the previous instances So here you can see it's using my browser Obviously I have all of my bookmarks You can see that this is like you know unique to me It's not just a random incognito instance And then it will start searching something So it finds a search window It's going to start searching We'll take a second Comes here Now I can probably read this a little bit easier because it's not using the Arabic version of Chrome And you get the idea So I'm going to stop running that for right now but you can see that now we're using our own browser Next what I want to show you how to do here is how to parse information into a particular format So if I just change this task right and I just tell it to go do something it's quite good at doing that I can just tell it like you know go buy me a book on Amazon for example and it probably can actually go do that because it will have access to my signedin browser So it can just load up Amazon can find the book and then it can buy it Now you probably want to be careful getting it to do something like that but it can It's very good at just kind of automating and clicking buttons But a lot of times you want to parse information and you want to grab for example like you know the five most recent Instagram posts or something that's kind of difficult to scrape if you weren't using an AI framework like this So I want to show you how that works And first let's see how we would do it if we're just kind of naively going through this So if I were to go into the task here and say you know go to tech with Tims Instagram and grab the five most recent post captions Okay cool So that's something that we can do So let's run this and let's see what the output is when we print this result Okay so we can see this finished running here It ended up loading on my Instagram and then if we look at it here it's giving us kind of this extracted content right so it says extracted content and then it has some text that says here are the five most recent post captions and then it gives us this information Now if we want to access the extracted content then we could print result dot and then believe it's extracted content and we'd be able to get like this kind of JSON that we're seeing right here However even if we were to get all of this text notice that it's in plain language right like it's not something that's easy for us to pass to a function or use as a parameter it's not in this structured format We want it to be in something that's consistent So we always know what it's going to look like and it will be easy for us to kind of parse through it and grab the information that we're looking for especially if there's a ton of data that we want So first let me show you what it looks like if we print the extracted content and then I will show you how we do the structured output Okay so I just ran it and you can see that I printed the extracted content and this is what it looks like So it's a little bit messy right it's a list that says you know input from tech with Tim Instagram sent enter key extracted page and then it has captions and this is what it extracted from that page And then inside of here we have again like there's all of these new line characters and it's a little bit difficult for me to parse this information and I don't know if this key is always going to be called captions So I want to adjust this to use a structured output model so that I always get this in a particular format Now while we're here though I do quickly want to show you that here are a few uh methods that you can use to grab information after the agent runs So you can get screenshots if there was an error for example or something The action names the extracted content that's what I was showing you And then things like the model actions the errors etc And then there's a few other ones here like model thoughts action results and final result which is what we're going to use in 1 second once we start setting up that structured output So for the structured output if we go here to output format in the documentation you can see a simple example that's very similar to what we're going to set up So let's start following along with this and writing out the code to kind of parse this into structured output So first we're going to need to import pyantic So we can say from pyantic import the base model And then we can bring in some classes So we're going to say class post Okay this is going to inherit from the base model from pyantic It's not going to be the same as this It will be a bit different So we'll just say something like the caption is a string and then we will say the URL is a string as well And for now we'll just get two pieces of information but of course we could extract as much complex data as we want I'm then going to say class post like this again inherit from the base model and then I'm going to say my posts is a list of post Now you'll see that list is being highlighted red and that's because I need to say from typing import list like this and when I import list uh now that will work Okay So now I have my paidantic models and what I need to do is I need to tell the model to use these in the output format Now in order to do that I need to use a controller So I need to import the controller from browser use So from browser use import controller Then I'm going to say my controller is equal to a controller and I'm going to say the output just like it has in the documentation here The output model is equal to post Okay so we have two classes One for an individual post one for having multiple posts So in this case you know posts we could have multiple things here if we want We tell it hey we want to output the parsed information from our web scraping session into this particular output model And then I need to pass the controller to my agent So I'm going to say controller is equal to controller Okay Now that I do that I'm going to change this to rather than be printing the extracted content I'm going to print the final result And when I print the final result we should get these pidantic objects or we should get a pidantic object with posts And then inside of there we should have all of these post objects So let's run this and see if it works We're going to go here and run Okay so it just finished running And you can see now that we get the expected result We have posts we have a list and then for all of the lists we have a caption and we have a URL Now the URL is incorrect for these posts Obviously it did something wrong Previously when I was doing this it was working properly But either way at least we get the output format that we were looking for So now if I wanted to start using this in like maybe a later step of my code I would take my final result So right so I would say you know my data is equal to result.final result I can kind of copy what we have in the documentation here So I can say you know parsed post is equal to post domodel uh validate JSON data Now what this method does is it converts the JSON which is what we're getting here right So like it looks like a JSON object into the correct Python object So internally we can now use this Python object Okay So from here we could say something like parsed posts and then this is going to give us all of the posts and then we could grab the first post and we could get the caption or something and we can start using this like normal Python object and kind of passing it to various other parts of our code Okay so that's it for parsing it into that structured format Very useful something you want to do a lot when you're web scraping Now I want to show you a few little kind of tips and tricks that can make this a little bit faster and that you definitely want to know about So first useful thing is the initial actions Now sometimes you want the agent to just immediately skip over some steps and go directly to a page or do something predefined So in this case you know they're giving the example going to Wikipedia and scrolling down so it automatically loads the page So you're skipping a few steps that the LLM would have to figure out and making this just cheaper and faster to run So in our case what we could do is we could set up some initial actions that say "Hey just go to Instagram like go to Tech with Tim's Instagram because we know that's what you want to do every single time anyways." And then from there you can kind of run the uh what do you what do you call it automated script So what is the URL for my Instagram it's going to be instagram.com/tech_with_tim or something Okay so I believe that's probably the URL So that's our initial actions And then we can just pass to the agent here Initial actions is equal to the initial actions And then it's just going to do that right so now it doesn't need to think about that It should just automatically go to that page Let's run it and test that out quickly and make sure that works and then we'll move on to the next one Okay so here you go I know it wasn't shown on screen but it just immediately went here to Tech with Tim and then it can start kind of continuing with the rest of the steps Okay so that's the first thing that I want to show you The next thing I want to show you is sensitive data So if you need to actually give some passwords or some information or an API key or something to browser use you want it to be able to use this in your own browser locally but you probably don't want this data to be passed to the LLM right you don't want the LLM that's not running locally on your computer to be able to use this information So like it's showing us in this example here we can have something like some sensitive data and then we can just pass sensitive data here So sensitive data is equal to our sensitive data And it now will be able to use the X username and the X password For example if we were signing into Twitter/X and it will not actually pass these values to the LLM it will only pass the name of these values So the LLM would tell browser use hey I want to use the X password in this particular field but it wouldn't actually know what the X password is So it's kind of like an environment variable or like a placeholder that actually gets passed to the LLM that's running in the background And you can see here that the model only ever sees Xname and X password It won't actually know what the the password is So you're not you know leaking your password or your credentials to the LLM Now if you trust that great If you don't I mean is what it is You probably just want to run the entire thing locally for yourself But I thought that was interesting and something worth noting because you could for example tell this like you know here's my username and password Go sign into this page and then go do this XYZ thing Okay so that's pretty much it guys Obviously there's a lot more stuff that you can do here I highly recommend that you look at the documentation That's why I popped it up for this video because it is very useful and that's exactly how I learned how to do all of these things If you enjoyed the video make sure that you leave a like subscribe to the channel and I will see you in the next one [Music]
2025-03-26 17:23