okay so earlier this week open AI had an impromptu video stream to basically announce a bunch of new things that are coming to their apis now over the past month or two we've seen them release a bunch of things like their first foray into agents we've seen GPT tasks we've seen the Deep research agent we've seen them launch operator which is their take on to the computer use and stuff like that but up until now all of that has been aimed at at consumers and perhaps Enterprise but not developers so in many ways this latest announcement really fills in the gap for developers and creates apis for a lot of the things that open AI has released on chat gbt over the past few months so there are four things that they announced at this event three of which I'm going to go through and then I'm actually going to keep the agents SDK to do a whole video about that itself where I can basically break down how it works and we'll look at building some agents and using it for doing different tasks Etc so the three other things that they announced were the responses API they announced a bunch of built-in tools three Tools in fact which actually look really cool which I'm going to go through and then also they've announced their own observability tool for doing tracing and for monitoring what's actually going on with the calls that you're making to models Etc okay so let's take a look at the sponsors API and to start off with this we need to revisit a little bit about the history of the open AI apis actually the open AI API has kind of become the standard format with most language models supporting it you've got Gemini supporting it you've got AMA supporting it lots of people are actually supporting it even though those models may not be open AI models it's kind of become the de facto standard so if we look back in history this is sort of like the third iteration of this API so originally they had just straight llm calls and in fact behind the scenes all these apis really are just basically converting whatever your input is into a set of numerical tokens which then go into a model now sometimes the sequence that you send it will cause it to have special tokens in between whether that's for like conversations or tools or things like that but really the different API endpoints are a way to make it easier for developers to use this so the initial sort of way which eventually became called the completions way is where you just pass in a string of text and the model will then basically condition on that to generate more text out this goes back all the way to the gpt3 API and even how people use gpt2 and stuff was in this way after chat GPT came out an open AI realized that there was a big demand now for these kind of chat apps they created the chat API and the difference there is you've now got sort of identities and different kinds of content going in so rather than just pass in one single string you would pass in a dictionary perhaps with a system message with a user message with an assistant message you could just extend this list of dictionaries as you went along and that worked out very well for chat apps it was often quite frustrating if you weren't building a chat app and you were trying to work out okay what's the best way to do this and gradually while all the apis were supporting the chat API not all of them were supporting the original completions API and this brings us to the responses API so in some ways this is sort of unifying a new API endpoint which is going to make use of sort of completions but also be able to make use not only completions but of tools of settings of a whole bunch of different things not only just for pure text coming back but also for things like image generation for darly and search generation for their new web search tool Etc in here so the main idea here is that it gives you this Simplicity of sort of chat completions but make it easier to use things like tools and to use what up until now has been in the assistance API okay so if we come in and look at their docs we can see the new responses API is going to support a wide variety of things so the simplest way is just sort of text input and then getting your response back for that but then also you'll be able to do image input be able to do web search which we'll look at in some code in a bit be able to do file search for things be able to do function calling where you'll just pass in the tools and pass in a tool Choice Etc and even things for like the reasoning models where you'll be able to pass in an argument where you tell it the effort for reasoning Etc and then if you want to stream that response back you're able to stream that response back as well so you can see that in some ways this new API is just sort of standardizing some of these calls and making it simple to basically put these things in here as well as the text that we want to response back to Etc so what does this mean for the existing apis in many ways the original completion API has gone away and the main API has been this chat completions API and you can see that they're basically saying here that they're not going to take this away at all that this is going to depend on models so it does look like more we're going to have sort of certain models that are more geared towards chat and certain models that are more geared towards specific activities perhaps things like the web search and maybe even things like reasoning in the future will be a separate call to a specialist model rather than just the general chat calls but you can see here that the chat completions API is not going away at all that said the assistant API is a totally different ball game here and they're planning to deprecate this in sort of mid 2026 so it's still a long way off that said I actually get the feeling that this API doesn't get used that much at all compared to the chat completions API and that they're trying to rethink exactly how to do this now you would hope that the new requests API gets things like the code interpreter tool over time so that we can still actually use some of those tools and just use it via the new API Etc now speaking of tools let's jump in and look at the three new tools that they announced okay so the first tool that they introduced was web search and originally ID actually expected this to basically come out in the 12 days of shipping back in December this is basically what powers chat gbt search behind the scenes they've always been quite vague as to what search engine they're using how they're using it Etc I suspect that it's probably Microsoft's Bing but they may be supplementing it with other search engines like Brave which is something that a lot of these llm provider companies that are providing search are actually using at the moment okay so to see this in action we can come over to the playground we need to use the responses API in there and then we select a tool now the tools we've basically got function calling like we've always had we've got file search which I'll talk about in a minute and we've got web search now you'll see that when we click web search we've get some choices where we can select the country we want to focus the search in we can select a state a city a time zone and then we get this search context size so they talk about this is controlling how much context is retrieved by the model unfortunately this doesn't seem to control like how many links you get back or anything like that that is not necessarily something in here that we can control anyway we'll select high in here and we just basically add that tool I can select a system prompt if I want to I can select either GPT 40 or 4 mini for this they're the only ones I think currently that are working with the web search tool and you'll see that if I come over and ask what did openai announce on March 11 it's going to go off do the search and then give us the return result now you can see that we've got a natural language result in here about what they actually announc but then we've got this link in here and so if we click the link it will take us to the relevant article that it's basically Source the information from now you see that basically we get a couple paragraphs of information if we come in and look at the code we can see actually what we got back would have been all this output textt and currently all I'm seeing is that sure enough we do get the link back the link is in there but we have to pass it ourselves out of the text doesn't seem like we get a nice field with the actual Link in it hopefully that will change over time we can see okay what search context size we had Etc but basically what we've gotten back is just this as text which is then being rendered out here so this will definitely be useful for agents being able to go and Source information going to be able to look for real time information open AI is not the first one to do this Google has already done this with grounding with Google search and I would say that each of them have their strengths and weaknesses so with Google you're also getting the keywords that they actually use to run the search and you can actually then click out to the actual searches and see the searches on Google yourself but with Google's one you don't get the direct links you get basically links to Google which will redirect to the other site so in that way the open AI one is actually a little bit nicer that you're getting the links even though you're going to have to pass them out of the text yourself but then you can basically send an extractor to go and get the information out to basically rag over that information for future queries that the user might have as well all right if we look in here they've got a whole bunch of nice fancy charts about how this helps I think pretty obvious nowadays that searching the internet is going to make an llm smarter if you needed a bar chart to know that here you go it is interesting to look at the pricing for this so the pricing starts at $30 per th000 calls so basically 3 cents a call on the GPD 40 model and 25 on the 40 mini that said it goes up to 5 cents a call for the one that we just did which was the high context size in there so you probably want to test it yourself but maybe GPT 40 mini with search preview high is going to be the one to go for I respect for most agents Etc in that you really want to have a sort of as much context as possible that can fit in there so you probably want to go for the high context size it's not like it's returning back 10,000 tokens it's just returning back in the hundreds of tokens here all right so the next tool up is file search and this one is kind of interesting because in some ways this was already in the assistance API before but here they've refreshed the API to make it easy to be able to upload files and then to be able to do queries over those files and you can see that here is an example in their JavaScript API of where they're basically going through they're making a vector store they're uploading different files Etc and then they're basically just doing a query on that now one of the interesting things that they mentioned here is that they have sort of revamped this to be more of a general rag solution now so you can actually use what you upload as a vector store so you got to wonder is that a threat to competitors like pine cone like chroma who've been making these Cloud Vector stores now you can actually do it all within a single API without having to worry about the embedding model without having to worry about other things and stuff like that it's going to really come down to sort of testing to see just how good is their Vector store system now interesting they've also added metadata to it so it is really looking like a nice out of the box very quick and easy to get going rag system that you can basically just upload what you want and then use it here now I guess is when you actually put this into use you're going to find probably some limitations with it I think it's still early days for this but it is really nice to see when you come in and look at their docs that it looks like it's also giving you some citations back where we can see file citation we've got the index we've got the actual file name Etc in there that it's going to give you information back you can also customize the retrieval to basically set things like max number of results or have metadata filtering and stuff like that and you can see that the actual types of supported files here is quite a decent amount of files we've got everything from code files right through to PowerPoint files HTML files PDFs docs Etc now they do mention that some of the limitations are that you're limited to 100 gabt and you're paying for them to store that data as well it's really important that you understand you're not just paying for the calls you're paying for everything that they're storing as well in there but I got to say that price is actually pretty reasonable I feel in here and for the ability to just have sort of an out of thee box very quick to use rag system this is pretty cool this is something that we haven't seen from the providers that we would have expected to see provide something simple and easy like this like a lot of the cloud providers Etc so the third tool that they released is computer use and this is basically the technology that's behind their own sort of agent which is operator and operator lets people type in a task and then it will go off and then use a browser and the internet to try and complete that task now this is something that's only available on the chat GPT Pro Plan currently and also only available in the United States at the moment it's very important to point out that what open AI is actually releasing as an API is not the computer environment itself it's just this CUA model so you can pass a screenshot to the ca model or with a command Etc and that model will then decide okay you need to click at this place or type in something in a certain place and it'll give you the coordinates for that and then there the application has to actually run that on a browser in a particular environment unfortunately open AA is not making their computers in the cloud and their browsers and their Docker instances in the cloud available to us what they're doing is just making the model itself available so we can see when we look in the docs for this they've got a whole section about how this actually works and it's quite nicely explained it's basically the same as what we've looked at before with things like browser use theanthropic computer use Etc you got to commend open AI for not trying to reinvent names for this but just basically take the name of whoever came out with this first and use it so we've got computer use from anthropic being used here just like they deep research from Google but the nice thing in the docs is that they do have sort of example code of how you can do this it seems to be built to basically run either on a Docker instance or to run locally via playright and that's something really nice that browser use basically had was where they've basically got a lot of this code working my guess is that we will see some startups pretty quickly providing the browser in a Linux box in the cloud that you you will be able to run this remotely to get things done so I do think it's really interesting to sort of see and especially as we seen over the past week or so products like Manis have really sort of taken off which in many ways is just using things like the open source browser use I think they actually confirmed that on Twitter this week but for me that really shows that these tools are powerful and when you get the prompts right and you get the orchestration right you can get really powerful results out of them now they have a bunch of benchmarks in here that show that this model is doing state-of-the-art performance Etc but don't be fooled even they claim human oversight here is recommended so this is still very early days for using these computer use models you probably don't want to let it go with your credit card your Twitter password your bank login Etc because it may run a mark and that's not what you really want to happen that said if we look at the pricing for this I kind of feel that it's actually very reasonable that okay we're going to be passing in images in here so that will probably use more input tokens than just normal sort of text but we can see based on this it's probably not one of their newest and biggest models that they're using here this is only $12 per million tokens out so on the whole that's pretty reasonable and from memory even cheaper than the Sonet 3.7 model which is what they're using for computer use there okay so as I mentioned earlier on in the video I'm going to do a whole video on the agents SDK I think that's definitely a really interesting thing and it's probably one of the biggest parts of this whole sort of launch in here the last thing I wanted to sort of just cover in this video was the integrated observability tools to trace and inspect agent workflow execution so tools like Lang Smith and many others there's quite a number of startups that are actually doing this that allow you to trace and store all your calls to the llm and things like tools and stuff like that so you can actually see what's going on certainly Lang Smith is one of the nice ones for this it allows you to basically track everything going in and out obviously it was originally built mostly for Lang chain but I think you can just use it sort of out of the box no problems another one that was launched late last year was from the pantic team that launched logfire and there are many others out there one of the ones that I personally like a lot is Phoenix by arise AI this is a fully open-source system that you can host yourself I'm a bit reluctant to go and give my prompts because nowadays prompts really are your intellectual property for how you build agents and the structure of how you build agents I'm reluctant to give those two things to some of these companies even if I think Lang chain and pantic AI are actually reputable companies I think there are a lot of other people out there that are basically doing these tracing and evaluation Suites at the same time that they're building agents for commercial customers Etc this brings us to open AI who certainly is out there building agents for big companies and charging lots of money to get this done so I must say I'm a little bit reluctant to basically just give all my sort of agent structure and stuff like that in this nice way but maybe that's just me being a little bit too suspicious definitely looking at the product it looks quite nice would have been nice if they had open sourced this so you could have either run it on their system or run it externally but you can see here that what it's actually doing is it's able to trace all the calls to the llms and to which llms and also to be able to trace what your actual tools are doing so if you ping an API if you run some code locally something like that you want to be able to check that back so that you can see which parts of your agent are taking a long time to run which parts of your agent are falling down Etc and also one of the cool things about these kind of tools is that you can export some of these calls out to use to F tune and train models so that is really powerful if you can basically work out all the sort of calls that your system is going to make and your agents are going to make and then over time you build up a very nice data set you can then go and use that to find a model specifically for your use case to both help on saving costs on tokens but also to be able to get more consistent outputs so you can actually see which runs are working well and which runs are not working well in these kind of things so open AI now has this logging and traceability stuff in here it's going to be interesting to see how this takes off certainly for a lot of customers this is going to be a really key thing in that if they just pinging open AI all the time not pinging anyone else and they really want to basically know okay what calls are working what agents are working Etc this is the kind of tool that will help out for that all right so to finish up this announcement definitely is a big announcement from open AI in many ways I think this is actually more important than the GPT 4.5 model that we saw with this announcement we really see that open AI is stepping into the whole building AG game and allowing developers to build agents like I said I will cover the agents SDK in its own video but I really think with the tools that they've announced with everything that they've announced here we're definitely seeing open AI start to Mark out its territory in this new agents world all right so the agents SDK video should be out in a day or so I'd love to hear from you what you think of the various tools which tools do you think you'll end up using a lot yourself what do you currently using for things like orchestration I'm curious about that as well and do you trust open aai to basically give them all your traces and your logging of not only the llm calls but of tools and the whole ability of how your agents actually work anyway as always if you found the video useful please click like And subscribe and I will talk to you in the next video bye for now
2025-03-19 08:47