today we'll look at generating images using generative AI but we'll specifically do it through apis I'm going to go a little bit deeper into Del 3 which is one that I used finally I'll do a code walk through but in addition to that I'll show you some of the cost comparisons between Del and a few other options that I was cont I looked at imagine 3 I looked at stable diffusion 3.5 and I also looked at gy images Genera offering so these are all good options each one of them has their own pros and cons specifically for my use case I decided to go D 3 and I kind of walk you through how I went about that process so now to set the context we are actually building the Deep text as app which is a place where you can get pated AI articles lot of interesting stuff that happen in the AI world all of that brought to your fingertips now while we able to Source a lot of good articles and we able to you know find the right articles to show and curate it the challenge is that sometimes many of the Articles don't have images that come along with them especially things like research papers but to make it a little bit more attractive in the app we thought why don't we just just quickly generate nice image that is representative of the article so for this we have to give you know the article title maybe a little bit of a description of the article and that should be enough to allow us to generate an image I think it's a very good starting point especially if you have not done any kind of coding with generative AI before or generative a apis this actually is a very nice simple use case for us to do so let's get started so let's take a look at some of the different image generators I'll go through a few of these and I'll share with you my reasoning for why I chose one over the other I I start with di 3 I'm basically giving the same prompt to you know most of these image generators we show you kind of an Apples to Apples comparison of how it works now di 3 from openi you know some of the reviews out there were very positive for this it's generally a very intuitive I think image generator prompt that you give it's able to make good sense of it and it's of uh interesting image of of that secondly it also has inbuilt Gils these prevent know any kind of controversial or problematic images from being generated which is actually a good thing for me because I then don't have to do additional checks to make sure that you know the Imes are safe to put onto our I'm sure many of the others also like Gemini and imagine have this but for now I thought okay D 3 is kind of taking care of that I don't have to worry I tried using it in chat gbt you can generate up to two images for free every day and overall it kind of gets it right on the first try so I thought okay let me let me look at this the only first downside is that you do have to have a paid version if you want to start dating just using secondly this is imagine three uh gin now this is not yet available to everybody I mean you have to get onto an allow list which means you have to request access First Imagine 3 overall I felt it was also quite good it does tend to be a little bit cheaper than the others so that's positive uh third I took a look at met AI uh this is of course open source and you know meta AI GMA 3.2 actually is known for its fast image in capability right and you can even edit it in real time just using but that is not something that I needed right now so it was you know that's nice feature to have but not of great use to me I was able to test it out using M and WhatsApp as well as this and I thought again images are quite good here but to be frank I think using it on hugging face or maybe from Vortex from Google where it is also available that is a little bit more comers I also looked at gy images now they provide you indemnification so which means that you will not get suited for any kind of copyright violations that come from it because they have already handled all of this so that is a real plus point however I think for us that is overkill I don't actually need that kind of support their images and all that are quite good given that their entire model was trained on um stock images you do tend to get stock image kind of output from this so again it was not very interesting and to be frank this was a little bit expensive of all the different options that I reviewed this was the most expensive uh so I thought maybe I'm not going to go with G Imes for this and finally stability AI now actually these guys do have a really really good uh image generator they're one of the first people to come out with generators the stable diffusion 3.5 is
what I was looking at it has a lot of positive reviews it also provides a lot of control through the API itself they do pay a lot of attention to typography typography is nothing but T that is added into the images if you see this image this we're leaving for the future is something that they've asked for it to add and it has done a good job so uh stability 3.5 is really interesting I think even in terms of cost it is very competitively priced the only challenge is that again I didn't want to I mean I couldn't actually try it out because if I had to use stable assistant I had to take a subscription just to just to get access to the free trial also and I did look at some implementations on hugging face but those were really slow so I didn't get a good sense of what exactly I can do with this but I think my biggest challenge with this is the company stability AI right now has a lot of know kind of management problems you know if they shut down tomorrow tomor then again I have to come back change a lot of the things I built into my app maybe that's headache that I didn't want to take like so overall I decided that let me just look at di itself but the cost of course there is an implication and we'll take a look at that here's a cost comparison table for each of these of course these are going to be kind of approximate itself uh but at least it should give you a sense of where each of these CH there was no pricing given specifically for Tama 3.2 I think it's basically if you're going to host it know what is your cost of hosting and generating of all of these gy images tends to be the most expenses 10 to 15 cents per image is the is the expectation di 3 is also kind of expensive it approximately comes up 8 cents in that way Google is quite good stability AI is really good it's hardly two cents an image for me I actually felt that maybe open cost is fine because at 8 cents an image I might use it like you know a few times a day to generate you know a couple of images so that cost is okay for now if our volume increases and it starts to become more expensive then I look at maybe mov to imagine three maybe even stable diffusion I have selected D 3 so I'm going to go to open aai and create my account and start using you come to platform. open.com get started this is the starting uh page you can come down to the capabilities part and see all of the different that they have image generation will give you an overview of um how you go about it what is the code like fairly simple uh and yes some tips on you know how you can do all of this to start using this there's no way to do it as a free version you'll have to go and do a sign up first right create your account add some credit and then you can start so you can click sign up to go ahead and create an account so now that we have set up our account we can go ahead and start um using this so before you do that we need to go to settings and go to billing and here you can go ahead and add your pment method I did a little bit of exit here I think minimum is $5 um to be able to get started if you want to add information about the company GST code all of that you can go to preferences now after youve done this you are actually ready to use it you can go to API keys and this will give you uh your particular key if you need to generate you can go here and create a key yeah and you just copy this key be careful with your keys obviously don't uh make it available in the public if anybody else gets your keys you will actually end up having to pay the money for that and also once you've created a key you can't see it again right so at the time that you create key that is the time that the good part is you can actually go ahead and generate Keys as many times as you want there's uh no real challenge to generating a key multiple times as no cost once you have done that you can go here to get access to the API sence and we'll take a quick look at it right now under images you do have different options if you want to edit an image programmatically want to create variations you can do all of those things but now I'm just focusing on the create image a these are the different parameters to make the function call model is the first one you can specify di 3 or Di 2 default will be D 2 N is actually the number of images to generate so you can at least with d 2 you can has to generate multiple images between one and D 3 only allows one for my particular use case one is actually all I needed so that's totally fine di 3 also allows you to specify the quality uh your options here are standard and right so the default is standard response format you can get it as a uh Json or you can get it as a URL default again is URL this is how it sends it back I'll show you what that looks like when we actually run the code and sizes now size when it comes to uh D2 it pretty much generates only Square images right so 256 by 256 or 512 by 512 or 1024 by 1024 which is the default now when it comes to di 3 you can actually generate Square image or you can generate a horizontal image of 1792 by4 or you can generate a vertical image of 1024 by 1790 so this is actually an improvement that we have in Del uh Style again this is I think something very specific to Del 3 you can ask for vid natural viid actually creates hyper realistic uh images and user now this is for cases where you allow users your users to create uh images and you want to check you know the usage might want to set rate liit and finally what this API returns is a list of image object now let's get started with the code walk through so this is a very simple program to generate an image taking the you know title and a snpp it or from the article as the um this is you know the usual set up from open AI import open a here's how you set up your key a secret key you file you would put that secret key and you access it through this and you create an instance client equals open and you can actually start to use using the client so we're going to Define this generate article image function what it does is it takes the image from it takes the width and the height defaults like I said are you know we are trying to create horizontal is 1792 by 1024 if you don't have a prompt then just return that a prompt is required otherwise you can it's as simple as you know this it's on line function right so what we have done here is client. images. generate you can basically find
this in the open API documentation itself so the model I'm using del3 I'm giving the B uh that we had created and nals 1 that's actually default anyway quality I'm just going with standard because I don't really need an HD HD quality image and finally the size width and the height putting that in send it out I get the image URL it it's basically you know it's a list of objects so from the first object I'm accessing the uh URL and return the image this is basically how it gets used uh so I've taken the article from crunch this is about an AI song generator which can provide completely legal and licensed music and the first line from that article is basically this it talks about you know two companies that are coming together to create the song generator um and you know planing to share Revenue with right for uh I menu setting these I create this image prompt basically I'm just joining these two article title and the spp and then I'll prate to just to show you guys what the prompt looks like and then uh I'm going to call my generate article image function with all of these parameters and finally we will print out the image URL so it's quite simple so let's go ahead and run this so you'll notice that this actually takes a little bit of time for it to generate but if you can see PRT is basically this combines the title and the um snippet from the art so yeah that's how much time it took uh what we can do now is we can from Di and this is the kind of image that generates pretty good right it's actually pretty good so let's see what if you can actually improve upon this yeah the other way to do this what I actually wanted to do this prompt is actually very simplistic right it's just a title maybe a little bit of a description in my particular instance the article snippet is an optional parameter so I don't really need to send it even if I just send the article title it'll now di 3 actually is much smarter imag generator so the way it has been designed is once you get the prompt internally it rewrites the prompt to make it a little bit more intuitive and then that is given to di if you want to use your prompt as is in the prompt you actually have to specify and you'll find this in the API documentation you can actually specify saying that use this as is I want to test this out right so something like that with this kind of a prompt it has generated a pretty decent image let's see what I can do if I add a little bit more intelligent SPS so I've defined a different so it's more or less the same code but I've created another function here which basically generates a prompt for image generation right so let's go back up it's as you can see it is more or less the same thing we set our open key we have the generate artic image function that we just saw right and here's my new function so what I'm going to do here is I will pass the article title and article spp and I'm going to use gp4 here right so I'm telling GPT 4 I'm giving it this prompt to the API saying given the following article title and snipet if available create a detailed and creative prompt suitable for Di to generate an image that represents article point so uh I've also given a little bit more instructions to gp4 I said that make the prompt detailed about 30 words of the prompt should be specific uh you know to the scene the characters the objects all of those things and another 30 words is a little bit meta it's about the overall style of the image No it should be hyper realistic photographic and I've also mentioned that the target audience for this image is AI developers and then I call this the chat completion uh I it's G dpt4 I have specified Max tokens and then I send the prompt as the message and I get that back I've just added one additional thing here saying do not use any words or typography in this and the reason why I do this is because you know sometimes typography is not very great there are some spelling mistakes the font looks a little weird which I didn't want so I asked it not to use any words or typography in the if you recall stability actually does a really good job of this with stable diffusion 2.5 and then I've also put in one last thing saying if any part of the prompt goes against any of your policies or just ignore that prompt and continue to generate an image as best as right because they do have D 3 has pretty strong guard rails so I just want any image to get generated that is representative of the article I don't want it to fail just because there might have been something a little controversial in the article and Val decides that cannot uh generate an image for us okay so let's see how this thing is used again it's the same article that I'm using uh article title is the same and the article snippet is also the same image dimensions are the same and now basically I'm going to use those two to call this generate a prompt and this image prompt that I get I'm going to then pass it to my generate article image function let's try this out and see what we get as a so now you'll see it is running The Prompt generator with gp4 this itself takes a little bit of time but now look at the quality of the prompt that has come out here right it's a lot more detailed it talks about you know it very uh vividly describes the scene it's scene depicting presentators from these two different companies shaking hands over holographic display it talks about the background it talks about you know the creation of licensed music now D may not be able to do justice to all of this but this actually is a much nicer form right oh and it's already generated uh the image for us so let's take a look at it let's see what this is actually pretty good it actually talks about how you know these two different companies and you know it is about music it's done a good job of listening to the actual prompt right so the prompt gave a lot of insight what all of these things by just using a much nicer promp you can get a much more Vivid image not that this image is bad this is also a very nice image but you see I don't particularly like this kind of font this is similar style but uh it's a more vivid imagination a little bit more creative of course note that there would be an additional cost using gp4 through the API right so is it worth the cost that is something that you have to for those of you who are really new to this if you're not sure about you know what is all this how do I do any of these things the easiest thing to do again is to just ask Genera to do this for I'm actually using cursor so I asked cursor here itself um can you generate code for me to use di 3 I told it specifically what I want I'm going to give it title of an article I'm going to give it description of an article then you write the code to generate an image using uh D 3 and it did a pretty good job right you can if you're not using cursor you can of course use any other ID that you want you can try this s go to either chat GPT and ask it to generate the code or you can ask Gemini to generate the code or I actually found that anthropics CLA does the best job just a quick note here I I'm sure this is just an issue at this point in time uh Daddy's function called syntax has actually changed I think this be client. image. create and now it is cent. images. generate the funny part was when I used open itself to generate that code it generated it with the old API and for some time I was struggling and trying to figure out why is this not working it should be fa straightforward when I looked at the API reference it actually showed that hey this is the new code that should be used and simultaneous I also you know gave that same error message to two or three different gen models and Pla was the one that actually diagnosed it and gave me the correct solution so just something for you to also be aware of sometimes when these errors occur it's a good idea to just go and check with code generators or with chat GPT and all of these things they can actually diagnose it well for you so if you're a beginner don't get over by all of these things just get started off with writing some code or by asking the code generator to write the code for you it's a good way to get started and it feels nice when you actually start to generate these images and text and get get the solutions if you found this video useful please consider subscribing to our channel in addition to kind of introductory videos like this we will continue to post a lot more code walkthroughs we'll have interviews with experts uh I will also be sharing career advice on how you can navigate this entire e space and we do meet every now and then with are deep div on technical subjects and we will share recordings from those also thanks for watching we'll see you next time if you're serious about a career in AI follow deep Tech Stars on your favorite social media platforms
2024-12-26 18:33