Google Gemini Flash 2.0: The Most Cost-Effective AI Model for Production Apps?

Show Video

Google Gemini recently released its Flash 2 .0 family of models, and they are really good, and I want to talk about them today. Now, I have the opportunity to play around with a lot of different AI services, a lot of different large language models for different use cases, and this one has me really excited. I can confidently say if you're building an application that sits on top of a large language model and you have high throughput, and this is something you are paying for, you'd be hard -pressed to find a better solution than Google Gemini's Flash 2 .02 .0 light. This is because of the combination of speed, quality, and pricing. The pricing is just really, really good.

So what I want to do today is spend a little time talking about Google Gemini 2 .0 and the family of models. We'll take a look at some pricing. But I think kind of a missing component of this for me has always been the way Google has kind of packaged all of their different services. So what I want to go through is an article that I wrote that kind of talks about the different services out there and which one you should choose. which scenario. I think this will help clear things up and then we'll go through and we'll build an application that uses Flash 2 .0, a simple one at that, but we'll see how to

get started with Flash 2 .0 in your applications. All right, so here we are over on X. Here's the official Google account kind of announcing the Gemini 2 .0 family with the new options and broader availability. First, you can see the models are 2 .0 Flash, 2 .0 Pro, and 2 .0 Flashlight. If you go through there, generally available across their AI products, AI products including Google Gemini API in the Google AI Studio and Vertex AI.

We'll talk about that in a minute. Both the 2 .0 Flash Thinking Experimental model and the 2 .0 Pro Experimental are rolling out in the Gemini app. Again, there are different ways to access some of these services, and we'll get to that. If you go down and kind of read through some of this, I'll leave this in the description below.

There is a blog post that you can kind of take a look at to kind of get some more information on this. Here's where it gets really exciting for me. If you go into the pricing models and you go into Gemini 2 .0 Flash, you can see the rate limits here, input pricing, output pricing right now free of charge, free of charge, up to 1 million tokens of storage per hour. And then they have grounding with Google Search. So if you want to be able to build search, into your applications without using like external tooling.

That's really exciting. The pay as you go pricing, here's where it gets exciting. So if I want to use this in my, in my applications and I'm using the API, the input pricing, 10 cents per one million tokens, 70 cents per one million tokens for audio.

So that's text, image, and video, and then audio. And then the output pricing, 40 cents for one million tokens. Again, I don't have a list here kind of comparing all of the different pricing, but this is really inexpensive compared to a lot of the other large language models out there. And some of the things that would stop me from using them in my apps, something like Anthropics Claude Sonnet 3 .5, which I love, I use every day, but using it in an app would get very expensive.

Same with something like OpenAIs 01, pretty expensive to use. So there's that, and then you can go in, again, that's kind of pay as you go. But if you go into something like OpenAIs, you know, but if you go into something. like Flashlight.

And if we go down to the pricing here, 075 for 1 million tokens in, 0 .30 .30 for 1 million tokens out. Again, extremely inexpensive compared to all the other large language models out there. So that's the exciting part. We know that if you go ahead and give it a test run and try out some different prompting techniques against it, you get some really good.

It's really fast. The quality is great. pricing really good. Now here's what I want to talk about. I wrote an article for this over on bytesized AI .dev, a free resource for you if you're interested. But I titled this Google's AI Tools explained a beginner's guide, making sense of Alphabet Soup. Because when I go to use some of these different models, I get really confused on all the different AI offerings that Google has. And I think once you kind of get this overview, then

you'll kind of know where to go depending on what you're trying to do. So I go through, I talk about Google Gemini, why it's important, why Gemini matters. But then we really start to look down here. So first off, there's Google AI Studio. This is your playground for AI.

Think of Google AI Studio is your personal AI laboratory. It's free. There's a web -based tool where you can experiment with Google's Gemini models without getting lost in all the kind of technical complexities. So key features, user -friendly interface for testing AI models. There's a gallery of example prompts to get you started. Code generation for your applications.

If you're using it in your app. Now, again, there's only certain SDKs out there. We'll talk about that. And a free tier for experimentation. If you want to use AI Studio, this is, start with the prompt gallery, see what's possible before creating your own prompts. So I have a link to that.

If you go over here, you are dropped into AI Studio. On the right here, you can, you see this. You can get some code. If you're using, using something like Python, JavaScript. You'll see there isn't a Java SDK here. If you go over, you can run this.

This is for newer models, use the Gen. AI SDK. So this will take you over to the documentation and then the Gulu developer API. You see right now that they only have a SDK for Python node and rest. So something like Spring AI underneath the hood could use the Rest API. Or you could use something like Vertex, which we'll talk about in a second.

But I'm hoping they have a Java SDK for this at some point soon. So we have this configuration here. You can go in and select your model. I love that when you kind of hover over a model here, you see the pricing. So input 10 cents, output 40 cents. What is it best for? What are use cases? Rate limits, et cetera.

You really kind of know what you're getting out of the box when you kind of hover over this. And then if I go to something like flashlight, you can see that. But oh, that's even, the preview of that is actually free right now. So we can see kind of the pricing for this.

We can adjust our temperature if we want. We can use different tools like structured output. We can execute code.

We can use function calling. We can use that grounding, grounded search with Google search. And then there's some advanced settings.

But I just want to do something here and kind of show off how good this is. So I want to create a. spring boot rest API using the JSON placeholder service slash posts API. Use the new rest client in Spring Boot 3 .2. And I'm going to go ahead and click run extremely fast. Like this is often running really, really fast.

Okay. But maybe it's really fast. It just doesn't produce really good quality. Wrong. If we look at this, I asked it to use a rest client. It knows how to use the rest client.

It's set up a base URL for JSON placeholder, build a rest client bean. And then here in our post controller, we have a request mapping to posts. I like that it's writing pretty good code here. It's kind of conforming to the rest style.

We're getting response entities in the response. We can get all posts. We can get a post by ID. We have a post. this service that uses the rest client to go ahead and get that information. I like that it kind of delegated that code to the service and didn't do that all right in the controller.

So for me, it's writing pretty good code. Maybe the only change I would make here is I might use a record. But again, I can kind of reprompt it to do that and make some changes if I wanted to. I love the explanation here, too.

It kind of, it just doesn't give it to you. It's really explaining to you, like, what code it wrote and, what why it actually did what it thought it did, right? So really good stuff, important considerations. Again, I think the combination of speed, quality, and pricing makes this a really attractive offer from Google, and I'm really excited about that. So that's the Google AAS Studio.

You can get an API key for this, but again, this is giving you an API key for the rest endpoint. Or the, I think, again, they have like a Python SDK and a JavaScript SDK, but nothing in Java yet. But you can kind of play around with this just using an API key. So there's also Vertex. So Vertex AI is Google's comprehensive platform for building and deploying AI solutions at scale. It's like having an entire AI factory at your disposal.

Key features, complete toolkit for building AI solutions, advanced model training and customization, enterprise -security security, integration with all the Google Cloud services. So when should you use it? Use vertex AI when you need to process large amounts of data require strict security compliance. So again, I have a link for that. That will take you over to Google Cloud.

Inside of there, there's Vertex AI Studio, and there's a whole bunch of kind of information in there for how to get started with that. Now, the only thing is you'll need to set up some credentials, and you'll need to set up a project. So I have a project called Hello Spring AI, which we'll use in our application in a little bit. So that is a vertex. And then there's Google Gemini, AI for everyday use. This is the consumer -facing version of Google, which brings AI capabilities directly into Google's products you already use. It's like having an AI assistant integrated into your daily workflow.

Really nice thing about this is it works within Google Workspaces. So if you're using docs, sheets, anything in that in that realm, even Gmail, you get this nice integration with that. Handles text code, images, and more available through the Google One AI premium plan and a simple conversational interface. So if we go over to that, this is Gemini. You can go ahead and pick whatever model you want to use. So again, I'm testing out Flash here. There's the Flash thinking experimental. So this is like their reasoning model. So if you want to go ahead and give it something a little bit harder to solve, you can kind of see its thought process there.

There's also this 1 .5 Pro with deep research, which I've been using a lot of lately. This will kind of help you with any research. So if I'm researching an article that I want to write, or a chapter in my book, or I'm researching a video to make, and I don't want to go through and do all the manual researching of going out to the web and finding all these different websites to cite or kind of kind of coming up with a plan. This does a really good job of this. So if you haven't had a chance to check out deep research, I would suggest doing so. But again, this is kind of the consumer facing.

This is where everybody else goes to kind of use the different Gemini models. And this is a great place to do that. So also in this article, making your choice, how can you make the choice? What are you, what are you using it for? There's some information here. And then a decision tree. So just that I'd share that article because one of the things that really kind of stopped me from using Google Gemini in the past was the packaging. Like I didn't know all these different services they had and which one I should use.

So for me, just kind of playing around as a developer, I'm using Google AI Studio. But when I want to build applications in Java, in Spring AI, using the Google Gemini 2 .0 family of models, I'm going to reach for Vertex AI, and we'll take a look at that in the spring documentation. And then finally, Google Gemini for your kind of everyday use and also integrates with a lot of the different Google products out there. One other thing I will say for Google Gemini that I really like, and I think this does this in the AI studio as well, you can come in here and pick a code folder. So if you have a folder of code and you want to upload that and then go ahead and ask a question, about that. You can do that right here in Gemini. And this is something that I have

written different services around to be able to do this for me. And it's really cool to see this baked right into Google Gemini. So finally, I want to build out a product. I want to build out a quick application that uses Google Gemini. We now know this is going to use Vertex, right? That's the solution that we need to kind of build into our applications. So if I go over to Spring

AI's documentation, and I want to get started with it, if we go under A. AI models and we go under chat models, we see there is a Google Vertex. And then we go Vertex AI, Gemini, chat. So you can read through this. I'll give you, you know, if

you want to go ahead and pause this and read through this, that's great. But you can see to get started with it, the prerequisites are that we can't just use an API key like we can with other LLMs. Now, again, I'm hoping at some point we get an API key in here that there is some type of Java SDK. But for now, we're using Vertex for this. And to do that, you have to install the GCloud CLI or

whatever OS you're running. And this wasn't that hard for me. I was able to get this installed and then just go ahead and set up a project ID. Again, back in Vertex, this is my project ID, my HelloSpring AI. And then I go ahead and authenticate with my credentials. So those are the things that

you need to do before you can actually build an application here. But once you have that up and running, you're ready to go. And then we can go ahead and get started with it. So what we can do is head over to

start .Spring .I .O. Create it a new project. I'm going to say dev .danvega. We'll call this Flash 2. And I'm going to use JDK23. And then we just need a couple of dependencies. We need the web dependency. And then we need to pick whatever large language model that we're using. In this case, we're going to

use Vertex. AI Gemini. And that will give us everything we need to create an application in Java using the Spring AI project. And this is going to be really easy and really fun to get started with. So what you'll

need to do is click Generate. This will download a zip file. When that zip file is downloaded, you can go ahead and open that up in whatever text editor or IDE, you're most comfortable using. For me, that is going to be IntelliJ's ultimate edition. With that, I think we

have everything we need. What are we waiting for? Let's write some code. All right, to get started. I'm going to come in here in Java. I'm just going to rename this. Refactor this to application.

And we'll go ahead and refactor that test as well. Good to go. All right. The first thing that we need to do is we need to set up some properties.

Whenever we're talking to an LM, we need to configure the LM that we're using. I'm going to go ahead and rename this to application. YAML and we will open that up and let's go ahead and set this up. So I need to set up my spring application name, and

this is just going to be called Flash. Now we need to set up the AI properties. Let's do AI. .Virtex .a .g. Gemini, and we're going to start with our project ID. Again, that was that Hello Spring AI project that I set up over in Vertex. So I'll say Hello Spring AI. Now, again, I've already done the Jeep, the Google Cloud, CLI, setting up that project, setting up my authentication, so that is all ready to go. So that's how I'm authenticating. The location for this is going to be US Central 1. Again, you can kind of get a

list of, oops, Central 1. You can get a list of those in the Vertex documentation. Now, from the chat perspective, we're going to set up some options, and I'm going to set up the model. And the model is going to be Gemini 2 .0 Flash 001. Again, this is something you can find. in the vertex documentation on the different model names that you're going to use.

With that, we have our configuration done. Now we can start to write some code. So I'm going to come in here. I'm going to create a new bean. This is going to be of type command line runner, command line runner, and this will return args. Yes. And in here, we are going to make our call out to the LM. Now, when we have a single LM configured like we do here with the the vertex AI, Spring AI will automatically, wire up a chat builder for a check client. builder for us. So we can say

check client. That builder and builder. And now what we can say is we can go builder. Builder. And that will create us a client. So with that client, I can go ahead and make a call out to Google Gemini's Flash 2 .0. I can say prompt. I'm going to pass it a user message. So this is the basically the query that we're going to ask it to do run. And so I'm going to say, tell me. And

interesting fact about Google Gemini. Okay? I'm going to use the blocking call. So this is the synchronous call. This will wait until the entire response is ready to return to us. You can also use the streaming model. So if you want to stream the response back and start to output it

as soon as chunks come back, this is if you're building, say, a chat interface and you don't want the user to wait. This is a really good option. But in this case, I'm just going to make a blocking call. And what I'm going to return from that is the content. That will give me the string response that we're looking for. So I can say response and we can go ahead and output that to the council. So if everything works, we should be able to run this application and see a response down here. So one interesting fact about Google Gemini is that it was designed to be natively multimodal from the ground up. This means like, unlike some of

other AI models that were initially trained on texts and then adapted to handle images or audio, Gemini was built to understand in reason across different types of information, text, code, audio images, and videos simultaneously. This allows it to process complex inputs and provide more nuanced and comprehensive responses. Cool. So that is really easy to get up and running with Google Gemini in your Java and spring applications. Just a little bit of to get you started. And again, I can confidently say that because the pricing is so good, the quality is so good, the speed is so good, that this is a really great option. If you're building applications on top of large language models that are going to publish to production and have to pay for because, again, pricing is really, really good. Now, I know that was an extremely

easy, simple application to build, but this gives you the foundation to start building applications going forward with Google's Gemini Flash 2 .0. So experiment with Flash 2 .0, Flash 2 .0 light. Again, really great for kind of building those applications that we're going to take to production. And I've really, I started experimenting with the experimental versions of Flash. Never really did a whole bunch of deep dives into it, but I found myself the last couple of days really diving into this and giving it some tests. And I'm really pleased with the performance and the speed and obviously the pricing is a big one for me here. So let me know in the comments

below. Have you had a chance to check out Google's Gemini Flash 2 .0? Have you had a chance to start incorporating it into your applications yet even with kind of the experimental versions that were out there? And if not, is this enticing to you? I know it is to me. Is this something you're

going to take a look at for building your own generative AI applications in Java? I had a lot of fun putting this together, friends. If you learn something new today, do me a big favor. Leave me a thumbs up. Subscribe to the channel. And as always, happy coding.

2025-02-10 06:49

Show Video

Other news

2025 world's strongest handheld laser 2025-06-05 16:44

Nvidia Shrugs Off China Concerns With Upbeat Forecast | Bloomberg Technology 2025-05-30 11:26

What's new in Flutter 2025-05-27 08:21