Google Gemini 2.0 Just Shattered Industry Expectations! (OpenAI Defeated) - An In-Depth Exploration!

Google Gemini 2.0 Just Shattered Industry Expectations! (OpenAI Defeated) - An In-Depth Exploration!

Show Video

Today Google unveiled Gemini 2.0 a groundbreaking AI model designed for the agentic era imagine an AI that doesn't just respond to your commands but truly understands your world an AI that sees hears plans and acts in this video we're diving deep into the features of Gemini 2 dot 0 showing how it's set to transform industries reshape our lives and redefine what AI can achieve whether you're a tech enthusiast a developer or just curious about the future this is a video you don't want to miss Gemini 2.0 enables new kinds of multi modal AI agents agents that can see and hear the world around you think plan remember and take action Gemini 2.0 is what enables Project Astra our research prototype of a universal AI assistant using capabilities like multi modal memory and real time information it lets you understand your world live what can you tell me about the sculpture the sculpture you're seeing is called My World and Your World by Eva Rothschild located in Lewis Cubit Park in London what kind of themes is her work explore she creates abstract sculptures that invite viewers to interact with and interpret their surroundings in new ways Project Astra is multilingual using native audio to seamlessly switch languages as you speak he did London in Gulag China down Norway we're taking this to the next step with Project Mariner with agents that can complete tasks on your behalf you can ask it to do things like this that require multiple steps doing research to look up the artist then finding the painting and then shopping for the right supplies the agent uses the web to take action and find what you're looking for it plans and reasons at each step always staying in your control and completing your task Gemini 2.0 enables agents that can be applied across domains we're exploring agents that can explore virtual worlds assisting you in video games where do you recommend I attack from in the space given the base layout I recommend attacking from the bottom or south side this direction allows you to target the town hall directly with your giants while the Wizards can handle the surrounding defenses Gemini 2.0 can reason about the physical world

understanding 3D spatial environments and objects these capabilities are being applied to robotics with agents that can assist you in your everyday physical environment learn more about how Gemini 2.0 is enabling the next generation of multimodal AI agents at Deepmind Google slash Gemini to understand the true potential of Gemini 2 dot 0 let's explore Project Astra Google's prototype for a universal AI assistant it seamlessly integrates tools memory and multimodal reasoning delivering unparalleled capabilities all right we've got a Pixel phone loaded with the latest test build of Project Astra and we're going to leave the office to test it around London I've got this email with apartment Info can you tell me the door code and remember it I'll remember the door code for you it is 1170 to unlock it enter the code and then press the unlock button based on these symbols how should I wash this the tag shows to machine wash at 30 degrees Celsius avoid bleach and tumble drying iron on low and dry cleanable then what setting should I use on this machine based on the laundry instructions of 30 degrees Celsius select the 30 setting under Easy Care on the dial I've got this list of recommendations can you tell me a bit about some of them I can search for these places for you Cold Drops Yard is a shopping destination with unique industrial design Luminary Bakery is a social enterprise bakery and Tate Modern is a modern and contemporary art museum now we're going to try Project Astra out on the street let's see what it can do I've always said Scone but now I'm hearing people say Scone can you help me with the pronunciation it's a fascinating topic the pronunciation of scone both scone rhyming with gone and scone rhyming with bone are accepted pronunciations though regional variations exist what can you tell me about the sculpture the sculpture you're seeing is called My World and Your World by Eva Rothschild located in Lewis Cubit Park in London what kind of themes does her work explore she creates abstract sculptures that invite viewers to interact with and interpret their surroundings in new ways uh would these do well if I planted them back home mountain fleas can grow well in New York they prefer full sun to partial shade and moist well drained soil so your garden in New York should be a good fit for these flowers I'm trying to get a gift for my friend and this is everything she's been reading how would you describe her taste in books based on these titles your friend seems to enjoy a mix of history biography and perhaps a touch of political or social commentary I picked some of these out any idea which one she'd like the most consider half of a Yellow Sun for its exploration of complex social and political issues within a historical setting aligning with her interest in narratives that offer insight into significant periods will that bus take me anywhere near Chinatown yes it does the 24 bus route goes through Leicester Square which is very close to Chinatown are there any landmarks on the way riding the No. 24 bus notable landmarks you will likely encounter are Westminster Abbey Big Ben and Trafalgar Square I also met up with some friends to test out the multi lingual capabilities now I'm using these protec glasses to test Project Astra hands free can you check what's gonna rain anytime soon it looks like it is currently mostly sunny in London rain is expected tonight from around 10 p m until tomorrow morning at 6 a m hey what's that park next to me the park looks to be Primrose Hill known for its panoramic city views and the notable Shakespeare's Tree can you check if I can bike in there it appears that cycling is actually banned within Primrose Hill itself although it is permitted in nearby areas and the rest of Regent's Park I'm biking back to Camden can you check if there's any supermarkets along the way there's a Sainsbury's on Camden Road a horisons on Chalk Farm Road and an M&S Simply Food on Camden High Street what was the door code again the door code you saved previously is 1170 today I want to tell you about Project Murder while Astra enhances your day to day life Project Mariner takes productivity to the next level by automating complex tasks seamlessly through the Chrome browser it's a research prototype exploring the future of human agent interaction and is built on Gemini 2.0

like with all new technology it's important for us to build this responsibly which is why we're starting small we'll be getting feedback from a group of trusted testers and using their experiences to really shape how Project Mariner evolves let me show you how it works so Project Mariner works in the browser as an experimental Chrome extension I'm going to start by entering a prompt here I have a list of outdoor companies listed in Google Sheets and I wanna find their contact information so I'll ask the agent to take this list of companies then find their websites and look up a contact email I can use to reach them this is a simplified example of a tedious multi step task that someone could encounter at work now the agent has read the Google sheet and knows the company names it then starts by searching Google for benchmark climbing and now it's going to click into the website you can see how this research prototype only works in your active tab it doesn't work in the background once it finds the email address it remembers it and moves on to the next company at any point in this process you can stop the agent or hit pause what's cool is that you can actually see the agent's reasoning in the user interface so that you can better understand what it is doing and it will do the same thing for the next two companies navigating your browser clicking links recording information as it goes you're seeing an early stage research prototype so we spent this up for demo purposes we're working with trusted testers to make it faster and smoother and it's so important to keep a human in the loop after the fourth website the agent has completed its task listing out the email addresses for me to use and there you have it we're really just scratching the surface of what's possible when you bring a Gentig AI to computers and we're really excited to see where this goes next this capability transforms tedious tasks into seamless workflows freeing up your time for what truly matters like shopping more efficiently and focusing on what you enjoy most a demo of Project Minor a research prototype we built with a new Gemini 2.0 models let me show you how it works I'm going to start in the Gemini app but Project Minor is an experimental Chrome extension that works across all the pages in this demo I'll be prompting the agent to find the most famous post impressionist find a colorful painting of theirs on Google Arts and culture then add some colorful paints to my Etsy cart in Gemini the agent starts by asking who the most famous post impressionist is Gemini quickly identifies Van Gogh as the most famous post impressionist painter from here it navigates to Google Arts and culture by typing in the URL into the address bar you can see how this research prototype only works in your active tab it doesn't work in the background it opens the search bar to look for Van Gogh Van Gogh is the first option so it clicks on it now it decides to click on the artist page to look for a colorful painting we're able to see the agents reasoning in the user interface throughout the process so we can easily follow along to understand what it's planning the agent realizes that pays only contains information about Van Gogh so it decides to scroll down to look for artwork it discovers a collection of art and starts looking at them to find a colorful option the agent used its multimodel understanding capabilities to find a colorful painting of irises and clicks on it after finding this painting it navigates to Etsy by typing in the URL it clicks on the search bar and searches for colorful paints so now it's looking at different paint sets and the agent gives us a sense of what it's optimizing for as you can see it's optimizing for price and visual appeal the agent clicks on a set of watercolors that appears to be a good fit it's then added to cart after moment the agent asks me if it should proceed to check out this is an early research prototype but it's a great example of how we are keeping a human in the loop and ensuring you're always in control I just respond by saying there's no need to check out and now it completes it Sask and Hands Back Control Project minor is now available to a select group of trusted testures as an experimental Chrome extension they're giving us feedback to help us build a safely and responsibly we're excited to see where this technology goes and what people trade the first feature I want to highlight is Gemini's live streaming capability let's dive into this demo and see it in action this is a quick demo of multimodal live streaming in Gemini 2.0 and we are casting a tab into AI Studio um can you see my screen yes I see your screen it's showing a document with the title demo it also contains bullet points and a Google meet window with a person in the video is there anything in particular you would like to talk about on the screen yeah can you start the demo by just reading the text I highlighted sure I can do that the text you highlighted reads the Multimodal live API lets you build real time multimodal applications powered by Gemini 2.0 flash it features tool use like Google search and code execution function calling and more would you like me to read the following lines too no can you actually just tell me what this word means though I can do that multimodal refers to the ability to process and understand different types of data like text images and audio altogether does that help yeah um I need to demo the interruptions feature next can you tell me a really boring kind of once upon a time sort of story okay I can do that once upon a time in a small village nestled beside a quiet forest there was a very unremarkable rock this rock was gray and it was round and it sat there to interrupt you no problem I understand that interruptions are part of the demo what should I do now um so we're gonna demo memory can you just summarize everything we've seen or heard so far I can do that so far we've discussed a demo of the Multimodal live API which lets you build real time multimodal apps and I define what multimodal means I was also asked to tell a very boring once upon a time story but I was interrupted is there anything else uh no I think we're done um do you wanna read this image output in Gemini 2.0 sending card or should I I can read it if you would like the ending card says start building with Gemini 2.0

a I c o dot Google dot com is that correct yeah that's great alright thanks talk to you later but Gemini 2.0 isn't just about efficiency it's also a powerhouse for creativity enabling you to bring your ideas to life Gemini 2.0 can now natively generate images as part of your conversation let's say you wanna imagine how it might look to turn this car into a convertible with today's AI tools this takes a lot of work you might have to manually mask out this roof area or write a really complex prompt we brought this image into AI Studio and sent it to the new experimental Gemini 2.0 flash model this is a real time screen capture and all the prompts you'll see in this video are things that early testers can now try and this is the entire prompt we gave it turn this car into a convertible and it generated this image here they are side by side notice how it successfully modified the car but also managed to keep the rest of the image consistent this is difficult in many AI tools that use separate models but here is all being done by one model and we continued the conversation saying imagine the car full of beach stuff and change the color to something that feels like summer explain as you go the model began outputting text it explained its idea for a new color then showed it to us but the really neat thing is that the model kept going it went right on generating another image with the car full of beach gear remember that this is all a single response of text and image tokens coming from the model this ability to output across modalities interleaving text and image together is one of the most exciting aspects of Gemini 2.0 here are some more example prompts and outputs all coming from Gemini 2.0 flash let's say you wanna make this photo look a little bit nicer by getting rid of all this stuff on the couch just give it this prompt and the model does that for you you can even combine images you can ask what your cat might look like on this pillow or on this skateboard all of this enables you to co create with AI in new ways we gave this image with this text on the side of the box to the model and we said open the box generate an image looking down into the contents of the box we sent that prompt to Gemini 2.0 Flash

this approach where you send parts of your prompt in the image itself opens up new possibilities for how we could have more seamless multi modal back and forth with the model the model reasoned about the text on the side and generated this image of old electronics you can push this even further co creating imaginary worlds together we went back to this first conversation in the video we wanted to try communicating visually with the model so we drew a circle on the door handle with the prompt open this the model successfully figured out what we meant and it generated this image with the car door open and for the last prompt we said make the car fly imagine you are the car soaring through the clouds to the beach show what that might look like this is a challenge to see if the model can keep the car consistent while also visualizing this new scene that I'm imagining and the model was able to do it following my instructions generating this image while Gemini excels in creative applications its ability to understand and work with spatial relationships and 3D reasoning is equally impressive showcasing its versatility in tackling complex tasks that require a deep understanding of physical spaces and dimensions what you can build with spatial understanding in Gemini 2.0 we introduce this capability in our 1.5 models and we've advanced it even further with Gemini 2.0 this is a new tool in AI Studio that makes it easier to explore spatial understanding with multimodal reasoning for example you can input this image and prompt it to give you the positions of the origami animals this is a real time recording and notice how fast the results came back that's because this is running on our new experimental Gemini 2.0 flash model which enables advanced spatial understanding with low latency you can see if the model can reason about which shadow belongs to which animal by asking for the fox's shadow and the model finds it over here or ask it about the armadillos shadow it finds that too spatial understanding enables new ways to interact with images it lets models do more than generate text about an image Gemini 2.0 can search within an image

you can give it this image and see if you can find both rainbow socks the model finds the matching pair you can even ask it to find the socks with the face and it finds those two this was a particularly neat result because you'll see the faces are really small and obscured like all models it won't always get everything right so you can try your own prompts to see what works for you you can combine spatial reasoning with multi lingual capabilities you can give it an image like this and prompt it to label each item with Japanese characters and English translation the model reads the text from the image itself and translates it with spatial understanding Gemini 2.0 enables AI agents that can reason about the physical world for example you can give the model this photo and ask for the position of the spill but then ask how it would clean it up with an explanation and the model points out the towel over here and with Gemini 2.0 we're introducing 3D spatial understanding this is a preliminary capability still in early stages so it won't be as accurate as 2D positions but we're sharing it so the developers can try it and give us feedback here's a collab notebook that lets you prompt the model to give you 3D positions within photos then we visualize those positions in a top down view essentially turning the photo into an interactive floor plan another standout feature is Gemini's ability to produce native audio which is a groundbreaking advancement for enabling smooth natural and multilingual interactions it's pretty amazing take a look at the demo about native audio so Gemini 2.0 introduces multi lingual native audio output but maybe you're thinking hmm what exactly is native audio you're actually hearing it right now everything you hear in this video was generated with prompts like you know this actual prompt on your screen like right now it was all generated by prompting the new experimental Gemini 2.0 flash model like you see in this AI Studio screencap neat right totally native audio is really really neat it's different from traditional TTS or text to speech systems like what's super cool with native audio is you can do more than just prompt an AI agent on what to say you can tell it how to say it you can prompt it to just be like dude you know just totally chill or prompt it to speak with oh so very many dramatic pauses and all of this is multilingual you know how when a computer switches languages it sounds like a different voice that's a limitation of traditional TTS but with native audio in Gemini 2.0

you can build agents that switch languages more seamlessly check this out okay so I'm starting out speaking English but then Macno many Hindi with native audio maybe information retrieval could be more expressive like what if AI agents could tell you the weather differently on sunny days maybe they sound like this oh the weather today is 74 degrees and sunny all day awesome but if it's rainy it might sound more like this so the weather today is pretty drizzly and cold all day oh well or what if AI agents responded dynamically to your context maybe if it seems like you're in a rush your agent can speak really really really really really quickly or maybe if you're whispering you might want your agent to whisper back at you anyways you get it so many possibilities with native audio new output modalities are available to early testers now with a wider rollout expected next year start building with Gemini 2.0 at AI studio dot Google dot com okay okay wait I know the video supposed to be over but let me just show you a few more prompts we could do that last line more like start building with Gemini 2.0 okay start building with Gemini 2.0 um I guess start building with Gemini 2.0

yeah Gemini 2.0 isn't just for end users it's a powerful tool for developers looking to create custom AI applications Gemini empowers developers to build personalized domain specific agents with ease you can build with native tool use in Gemini 2.0 Gemini 2.0 is natively built to use tools like code execution and Google search here's a demo that combines tool use with real time interaction built using the new Multimoto live API make a bar graph that compares the run time of the Godfather and Oppenheimer add the other two godfathers to the graph pick three random superhero movies and add them to the graph notice how quickly the model responded that's because this is powered by our new experimental Gemini 2.0 flash model

it's able to search and code while you interact in real time one really neat thing about this demo is how easy it is to set up the Graph Renderer all you have to do is describe what that tool does like this and the model using the simple description you give it figures out how to render graphs all by itself we've open source this demo along with collab notebooks to get you started you can also explore tool use in AI studio let's say you have to research New York restaurants and you need to look up the phone number address and write a brief description for all of these places we gave this prompt to the 2.0 flash model and look how quickly it does that for you the model is writing search queries retrieving info and putting it into this table it even provides citations and links so you can explore further you can customize and control how tools are used in your applications you might want to use Google Search for some things and not others you can write instructions like these telling it to only use Google Search for questions about sports then if you ask something like this the model will answer without using search but if you ask this it'll use Google Search to answer just like you instructed it to this ability where the model decides when to use the right tool based on your instructions is one of the strengths of native tool use in Gemini 2.0 these are just a few examples of how you can build your own multimodal AI agents with Gemini 2.0 coming up we have a detailed demo of all you're using an AI agent powered by Gemini 2.0 to play squad busters this real time unedited interaction demonstrates the AI's ability to interpret the user's video and audio inputs along with internet source data all without requiring extra training or direct game integration hmm hey listen I'm about to start a game and I want to make sure I complete the weekly quests can you see them yes I can see them looks like you need to pick up 300 gems and dust 10 boss monsters you've already got a head start on the 300 gems cool um I'll ask you to remind me about it during the game cause I keep forgetting about it absolutely I'll keep an eye out and remind you about the quests thanks you got it just let me know if you need a reminder anytime um hey I'm about to pick a character um can you do a quick search in Reddit what is the meta Gemini 2.0 isn't just an AI model

it's a leap forward in how we interact with technology whether you're exploring a city automating tasks or diving into creative projects Gemini is there to assist let us know in the comments what's your favorite feature of Gemini 2 dot 0 how do you see it fitting into your life don't forget to like subscribe and stay tuned for more updates on cutting edge AI technology

2025-03-08 06:01

Show Video

Other news

Primitive Technology: Belt and pulley blower 2025-06-09 11:38
The Engine of Our Dreams Exists. It's a Clean, Powerful, Supercharged and Rotary Valved Two Stroke 2025-06-01 10:00
Nvidia Opens AI Ecosystem to Rivals, Apple AI Struggles | Bloomberg Technology 2025-05-25 18:40