AI on IBM z17, Meta's Llama 4 and Google Cloud Next 2025

Show Video

What percentage of enterprise data is unstructured data? Kate Soule is Director of Technical Product Management for Granite. Uh, Kate, welcome back to the show. What's your estimate? This feels like a trap, uh, without, you know, just a, a wild guess. I'm gonna say 40%. Shobhit Varshney is Head of Data and AI for the Americas.

Uh, Shobhit tuning in live from Vegas. Uh, what do you think? 200%. Have you seen the quality of structured data in companies? All right, great.

And last but not least in joining us for the very first time is Hillery Hunter, IBM fellow, and CTO of IBM Infrastructure uh, you've got an advantage on this question, but I don't know if you wanna offer your guess. Yeah, I'll, I'll take the midpoint there. Uh, not exactly the midpoint, but uh, I'll go with 80%.

Okay, great. So the answer is 90%. Uh, we're gonna talk about that today and all that, and more on the. Very 50th episode of Mixture of Experts 50th episode. Crazy and welcome for Woo-hoo.

Yeah. Woo-hoo. I'm Tim Hwang and welcome to Mixture of Experts. Each week, MoE brings together a talented and just lovely group of researchers, product leaders, and more to discuss and debate the week's top headlines in artificial intelligence.

As always, there's a ton to cover. We're gonna talk about the Llama 4 release, Shobhit's in Vegas. He's gonna tell us all about Google Cloud Next.

Some really super interesting research coming outta Pew Research. Uh, but today, uh, we want to take the opportunity because Hillery is on the line with us, uh, to talk about IBM z, which is a new launch that just came out on I believe Tuesday. Um, and it concerns mainframes. Uh, and so I guess, Hillery do you wanna just start for listeners who are less familiar with the sector, what is a mainframe anyways and why is it important? Yeah, I'm, I think first fun fact is "z" stands for zero downtime and mathematically, that's kind of an interesting conversation.

We talk about the system now having eight nines of reliability and the way that you, you count those nines, as you say, it's 99 point and then six more nines. So that's how you get to, it's a lot of nines. Nines resiliency. Yeah. But it means just a couple hundred milliseconds, a year of downtime on average. And so, you know, when I talk to family members or I, I meet someone socially.

I kind of say we work on building the computers that you don't see and that you just sort of assume are there and never think about. And what that means is really, this is where most of the world's financial transaction volume, everything from things in the market to your personal credit card transactions go through it in the back end. And you hopefully never think about whether or not that computer's gonna work or your credit card transaction goes through. These are these systems that we just all assume are up all the time.

And so it's really kind of at the core of. The global economy, to be honest, that's really not an exaggeration. Yeah. What I love about this is like you work on like arguably the hi, some of the highest of highest stakes computing. Um, and I think one of the most interesting things about the launch is that I know AI is a, a, a big part of this launch in some ways.

Um, I know there's sort of z17 which is the mainframe, and then there's "z" sort of the software, which sounds like kind of IBM pushing into the idea that these, you know, 6 9s of reliability computers. Really are gonna get, you know, sort of integrated into the overall sort of AI revolution, which, you know, we talked about on the show before. AI, you know, is, is is not always kind of like production. It, it like sometimes, you know, messes up, it's stochastic, it has all sorts of randomness. So curious to hear a little bit more about like what's getting launched on the software side and I guess how you kind of like get AI to work at like such a high level of reliability that I think most software developers never even need to think about as they're kind of vibe coding or whatever. Yeah, it's, it's a pretty different space, but it's equally fascinating, I think to that whole vibe coding kind of space that a lot of folks are interacting with now on a daily basis.

Um, from a technical perspective, getting things done in transaction means having millisecond level AI and that means super, super fast, tightly integrated, being able to handle billions of transactions a day, um, and being able to score things at line speed, right? So. Again, anecdotal sort of example, if you're talking about fraud and analytics in the credit card transaction processing space. If I as a consumer am buying something online, it's okay. There's minutes to hours before the thing gets shipped out, you know, so fraud can happen offline, but if it's in a store and somebody's trying to rip you off and buy an expensive phone or something like that, at Best Buy, you wanna make sure that instantaneously, the moment the transaction goes through, that it's detected as being fraudulent. And so there's actual real economic value and consumer value to being able to score every transaction in real time. The interesting thing that we're now talking about being possible on this next generation of mainframe is multi-model AI.

So a really small, fast compact model that's running there, right on the processor, dealing with this massive transaction throughput. Maybe occasionally it has low confidence in the scoring it provided, and it needs to be backed up by a bit more robust, complicated model, and so we're putting extra AI cards called the Spyre card into the system to enhance not just being able to do that super fast processing on the processor itself. But also do fast processing, one step slightly removed and adjacent on a PCIe attached set of cards. And so we've just multiplied the AI capacity, um, and throughput for the system. And also then from the perspective of then the total system experience on the software side, like you said.

We now something called Operations Unite, which is AIOps driven AI chat driven interface to everything going on in the system. So observing, remediating issues, all happening in a totally modern interface. So it's pervasive once you put the AI capability in. It's not just about the workloads running in the system, but also how people use and operate and keep the whole thing stable and healthy. Yeah, that's awesome.

So Shobhit I'd love to bring you in. I, I know I launched this episode with a question about just. How much unstructured data, uh, enterprises are sitting on, and I'm sure this is a problem that you have to deal with and that you talk about with customers day in, day out. Uh, I know that's a component of this launch, but curious if you want to just opine a little bit on kind of how the world is evolving there and I guess how the Z launch sort of fits into some of those questions. Uh, a big fan of, uh, of, of the Z Series and I grew up in a cloud, first AI, first world, and I have so much respect for understanding the right balance between where mainframe should be playing versus where the clouds are, right? So as an example, working with a very, very large bank where we leveraging cloud environments with a lot of different GPUs and compute behind it to train the models. But once you have fine tuned the models to enterprise data, you wanna go bring it where the transactions are happening.

And these are sub milliseconds, right? Very, very quickly. You're having doing this, and you're doing billions and billions of these every hour. So you want to bring the AI inference as close as possible to where the transaction is happening. In the first wave of doing unstructured content analysis, you would have some large language model that summarizes a call recording or starts to do some knowledge search and things of that nature.

Now, in the next wave, once we've proven out that this technology is working, you wanna do this in more mission critical workflows. For example, when fraud detection happens, like Hillery was mentioning, there's a lot of, uh, patterns that we need to look for. It's not just that one transaction that happened. You also need to look for how that transaction was. Was, uh.

At that point of the transaction happening in sub milliseconds, the larger models have a lot of latency. You can obviously not afford to have that, that data go out to the cloud and come back A, security issues and B, the latency and, and, and other things. Right? So you, you, we are in the world where we see. A lot of our larger fortune hundred companies move from experimenting with large, uh, frontier models that are API calls to then fine tuning smaller open models and bring them close to the compute. So I think the Z series works incredibly in this space. And we also have the brand permission with Z. They're like, what, Hillery What?

90% of all credit card transactions happen on Z and 90% of the Fortune 50 banks rely on us and whatnot. Airlines, retailers. So you're on the mission critical workflows.

This is no longer, Hey, let me ask the prompt a different way, right? So you're not experimenting, you are doing this in, in more critical workflows. You know, I love that you went to latency. I think one of the things related as well to that whole leaving the system is the data security model, data sovereignty, all those other really hot topics.

And so I think also bringing AI to where that data is and where that mission critical data is, where that valuable and sensitive consumer and personal information is, is a big part of this conversation. I, I think one other thing, in addition, again to latency and then that data protection is also the energy. So we've greatly increased the AI capability and the overall capability of the system, but drop this whole system generation to generation by 17% in the power consumption, and the team has measured that it's about five x more efficient to do that AI in place where the data is. Then, to your point, calling out to some external system.

So these days everybody's running outta power, looking to take out more data centers base, all that other kind of stuff, and being able to do AI so efficiently, I think is a, is a really exciting step forward. And Hillery just, just about a month back, I was with one of the largest top three credit card companies and we were having this, uh, concern around fraud detection and said, uh, we can obviously do a lot of LLM work to understand patterns, right? It's not just a spot in time. And even a month back, we struggled to bring models that are LLM models. In real time transactions 'cause it just sub, sub millisecond and stuff.

And I was just so proud that in the last week we, this week we've been able to go after those use cases that we couldn't even, even a few weeks back. Right. So we are coming to a point where clients understand that they've proven it out inside of their enterprises that we can use LLMs and we've trained them in a particular way, but latency was coming in the way of us doing this work. A lot of our clients are just huge kudo to your team to doing this right. I think you bring enough AI and, and to your point, the creativity just explodes.

Every developer in kind of this core of the enterprise space is now, oh, that's now for me. That's not something for people elsewhere in different environments. It's now insurance claims processing, even medical image assessment. There's all kinds of amazing things going on on that core data.

'cause AI is also for those people and for that data and for that context also. That's super exciting. So Hillery, before we move on to the next topic, what, uh, what comes next for you all? Yeah, so the capabilities with Spyre come out in 4th quarter.

There's rolling set of announcements on the different software enhancements, and I think the way to think about it is we're making these systems AI through and through, like I kind of mentioned, you know, starting back even in z/OS 3.1, the last release there was AI inside things starting to look in that direction of self-healing or, or sort of automation of management, of the efficiency of the system. Uh, what we've stated about z/OS 3.2, which is gonna be coming out is, is even more integration of that smartness into the core of and the heart of how the system operates, and then how operations teams experience it and going all the way out even into our support staffing. So.

If you call IBM for help with something, now we are also using watsonx technology to help those agents who are helping you with your mainframe. So that's a project that we started with in our technology lifecycle services organization with our storage products. And we're, you know, we've announced now this week that we're also bringing that to mainframe support. So that whole experience end to end, how the system runs, what you can do on it, what you understand about it, and then how somebody helps support you is. All gonna be AI enabled.

And I think that end-to-end in full stack story is, is just really exciting. This is us living what we've been talking about with the power of AI. This is awesome. Yeah. So we'd love to have you back on

the show as things unfold here. I think it's a, like a segment of AI that we haven't talked as much about, but I, I love it personally just 'cause it is like this kind of very high stakes thing. You really gotta get it right in these domains. And so, um, you know, it's a kind of AI, almost engineering that you don't really see in a whole lot of other.

Which is really exciting. So I'm gonna move us on to our next topic. Uh, Meta has released LLlama 4 a long awaited release in the open source space.

Um, there's three models that they've talked about. Two of them actually announced, uh, the Scout model, the Maverick model, and the Behemoth model. Um, and it follows in a pattern that we've seen. Elsewhere in the open source space where people are launching both smaller models and bigger models to meet a variety of different applications. Um, Kate, maybe I'll start with you.

I don't know if you had a chance to kind of play with some of the models yet, but curious about your early impressions, your vibe, check, uh, on, on how this release went. Yeah. Uh, you know, it's been a busy week, so I haven't had a chance to to play with them directly, but it's really exciting to, I've been reading up on them, uh, certainly, and it's really exciting to see what Meta put out there. Uh, I mean, with the release of their largest model, which is, uh, you know, over 400 billion parameters, I believe, mixture of experts and a hundred billion parameters, I think is the scout. Uh, they're really starting to take on larger and larger tasks and create, you know, some powerful models out in the open source ecosystem. I think with the, uh, announcement of their B myth model, which is, you know.

2 trillion, uh, parameters. Uh, I think what they said said that's big, right? That's big. That's, that's pretty big, Tim. Um, so, you know, they're, they're talking about, you've already on earlier trained versions, checkpoints, uh, it's cracking GPT-4 0.5 on tasks like science.

So they're clearly, you know, putting themselves out there as a frontier model provider. And doing that in the open, I think is only gonna continue to put more pressure on these closed. Labs to release some of their work out in the open as well, and more broadly help the community. So that, that's really interesting.

Um, I think there is a lot to be said about the, uh, mixture of experts architecture that's going on. Uh, where we see, you know, obviously DeepSeek made this famous, uh, when they first released, uh, back in December or so, uh, with, not first released, but released a big update to their family. Um, it's. An architecture that's been used more broadly even before that. But I'm really hopeful that this release will help get broader community support behind mixture of expert architecture.

'cause there's just tons of, uh, really interesting things about it. Very training efficient, um, inference efficient, particularly if run at a, a low batch size. So.

You only have to use the experts that, that you need to call at inference time, which, you know, if you're just running, you know, one or two tasks, uh, can be run really efficiently. You start to lose a little bit of that if you have to run these at much larger batch sizes. 'cause you have to load all your experts into memory. So most people don't quite realize that about mixture of experts. But either way, really excited to see just another power horse model get released, uh, in this case, two power horse models get released out into the open.

Yeah, for sure. And if you can go into that a little bit more for some of our listeners. I mean, I miss namesake of the show, so I have to kind of fight for it, but it's like, has mixture of experts been a little bit uncool as of late? Like, I guess, is this kind of you, it sounds like what you're implying is sort of like these models might like make it like a focus of the community again in a way that it hasn't in the past. And I'm, I'm kind of curious about how that, how that's developed. Well, I mean even just with the the Z system, right? We're talking about the focus on inference efficiency, running things quickly at inference time, and a lot of what requires that, or what enables that is the community building open source software and platforms to be able to host and run these models as quickly and fast as possible. And just because the most popular open source models to date, including pre prior generations of Llama, have been dense.

Architecture models, a lot of the existing support for hosting and running these models, running them locally, run, hosting them and running them yourselves on platforms like VLM are, you know, predominantly based on some of those more popular dense architectures. So there is going to need to be kind of a, a groundswell movement of the community continuing to build out support. I think we've seen a lot of that already with the release of Llama 4, and I'm just excited to get more open source developers interested. In mixture of experts as architecture as a whole and continue to build out toolings and, you know, ways that we can work with these models more broadly.

Sure. But maybe I'll bring you in here a little bit. You know, I think that there's coff in a way this discussion goes, which I think is like less interesting, where it's basically like, okay, Meta did this release now, like, who's ahead, you know, in this race? But like, I think that's often like the wrong way to think about it, particularly as the space gets more and more complex. Yeah. Like how should we read into this? Sort of launch about what Meta strategy is and how it's trying to kind of like fill a, a niche in the market, right? Because I think rather than thinking about like, oh, DeepSeek is ahead, or Meta is ahead, I think we should just kinda ask the question of just like, how are the strategies sort of evolving? Absolutely. Yeah.

I'm curious if you have some thoughts like what you read into this launch, basically. So let's just start by, by acknowledging what a consequential, uh, impact Llama has had on industry, the Llama models have been as of like 18th of March, they've been downloaded a billion times. Sure. Let, let's just let that sink in A billion. That's a lot times we've downloaded a model and made different versions of it adapted and this of that nature.

Right. So a lot of enterprise that we work with, they are, we are very focused on how do I adapt a model to our enterprise specific domain, our data, and the way we want the models to behave. Right. That adaptation comes only when you're really, really open. There are certain, uh, frontier models that can be adapted fine tuning, but then you're leaving, you're sending your proprietary data to the cloud.

That's a no go. So usually open models, open weight models are fine, uh, in that space where you can go and tune them to that. So our own Granite models, there's some models from Mistral and DeepSeek and others are also open weights, open models. But it takes quite a bit to create a good mechanism to assess the quality of an output. So for a lot of our clients, we have to go and gr build end-to-end LLM benchmarking mechanisms.

How do you evaluate the output on your specific documents? So the. Benchmark results that are public. Those are a good starting point to get you a directional y check to say, yeah, it's worth looking at. 'cause Llama 4 did X better, but none of my clients jump up and down saying that, oh my God, this is like 0.2 points higher than the other one. Right? People have other criteria that we use to judge which LLM uh, we should be leveraging.

It starts by IP. Who can own the IP on that model. It starts with where the data gravity AI model follows the data gravity.

It's actually commitments that you've made to specific vendor cloud vendors, right? There is things around can I adapt this to my own, uh, to my own environment? And then return on investment, the overall ROI of running these models. So you'll see a trend towards every six months. The next size smaller model gets smart enough to outcompete the previous one from six months back. So we're seeing this constant trend where we're getting really good power, like the performance to. The cost ratio, right? I think that's the sweet spot, and Llama has done a really good job. I would anticipate that we'll continue this trajectory of a billion downloads and we'll have different adapted versions of Llama available for our enterprises.

That's the right frame to look at it versus, oh my God, this just crushed the numbers on this particular task. Then there is, uh, then there are other models that will constantly innovate with new methodologies. I think DeepSeek did a phenomenal job with, with some of the paperwork, our Granite models. We have some really nice tricks up our sleeves in our own models, and we give back to the community too.

So I'm just super pumped about the community coming together. Open source, getting to a point you can adapt it to the enterprise and very, very focused on intelligence, uh, divided by the price and the what, and that kind of a metric. Hillery, maybe I'll bring you in, um, you know, just to talk a little bit about this Behemoth model. I know it wasn't released, but it is like shockingly large. Um, and, and it's cool on one level, you're like, wow, okay. It's like really, it's really big.

I'm kind of curious though, like from your point of view, you know, the degree to which like these are actually kind of like. Practical models that a lot of people will use in the wild, 'cause it sort of feels like the kind of infra you need to pull off. Like really actually serving and using a model of the scale. Like there's part of me is like, is this just kind of a more of a marketing thing than it is actually like a practical reality. But curious about your take on, on this is like, is there room for open source on the like mega, mega, mega scale model? Just because it kind of almost like limits like the set of people who would actually practically end up using it.

Yeah. I guess a, I have a lot of similar thoughts to what Shobhit just shared. Um, a couple of things, right? I mean, within IBM Infrastructure, we're also handling, creating the cloud infrastructure for watsonx and deployment of all these infra services and stuff like that. So the other part of my brain is, is looking at how do we bring, you know, more and more powerful accelerators of all kinds into that cloud environment to do whatever it is that watsonx needs to do, right? So if our customers are gonna need those really big models. I'm not gonna be the one that says No, we won't provide the infrastructure for it. Right? So we're advancing with NVIDIA and Intel and a MD and putting, you know, new and more GPUs out there to enable people to play around with models as large as they feel like are gonna be useful.

I think on the practical side though, we see a lot of experimentation or attempts to use these things maybe from a teaching perspective. Um, but then when it comes to scaling out deployments, almost all of our customers then start to engage with us on how can I customize smaller things, right? So I feel like you sort of have to know where things are at on the large side and what it might do for you. You may use that to inform yourself on, you know, what the solution might look like or, uh, maybe create, um, you know, additional tuning data or something like that, you know, to get that characteristic that you need out of something that's then gonna be affordable to scale. So I continue like show bsu most of our customers saying, Hey. Um, you know, work largely in kind of the B2B space.

As, as, as IBM we're working with other large enterprises who have millions to hundreds of millions of clients. And when you're wanting to engage with all of them and run at business scale of billions and hundreds of millions of things and people, um, the affordability very quickly kind of kicks in and people, you know, start looking at customization of smaller things for real scale out of, of deployments. Well, and if I can make a prediction based off of what Hillery, you just said.

Um, and, and kind of speaking to Shobhit, what you mentioned about, you know, small LLMs are increasingly being able to do more things. You know, I, my prediction is that most of the models for, uh, Llama 4 that were released, they're very, even the smallest one is quite big. You know, a hundred billion parameters. I think they're going to be used most by the community to fine tune some of the older, smaller Llama 3 models. So if we look at what can run on a laptop, what you can easily train and customize, you're really talking, you know, like, uh, one to 10 billion parameters in size, uh, more and you know, maybe a dense architecture.

'cause there's a lot of tuning support for that kind of capability, uh, model already created. So. I think that some of the most immediate uses of these biggest models are going to be to continue on that trend of how do we get those smaller models even more performant, uh, by using those bigger models to be able to teach, to be able to generate data, to be able to help augment existing enterprise data and create more of it, and them bring that and pack that down into smaller models like the older generations of Llama our generation of Granite, um, all playing in that, you know, single digit billion parameter size frame. I, I, I totally agree, Kate. And I think one other, you know, little factoid, I'm sure you guys have talked about this before, but it's estimated that only about 1% of enterprise data or 1% of the things in enterprise needs and model to use are contained in publicly available models, right? So as you think about that, it has to be that, um, an enterprise is gonna be customizing something.

And then the question is what is that something? And is that something affordable enough then to scale? Yeah. And uh, the size, uh, and both the size of the model, but also the context of Windows side, right? 10 million per context window. What a world we live in, right? I can just dump a bunch of data to it and, and talk against it. But it takes a lot to host these models. So a lot of, uh, use, uh, different vendors who are offering inference, infrastructure, the same exact model it is complex to host this and get it right.

Each vendor is offering different kinds of context windows. 'cause not everybody can pull off a 10 million infrastructure the way you fine tune it, so and so forth, right? Even companies that do third party analysis, uh, like artificial analysis and stuff like that. It took them a few turns to get the models to be provided, the inference infrastructure just right to be able to match what Llama had claimed to, to be the, the results in their papers and stuff like that.

So it takes a few rounds to get this done, and I believe that this is, speaks to the complexity of some of these larger models on how much difference you see from the same prompt being sent to three different or seven different vendors who are hosting this model have slightly different responses and you see quite a bit of a difference between the two. So I think we'll get to a point where derivatives of Llama 4, uh, the data that's created synthetic data out of Llama 4 and some of the new techniques that they released will make their ways into smaller models. And those are the ones that'll scale, uh, across, uh, different companies. But I'm generally very, very excited of these, these big releases that model companies are doing.

They're still sticking to their open weight models, there's still the restrictions that come with a Meta license that's not quite Apache and MIT, but overall our clients have, have, have loved the fact that we can now outcompete each other in the AI space and all clients win. When you have great AI labs working on this together. I'm gonna move us on to our next topic, which is Google Cloud Next, uh, show ba you're actually dialing in, uh, straight from Vegas, so I'll kick it over to you. Um, you've been there all week. Uh, what are the big things that we should know about coming out of this, uh, this show? It's, it's lovely to be with developers and just people who are hacking through, and clients who are actually using it.

Uh, 500. Customer logos on screen. That's where Google Cloud is today. Like that's such a great testament to where they were two, three years back and they've done quite a bit to make sure that they're serving the enterprises and they have more and more data. Cloud is growing, profitable, things of that nature. When you start to look at, uh, how they're bringing AI across the entire platform, how they are.

Exposing some of their internal strengths. So as a, as a great example, they have amazing TPUs to train their own models for their own use cases like YouTube, so Gemini across mobile apps and whatnot, right? So they're, they're bringing that TPU out to enterprises and they constantly innovating on that. So the latest release, Ironwood, amazing progress they've made on their own chips.

Then there's a lot of stuff that Google does in turn be to support their billions of users. So things like their own wide area network of, of fiber. It's millions of miles of fiber that they've now exposed to or to, uh, enterprise, uh, users and stuff. So this seem, they're seeming to make a very concerted, uh, effort in making sure that their secret sauce is now available to the end enterprises to use as well. Uh, overall, they, uh, they spent a lot of time on media creation, uh, versus, uh, use cases like coding or data and things of that nature, the media creation.

Clearly they're the only cloud that can do this end to end across all these different modalities, creating content. Uh, I was privileged to be part of the sphere experience in the on Day Zero where they showed us the Wizard of Awes and what they're doing to do this on such a mega scale. Right? It is just. It's, it's a great experience to see AI leveraging the, the best techniques to go create a such a immersive experience on this big sphere, uh, scope. So a lot in the media space, but not a lot of our enterprise clients jump up and down on the media topic. There's marketing great, there's some media creation, but the bigger focus on enterprises are what do I do with the call center? What do I do in my code development processes? My data is, is messy and things of that nature.

So they made. Quite a bit of, uh, announcements in this space. They have been for the last few weeks announcing newer and newer models. It's just amazing to see how 10 days before your annual event, you're releasing your Gemini 2.5, right? This is, it's this great people hold onto these big announcements, but in this AI race, you can't wait for 10 days. You need to get Gemini 2.5

out before Llama 4 comes in. So it's, it's good to see that progress is, is, uh, going really fast. The performance per intelligence per dollar. Gemini Flash has been doing really, really well.

Do talk. Their Gemini 2.5 Pro model across the board on the benchmarks and on all the different things that matter, including the loss exam for humanity is absolutely number one. So a huge focus on that.

Uh, just shifting a little bit more towards the agents space. Uh, we had MCP from Anthropic, which allows an LLM to in a structured way with a, with a standard protocol, access backend systems and stuff like that. To compliment that Google has created its own agent to agent protocol, which allows one agent to talk to the other agent, not as a tool, but as a, as a equal citizen, like equal little citizen. It's a peer. So both of them can peer and they can talk to each other and say, Hey, I found this error.

How do I what? Do what? Do you want me to do this? Or maybe go talk to a human if needed. And this is asynchronous. It takes a while. It can take long working task and they can talk to them back and forth. I'm generally very pumped when we get to a point where people start.

Collecting around specific standards. Uh, Google had a lot of different partners, 50 plus already working on, on, uh, agent to agent within IBM consulting. We obviously have a really good agent tech workflow. We have our own IBM Consulting Advantage we already have MCP integrated into it. Now we are working on agent to agent within that space as well.

So we are getting really, really excited about, uh, making sure that this is very open ecosystem and you're working sideways. Uh, those were my highlights from the Google event. Just very pumped about the. The clients talking about the specifics of how they did it. It's not just a 30 second video, but a whole half an hour session.

Let's deep dive, here are the challenges, here's our journey of which models we use and so and forth. So it's very good to work with the product teams and the customers in these events. That's great. Yeah. So, uh, I guess Hillery an avalanche of announcements here from a number of different directions. Um, I'm curious, I, I think as you kind of like look at Google Cloud and what they're announcing, trends, thoughts, hot takes, uh, from, from the Sears, uh, Google Cloud Next.

Yeah. One of the things that caught my eye that Shobhit didn't have on his list, so I can can grab onto it and mention it, um, you missed one. Yeah. So, so they also talked about, uh, AI on premises and, and offering those capabilities.

And I think that's also exciting to see in the sense of, again, it affirms kind of what we've been thinking here, that clients do need to be able to run AI. In an air gap environment, we keep saying that AI is a platform conversation and that AI and hybrid cloud are two sides of the same coin. And really that's a statement going back to everything. We were talking at the beginning, that there is data in really important places and that data needs to be secured. Sometimes it needs to adhere to sovereignty concerns and other things like that.

And so. Bringing AI to the data. Um, and the fact that, you know, one of their announcements this week affirm that is, is something that they also see as important. I think it is just a really good affirmation of what we're also seeing in the enterprise space that gotta bring the AI to the data.

AI is a, is a decision about how flexibly you can deploy AI and all those locations that you have, data and customers. Um, it's not just a decision about. Only which model and only which location it runs in any final takes. Kate, I dunno if you have any thoughts from, uh, this year's Google Cloud Next uh, on any and all of this. I mean, it's just like remarkable every time like Shobhit comes to a show and is like, here's what's happening.

And it just feels like, it's like this, like voluminous list that I have trouble parsing, but I know I need aho it to die to decompose. Uh, yeah. All of these main tech conferences going on, it's great.

No, I mean, o obviously from, you know, my perspective, I'm most interested in things like the Gemini 2.5 Pro release, which has been really impressive, honestly, getting great, great vibe checks from that model. Um, really exciting to see them really kind of take center stage, uh, and have a, a strong release there.

So, you know, more, more great models out there only, uh, improves what the field can can accomplish. So, uh, from that perspective, really excited to see them push the. Push the boundaries. Yeah, I think, uh, just one last parting thought.

I think Google is really flexing. Its, uh, B2C learnings, right? The fact that they can train their models from so much content, and again, I'm not getting into where the content is coming from and like, and indemnification and stuff of that. I'm just purely commenting on the fact that they can train on so much more real world. Information from B2C, uh, space, right? There's nobody else who has access to so much data B2C, right? So the video generation, for example, the videos that they are creating are very, very cinematic and they, it seems like they have really gone out and looked at all of the YouTube videos from really good creators and stuff like that.

So the quality is, is, is really good, and it's translating into voice experience. And this is becoming more and more critical for clients to get to vendors, to get voice right. And I think they have an unfair advantage in the space where there they can go and provide some very nice audio experiences as you're, as you're thinking through.

So one small example was if I have some Google docs and stuff like that, I can, I can ask an agent to say, create, uh, a particular workflow, do some research, and then create a very long research paper. So now it's created a three page. Paper on a particular topic on why your margins are dropping, though your revenues are going up and it'll do competitive analysis and all this stuff.

Create a three page paper. I can click a button and create an audio, uh, broad, uh, podcast out of it. Right? And this like corporate enterprise stuff that's so difficult to consume and now you're plugging in a really nice audio, uh, layer on top of it and I can listen to it on my drive to to work, right? I think the fact they have an unfair advantage on the audio and the experience side.

That starts to give them some advantages on the enterprise side as well that some of the other, uh, peers of theirs don't have with these podcasts going up on YouTube, maybe Kate, you'll get the digital twin of Shobhit that you've been wishing for. Exactly. As long as he gets some royalties from it. Yeah, that's right.

Exactly. There's ad dollars. There. So, um, yeah, I mean, I think the future of like educational entertainment here is really funny and interesting to think about is like, convert all my emails of the day into a Netflix series. I can just watch when I get home. You know? I think we will start to enter this like very strange worlds.

But here's the kicker man, and I'll, I'll absolutely close on this. I wanna live in a world where I can insert myself show bit inside of a movie scene that I'm seeing, right? If Iron Man comes to a bar and orders a drink, I wanna be the bartender, right? If, if you have, like, if you have all the celebrities on screen, I wanna be part of that. I could be the driver of like, I want to immerse myself as part of the video, and this was not possible till today. So if you look at how far we have come with the video creation, I think we're at a point we'll have super personalized movies where they'll be cracking jokes that I do on my daily basis too.

I'm gonna move us on to our final topic of the day. Uh, I'd be remiss to mention this even though we just have a few minutes on the episode today. Um, I really encourage you, if you're listening to the show, to check out this super interesting report that came outta Pew Research. Essentially it's a, uh, survey of American perceptions around AI and how people use AI in their everyday lives. Um, and I think we only have enough time to kind of do a few kind of hot takes here, but I think one sort of really interesting takeaway. From this report was the degree to which sort of experts in AI have views about AI that are really, really divergent from, you know, people who are just kind of like using or experiencing AI in their like everyday lives or even just having heard about it and never used the technology at all.

Um, and I think maybe, Kate I'll kick it to you. I think like one of the really interesting results was, you know, all these data, all, all these kind of data points about. Experts saying, uh, that, you know, uh, jobs won't be impacted by AI, but people really feeling like jobs will be impacted by AI.

Um, experts generally being a lot more positive on the technology than the journal public is. Do you feel like this kind of impacts the kind of prospects for AI going forwards? Um, just kind of curious about your quick take in the minutes that we have. Yeah. You know, I think there's a lot of interesting things from the, the Pew report. Definitely not enough to get fully into right now, but. I think it speaks to the optimism of the researchers involved, which is great because we need people optimistic about the impact of technology and science on the world to be the ones inventing and trying to push it forward.

But I also think it speaks a bit what I saw to some of the representation in technology in that we still have work to do to get better representation to more reflect the world building this technology. So if you also look at, they broke down, you know, men versus women's perception of technology and how it will impact, it's similarly men matched AI experts. And it will be no surprise to anyone that most of the AI experts and the research field is still predominantly, you know, uh, the work is being done by men. So. I think there's also, you know, it reflects just some of the needed diversity and different opinions and broader perspectives that we still have room to grow and bring into AI research as a discipline as a whole.

I think it's a great note to end on and I think, um, hopefully it was a good sell for you all to go check out the report. I think there's a lot of data there and I think it's worth really parsing through and I, I agree with you. I think it really points out, so the need for, for greater efforts on diversity in the space.

As per usual. I say this every episode now I feel it's like almost a tradition like saying agent, uh, in every single episode, but we have had more things to cover than we have had time to cover today. Uh, but uh, Shobhit, Kate, Hillery, thanks for ablely guiding us through, uh, for our 50th episode. And, uh, thanks for joining us.

Uh, if you enjoyed what you heard, uh, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. And we will see you next week on Mixture of Experts.

2025-04-14 14:57

Show Video

Other news

Don't Miss Duo's Big Unveil: See What Attackers Will Hate & Users Will Love 2025-06-01 04:54

Data Analysis for LinkedIn Success: Leveraging Metrics to Grow Your Network 2025-05-27 19:06

This Energy Tech Could Make Early Investors Millionaires 2025-05-23 14:06