2024 Rewind: Breakthroughs in AI models, agents, hardware and products

2024 Rewind: Breakthroughs in AI models, agents, hardware and products

Show Video

All right, looking back at 2024, what was the best model of the year? For me, it's going to be Gemini and Flash. And I'm going to nominate a sequence, I think, which is the sequence of the Llama models. So is the bubble finally going to burst on Agents in 2025? Agents are the world. Agents are everything.

And in 2025, we're going to have super Agents. In 2025, is NVIDIA Still going to be king. Not only NVIDIA is here, but we also see new entrance or the the other players in the market. Are we going to end up having openness and safety? You can do this out in the open. It does not need to be behind a a black curtain. So to speak.

All that and more on today's mixture of experts. I am Tim Hwang and welcome to Mixture of Experts. Each week, MoE is dedicated to bringing the gold standard banter you need to make sense of the ever evolving landscape of artificial intelligence.

Today, we're looking back at the huge evolutions across 2024. You know, just to take you back, in January of 2024, we're all chattering about the release of the GPT store, Claude 2.1's long context window, and I think at that point, we were still waiting for the release of Llama 3. Uh, 2024 was incredible, obviously a dynamic year in AI, and so what we've done is we've gathered a bunch of our best panelists to talk about what stood out to them, what didn't go as well, and maybe what they'll think, uh, about what happens in 2025. We're going to talk about agents, hardware, product releases from the whole year, and But first, we're going to start with what happened in the world of AI models in 2024.

And to help us unpack the journey we've been on, we have with us Marina Danilevsky, who's a senior research scientist, and Shobhit Varshney, senior partner consulting on AI for US, Canada, and Latin America. And so I want to actually start with maybe a quick Uh, more kind of closer story, right? Even before we zoom back to the, you know, dark ages of January 2024, uh, which is the release of o1. Um, you know, obviously this was a big announcement, one of the biggest announcements of the year. And I know a show, but before the show, you and I were talking, you've You wanted to kind of get in and actually just point out that like the release of o1 is actually marks a pretty big change in how these companies are thinking about doing models and scaling these models. And maybe we'll just start there if you want to jump in.

Excellent. It's such a great time to be alive. Um, what we see all around us, like there's no other year in your entire career life that you would rather be alive than today. In the last year or a year or so, we saw the era of scaling laws. We got to a point where We realized that adding more compute, building larger models, and driving higher performance got us incredible, incredible performance from these models, right? So we got to a point where we, we have insanely large models, now Llama 4 and 5 billion parameters, 1.75 from GPT 4.

You can see this huge set of big models that are doing amazing work. Now we are transitioning to a couple of different shifts in the market. One, we are seeing more of the shift moving towards the inference phase of it.

Slow down, think about what you want me to do and think through a plan and come up with an answer. We also started to give these models more tools that they could use, just like we learned to use tools as we grow up. So we have these agentic flows that are helping us increase the intelligence as well. We also saw a big shift in the overall cost. The cost of these proprietary models implemented in the last year or so. But then smaller models got more and more efficient and started to perform much much better.

So we've seen this shift towards insanely large models that can think a lot more. We saw us run out of all the public internet data and now we're focusing a lot more on high quality enterprise data or stuff that's built for specific models. So we're now getting to a point where you have a teacher model that's insanely large, really well thinking through the whole problem, that can create synthetic data, can help Train a smaller model can distill a model that can deliver high performance at a lower price point. So we've shifted this, shifted quite a bit in how we think about AI models and how we have been investing in building them.

2025 and beyond is going to be a completely different ballgame in what we see with what AI models would do. Marina, what are your thoughts? Yeah, I think you're right. It's been a really interesting year in terms of where we started, where we've ended up. We've seen that, yes, we can go bigger and bigger and bigger.

And now we're finally there. We can say, great. So how well can we still keep going now that we can go so far? smaller. So that initial research push of how big can we go, we've finally given ourselves the luxury of, all right, now it's time for efficiency. Now it's time for cutting costs.

Now it's maybe eventually time to talk about environmental aspects and things of that nature. Maybe next year. Is that a prediction for 2025 or? 2025.

Um, So that, that part is very interesting. It also means that the quality has gotten to where we can start to, uh, build enterprise grade solutions reliably. And I'm, I'm excited for that. I know we're not talking about next year yet, but that's the thing that I'm really excited for. The quality is there, I think finally.

And we, we can start getting real serious about enterprise solutions. Yeah, I mean, I think that seemed like a really big trend this year, you know, was certainly someone who kind of like does software engineering in their free time, kind of as a hobby. This is the year where I was like, wow, I am finally able to do stuff with these coding assistants that like I would not otherwise be able to do. It's like finally fit for purpose for me to kind of use on a day to day basis.

And I think that was, that was a very big. jump, um, I think that, you know, we noticed in the last 12 months. I guess Marina, are there particular stories that stand out to you from like, I don't know, earlier in the spring or otherwise where you're like, Oh, I'll, if when I look back on 2024, I'll really remember it for X. I mean, first of all, I'll remember it for just the, uh, very, very high levels of competition. It felt like every two weeks somebody was coming out with something and companies that you maybe wouldn't even expect like even which is very recently Amazon being like, Oh, they're working on that.

Oh, that's actually pretty good. So I think I'll remember it for a lot of people trying to, uh, really one up each other, uh, in a, in a good way, in a way that actually really pushes the thing forward. But I think that the number of players that we have this year is, uh, what's really going to make it stand out for me and some of the. You know, as we talked about in previous episodes, some of the debuts were more successful, some were less successful. Sometimes people didn't quite, you know, double check everything.

Maybe sometimes people thought that the demos were a little bit overcooked. Um, and so I, I think that that's, That's the thing that'll make me really remember the year is the different ways of how do you join in the competition and introduce your, your flavor. Shobit, how about you? I think from an enterprise perspective, uh, this is an amazing year. We, we recently ran a survey for our AI report and about 15 percent of our clients globally got real tangible value by applying generative AI. There's a lot of, uh, knowledge that was locked in documents and processes, things of that nature. And we saw meaningful movement and how clients are focusing on it.

a few small complex workflows and delivering exceptional value out of it. I think we did not get enough value out of the generic co pilots or assistants. That has shifted more towards, hey, this really has to be grounded in my data and my knowledge and things of that nature. But overall, the last two weeks that we just went through, I think that was the most action we've ever seen in the last two weeks.

Two, three years of AI, what the competition between open AI and Google and then meta jumping in that that has been a phenomenal, phenomenal movement in the community together and now we're starting to see us move towards, hey, we have exceptional models, how do we start to then control them a little bit more, adapt them to our enterprise workflows and our data sets and have them think and reason with tools and things of that nature more the big movements around o1 I think it's going to go down in history as a big, big point in time when we started to realize that 200 a month is actually great value. You start to get to a point where if you're thinking about how you, if you're spending 200 bucks a month, you're really being very focused on which workflows truly can see an uplift and apply AI to them. Now you're at a point where you're really paying somebody to do that.

augment every aspect of your daily life. I think we're great, great momentum to start 2025. Yeah, for sure. And I guess, I don't know if like folks have nominees on superlatives for this, or it's like, is o1 the, the release of the year? I mean, I think from a model standpoint, or were there other ones that kind of stand out? I mean, I guess we also had like, Mama 3 this year, right? It was also a huge, huge announcement. For me, it's going to be Gemini, uh, uh, Flash.

I think what they've just done with a small model that does multimodal, that's going to drive the next two, three years of computing. And the reason I say that is everything that you can now unlock. If you guys followed the Android XR announcements recently, you're now at a point where multimodal models were inherently insanely large. They needed a lot of compute, always happened on the servers. Now with models like Google Flash, you're getting to a point where a small model can do multimodal really, really well.

And the thing that will blow you away is how it starts to remember things that you've just seen, right? I think it's going to start augmenting all parts of our, uh, of our day to day workflows, including memory. That's something that we have not seen so far. Uh, we used to generally ask questions in a very cold start.

Now we'll get to a point where these models will have infinite memory, can have access tools like we do. I'm very excited about high performance at a really small, uh, size. So we can then eventually get to this, Compute infrastructure where you can have XR AR experiences and you can bring compute more and more closer to the devices that will drive a lot more of privacy as well because then the data is locked into those devices that I'm carrying with me versus somebody else's cloud. Yeah, I want to agree with that. Actually, the, the small model, the small models thing, because I think that we're going to start at least in the next year or two, seeing a lot more, uh, formal regulation going on and a lot more people waking up to what does it really mean as you're talking Shobit, but if the models are starting to remember, starting to be personalized, starting to be customized, that's going to become extremely, extremely relevant.

So having something small, local that you can actually have that guarantee technologically. That's going to become very, very important. I agree with you.

Yeah, for sure. And how about you, Marina, I think in terms of like, you know, I know Shobhit was saying, oh, one was huge. Like if you have like a, you know, best model of the year kind of nomination.

That's a hard one. I, I like seeing them in a holistic way. And I feel like it's hard to tell at the moment when something is actually going to, uh, you know, turn, turn in, I'm going to nominate a. sequence, I think, which is the sequence of the Llama models, not the Llama models itself, but the sequence of we're going to have Llama 3 and then we're, so we've seen what we can do with pre training and then we're going to see what we can do with post training.

So we're going to get bigger, bigger, bigger, bigger, and then we're going to see how far down we can go. I'd like to see a consistent perspective of that as a sequence that people try of push the pre training, push the post training, push the size and do that iteratively, iteratively, iteratively. I'd like to see that continue to be a thing. Yeah, I feel like that's like how we know you're a connoisseur, Marina, is like, you like, you, you like the curation of Llama.

It's not just like any given model is the best model. Marina, I think we'll get to a point where the big research labs are going to build even bigger, bigger models. But they may not release them in the public as a model. And we use that more for creating synthetic data, for displaying teaching as a teacher model, and so forth. But I'm really excited about, we're finally coming to a point where we've poked at this for a while, and we said, oh, if I just ask this model to think before it answers, Well, this is what elementary school teacher kids, right? And now we're trying to relearn how we teach young kids on how do they look at the like, try different things out, create a plan, answer the question, go pick up a calculator if you really need to.

And don't try to do this in your head, things of that nature. Like I, I feel that we are, I have little kids and I've spoken about that quite a bit and I feel that we are, we are, there's so many similarities between how we are training and we're doing some reinforcement learning with our kids and giving them rewards and mechanisms in place. We are breaking problems into smaller chunks and they go solve each one of them separately and there's a whole positive reinforcement around them and they get things right.

I think we're getting to a point where we're getting to learn how these models learn and that becomes a good symbiotic relationship. I think we will stop. Asking these models to do things that humans do really well, and we'll have a better mutual appreciation of which things should be delegated down to these models. And that also means that benchmarks and how we evaluate these models are going to change quite a bit.

But I think today we're starting to get to know these models really well. And 2025 and six will have a very different relationship with these models, becoming more of a companion. Versus trying to figure out, hey, can you do this as well as I do? Yeah, absolutely. Yeah, I think one of the funniest outcomes of this year has been all the examples of, like, could you just try harder? And then, like, the model actually just does better, which is, like, very funny. I mean, computers did not used to do that.

So, um, so I think maybe a final question, and then we can wrap up this segment, um, is we haven't talked so much about, Uh, multimodality, but it really seems poised to become a really big deal in 2025. I'm curious, I guess maybe Marina, I'll start with you, if, if you've got kind of predictions for what's coming up in the next year for, for multimodal. Yeah, multimodal, uh, that's something where we had those thoughts when foundation model sort of first came on, cause we were all very excited about the fact of, oh, well, it's just tokens in order.

It doesn't have to be text. It can be anything, but then I think the reason we all went into text as one of the very early code being part of it, I think, is the amount of training data that we had, the amount of examples that we had. So especially now that we've gotten better with synthetic data and with, like you said, Shobhit, but you were referring to teacher models, we're going to be able to explore that space, uh, a lot more. And so I, I think that they might finally, uh, be at the point where once again, they are useful.

There's huge interest in, uh, having the multimodal models because now, you know how with the text models, we had the idea that when you have one doing lots of tasks, it learns from each other. Now it's going to be even more interesting where if you have a multimodal model, does that make it actually also better at each of the individual modalities? Again, I think the data is now finally there, not just the compute, but the data and the ability to create more data. Um, and so I think that, yeah, next year we should see more. I think I was expecting to see maybe a little bit more models that aimed at the sciences this year.

Maybe now again, next year, uh, maybe models that are going to be more successful with video, not just Sora, Sora. But something that is maybe a little bit more useful lower down, think like, uh, with robotics. There's a lot of, uh, things to be minded there. So that's, I guess where I, I see those maybe, yeah, the flashy parts are fun, but the real usefulness is somewhere a little bit lower down with, um, with the hardware. No, I think the multimodal space is going to be amazing the next couple of years.

And I think it is important for it to understand all aspects of what humans are seeing, feeling, looking at, reading, and listening before it comes and helps us. Um, I think it's going to have a huge impact on its understanding of the world around us. So far, we have done things where, hey, I will take a picture of something or I'll translate that into text and ask a question of a chatbot.

That paradigm has not scaled. As the, as the multi modal models get better and smaller. Like the Gemini 2. 0 Flash Experimental, those are the ones that are going to drive more and more richer experiences in our day to day lives. And the competition is going to be very, very high.

You will see these models come out from any, from everywhere. Uh, the Any2Any, from speech to speech directly, those kind of models are delivering exceptional customer experiences. If you go for, if you look at traditional ways of doing AI, you would go speech, To text. You take that text, you pass it to a, to a AI model. AI model figures out what to respond with, and you go back from text to speech.

A lot is lost in translation and transcription. Now, when you start doing, um, from media to media, you go from voice to voice. It starts to understand the nuances of how humans talk. I'll, I'm very excited about the next year of multimodal. Small and then starting the full context. That's awesome.

And that's all the time we have for today to talk about AI models showbirth marina Thanks for coming on. Happy holidays, and we'll talk next year about all this and more For our next segment I want to talk about agents in 2024 and to help me do that I'm gonna bring in Chris Hay distinguished engineer CTO customer transformation and Maya Murad who is the product manager for AI incubation Maya, Chris, welcome back to the show Well, so in 2024, uh, it was the year of the agents, agents, agents, agents. I think it almost became a little bit of an in joke at MoE that if we had an episode that did not include agents, uh, that was a really big thing and an unusual thing.

Um, and so I guess probably let's put it this way. And I guess maybe Chris, we'll, we'll throw it to you first is, um, Agents over hyped in 2024 or under hyped in 2024 under hyped, not hyped enough. Agents are the world agents are everything, and in 2025, wow, we're gonna have super agents. That's what's coming in 25. Okay, um, and I guess Maya, I mean, looking back, um, I don't know if you'd agree with Chris or if there's like particular stories in 2024 that really stood out to you in the development of agents, if they're going to be as big as Chris says for 2025. So I definitely agree 2024, I would say it was a lot of talking about AI agents.

Um, I'm excited to see more execution and what I expect to see is more quality. Hurdles. Once we see more agents being pushed into production. I think we're just scratching the surface of what is needed. A trend that I'm starting to see right now this year is having more protocols and standardization efforts.

So we saw that Meta is attempting to do that with the Llama stack, Anthropic with their model context protocol, MCP. Um, so I think it's going to be this little battle for how do we standardize how LLMs interact with the external world, how agents, I think in the future it's going to be how agents interact with each other. Um, and I think this is where the next frontier is and where a lot of our efforts I was going to be heading towards. Yeah, this felt like a big, like, almost like a preparation year. I was looking at all the news stories and I was like, is the biggest agent story of the year that Salesforce is hiring a lot of sales agents to sell agents? Like, it feels like, and then between that and the technical standards, it's almost kind of like, it's almost far and few between to be like, oh yeah, this was the killer agent release of the year.

Um, and actually, in fact, a lot more prep. I don't know if Maya, you'd agree with that. It felt like it was the year of bracing for what's to come and all the different things we needed to consider and then who wanted to own that category. So it was really interesting that for example, Meta went out early and with, so the first iteration of Llama Stack was. a little bit rough, but what they were trying to do with their saying, we're in the long term, we're in this in the long term.

And we want to help define those agent intercommunication protocols. And I have faith if, if that's a direction that Meta wants to take, I'm sure they're going to do a good job at it. But this is also signaling something interesting.

Um, the last two years, it's, um, Mainly the field reacting to what open AI put out so open. I put out their chat completions API and the whole ecosystem followed suit. And if you didn't have that exact API, your thing was much more difficult to consume. And now we're seeing a lot more players contend to. Uh, being the one setting those standards and protocols. Yeah, for sure.

And maybe, I guess, Chris, to turn it back to you, I mean, you're, I think you just used the phrase, agents are the world, which is a very bold claim. But, I mean, 2025, I mean, you know, let's say agents are a lot more popular, become a lot more prominent as a part of the landscape. You know, is it meta that's well positioned to win here or do you, do you have any predictions about what we're going to see in terms of who's going to be leading in the space versus maybe a little bit further behind? So I really like what Maya had to say on Anthropic and the model context protocol. I actually think that is going to be one of the biggest enablers for agents next year.

And I think the problem that they've solved really well is allowing remote calling of tools. That's probably the biggest thing that they've solved there, right? So yeah. If we think about the enterprise for a second, you're not going to have agents that are sitting scouring the web, or they're going to be, uh, sitting downloading documents, whatever.

It's going to be access to your enterprise tools. It's going to be things like accessing Slack, it's going to be accessing your, uh, Dropbox, or your box folders, or whatever, or your GitHub. And a lot of that is being standardized. But more importantly, you want to take your own data, and then expose your own APIs, and expose that in a way that agents can consume data. In a standardized way.

And I think MCP has done a really good job of allowing you to remote call tools and then be able to chain them together with multiple servers. And I think that's going to be a big enabler. Now what's interesting and what they've done there is it is easy to hook up different LLMs, for example. So it's not tied to the cloud stack there. You can hook up any other model that you want. And.

It's all tied in to function calling, which again, was a standard that was created by OpenAI in that sense. So, I like what you said there, Maya, about, you know, different providers coming in, and coming in an ecosystem. And I think that's what I'd like to see happen is no one company winning. And this is ecosystem of providers is going to push everything forward, and we're going to enter this world of the big agent marketplace.

And that's why I say super agents are coming, because it's going to be this really big ecosystem that's going to start to emerge in 2025. And when you say super agent, what do you mean exactly? I just made up the term Tim, so. You heard it here first on MoE. A really good agent.

That's a super coming from super intelligence or is this your definition or is it in the sense of like a Hollywood super agent? Actually, I, thanks for the save there, Maya, right? I'm going to define a super agent as the combination of the reasoning models. The inference time compute models are coming out just now combined with tool access. So therefore they're more powerful than the agents that you have today.

So there you heard it first. You're right, Tim. That's what a super agent is.

Very nice. Uh, Maya, you had a funny phrase when you were kind of giving your reaction to my first question, which is, you know, next year's agents are going to be everywhere, but it's also going to be the year we're going to discover, like, where the, the barriers or the limitations are, you know, basically this kind of the full force of agents going to become crashing onto reality. And I think we're going to learn a lot. And, you know, I guess one question I've been asking a lot of the panelists for this. This episode is, you know, what's underrated? What are people not thinking about that are, that's likely to be like a big hurdle, right, for agents going forwards? Number one answer, security. Super underrated.

I think it's already being reported that a lot of the existing players in the space are leaking sensitive data. And I, I, see agents as a way of exacerbating these inherent risks of LLMs. And I think we're under appreciating what it takes to get it right. I think the other thing is how to nail the right human interactions. When you have this ability to automate more complex tasks. What are the things that you still need to delegate to the human? How do you need to have a human in the loop? How do you avoid an overtrust issue? My team has done a number of user studies and when information is presented neatly by an actor that looks and seems intelligent, it's really easy to take everything surface level for granted.

And I think there's a whole new paradigm of human computer interaction or maybe human agent interaction that will be unlocked. And I'm, I'm really excited for what's to come because I think this is inherently a creative exercise. How do we keep, retain our creativity, retain our ability to do critical thinking, and yet automate certain parts of processes to AI? Um, that will be a really interesting paradigm to get right. Yeah, I think that delegation problem is going to end up being super, super hard. Uh, I think, uh, yeah, it's very easy to be dependent on, Even people who sound smart when they're not actually. It's like no different, I guess, for, for agents, uh, as well.

Um, well, I guess put it this way is, you know, it sounds like we're very interested. And I guess the big prediction from both the two of you seems to be, you know, agent marketplaces. Right. That's going to be maybe like the big thing we're going to see, um, next year. You know, I think one of the big questions is also kind of like what's going to be the first most popular agent use case in some ways. Um, you know, you think about the big marketplace.

There's a lot of things that agents could do that may be fun to do, but, you know, I think we're almost kind of looking like what's going going to be the, what's going to be the email of the agent world, right? Like what's going to be the slack of the agent world. Um, curious in both of your experiences, you know, talking to customers and stuff with their particular things, like in their hopes and dreams that they really want to see out of agents. And if there's kind of anything recurring there, that's worth it for our listeners to know. I think from my perspective, Tim, and that marketplace, I think there is some obvious ones.

Like, Translation, I think, if I'm truly honest, like language models today, I don't think they've really nailed translation so well. There's some models that do certain languages really well, but then, um, if you think of the more esoteric languages, for example, um, the less popular ones, then the, the large models aren't getting that. And then it's going to be specialized models that have been trained in that specific language. So, um, I think that's probably a real opportunity for some of these smaller language models combined with an agent to offer translation services. And again, add that into domain services.

So things like legal, which is something you know very well, Tim, then I think that will probably be a big. piece of that marketplace, but I'm hoping that it won't just be about these individual agents. I think any piece of information, it could be sports scores, it could be golf scores, it could be information about play, it could be absolutely anything. And one of the things, and this is my next prediction for 2025, is I think we're going to get a shift in the world wide web. So today, HTML, et cetera, is the dominant.

Uh, language, markup language of the internet. That's not really well designed for LLMs and not well designed for agents. So I wonder if in order for the agents to exist, not just having the marketplaces, but having the way to expose that data, we talked about MCP earlier, I wonder if you're going to start to see new types of. Page is appearing where the content is optimized towards the agents for consumption by agents and resources that they expose as opposed to necessarily human. So I'm, I'm kind of predicting we're going to start to see this shift in the web to a kind of, uh, dare I say a web 4.0, I'm trying to avoid the term web 3. 0 where we have content that is specifically designed for agent consumption.

Yeah, it seems to be almost the prediction that's kind of implicit in what both of you are saying is that you know there'll be so much interest in the promise of agents that like almost we're going to be kind of reconstructing the web to make it safe for agents or make it work for agents. And I guess a lot of the kind of stack and a lot of the kind of interoperability stuff that's being built is like an attempt to do that in some ways. Um, I don't know, Maya,, do you agree with that? You think that's kind of like going to be the future is like we'll have a, you know, agent markup language basically. Uh, A.T.M.L.

I think a lot of the interesting use cases will be unlocked when different agents that were built by different providers that are owned by different organizations are able to interact with each other. And like, how do you establish a safety protocol? How are you able to do that productively? Like the promise here is like, how do we break out of all these silos of different systems and having to manually architect how each one speaks to each other? And can we get to, uh, Universal interaction protocol. This is really an interesting promise.

I don't know if we will fully unlock it next year, but a lot of different actors would like to go into this direction. And there's simple things that we should nail before that. So I know like software engineering tasks are there's a lot of investment going that space.

I still think no one has nailed like the average business user, the average business user has to use, I don't know, a dozen of different tools on their computer and their machine. None of them speaks with the other. Everyone has its own onboarding experience. So I see a lot of opportunity to flatten out these complex experiences and make them much more dynamic and integrated. And this is the true promise of this technology. And it's the ultimate dream, I guess.

I mean. Because the world you're describing is almost like the agent becomes your entire interface for all these applications, like they stay independent, but like, yeah, the operating system in the future really is the agent that's doing things on your behalf. It's natural language.

I was like. LLMs changed our perception of how we interact with the digital world. We expect everything to be in natural language, or you could do a form and then there's an option to do natural language interaction. And I think that expectation is gonna widen. Yeah, no, I think that makes a ton of sense.

I guess maybe the final turn that we should talk a little bit about is like on the engineering and coding side, right? I was thinking this year that like, The coding assistance has gotten really, really good. But the dream is that you eventually have agents that are like, I'm really envisioning a software code base that looks like this. And it's able to kind of like build and interoperate on all parts of that, and all parts of your code base. What do we think are the prospects for that kind of automation and agentic behavior? I'm going to kick off here, and I'm going to be controversial as always. And here is something for people to think about, which is Programming languages today are designed for human beings, right? And if you think about things like loops, while loops, for loops, etc. There you have however many versions and the same with conditionals, if statements, blah, blah, blah.

But you know what? When you get down to an assembly level, none of that exists, right? It's all back to branches and, you know, uh, and jump statements, etc. And therefore We are in an agentic world, we're getting them to program in a language that is designed for humans. And the big challenge, I would say, that I think is going to happen over the next few years is that you're going to have a more agentic native language. Something that is more designed for LLMs and therefore a less of a syntactic sugar that you need to satisfy humans there. So, I think there's going to be an evolution in programming coming.

Um, And, and you can see it already today, right? The LLMs are already generating, uh, you know, here's another Fibonacci function. I don't, I don't need another Fibonacci function in my life, right? We got those. Exactly.

So I then think you'll be like the equivalent of kind of NPM or something like that, where you have a big massive AI library where you can pull the functions that you need. So I think. Like your AI operating system, I think we're going to get an AI programming languages and libraries that are going to be a little bit more native, and then that's going to help the development of coding. So I think that's an interesting term. Will it be 2025? Maybe, maybe it's going to be 26, but I think that's where we're going.

With the current technology we have, I'm like super impressed with what I've seen with Repl. it, with the ability to stand up like full stack applications. On the project I'm working on with Bee, it's been such an interesting paradigm like chat to build applications.

Um, I, I really see the ability to create digital interfaces and code bases being democratized in a way that hasn't been able to for before. Purely powered by the current technology of agents that we have. I just think there's this like last mile problem to nail, and I think next year this is going to blow up in a major way.

Nice. Well, you heard it here first. That's all the time that we have for agents. Uh, that was a lot to cover in a short period of time. Chris, Maya, thanks for coming on the show and we'll see you next year.

I want to move us on to talk about the hardware that powered AI in 2024 and I can't have picked a better duo of people to help out in terms of explaining those, uh, than the two that I have online with me today. Khaoutar El Maghraoui is a Principal Research Scientist, AI Engineering, AI Hardware Center, and Volkmar Uhlig is Vice President, AI Infrastructure Portfolio Lead. Welcome to the show. Volkmar, maybe I'll turn to you first. So, you know, as we talk about hardware on AI, it's almost become synonymous with saying that we want to talk about NVIDIA. Um, and, uh, I'm curious about what you thought the biggest stories were this year from NVIDIA.

I mean, the one that strikes me is the announcement of the upcoming GB200. Uh, but curious if there's other things on radar for you as we kind of think about, you know, what were the big stories in 2024? NVIDIA. Made a big splash for the GB200.

Um, and I think we are seeing a big shift towards more integrated systems and protocol on the training side. Very large, like rack scale computers now. Um, liquid cooling is coming. So all the things we've, seen over the years how to get cramped more compute into smaller form factor, you know, making it faster, better networks behind it, etc. And I think NVIDIA is really trying to push hard on staying the leader.

Um, on, and then we are seeing upgrades, which are kind of a reflection of Um, how models are now looking like. So we have 70 billion parameter models. Um, and you know, the 70 billion parameters, even if you quantize gigabytes at 8 bit. It's 140 gigabytes at 16 bit. Uh, now you don't want to have to buy full cards.

So that we see an increase in memory capacity across the board of all the, uh, the accelerators. Uh, but not only NVIDIA is here, but we also see new entrants or the, the other players in the market. AMD is announcing a pretty good roadmap of their products. All that's very, very large. Memory capacities and memory bandwidth to address those large language models and fit more model into less space or less compute like and, uh, and Intel is playing in the market as well.

And then you have a handful of startups, uh, where we also saw, you know, really interesting technologies coming onto the market. So if you look at, uh, Cerebros, that's a wafer scale, uh, AI, which, you know, like. A year ago, they were talking about it, now you can actually use it as a cloud service.

You have Croc being a player, there are other companies coming up, there's D Matrix, which will have an adapter coming out at the beginning of next year. Um, and so I think, um, um, yeah, so I think there's a good set of players in the market. And then there are new entrants, right? We just saw the, the Broadcom announcement, um, pretty much, I think it was last week, um, with very large, you know, revenue targets, uh, and the relationship with Apple, uh, and then Qualcomm is also in the game and has a chip architecture coming, you know, and being some of them are available and there's a good roadmap for them.

So I think the market is not only NVIDIA anymore, which is, I think, good for the industry, and it's moving extremely fast. So, and we have, we see training systems there, but there's an an increasing. Um, focus on inferencing because from my perspective, it's kind of where the money will be made. Yeah, for sure.

And I guess, Khaoutar, I don't know if you want to talk a little bit about that bit. I wanted to make sure that we did talk a little bit about kind of the big trends in inferencing this year, because it feels like that was actually a big, um, theme of kind of how this market is developing out. And, uh, if you want to speak a little bit to that and where you think things went in 2024.

Yeah, so of course, there's a lot of, a lot happening, especially around, um, inference engines and optimizing inference engines. Uh, a lot of hardware software co design is also, uh, you know, playing a key role in that. So, uh, you, we see technologies like VLLM, for example.

Uh, we see also things like the, um, They try to what they're doing and all the the stuff around KV cache optimizations, the batching for in the inference optimizations. So a lot of that, a lot of innovations is happening in open source around building and scaling, inferencing, especially focusing on large language models. But a lot of these optimizations we see, they're not only specific to LLM, they can be also extended to other, to other models. So, um, so a lot of development that's happening at the VLLM, uh, there is work, you know, even at IBM Research and others contributing to open source to basically especially bring a lot of these co optimizations, um, in terms of scheduling, in terms of batching, in terms of figuring out how to best basically collocate all of these, uh, inference requests and get the hardware to, uh, um, uh, run them efficiently. Yeah, absolutely. Volkmar, do you want to give us a little bit of a peek into 2025? I mean, it kind of sounds like with this market becoming increasingly crowded, I think everybody's coming after NVIDIA's crown here.

You know, what do you expect to happen in 2025? Does NVIDIA largely still stay in the lead? Or do we end in December 2025 with, you know, the market becoming a lot more divided and diversified than it has been traditionally, particularly on the training side? So I think the training side will be, that's my prediction, will be still very strongly in the hands of NVIDIA. Um, I think AMD and Intel will try to break into that market. Uh, but I think that will probably be more in the 2026 27 timeframe. Uh, the reason why I'm saying this is, um, the architecture you need to build, to build a really successful training system, it's not the GPU, it's, it's a system. So you need.

Uh, really good, uh, low latency networking. You need to have a reliability problem. There's a, like, a strong push to actually move compute into the fabric, um, to further cut down the latency and more efficiently utilize, uh, the hardware. And, uh, NVIDIA, with their acquisition of Mellanox, effectively bought the number one network vendor for high performance computing, which, you know, training is.

And so there is a, there's a, you know, a bunch of consortiums coming up. There's Ultra Ethernet, um, where, you know, they're trying to get to a similar capabilities what you have with InfiniBand. And InfiniBand, despite that it's an open standard, there's pretty much only one vendor on the planet, which is Mellanox, which is now owned by NVIDIA.

So I think NVIDIA has a good, uh, you know, lock on that. side of the market, and therefore a lot of the, of the investments where other people are, are playing is more in the inferencing market, which is much easier to enter, you know, because you intrinsically not only have NVIDIA systems, like you don't have NVIDIA on cell phones, you don't have NVIDIA on the edge, and so there is a, and the software investment you need to do on inferencing is, is much lower than what you have on training side, so I think training is, is in, in, um, Very safe hands for NVIDIA. So unlocked, yeah.

But I think there is now enough with Gaudi 3 coming online, which has integrated Ethernet, uh, you know, the, and what AMD is putting on the market. I think there will be, it will be a slow creep into that market. And I think, you know, in 2026, we will probably see, um, that, you know, there is a major break in into that market, and NVIDIA loses that. That very unique position it has right now. Yeah. It's going to be a big transition.

Khaoutar, do you agree with that for the 2025 prediction? Yeah, I agree with that. Of course, there's a rising competition in AI hardware, like Volkmar mentioned, companies like AMD, Intel, and startups like Groq and Graphcore, they're developing competitive hardware. IBM also is developing, uh, competitive hardware for training and inference. The problem with the NVIDIA GPUs is also the cost and the power efficiency. The NVIDIA GPUs are very expensive and they're power hungry, making them less attractive, especially for the edge AI and the cost sensitive deployments.

So the competitors like AWS Inferentia, IPUs, they offer specialized hardware that's often cheaper and more energy efficient for certain applications. So. And I think, you know, the open standards, for example, like the open AI Triton, um, and the Onyx and new, you know, these new frameworks, they're also working a lot on reducing the reliance on NVIDIA's proprietary ecosystem, which makes it makes it really easy for competitors to gain also some traction here. And if we look at the inference specific hardware, there is, you know, these RISE, like I mentioned VLLM before, this dedicated inference engines like VLLM, SGLang, Triton, they highlight the potential for non NVIDIA hardware.

So they're opening up the door for the competition, uh, easy entry. And they also, and allow them also to excel in inference scenarios, especially for large language models. So, Uh, we'll see uh, this widespread emergence of edge inference solutions powered by ASICs.

Uh, and, and I think this is challenging NVIDIA's role in this rapidly growing edge AI market. Yeah, and I think the edge is, I think is the last bit I wanted to make sure that we touch on before we move on to the next segment. Um, you know, Volkmar, it seems to me that obviously one of the big stories was Apple moving into Apple intelligence and making sure that all the, you know, essentially AI chips on them. Um, I assume that's going to continue to 2025, but I'm curious for our listeners that are less involved in watching the hardware space day to day, if there's any trends that you think are worth it for people to pay attention to as we get into the next 12 months.

I think the Apple model is, uh, is very elegant and protocol when you are in a power constraint environment. Um, so you, you know, whatever you can do in that power constraint environment with less accuracy you do on device. And then whenever you need more, you go somewhere else. Uh, I think also the, the Apple.

Uh, architecture that they are running on this on on the same silicon as they are running, you know on their phone They run in the cloud. It's a it's a very Interesting architecture because it simplifies it for the developer. It simplifies it in deployment And so I think that we will see more Of that type of separation, and I think we will see more compute happening on edge devices, and we're going now as silicon matures, and you know, there are there's more choices and you don't need a high powered card anymore, and the silicon gets more and more specialized for that, you know, simple matrix multiply, I think we will see pretty much every every chip which will leave a factory will effectively contain AI capabilities in one form or another. And then it's really this hybrid architecture of on device and off device processing, which allows to have, you know, silicon live for a long period of time. But if you're on an edge, You know, and Edge is not only a phone, it could be an industrial device, where you know, you know, your life cycle is five to ten years. You don't want to go and every two years have to swap out the chip just because you want to train another network.

And so I think the architecture Apple put out will be uh, more solidified and we will see, you know, software ecosystems building being built around that. Yeah, that's great. Well, Khaoutar, I'll let you have the last word here.

Um, I've been asking most panelists as they've been coming on, what is the most underrated thing, um, in this particular domain? So for AI hardware, are there things that people are not paying attention to? Um, you know, there's a lot of hype in the AI hardware space. So I'm curious if there's any more subtle trends that you think are important to pay attention to? Yeah, that's a, that's a great question. So I think, um, there is a lot of work around real time compute optimizations. Um, technologies, for example, like the test time compute, uh, which allows AI models to allocate additional computational resources during inference. This is something that we saw with OpenAI o1 model. It's really, I think it sets some precedence here and it allows the models to break down these complex problems effectively and mimic also kind of what we're doing in human reasoning.

And it also has implications also on the way we design these models and also the way the models interact with the hardware. So it's kind of pushing for more hardware software co design, um, in this context where processing during inference, I think another trend I see is the hardware accessibility volunteer for all. I think when we see the Llama3 series, which illustrates new hardware ecosystems are evolving for both high end research models, but also for consumer grade applications. So the Llama models, they release, you know, multiple versions, the 400, the 8 and so on. So that's also an important trend that we're seeing. So we can kind of bridge the gap between high end These are data centers that allow basically access to where you have access to these high end computes and infrastructure, which is not accessible to everything.

So pushing towards that would be really important. The other thing is the open source and the enterprise synergy. IBM released Granite 3, which I think is a great step in the right direction, which also highlights the importance of open source AI and its ability to maximize the performance for enterprise hardware. And, but there are still hardware design challenges. For example, what we see with NVIDIA's, uh, the Blackwell GPUs and the issues that they have around thermal management and server architectures.

So, um, these hardware's, you know, to scale the need to meet demands for these next gen AI. Power efficiency is becoming critical. So, um, so I think if I were to sum up what's going on around these trends, I think the year 2024 showcased the importance of hardware software co design and the industry's pivot also towards specialized AI accelerators, open source adoption, and real time compute. Innovations are really very important, are setting the stage for further breakthroughs. Yeah, that's a great note to end on.

Well, that's all the time that we have for hardware. Uh, Khaoutar Volkmar, thanks for joining us, uh, and for all your help in 2024, uh, explaining the kind of world of hardware and, uh, we'll have to have you back on in 2025. Finally, to round out our picture of 2024, we need to talk about the product releases that stunned us, amazed us and gave us something to think about. To help me do that are Kate Soule, Director Technical Product Management for Granite, and Kush Varshney, IBM Fellow on AI Governance. Kate, maybe I'll turn it to you first.

Obviously, you know, the schedule was crazy this year in terms of product releases. It felt like every other week there was something. But I guess looking back on the last 12 months, I'm kind of curious, like, what did you think was the biggest things, right? The stories that will kind of look back on 2024 and be like, Yeah, this is the year that. You know, that happens as the director for technical product management for granite. I feel like I have to, uh, have to celebrate what our team at IBM accomplished and released for, for launching the granite 3.

0 model family, um, focused on right. Apache two licensed models that are transparent, uh, with kind of an ethical sourcing of the data that went into them, uh, that we share all the details about online in our report. So really excited about being able to continue that commitment to open source AI and being able to create, you know, state of the art language models and the two to 8 billion parameter size that we can put out there under permissible terms for our customers and for the open source communities to, to leverage more broadly, uh, looking outside of just IBM, you know, I think the release of the GPT 4. 0 family of models and product was really exciting. I think it. Launched a new wave of interest in how do we continue to improve performance without just spending more money on our training compute.

So I think that really is ushering in this next wave that we're going to see in 2025 of how can we spend more at inference time allowing models and products that use these models to have more advanced computations and inference calls that get generated to improve performance beyond just let's throw more money at the training. Let's throw more data. Let's scale, scale, scale. So that's more broadly, uh, something I was pretty excited to see. Yeah, we should definitely talk about both of those themes. I mean, I think on the first one, you know, 2024 was really like the, the attack of the open source, you know, it felt like for a moment there, like all the closed source models would really be winning the day.

And it's just like the explosion of activity on open source has been really, really exciting to see. And then I think the second one as well is kind of like, it's like the, the, you know, play smarter, not harder, um, kind of world where, you know, I think like there's a bunch of new techniques that we're seeing kind of play out in a lot of places. Maybe Kush, maybe we'll start with that first theme. Um, you know, in the open source world, of course, this is also the year of Llama 3. Um, there's just been a lot happening in open source land.

And, uh, curious as you look back, I mean, I think on either of the themes that Kate, Pointed out here, you know, either on the open source side or in the kind of different methods for doing AI If there's like things that you'd want our listeners to remember from 2024. Yeah, I mean, I think You're phrasing of it. I mean Open source returns or the return of whatever you want to call it.

Yeah, I mean, I think that's the The right way to frame it, I think, uh, we're realizing, I mean, when we talk to customers across the board, um, that, uh, they were, I mean, in 2023, it was all about kind of POCs and this sort of thing, like getting people excited within their own companies that don't maybe generative AI has a role to play. But then over time, they realized that actually we need to worry about. Um, uh, the copyrighted, uh, data, um, other governance sort of issues, the cost, um, just, uh, how to make these operational. And, uh, I think, uh, watsonx, uh, the IBM product, uh, kind of shined with, with that, um, the, the granite models obviously as well. So, um, How do we take, uh, the, the science experiment that we had in 2023, um, kind of was being used more, uh, this year and now going into next year, it's all about, uh, being as serious as possible. I would say.

Yeah, for sure. And I think now that you're on, uh, for this segment, I mean, I think it's a good time to ask too, obviously spend a lot of time thinking about AI governance, right? And there were a bunch of stories. Yeah. in that vein, uh, this year. I don't know if there's ones that you'd want to call out for, for 2024. Yeah, no, I mean, I think, uh, just the fact that, uh, the whole AI safety world, uh, convened, right? I mean, uh, in, we had this, uh, Korea summit, we had the summit in San Francisco, um, uh, in November. Um, and yeah, I mean, it's just, This is now the topic, I think it's the thing that we need to overcome, uh, because just having AI generative AI out there without the safety guardrails and without the governance, um, it's just dangerous.

Um, I think it's, uh, the promise of the return on investment is only a promise until you can overcome the hump of, uh, the, the governance issues. Yeah, for sure. Do you have any predictions for where we go in 2025 with all that? I mean, um, Yeah.

You know, I think we're, I'm detecting a theme here, which is 2024 almost like set up a lot of stuff. 2025, we're going to almost see how it plays out. I mean, both in open source and in governance, it seems like. Yeah, no, I think, uh, the prediction is, uh, uh, I mean, the earlier segment was about agentic AI in the show.

So I think that's gonna really, um, explode as well. And I think the governance, uh, There is going to be what drives the governance back down to, um, other use cases as well, because when you have autonomous agents, um, uh, then really the governance, the trust is, uh, extremely important. Uh, you have, I mean, no, very little control over what these things might do. Um, uh, the stuff that, uh, that Kate was mentioning, the extra inference cycles that you're going to see are going to be, I think, mainly for the purpose of governance. It's to make these things, um, kind of self reflect a little bit, maybe think twice about what answers they're putting out there and so forth.

So you're going to have more tools for governing the agents as well. So the Granite Guardian 3. 1 release that just happened actually has a function calling hallucination detector in there. So that's one of the things that agents actually do, right? As part of the LLM Uh, they actually will call some other tools, some other agents, some other function and if that itself is, uh, hallucinated the parameters, the, um, the, the type of the parameters, the function names, all of these things can, uh, kind of go wrong. So we have ways of, of detecting, uh, issues there. Kush, I'm, I'm curious, you said the, the inference.

runtime is going to be used more almost for kind of governance and self reflection. But I think you had even shared a paper recently about how there's also like, it also opens this whole can of worms of other risks and potential security issues, right? When the models are running all these loops offline and people are naturally able to observe what's going in the Yeah, I mean, uh, I think This whole, I mean, self reflection, you can call it metacognition, you can call it wisdom. I mean, I think these are going to be things that are going to be part of what happens. But yeah, I me

2025-01-02 08:44

Show Video

Other news

A comparison of MIDI devices from Serdashop. And what does GM, GS and MT-32 mean? 2025-01-15 19:23
AMD at CES 2025 2025-01-11 17:39
Unleashing The Future of AI | MSI 2025-01-09 20:31