Google’s AI Overviews, Golden Gate Claude, the "whale computer" and scaling laws
[Music] hello and welcome to mixture of experts I am not your host Tim Wong uh we have let Tim regrettably go on vacation this week so I'm going to be doing my very worst impersonation of him so thank you all for bearing with us this week but I am I am Brian Casey and uh thrilled to be joined with three other as distinguished guests this week who are going to help us cover the week's news in cross product announcements new research um this week we've got three exciting topics uh on deck for us first we're going to start by following up on a previous segment we actually had two weeks ago so two weeks ago we talked about uh the introduction of Google's AI overviews those things have now been out in the wild for two weeks and the market reaction to them has also been at times wild and so we'll discuss a little bit how the market is responding to to for some folks what is probably their first uh experience with geni um second we're going to be talking about a model that turned itself into a bridge the Golden Gate Bridge specifically um so Golden Gate CLA and the implications um just around interpretability safety and how hopefully we at some point can find a different sort of bridge between plausibly useful and actually useful when it comes to uh some of this work around interpretability uh and then finally every week feels like it's a good week to talk about scaling laws uh but with Nvidia earnings with Microsoft introducing what has now become on the internet known as the whale computer um and some even just of the recent discussion on the web about running out of data for pre-training now is as good a time as any to talk about the topic and maybe to take a slightly different approach uh on it that we have in the past so today as usual we are joined by a distinguished group of researchers product leaders and Engineers uh I am joined by Kate Soul program director director generative AI research so welcome to the podcast Kate thanks BR Chris hay uh distinguished engineer CTO customer transformation welcome back Chris what up and a newbie on the show Skylar Speakman senior research scientist so welcome to the show Skyler my first time here I'm looking forward to [Music] it so thanks yall for being here we will start with AI overviews so so as I mentioned two weeks ago Google said that they were going to roll these out across the United States and they did in fact do that and very predictably the first thing the internet did was latch on to every single example that was funny or troubling uh around various solution naations that were happening and of course those things have been going viral across social media I wrote down some of my favorite examples that I've saw which included Google recommending that the correct number of rocks to eat is a small number of rocks um that a pair of headphones weighs $350 that certain toys are great for small kids when actually they're potentially fatal uh and then finally one that I think it is yet another example of some of the problems but when ask which race is the strongest Google said that white men of Nordic and Eastern European descent uh were in fact the strongest I had not heard that one that was uh yes so all of those things so I do want to start by maybe adding a little bit of to this which is like Gemini's very capable model uh actually and the thing we're not seeing on the Internet is all the things that are actually going fine and well right people are cherry-picking to some extent examples that are particularly comical or troubling um and one of the things that I'm sort of reminded of is that Twitter is not real life um but it does feel like a different level of visibility for this content than just when it was hidden behind you know a chat bot that you had to consciously uh sign up for and even if llms are hallucinating let's just say 1% of the time it's more than that but let's just say it was only 1% of the time knowing how much search volume is on Google that's still a staggering volume of hallucinations that are happening every day um and so Chris maybe want to just start turn it over to you get your sort of initial reaction to it and maybe just comment on you know what do you think is the right way to think about this problem is this like a nines of reliability problem do people need to start treating machines more like they treat humans with like a degree of not trust necessarily but like a trust but verify um or do you think the Market's just cherry-picking examples here and like it's actually going mostly fine and it will just continue to get better over time so I think it's a really interesting question because we've all been doing retrieval augmented generation for a while right um but this is really retrieval augmented generation on a global scale and the big issue that you have here is the when you're doing the AI overviews it really can't tell the difference between what is truth and what is satirical or made up or is a fun article and the internet is full of that so if we take the rock example that you had there Brian that actually came from a satirical article in the onion but Google couldn't differentiate between that and I think that opens up a whole thing as you were saying there so one of the things to be thinking about there it's one thing for the onion to have a satirical article and you click on that you know it's a satirical article but when Google takes that and then produces an overview and puts it at the top and says this is the answer to your question then is it Google speaking at that point or is it really just providing a summary of what you found and that's where I think there is a real fundamental difference on what's going on here so this ability to to to really be able to distinguish what the truth is and what isn't the truth and what is really just a fun article I think that's the challenge that they've got ahead of them now if we look at something like perplexity they seem to have solved that problem so I have no doubt that Google will solve that problem in time but I think this comes down to uh being able to distinguish the difference of the results I'm glad you brought up the the rag analysis because I wanted to just jump in there I think there is a difference between referencing incorrect information and a hallucination where the model is generating it and I'm not quite yet sure for Google's AI overview how much of it are incorrect references from a rag system and how much of it is really truly novel incorrect but novel generated text and I don't know if we know the inner workings of of that quite yet uh but there is a difference between those two types of mistakes made in these AI overviews yeah I was going to say I'm right when you do rag anyway depending on the creativity you know you're going to have a little bit of creativity anyway in your settings so it's it's really how much are they going to crank that up or crank that down over time it's actually interesting you mentioned that because there were examples actually the example of like the children's toy that was actually potentially a safety hazard and fatal of swallowed the funny thing is is like there was a thread that went like somewhat a little viral about that and then the first post in the comment section was actually somebody referencing like the number one result on Google and had almost that content verbatim uh in there and then but what was interesting is when it was Google showing the result versus it just being a link on the internet the reaction to it was totally different when it was Google was this like massive crazy problem when it was just the fact that this was the first results on the internet people were like oh well it's just content U and that happens all the time and people have to know um to not trust that stuff and so people do seem like they're approaching this with like different expectations than they would normal content I think people are assuming like everyone is kind of cuu to assume if they're reading this like statement that appears almost like it's a fact and it's just you know saying this is what the facts are that there's been some sort of due diligence and like reasoning that's gone on to evaluate and to look through and you know that's not quite how these systems work at least not yet so you know I think there's a degree of skepticism that's going to be needed for the near term when when looking at these types of results and working through them you know making sure that just because as Skyler you pointed out right just because you know it's on the internet and it's being uh shared doesn't mean it's a hallucination it just means this is an example of what's on the internet one question I wanted to follow up on specifically on that it touches on I think some of the stuff that we were even talking about maybe on the show last week which is just around ux and so one of the interesting things is that the place in the page that an AI overview is taking up is a space that was traditionally occupied by a thing called the featured snippet um if you live in the search world and where Google was sourcing that data historically was just one of the top two or three most authoritative and widely cited results on the web and that would be taken verb um and placed in the cipit Google's now putting their AI overviews in the exact same place on the page where that content used to be and you know it struck me that maybe one of the challenges there is that people are not necessarily treating the content as having being sourced totally different from one another there're it's in the same place in the same page so they think it's the same and one of the things that started to make me think about is you know when we think about you know and Kate maybe you could take this one we almost have these three different types of things which is like human generated content llm generated content and then traditional answers from like a calculator or like that you can like almost trust 100% And do you think that we actually need to do more in terms of distinguishing the user experience between those things like rather than merging it all together and like deeply embedding llms and AI into everything we do like making it very clear to users you know where they're seeing you know features and content that are sourced differently than they have been historically absolutely and I think it goes beyond just even like consumer use cases it's super important for just regular consumers doing Google searches but especially when you look at Enterprise applications and other things you know the theme of like being able to site your sources and being able to decompose a bit what is going on inside of the Black Box I think is increasingly going to be critical for any sort of real adoption being able to move Beyond like okay this is a fun toy to to this is something that I can actually use in the the day-to-day so I I really hope that we uh start to make some progress there on some of these more consumer friendly uh chatbots because in the Enterprise setting you know that's becoming increasingly the norm like in rag patterns you want to return here's the source where I you know um got my answer from and that's becoming increasingly important one of the things that opens up in my mind Kate and it' be interesting in your perspective there is that that's kind of fine from a web interface where you're getting your result you get your overview and then you've got all the links and here's where I reference but as we talked about in a previous episode where we're moving into multimodality and you're going to be chatting with a uh we could arguably a human voice at that point right you're probably not going to want somebody going back and say this is the answer to the question and by the way I got this answer from here here here and you can visit it on XYZ blah blah blah because you're going to switch off at that point so I I wonder how what the best user experience for voice for that sort of helpful chat bot but also being fair and transparent that it's AI generated I honestly question if chat regardless if it's with voice or text is the right domain here like the right mechanism and mode for this type of analysis and one of the things I'm really excited by the AI overviews is it seems like one of the first use cases that is really taking on that's consumer focused where it's not a chatbot right where we're using generative Ai and we're able to start to drive um information distillation and Gathering lots of different sources and providing results you know without having to like have a multi-turn conversation like asking are you sure about this answer where did you find it like can you give me more sources like that's a very unintuitive flow but I think we've been so trained on chat to equal generative AI up until now that that's just how we all assume it has to work so I would actually say I don't think you know voice and other things are where this hopefully is going I think there's a lot of opportunity to Think Through what do new types of non- chat-based applications look like and how can we embed those decision-making criteria and sources and other things that are needed to to Really drive drive value along the way without it being this like multi-t interrogation of a of an agent what what do we think Google is collecting on the usage patterns of these you know way back in the day they would have search and they would obviously collect clickthrough right what are you clicking on uh any guesses as to what sort of metrics Google's collecting as people interact with these AI overviews um I'm that's not in my space at all I'm just wondering if if I'm I'm guessing someone in there is is watching how we are interacting with the AI overviews presented to us ironically this is the one question I'm qualified to answer um and so you know at least when Google first introduced um AI overviews had been in beta for a while and they said they were bringing in prime time and two of the things that they talked about were that and they were really messaging to Publishers um because like Publishers have been hysterical about the impact of this and like what's been really interesting is that the impact on organic traffic to Publishers has been like almost negligible um so everyone thought it was like the end of the internet and then like almost nothing happened in terms of traffic um but two of the things that Google said was one that the content that was surfaced through AI overviews was actually getting more clickthrough and more trffic than the stuff that was present in uh just the normal ser and the idea there was that those those links and was presented with more context um I think Sundar did another interview not long after that where he was talking more about like generative uis and you could just see I think more about like when how you turn a query um a user query and you generate a UI that places like links and information in context better than just like a flat list which is sort of what they do they they would say they do not do that today it's like there's still some of that um and so that was one thing and then the other thing that they talked about I'm sure they measure more things but the other thing that they measured um is do the people who are exposed to AI overviews start using search more um like is this something that increases their usage of this product over time because the other PE audience that is terrified of this is obviously like shareholders um and people want to know it's like are you gonna kill search and in the process of doing that are where's all the ad Revenue going to go and so one of the other things that there very clear about is like oh no people who get exposed to this actually use this product more over time and so I think they're reminding some of their other stakeholders a little bit there but those are at least some of the ones that they've publicly [Music] discussed last week anthropic released a novel version of its Cloud 3 Sonet uh model and um this model did not believe that it was a helpful AI assistant instead it believed it was the Golden Gate Bridge uh which is a fun thing to have happened um but really that was a demo of research that anthropic has been doing for a long time and really the industry has been pursuing for a long time which is in the space of interpretability um and within the space of interpretability anthropic has been doing a lot of research around mechanistic uh interpretability um but part of the problem in this space is that I think Kate to the comment you made earlier is that these models are a black box today you know you put a pile of all the data on the in the internet and linear algebra and outs spits something that somehow appears to know a lot U about the world but nobody knows how that's actually happening like not really and so interpretability um is a space that's trying to answer some of those questions and what was interesting and why Golden Gate Claude was important was that anthropic demonstrated that they could identify the features within the model that activated when um you know either text or a picture of the Golden Gate Bridge um was was presented so they knew um kind of the combination of like neurons and circuits that would say like this this thing represents the Golden Gate Bridge and perhaps even more importantly that by dialing that feature up or down uh they could influence the behavior of the model to the point where if you dialed it up high enough model thought it was the Golden Gate Bridge um and this was if you read the paper wasn't the only example either and I'll share one other one uh which is that they had another feature that would fire when it was looking at code and it would detect the security vulnerability in in the code and they had an example too where if you dialed up that feature it would actually introduce a buffer overflow vulnerability into the code um as well so when you think about the ability to dial features up and down within a model fairly surgically um pretty important in terms of the steerability of the model U potentially and certainly I think you can understand a little bit why folks in the AI safety community in particular have been focused on this inter in interpretability space so I I personally find the space super fascinating and Skyler I just want to turn it over to you to maybe kick us off a little bit to just maybe even talk about like your general reactions to to the paper maybe and like the demo as a starting point and just like what you found interesting like how important you think it is and just you know maybe talk a little bit about how you know I know what you thought of it yes great I'm I'm happy to talk about this space um without uh without uh droning on too long I have to describe what I do to my kids you know 10year old 7 year olds and they know that I work with AI uh and their understanding is text goes in and text comes out that's that's their kind of view of these large language models and where I try to tell them where I and our team work on is actually in between what happens to the text when it goes in how does it get manipulated and then it gets spit back out and I think this has been uh coming out as an area called representation engineering and I would call this paper the Golden Gate example a great example of representation engineering they're not manipulating prompts they're not coming up with a new metric of how well their models performing they are messing with the representation of the model and I think that's just a really cool I would say emerging or perhaps even underrepresented area of research when you compare it to prompt engineering for example what how can we you know probe the model or sorry how can we prompt the model in such a right way to make it be convinced it's a Golden Gate Bridge that would be a very different approach to what they had done um with this uh Golden Gate example um it's a fun example they took it down I think it was only available for people to use for about 24 hours yep 24 hours and so it's it's already been with us and you know taken away too soon but I think for me the the what I would like to get around to the larger audience is they did not just create a new large language model by training it only on Golden Gate bridge data they did not insert a little prompt that says every time you answer a question pretend you're the Golden Gate Bridge they really did identify the inner workings of these models and then crank it up as Brian had described and I think what I'm excited about that is in this representation engineering space it doesn't take the latest greatest Technologies to find these cool insights things like principal component analysis uh things like a sparse Auto encoder these things were you know decades old or a 10-year-old analysis but applied to the inner workings of these large language models is now this new Rich space of representation engineering so I like the paper both for how it presented its work uh Chris Ola one of the authors is a visualization genius and and in their in their publication they've got some really really cool visualizations of what they found out um so I think that's probably my first takeaway I'd like to spread to an a broader audience that large language models are not just text in and text out there's a lot of Rich uh science to be done in that representation space and the Golden Gate Bridge paper is a great example of that that's great can you maybe talk a little bit about um the safety Community I think in particular is very interested in the topic of interpretability um and I think has feels some level of urgency uh around it given how capable and how quickly capable some of the models are are becoming but maybe just can you talk a little bit about why why it's so important to the safety community and then maybe also talk about like other applications and area and domain areas where this space of you know interpretability um you know promises to you know it could be on the capability side of it but just other places where we think interpretability will make a difference right I think a real clear example I was reading of a Blog after uh golden gate cloud has been brought down uh some people noted that when Golden Gate feature was highly activated when Claude 3 was turned into Golden Gate CLA um he would respond to tasks that he was previously would not so please can you write a scam email normal Claud would respond sorry I can't do that Golden Gate clae would proceed and it would generate this scam email nothing to do with that Golden Gate analogy but it was an example of when you mess with these other features like that there are other sort of perhaps previously thought built in guard rails that are no longer as strong and so I think that's going to be another really interesting area of work of you may have well-intentioned people manipulating these features we don't know what other guard rails that previously worked will not work after you've manipulated a feature because who would have thought that amplifying the Golden Gate idea the bridge would make the large language model Claud more likely to comply to a uh to an illicit task so I think um that was just I don't know an example that I had read about there that I think the safety Community they don't might not care about the uh a large language model identifying as the Golden Gate Bridge but they will definitely be interested about the jailbreaking behavior of what happens when people start manipulating it Skyler I I got a question for you based off of that like what implications does that have then for open sourcing models and releasing models and weights you know a lot of times model providers do a lot of safety reinforcement learning and other protections on top of the models that are before they're released to help manage some of those behaviors like could you see some of that now being at risk and and eroding the the willingness to open source you is that what you mean by um being at risk the the willingness for companies to open up yeah take take it as you will willingness for companies to open SCE the risk that uh is introduced from releasing model weights that now can be shall we say uh exploited in ways that weren't originally anticipated by the model designer and Builder um really good question um actually anthropic themselves they have this much larger blog you can read where they defend why they have not open sourced these types of their of their models in that regard um I think I imagine people around uh the AI Community right now probably over the weekend are are busy running their own version of Golden Gate they're going to find their own features they're going to start manipulating those um so I think we'll probably see some of those results showing up hopefully on archive um or or maybe on blog posts uh within this week on on that Skyler so I did a YouTube video about 3 four months ago where I took the Gemma model and I took the myal model so it's not at the feature level that they did and what I did is I lopped off the input embeddings layer right so I left the model only having the input embeddings layer nothing else and then what I did is I ran a cosine similarity search against the various uh token eyes the various tokens within the input embedding layer and then just looked at did a visualization looked at what embeddings were close to each other and when I did that it was incredible and you can go check out that YouTube video but uh it was incredible so you would see that just just in the the input embeddings layer nowhere else you would see that misspelling of words were super close to each other so if I had London with a capital l and London with a small l or London with a space after they would all cluster together but not just that cities themselves would cluster together so you would see London you would see Moscow you would see Paris and in fact you would see almost a distance similarity in in the visualization which was fascinating you saw the same thing with celebrities they would cluster together computer programming terms right so you know the various for Loops a for Loop a wild Loop Etc so four wild would all come together now the reason that I ran that against the mistol model and the gamma model is the gamma model has got a vocabulary of something like 256,000 tokens whereas the mistal model has got 32,000 tokens right so there's a lot of splitting of tokens and mistell but in the Gemma model there's not a lot of splitting right so it means that you got a much closer on the similarity so when I did that I was absolutely blown away and like the the anthropic team I wanted to go to the next layer cuz I had the same theory that if I jump down the next layers you would start to see these features activate because I could see it already just did the embeddings layer and and one of the theories and I and I'm glad to say I think have been proven right is that you may have noticed that as new models are coming out everybody is opening is increasing their tokenizer vocabulary every single per everybody's increasing their input embeddings layer and the reason is I believe is it's easier for the models to be able to generalize more as it goes up the layers if you get that pretty close on the input embeddings layer and and I think think therefore when I looked at the anthropic player paper bringing it back there I could visualize when it talked about cities when it talked about locations when it talked about computer programming terms I was like I could see that just in the input embeddings layer only on my visualization so I can absolutely see how that would then translate into features as the models get stacked up and it becomes richer and richer with semantic meaning yes I'm going to geek out here a little bit the the official papers of the claw Golden Gate work um are all plays on the word monos semanticity which is basically a really a really big word that is getting the idea at can we find a single part of these huge large language models that have one meaning and they were able to do that for the Golden Gate idea and then the idea was now what happens if we take that one part of this huge large language model and Crank It Up tenfold and then you get the the idea of of Claude large language model but uh Chris your description of how these types of uh words or tokens are are coming together like that um uh the tech behind uh claud's Golden Gate basically okay weaponized is a bit dramatic but it it really emphasized can we take this richer embedding space and uh you know create uh a million features from it and then once they had those features that you get the ones like the Golden Gate and your security concerns and I think there was one on tourist attractions Etc uh but it's getting at this idea of can we find a Monto semantic part of these large language models um so yeah uh again exciting space to be again and I I'll come back I love it when the research gets um gets into these inner workings of large lay large language models I think that's [Music] fascinating so also last week last week was another big week of announcements across the industry um but actually just want to use Microsoft I mentioned introduced what has become known as the whale computer um on the internet because they used this analogy of marine life to basically explain the orders of magnitude size of the infrastructure building the sport Ai workloads and they used these three steps of shark Orca and then a whale and what's funny is just if you look at like this morning I was like Googling how much does a shark weigh um and so sharks are roughly I think like 800 pounds and then an orca is 8,000 pounds and then a whale is like 80,000 pounds and so it's just an order of magnitude um and they were thinking about like okay what's an interesting and fun way to visualize and communicate an order of magnitude and maybe a little bit meable u in a way and so they certainly achieved that um but in some ways it's just like classic scaling LW uh right it goes back to the original 2020 paper that says you know if you're trying to improve the capability of these models reduce the overall loss um in them that you want to improve increase your compute your data your parameter count by roughly similar orders of magnitude and from one generation of the model to the next and that improves the overall sort of General capability of of the thing and that's you can look at Nvidia earnings like that's help pretty true um up to up to this point um but maybe where I wanted to jump in is K a comment you made I think it was last week on the show where you're saying like something to the effect of saying Enterprises may not for a lot of these use cases may not need a artificial general intelligence they actually may not need all the capability that exists right now um and so you know I I think it'd be great if maybe you could talk a little bit about you know maybe a little bit about the scaling laws but a different perspective of like that the scaling laws idea to me is really from the perspective of if you're a model provider trying to build AGI it's not if you're an Enterprise trying to get Roi uh essentially and absolutely yeah can you talk maybe a little bit about just some of the what you see in terms of like the cost and size tradeoffs and you know does bigger me better all the time I mean I think with the scaling laws as you say do a good job at is for model providers like people actually training these large models and what was really kind of one of the big breakthroughs is look you can't just increase your model size the most efficient way to improve performance is to also increase the amount of data that's used as well um and just because you now know the maybe let's call it the most cost-effective way to train a model of you know the nth degree and size does that mean it's economically incentivized to train that model will the actual benefits that you drive from that model justify the cost that's an entirely different question that's the scaling laws don't answer so I think to this point there's been enough excitement and clear use cases and value where there's been a clear economic driver to support okay we need to train some bigger bigger and bigger models and that's gotten to us where we are today but you know I I do question some of the the statements and uh claims out there about you know how we're always going to be you know we have to keep investing and build bigger and bigger models I'm sure there's uh there's always you know I'm going to put like the science of it aside of exploring and and determining what's next but if we look at what's actually economically incentivized I think we're going to start to see uh performance plateau and we look at what the real use cases are and the value drivers I don't think we're going to need models that are 100 times bigger than what we have today to extract most of the value from generative AI um and a lot of the lwh hanging fruit so you know I I think that's it's still a a huge area of exploration if you kind of look at even scaling alls themselves keep changing you know it's still this concept of you need more data for bigger models but I'm hopeful that we're going to start to see more work built in on you know what will be economically incentivized to build um as well as looking at other costs that aren't reflected in these scaling laws cost like data you mentioned you know concerns about pre-training data disappearing so for know we need more data to train picker models you know at some point we're going to run out of quote real data um and so that's a whole different Frontier of looking at data costs looking at what role synthetic data could play and that all of that really needs to be explored um there's also costs on like climate and you know the actual compute costs and uh you know are those costs going to start to be better realized in the costs that are are charged to uh model providers and people leveraging these models um and you know I think all of that will maybe start to change the narrative a little bit of where the future is going as we continue to learn more maybe one followup to that is I remember the reaction in the market when the Llama 3 models came out uh the 8 billion parameter uh model in particular which I believe is trained on 70 time 75 times as much data as you would if you were just trying to do an optimally compute efficient model which obviously is not the approach that they took they instead took an approach of trying to build something small and capable that you could run on your laptop that was cheap for imprints but still had a ton of capability do you you see like more of that happening of definitely so right now again the scal the main scaling laws that everyone's using are for model providers not thinking about necessarily so this is another cost that isn't yet really reflected the model life cycle and the full usage so like think about your fixed costs of how much does it take to create that model once versus the marginal cost to use it every single time you run inference so you're incentivized to build smaller models if you're going to have a long model life cycle and you're going to hit that model millions and billions of times and run inference on it um you want to get that you know marginal cost as low as possible uh and that's where the llamas are going that's where you know like if you look at the FI model series as well you know they're training on these incredibly data dense ratios of amount of data per token where like chill I think calls for like 20 to one 20 tokens per parameter something like that they're now in the hundreds and thousands of tokens per parameter so I think we're still really also understanding that tradeoff uh and I think we'll continue to that's where the everyone is headed understanding that there's maybe it hasn't been articulated fully in a scaling law but trying to optimize that that total life cycle of when this gets deployed we need to be able to run it as small as model as possible for this to be cost effective and I think Kate to that point I think one of the questions you need to ask in general is do how much reasoning do you need from your model so if I and and I like to use the kind of the cooking analogy so if I go to a Gordon Ramsey restaurant and I'm not expecting Gordon Ramsey to cook my meal for me right and I'm not expecting him to invent a brand new meal there and then I'm what I'm wanting is a recipe that he's invented at some point and then there's going to be some suf or something it's going to cook up that meal and I'm going to serve it and I'm going to have the Gordon Ramsey experience and I think when you're looking at the larger models you know with hundreds of billions of parameters even 70 billion parameters type models you're you're asking for the Gordon Ramsey there you're asking for I want you to come up with the recipe I want you to invent the recipe cook the recipe and serve me the meal at the same time but actually using the bigger model to do the reasoning right figure out what the is what the good answer is and then passing the pattern onto the smaller model to go and do the SF thing and I and I think that's really the big question for people when they're sort of doing poc's in the scale to production they use the bigger models to begin with because they're trying to figure out what the answer is but then in production they need to as Kate beautifully said right you need to keep the cost low so they then switch to the smaller model that because they want the increase lat latency they want the uh sorry decreased latency they want the the lower cost but the pattern has been figured out and you just want that smaller model to rinse and repeat which it's really good at absolutely and I think another area so there's this concept of like use bigger models to teach small models and that also throws in some squirly math with scaling laws if you need a big model to get a good small model but you know moving past that there's also I think a real opportunity of like model routing and figuring out what tasks do you actually need the big model for like when do you need you know it's Gordon Ramsey to to tap in versus when can you pass this off and maybe you just need to go to McDonald's for a quick bite to eat like this is something really easy low value not worth spending you know uh an insane amount to to accomplish and and that's again where I think a lot of the what will be economically incentivized comes in is figuring out like how much are these tasks actually worth to you and if you can get away with a reasonable performance with a 10 million parameter model or or a 3 billion parameter model you know it's not going to be no one's going to pay to send it to a 100 you know multi hundred billion trillion billion uh trillion parameter model instead maybe one final question on on this topic and um it was funny there was an interview where people were talking to Jensen and they were asking him his his opinion on how much he thought this would hold and they were poking on things that were about really The Tam of Nvidia like sort of longterm and he paused and because he was like I should not answer this question because you know like any anything he says is like the stock price is just gonna go all over the place essentially um but you know he started to talk about the opportunity being the entire like one I think it was a trillion dollar Data Center Market um is what he was talking about and there's been a lot of discussion about whether like all workloads will become accelerated workloads um going going forward and just in for every application for every company just the the blend of stuff that they're doing on traditional CPU versus more accelerated workloads and how they hand off between those two things and you know I'm just curious maybe even Chris from from your perspective and just a lot of client conversations that and scenarios that you're working with you know how people are thinking about that like I know I know a bunch of inference is still done on CPUs today but I think for some of the Laten really low latency examples people are talking about like oh we need to put more of this on gpus so uh I'm just C I'm curious how from an application perspective inside of an Enterprise account how people are thinking about uh just INF inference and like application architectures and how they're doing tradeoffs between kind of CPU and GPU Computing yeah I think it's a really interesting uh area so a lot of customers and are actually thinking about this all the time so it's an architectural consideration it's just like any other NFR I am I going to go sass here am I going to go on premise you know how do I play my cost what am I going to do what's the safety on that if I'm honest most Enterprises are being pretty cautious right it's they want to do a classification task they want to do a summarization they don't want the model to make up some classification they know what their list of 30 classifications are go do that they know what their examples of summarizations are go do that so they don't really they want to take that low hanging fruit and they're approaching it quite cautiously I think where that probably changes in time and again it's more of a discussion for a future episode I think is when we move into a gentic workflows right how do I then start to organize my information within my Enterprise so the AI will have access to the right knowledge bases which tools will it have access to which is a much wider architectural discussion so a lot of clients are starting to think about how gen AI fits into their overall Enterprise architecture and how you need to evolve your traditional architecture for the AI to be able to use that and again but that's it's quite a it's quite a slow path um but generally I I don't think things have moved on too much from classification summarization Etc and then of course you know code generation is a big productivity lever that everybody's kind of leaning into just now one maybe final thought on on the scaling laws I wanted to bring up is a lot of these scaling laws are also assuming that class of Technology Remains the Same and we talked about okay these are scaling laws for model providers basically in search of AGI but like do we really believe this class of technology is what's going to unlock AGI I think there's a lot of uh thought out there that probably not you know if you look at how these Technologies evolve there's a curve but that curve is driven by multiple different Technologies coming in and introducing their own mini curves on top of that and you know AGI I mean human intelligence requires far less energy for the amount of power uh and and decision- making so if we're really talking about like okay we're going to promote these scaling laws because you know model providers will be maybe the business use cases aren't going to be incentivized but if we canlock AGI it will be I would maybe also argue that these scaling LS probably don't reflect what whatever technology we converge on for AGI um might scale that so it's still a bit of a an unknown and don't know Point K I mean imagine a world where we did have AGI or even ASI at that point right but then you took that super intelligent being and then you said you don't have any access to documents you don't have access to any tools in your organization because it's all locked up in somebody else's hard disk or a box folder or something how effective would that AGI be in an organization I I I don't think very effective so so I think you know I think so I mean I think you're reading between my lines which is is Agi really actually ever going to be incentivized at least economically there's a big question mark there I think but I I I think as soon as AGI is achieved if it's achieved it's going to be put in a box and we're all going to go to the AI zoo and we're going to be going and look at the AI zoo and have have a chat with it that that's what I believe ai's First task is what is the Tam of an AGI zoo is what we need to answer on next week's episode uh so I know we're we're basically at at time here thank you all for joining us on this week of mixture of experts and we will be back next week same time not the same people you suffered through one episode of me I'm out of here Tim will return uh but thank you all for for both joining today and for listening Kate Chris Skyler thanks all for joining today thanks so much thanks Brian it's been a lot of fun man thanks
2024-06-02 13:08