Mixture of Experts: Rabbit AI hiccups, GPT-2 chatbot, and OpenAI and the Financial Times
[Music] hello and welcome to mixture of experts uh on this show we're going to be meeting weekly to review the sort of deluge of news that's happening in the world of AI um and the goal here is to distill down right it can be really hard to keep track of uh everything that's flying around on a weekly basis but the Hope here is by bringing together a group of experts we can distill what's happening um and give you an understanding of what's happening in the world of AI and what to be looking for uh in the week ahead and so today I'm joined with a great panel of um three uh experts uh really hailing from different areas of the AI World um so just to quickly run through them Chris hay who's a distinguished engineer at IBM um he's the CTO of their customer transformation operation um Chris welcome to the show hey thanks for having me looking forward to this yeah absolutely uh Kush Farney he's an IBM fellow uh working on AI governance issues uh Kish welcome yeah thanks Tim yeah and uh Shar he's the senior partner Consulting uh running the AI and iot business uh in the US Canada and Latin America so welcome to the show thanks guys thanks for having me um well let's go ahead and get started we're going to cover three really big stories uh of the last few weeks um the first one is going to be uh the uh recent release of the rabbit AI Hardware uh product uh and we're going to talk a little bit about some of the trouble they've been having on the roll out and what it all means for the future of AI enabled Hardware secondly we're going to talk about uh what's happening with gpt2 chatbot which is a mysterious chatbot that has just appeared on chatbot Arena um what it is and what it really tells us about the uh evals in the AI and llm space in particular uh and then finally uh we're going to talk about open ai's uh concluding of a deal with the financial times to license their data uh for training uh purposes [Music] so I'd like to kind of start first with uh the rabbit story so rabbit if you've been watching is a a really sort of widely discussed Hardware startup uh whose bid is basically to say you know in the future we're going to have ai first hardware and rabbit effectively is a little device that is intending to be kind of an AI companion for you um they rolled out just recently um and have run into immediately a number of problems so they you know had to push a firmware update to deal with a a battery problem right the battery was draining too quickly um they've been criticized recently because it turned out that their product was essentially an Android app that could be running on a phone um and so they've received a lot of criticism and I wanted to bring up this story just because it feels like it's the the seconds data point right so there was the release of rabbit uh Humane which is another company that released an AI enabled pin um has similarly run into aot lot of criticism people saying you know why would I buy this is this a good product at all um you know what is what is this all for and so Shan I was hoping I would bring you in kind of maybe to kick us off here um because I think what's really interesting is you know personally I'm like very excited by the future of AI Hardware right like I think there's just so many cool things that can happen once AI is on device and it's a thing that you can carry around with you but clearly kind of some of the first forays some of the most talked about forays that are happening um today um are are clearly having some some teething issues some some issues um and so kind of want to get your take as someone who's like deep in the AI and iot space and kind of thinking about the relationship between Ai and and Hardware how you see these recent stories and and what do you think it tells us about how this Market is evolving thanks Tim I had the pleasure of playing with the rabbit R1 at the CES this year and I obviously as a geek I am I did drop my 200 Parts I received my rabbit you own one right so it is fantastic effort uh if you think about uh the direction that AI is moving it will close it'll go to the edge more and more right you're seeing the models getting smaller there's a lot of work that's happening in on device Computing Apple breaking it its uh wall Gardens and open sourcing its open Elm you'll see Google with it share models all of those are moving closer and closer to the edge so I generally love the direction that it's taking that way you get addressing things around privacy your data is being commuted on on the device and that stays with you so the direction that the tech is taking is fantastic I'm all for it I think there's a lack of appreciation of what problem are you really trying to solve for and are the other devices that are better soled for it so when you start to uh react to a device like that in your in your brain you're trying to create a set of things that you're going to evaluate this on I think that's where the problem is with R1 or from when you look at even metas Rand glasses and the Humane pin right we have a set set of things that we are looking to evaluate it against as an example I would appreciate that it understands me as a person really well if it's attempting to be a personal assistant I would appreciate that it would have instant responses if I'm looking if I can do something in half a second or Split Second faster on a regular mobile phone I'm going to tend you to go do that so we're all optimizing for how to make it more effective in our own lives then you start to look at things around I already carrying a cell phone in my pocket so it has to be net new POS to be like for example the watch I'm wearing is adding something to the ecosystem right versus when you when you create the set of criteria and then you start to evaluate rabbits R1 it starts to fail on some of the basic uh capabilities that will expecting from it the direction is great but I think the the battery life being very low the the fact that the screen itself is they're teasing you with certain things you can do with a touchscreen like typing on the terminal but you can't really interact with the menus the menus kind of remind you of how we had the old scroll wheel iPods those are amazing to scroll through music but they're terrible at changing setting things of that nature and we see that Paradigm Shift over to the rabbit R1 as well there are a few things around taking images the visual recognition of what's in front of you that has been pretty decent like I've had good response when I'm pointing it to certain things in front of me documents is a hit or miss right now this handwriting recognation is still taking a lot more time so yeah I was curious if youve got like have you had like what's your most magical experience so far right I I'm almost interested in like the Steel Man case of like what's the what's the most exciting thing you've done so far with it right because it's gotten so much hate online that it's almost interesting thinking about like the the killer application like I remember when I bought my smartphone for the first time I was like this has maps on it I literally never going to be lost again and like that's like a huge deal um and I guess in the AI space I'm still kind of like waiting for that and I'm curious about as as someone who owns it and uses it and is playing around with it if there's like things where you're like oh this is starting to be really cool yeah so I think the promise of the large action model is pretty cool uh it has solved this to a decent extent with a few apps like Ubers and and others but the fact that a lot of the services that we that we use today are hidden behind applications and not all of those capabilities are exposed through apis so it's difficult for say a personal assistant Siri or chity of others to be able to go call those and do some actions so the large action model I think that has a lot of Promise uh the training data becomes a constraint for them that's their keyless heal so so far they've had hundreds of people manually go and train these models right and they're going to open up this catalog of hundreds of different models uh over time but in the current form it's very limited in what actions I can take on it but the fact that you can delegate a end to end process that's very complex and other you couldn't have done it with apis and that's what what really excites me I see a lot of applications I mean uh so I don't have one of these I'm not as much of a of a Gadget Guru as shath is but um uh but yeah I mean uh I think the um uh I mean there's going to be fits and starts with any sort of new paradigm right and uh uh things have to start somewhere I'm more of an optimist on on things generally so um to me what if this is leading to is actually like a fourth Paradigm of how we interact with Computing right I mean there was Punch Cards there was command line then there was guies and this is now I mean like we're in this fourth sort of era the language natural language interactions and so forth and I think I mean yeah I mean maybe there's no killer app yet but the killer app maybe is the fact that we have this new way of interacting and that's what these devices are going to uh start us uh on the road down and uh I mean having having this more like Mutual theory of mind like this system interacts with us it understands us we understand it I mean I think that's where we're headed and um uh the more we can just keep down that road I mean of course the first instantiation of anything isn't always the like the most perfect or the best but um but I think that's where I'm optimistic about it yeah and I think there's kind of this interesting sort of hill climbing right because I think you know my friend was like this is this is like Google Glass all over again right like you're going to have like a couple products that have like such a bad rep that they kind of taint the entire market for like you know a decade plus but Co I was kind of like agreeing with you I was kind of like well it's not like these products are failing so hard right like if you remember when Google Glass came out people like went into bars and like got beat up because they were wearing the Google class like we're we're not there yet and so it feels like we are kind of more in this like hill climbing um scenario I mean I I I don't have the device but I think it's utter nonsense if if I'm honest right well tell us why you know well if you think about what it is right what what what do you me to have here a camera right a touchcreen right you need access to Wi-Fi and then for it to be useful you need a cell connection as well as you move around it's going to do image recognition and then it needs AI Hardware on board what is it it's a phone okay so this is why you can't find a killer app cuz the killer app is a phone and and when I look at it and and I I'm going to give a practical example so Apple silicon is absolutely incredible right so last night on my M3 I ftuned the mistel 7B model with my own data set in 15 minutes at 250 tokens per second the the gpus is incredible and that same technology is coming into the phones Apple's going to go on device they've got the hardware with apple silicon you know and then the mobile phone man manufacturers are going to do the same so as as far as I'm concerned and I agree with the Paradigm but it it's like trying to sell a pager to somebody today it's like here's this thing that's got the things you need you can get messages and you know but nobody has a pager right because it was replaced by the phone and and so I do think there will be AI on Hardware devices I I just don't get that one yeah yeah and I think you're also raising I think one final point I wanted to hit on before we move to the next story is you know obviously the Thousand PB gorilla in the room is Apple it's it's not even th000 pound it's like the 100,000 PB gorilla basically in the room because it it's got the hardware it's got the data you know should be able to execute on all this they they haven't yet really right and so this is I think where all the other Hardware companies see an opening is like well Apple's going to be so conservative that there's an opening in the market for at least to get in and at least maybe be a good acquisition Target right um I guess sh do you just to bring it back to you I'm curious if you've got any thoughts on Chris's uh attack on this whole idea because clearly you were bullish enough to to buy the products and experiment with it and um and I'm I'm curious if you got if youve got the Chris takedown here like what's what's the what's the thing he's not seeing no like he's on the right track in the current state it's not a great product right but just being an optimistic of where the tech is going I'm more on the wash wibe of I see the promise of what this can bring but uh I I think that these uh devices will evolve and apple takes a while to come into this industry right same thing goes with the Vision Pro glasses right I again I was a big fan of them and I bought them early on and 30 days in I did return them so I just the fact that I found some experiences and the promise of where this can be at this point I'm just waiting for the next version to come out but I'm With You Chris in the current state yes the $200 I I would have used it elsewhere but uh I'm just a sucker for a good Tech man yeah me too sh but maybe my challenge back to you is let's fast forward 6 months time post WWDC right when all of the uh AI capabilities start to move on to iPhone regular which I think has already got the hardware that it needs to do these scenarios and let's see if the rabbit actually comes out of your drawer at that point or whether you're just doing those same scenarios on your phone yeah I think one of the funny scenarios I was thinking too is people right now obviously are focused on like the top end of the market right which is like who's willing to pay hundreds of dollars for the rabbit I also kind of think that as models get more efficient and more energy efficient uh we may just end up putting really small models in all sorts of existing Technologies and you know I think this could be both an interesting thing right and then also potentially like a bad thing like it's like 2035 and you're like arguing with your toaster to get working because at some point like someone made like the only interface for this is language basically [Music] so um let's go ahead and move to our second story so um I wanted to talk a little bit about gpt2 chatbot so if you're not familiar with this uh essentially this mysterious thing happens there's this platform called chatbot Arena which has become in some ways kind of the gold standard for evaluating models and it's a really simple idea you basically have people talk to two models um and you tell them which one you like more right and this is basically allowed the comparison cross product of a lot of different sort of Open Source models and proprietary models that are floating around the space and this kind of mysterious one merged uh gpt2 chatbot which everybody claims is incredible it's amazing and I agree actually playing around with it it's actually like quite impressive um and it was accompanied by this sort of mysterious kind of opaque tweet from Sam Altman saying that he also you know has good feelings about gbt2 and so it immediately has led to kind of this like fanfiction if you will about what this model is and whether or not it is kind of a Trojan hored quiet you know stealth release of what could be GPT 4.5 or GPT 5 um and so Chris I want to kind of throw it to you on like what are we seeing here is gpt2 chatbot really like the Next Generation model um and if you've got any kind of theories about that a would just love to get your take on like what are we seeing do you buy the hype I don't know I mean I had to play with it it's pretty good actually to be fair is it GPD 5 I don't know I think they've hyped GPT 5 so much that if that is at this point it has to be AGI or like not even going to impress us exactly so maybe it's GPT 45 but I I don't think that I I I read a Theory online I can't say who said it but I actually like it I somebody said that uh take the GPT to llm which they've open source you can download that in hugging face and they reckon that they may have trained gpt2 on the uh latest uh data that trains gp4 and I think that's an interesting Theory right you know gpt2 with gp4 data so maybe it's something like that um I don't know um but I don't think it's GPT 5 it probably is GPT 45 and as you say you you've got to put it in some sort of Arena to to see how well it's actually performing and you know they'll have run all the kind of MML benchmarks and the 20 other thousand benchmarks that's out there so you know sticking it in the chatboard arena see how it performs there is is probably quite a smart move right it's it's a good way of testing out how that model is um so I I think you know seriously it's probably GPT 45 but I I I really like the idea that it's gpt2 with GPT 45 data I think that's a cool Theory yeah I think I mean there's two things there one of them is like if that actually turns out to be the case and the model performance is like really good in the arena it's like yeah do these architectures really matter like is it just like you you have enough data and you can actually make this like amazing um like that ends up being the bigger lever um well curious so I do have a follow-up question here but just to quickly pause I mean k show I'm curious if you got thoughts on like first do you just buy the hype like do you think this is the next model is it all overhyped curious if you got any thoughts on that it could be um I mean anyone's guest is as good as anyone else's so um yeah I mean I'm sure it is something that's coming up next but uh yeah why speculate I mean I'm sure they'll tell us pretty soon exactly if you guys followed the the talk by Andrew on how agentic flows are going to be the way we get to AGI I think over time the next set of models that you bring out they will have a decent router that will go pick the right models and you're seeing these kind of things come out from open AI already right they have they're automatically picking the right model based on the queries and things of that nature right so I think the gpt2 would be a step in the direction of getting to or the 4.5 and five but I think it'll not be just one big model that's going to be able to solve all of that so I think they may be testing out in public and getting some feedback on how people are reacting what kind of questions people are asking and things of that nature in these open in llm sis Arenas it's very entertaining it's great drama in the AI world I love it I just pulled up popcorn and just enjoy what's happening uh there was a there was somebody who posted that U originally Sam Alman had tweeted with the gpt-2 and then edited that it be a gpd2 and they just leaving breadcrumbs to just make this more entertaining so I love the direction that is going in I think over time it will not be one big 4.5 or five
model you'll end up with a mixture of experts the way that uh that they will solve for this yeah for sure and just before we move on you mentioned Andrew who who is Andrew and is that stuff public if people want to check it out or is that internal yeah Andrew is God of AI so he's like like if you look at Deep learning.ai he started the the Google brain and whatnot he has co-founded corera is is a great great guide to follow on on AI yeah and it's also very funny to me it kind of occurs to me that it's like whether or not it's Sam mman or like Taylor Swift like both are basically like dropping breadcrumbs on social media as a way of like driving engagement around their products so uh and just for folks earlier that's that's Andrew ing um if you want to check out his stuff he's he's great I agree um so I uh I want to put on my tin foil hat for a moment right to kind of go on the next sort of turn of the screw with this story is basically uh let's assume for a moment that gpt2 chapot is is the next greatest thing thing that open AI is going to release I think it's actually very indicative that one major way they want to do an evaluation around this model is to release it on chatbot Arena right because like I think one of the interesting things I see evolving in the space is that you meet a lot of people who are mts's ml like you know basically like real deep and machine learning people who like I think desperately kind of hate the idea that like in order to tell whether or not a model is good you just talk to it for a bit and then you tell they tell you whether or not it's good or not right like they would prefer to have some kind of much more structured evaluation for measuring kind of like conversational quality um but as it kind of turns out chapot arena is kind of like dominating the space over time and um you know it kind of leads this very interesting world where it's like become more and more difficult to like quantify model quality and we're almost just kind of falling back to like almost the most one brain cell way of evaluating models which is well I don't know you talk to it for 10 minutes and then you say whether or not you think it's good or not and I was joking with a friend recently I was like oh what we should do is we should we should start reviewing models like we review like fine wine where you're like oh this is like a model with like you know oky overtones and it's a little bit more conversational and like I think we're like moving in that direction um but I'd be curious to hear from you know particularly like Chris like whether or not you agree that like that is the future because it is so funny that like what's happening is basically you have the super advanced technology but our eval methods remain like very rudimentary and I think some people would say that's good some people would say like well that's not how it's always going to be I think it's a good thing I mean I mean is it really that different from the original churing test right that's what Alan churing said right which is you go have a conversation if you can't tell the difference then you know then is it human or not and and actually if I think about the problems with the benchmarks then this is why I quite like the ELO ratings again it's not perfect uh LMC I think they do a good job there with the leaderboard but the the problem is that because all the benchmark are published we know everybody is fine-tuning to the benchmarks right so you know so how valuable are the benchmarks really so everybody's like I'm 84 or I'm 85 and you're like well you know but if I then ask a query that's completely different that's not on a benchmark then it starts to mess up right so one of my favorite tests is I play Hangman with um the various models right and and sometimes sometimes uh I'm playing the game and some times I'm choosing the word and and I can tell you straight up none of the models play Hangman very well including GPT 4 right so if I give it the word so I use cheese as my test and it very quickly guesses the E so you get blank blank e e blank e right there there is no other words and then every model is like I don't know an R I'm like what why are you guessing an R you know and and therefore you know know that sort of viess that you get from the kind of uh from the arena is really important because they're the sort of things the creative sort of tests that we'll have we'll play Hangman we'll play uh tic-tac-toe we'll you know ask different questions but if if you are literally training within an inch of its virtual life you know being fine- tuned to The Benchmark then how valuable are those benchmarks really so I think the future has got to be tests where you can't find shune IE you don't know what the questions and the answer are in advance and it's got to be a little bit more creative whether that turns into benchmarks whether it turns into kind of um you know LMC as we're doing today whether it turns into as you're saying there's like this model has this kind of vibe you know it's a little bit chatty it's good at classifications Etc I think you're right it might move in that direction but I I I think at the moment the arena is probably the only sensible place where you can actually rank these models sure and I I had really thought about that is like one if I hear you right I mean one of the arguments is like basically all of the benchmarks we've been using are now kind of useless is what you're saying so like almost like there's been this Collective action problem where like no one's been everybody's gaming The Benchmark and so the only thing you can really trust ends up being like I don't know you have like a 12-year-old talk to it for a little bit and tell you whether or not you know they like it or not you can trust if everybody's at the same level of you can't trust if everybody's at the same level of Benchmark that's no indication model model but obviously a model is low you know then you can say they got a little bit of work to do on that model but at the higher end you know if you go oh I'm 0.2 better in this Benchmark reality who cares yeah that's right that's right yeah and I think um yeah and I also buy that right which is basically maybe it's actually a sign of the success of these models is that like a lot of the leading ones are so good now right that the benchmarks are a lot less useful because like yeah we're talking about these gradations that in terms of like actual experience of the model like very limited um uh most of my my accounts of when I'm partnering with clients these are 1400 companies and and where we're doing gen at scale and we putting things into production there's a much higher threshold of what good looks like and how can you define it especially in the regulated industries that I work in final Services federal government and things of that nature right in those cases you have to be very precise on how do you measure the accuracy how to measure effect the answer is is correct it's grounded it's hallucinating things of that nature right so for majority of my Fortune 100 companies we have when we partner with them we create a system of benchmarks that are very tailored to the way the use case that they're putting into production so for example if you're looking at a rack pattern you're looking at pulling the right content is that is that content correct from that from those Snippets are they rank the right way given those Snippets can I reliably create the answer is grounded what's the grounded score and given the answer that you retrieve does that really answer the question that was asked and stuff like that right across each one of those it's based on the kind of domain that they're working on if you're looking at say uh contracts and you're trying to analyze if the answer is pulling out are correct or not the question itself will Define what is a good metric for it I may ask a question about given a contract tell me if I can order this particular part from that contract or not which means it's looking at the top of the contract it's look at an exception on page 19 and so on so forth so it's more of a Chain of Thought to understand how things are connected but then if I say can you contrast these two contracts that's no longer a rack pattern you're now asking a question where it's pulling out the right information comparing it together giving it to an llm so that each query has its own set of metrics that we need to evaluate at each query type right so we've created some very robust metrics to evaluate these models whenever we have a new model like llama 3 came out and snowflak optic came out in the last few days we need to plop that model into that workflow 10 step process step number four I'm going to call an nlm everything that comes before and after we need to have a good set of metrics to evaluate it the majority of my fortune big companies that we're working with they kind of ignore they look at the metrics and say hey a new cool model came in so the wipes that a new model came in but the evaluation we do not look at human eval scores we do not look at the scores that that are public in nature because those are not as meaningful to Enterprise use cases so partnering with Consulting our clients have built these really robust benchmarking mechanisms and that's how we've been bringing these to production during experimentation and production is continuously evaluating that throughout the day yeah totally and I think it's one of the interesting things like I was talking with a friend recently I was like in the future we're probably going to have these agencies that just focus on evaluation um it just feels like it's a it's an emerging business it's like essentially like models and pre-training become more and more commodified the big question will be like well which one should actually choose it feels like there's a whole industry to be built in terms of like bespoke evaluations even in like curation of people who evaluate your model seems like a critical question yeah we're going to start interviewing models like we interview humans right you model are applying for this HR job so you are going to be evaluated against your HR skills you model are a developer let's see what your react coding skills are like and you know and and actually I I think it's a fair point right chit which is it it doesn't matter how good a model is on a benchmark right what only matters is is it good at the task you need it to do so if you need it to do legal contract comparisons it doesn't matter if it's the best poetry writer in Snoop Dog style right what matters is can I can I evaluate contracts because that's the job you want it to do and can it do it reliably one point I just wanted to make related to to what Chris was saying on um kind of people fine tuning to the benchmarks uh so one thing that uh the highi research lab of IBM they putting together kind of this hidden Benchmark so not releasing it anywhere um and uh they have this thing open sourced called unitext um so it's a way to I mean actually construct these very quickly very easily and so forth so um I think that's I mean generally One Direction that uh is also going to be emerging is um uh that job interview also kind of being hidden away uh so that uh people can't train to it and and things like that and I think what shth was saying is is precisely right I mean uh these have to be right on on point on task uh for the sort of usage that you want so one thing we talk a lot about with customers is something called usage governance and um that is precisely that right I mean you don't want to care about what else this model is doing just for what is important for for your application for your industry and and things like that so um yeah I mean I think it's a it's a great uh area and government regulations are going to require a lot of this third party testing and evaluation too very soon so I think everything is is headed in that direction yeah totally one of the stories I was thinking about covering which we we probably will end up doing in a future episode and CH it' be great to have you back on it is like nist right and like kind of the development of all of these like Federal standards in the space and and what it's going to look like I hadn't really thought that it's going to look like like this a standard like HR interview like I love the idea that like in 2035 you're basically like what's your greatest weakness to the model or you like you know I'm going to need you to do this bubble sour you know is like the question you're going to ask people the models to do um and hopefully models will find it as frustrating as humans do yeah and I think models are going to evaluate models too I mean we're already seeing that quite a bit and um uh one thing our team is working on a little bit is um so the arena is actually just a par wise comparison with a human judging two things but um if you have three models um we can actually figure out smart ways of having models figure out which ones are are good because um when you have three models let's say one is an expert one is um a noice and one is like intermediate um the expert can know which one's the novice the intermediate one can also know which one's the novice so with using three at a time you can actually figure out um kind of a total ranking of models and stuff so it's uh it's a fun game uh to be in fantastic hey Kush yesterday uh we had a new paper by coh here talking about the panel of llms right p uh that's a very very interesting way of looking at it instead of having one llm as a judge uh when you start to mix different LMS as a panelist you get a better accuracy in being able to Define that um I I also think that the the task that we asking an llm to do has fundamentally we need to have better appreciation for what steps llm is better at versus Humans so I feel this a little bit of a flaw in our benchmarking systems today where we are evaluating if you look at a 10-step process there are certain steps along that way that humans do that are incredibly easy for llm to take on right they just slam through those and then something very very fundamental will be so darn hard for an LM to get right so I think we are projecting what we are good at as a good Benchmark to evaluate llms I think as we as we go play around with these more you'll have a better understanding of what should we evaluate that llm on so I think the benchmarks themselves will start to evaluate the change so I'll probably not ask them what's their strength and weaknesses are or tell me a j story and things of that nature but I'm sure we'll have a better of what kind of questions we should be evaluating these elims they in context and grounded in the use cases that are in production for our Enterprise [Music] clients let's move on to the final story so this will be a quick final conversation but I think it was a big enough story that I think it's worth bringing in um so the news uh broke earlier this week that open AI had signed a licensing deal with the financial times basically to license their content for for training purposes um and obviously this happens on the backdrop of open AI you know getting sued by the New York Times and a number of other kind of Rights holders right and the kind of question about what are you allowed to train on is it a copyright infringement what do companies like open AI ow owe to people who you know uh whose data is integrated into their models uh is a really big one um and course I know you work on AI governance kind of want to throw this over to you for the first sort of take is you know I'm curious if you have any reactions to this news like do you think that you know we're going to see more of these types of Licensing deals going forwards in the future and and sort of if so why I'm I'm sort of interested in kind of like what's driving sort of the business decision here yeah so um I mean the content creators uh certainly need to I mean receive something in order for them to just exist right um because uh we're I mean pretty soon have we'll have run out of all the token in the world for these things to be trained on right and so um uh the new content needs to come from somewhere it can't be just fully like synthetic generated data I mean that will lead to model collapse and all sorts of other um sort of things so uh how that happens I mean copyright was always meant not to be a permanent sort of thing it's just to protect those creators during their lifetimes so that they have some livelihood and and so forth right so I think that's the idea a that we need to keep going with um and so it might just lead to a completely different business model um so local journalism has kind of died um in in the world and U maybe this is a way to resurrect it because um uh you need to have I mean like this information that is coming from somewhere um when we have a rag pattern or anything else I mean there needs to be timely information as well so uh I think just the fact that content needs to be be there um and we need to have a way to to have it uh uh kind of incentivized and and so forth is is the is the Crux of it um at IBM I mean we do do a great job trying to eliminate all um copyrighted content out of our Granite models and I mean we do a lot of stuff there but um eventually I think uh it's not a question of like who's infringing who's getting sued who's doing licensing deals but how do we just make an ecosystem such that uh the creators are are valued as much as as anyone else yeah for sure and I think this is actually really at the Crux of whether or not this AI economy can can work right because I think that um you know if you want to use Google as like an earlier template right it was sort of this interesting moment where we said okay you'll allow us to index all of the web and in exchange we're going to send traffic to you because we're search engine right and like that actually created a trade by which you know an ad economy could work it feels like here are the challenges that we haven't yet built that infrastructure to create sort of that symbiosis right and so like essentially there's nothing incentivizing the creation of new high quality tokens which is going to be a structural issue for the the market ultimately I guess coach maybe the question I'd have for you to maybe push back a little bit because I was debating this with a friend of mine you know my friend was like this is all just window dressing right because it turns out that like Financial Times tokens are just like not that valuable to open a a it's not a whole lot of tokens and B they already have a lot of new stories right do you do you kind of buy that like how valuable are the kind of tokens that we're talking about here um you know when it comes to a newspaper or when it maybe even comes to like say like is it more valuable to get um you know movie scripts right than it is new stories like I think this ends up being a really interesting question about like where the most valuable tokens are coming from um and I'm just curious to be like if you do think that these kind of like journalism tokens which have been the focus of so much attention um really is is where it's at yeah I mean journalism tokens have been in the news but uh I mean comedians have been suing as well I mean it's not that it's one or the other right so um uh so I think it's just the fact that it has to be new tokens and um uh I mean there's distribution shift right the world is changing now we have uh uh whatever we talked about at the beginning right I mean these uh these new gadgets the terminology for those isn't going to exist in um uh kind of any historical documents so we need to keep up with the world the way the meanings of words change the I mean any sort of new thing that comes up right and news tends to be one place I guess your comedy routines tend to be another place I mean wherever this um uh the where the world changes however we can bring that in I think that's where the value is because it's not about the number of tokens it's the quality and the quality in terms of how to get these models to keep adapting to the world as it exists yeah yeah for sure well so we uh probably in wrap-up mode right now does uh Chris show but I'm curious if you got any final takes on on this before we close up I I go one take and it's probably going to be on the opposite side and we had this chat before Tim which is um I think there's going to be a whole business on data washing coming out there because if if you really look at this and yeah I think there will be some folks like Financial Times that will license their data and that's great but you know if you take five articles which is a news article you know on the same subject I run it through a model you know I get it summarized and then maybe I open- source that data set right and then somebody else who's training a model goes and use that open source data set right you know where that data originally came from is gone right and as far as the model trainer is concerned it's like oh no I used this open source data set which is MIT licensed Etc I pulled off the internet there you're now one step removed from the uh the original content sources so I I think that's going to become I think that's going to become a big thing and then I I see people doing that commercially as well so as much as we're all good people and we want the Providence to go around I I just don't see a world where uh everything is so lovely and and you know we're all high-fiving each other and how good we are right I I see this data washing world coming really really quickly yeah for sure I have a I have a different slightly different take on this I think it is quite dang dangerous U I'm a big proponent of decentralized AI I'm with ammed from Cil ai's Camp right and he left stabil AI with his mission of decentralizing AI I think the fact that open AI is making a decision to partner with one news Outlet if they picked fox or they picked CNN they would have a different set of news articles that are being trained on right so I think there's quite a bit of bias in the media itself right if you look at the elections coming up and what not right where you what decisions you making on what data you think is high quality it's a single entities definition of what is high quality data where if you look at a human beings like we're being exposed to all the different uh both ends of the spectrum of of news articles and stuff right so I think decentralizing is going to be very important and this is this also goes to speak to open models where people can add data on the Fly and they can personalize it more and things of that nature I think it is a little bit dangerous when large organiz ation that have become AIS of Power with AI are making decisions unilaterally on what good looks like what ft would would produces news articles may not be representative of what the culture or the what the demographics of a particular country or particular region are so I think there's a little bit of definition of hey if I'm using a gen model from a particular vendor do I just get to go personalize it and say hey I'm leftwing I'm right-wing or I am more like my thoughts on particular topics that it gets more and more personalized to me and that defines how it responds and becomes my personalized version of it right I I wonder show it if the training is actually the important part for the open AI piece I wonder if actually it's just going to be ragging that data is actually the the key thing for them because you're giving the up-to-date article and therefore As you move into those agentic platforms training the models not going to be that valuable but actually being able to serve up and say this is the latest news from financial times and it's it's valid and it's not Hall ating I think that's probably valuable Chris I think it's just like kids right it's nature and nurture both of them so I think that's where we heading right like like what do you what was your nature of the kid that was born and what how they were nurtured or were dying but thanks so much Tim this is extremely helpful great great set of questions yeah absolutely well thanks everybody I could have not asked for a better panel to start with our inaugural inaugural episode so uh Chris Kush show it thanks for joining us and uh we hope to have you on on a future episode thanks te everyone [Music]
2024-05-05 11:57