The state of open source, InspectorRAGet, and what’s going on with Kolmogorov-Arnold Networks
[Music] hello and welcome to mixure experts I'm your host Tim Hong each week we bring together a panel of researchers product leaders Engineers policy experts and more to discuss debates and distill down the week's biggest news and Trends in AI so today on the show three stories first one the state of the open source uh what are the biggest Trends in open source models and how will they shape the business of AI second the future of retrieval augmented or rag uh they've come so far where are they going to go next and then finally kav Arnold networks or can what the hell are they why are all the Nerds suddenly talking about it and should we buy the hype so today on the show I'm ay supported by an incredible panel of experts so uh first off Marina denki senior research scientist at IBM Marina thanks for joining good to be here yeah and particularly thanks to you for joining us so early Pacific Time David Cox uh VP models and director of the MIT IBM Watson lab David thanks for joining the show pleasure to be here and uh returning for the second episode we were joking that we just made make this a Kush vars show going forwards uh he's unfortunately declined that but Kush Varney IBM fellow working on issues surrounding AI governance Kush welcome back it's great to be here and uh yeah I'm the VY with the hair um so yeah we'll use that as a little pneumonic so well great so let's start with the first story that we want to cover today on mixture of experts um so I think from where I'm sitting uh you know there has been just so much happening in the world of Open Source right So Meta of course released llama 3 a few weeks back um Apple in a very you know big move for them I think released the open um uh Elm on device models and then IBM just recently released its Granite family of models uh and so David I kind of want to give you a chance to kind of first plug Granite tell us what it is and what you guys have been working on um and then I kind of want you to kind of go into you know why it is that IBM decided to release Granite open source and why it thinks that doing this matters and I think from there I think we can talk more broadly about what's happening in open source but I wanted to give you a shot to talk a little bit about the work that you and the team have been doing sure yeah happy to um we actually had two major open source announcements this was a big week for us across IBM and red hat uh the first first uh was that we open sourced the granite code family of models so these are models in a a variety of sizes 3 8 20 and 34 billion parameters trained on 116 programming languages these are um you know state-of-the-art models competitive with you know the best in in the field and one of the areas that we really optimized for for Enterprise users because ultimately IBM is interested in supporting Enterprise is that allaround capability you know not just Python and gener code generation which is often the focus uh for the academic Community but also Java and rust and all kinds of other languages and also things like uh code fixing and explaining um so there's a lot of things you can do with code models and it's really being integrated into the software development you know fabric of how we do software and software is integrated into the fabric of everything we do in society and we we wanted to release these because we you know ultimately our position is that that open winds you know like we're communities will build around these models people will build things um that you know that we wouldn't expect they'll they'll be able to extend the models and that's that's super powerful and and that leads a little bit to the second um announcement which which happened through red hat so we have a technology that we developed for doing alignment of models uh we call large scale alignment for chat Bots and that gave rise to a project called instruct lab and what instruct lab is is a way to actually aggregate Community contributions to the instruction tuning of a model so now uh any developer anywhere in the world can submit new skills and new knowledge to a model and and then that actually gets integrated and then we do a weekly build of that model so it's a different cycle of development a different kind of community forming where we're not just forming around a model and you know building inference tools and things like that but we're actually able to merge all those contributions and then update the model every week uh so we're really excited about this it's been a fantastic partnership with red hat building this out they know open source better than than anyone and uh we're really excited this got announced uh at at Summit on on Tuesday by Matt hix the CEO of Red Hat yeah that's awesome and I think I don't know if You' agree with this is like I see what IBM has done with granite and with instruct lab and it's kind of like you I was joking with a friend the other day I was like it's open source putting its big boy pants on right like they're kind of like moving into like open source being something that like Enterprises will actually use um and I think that's really changing what we mean by open source right like I think like the big Trend even a few months ago was like oh my God these open source models are just getting so big right like they're huge parameter models and like isn't that the exciting thing is that open source models will be you know on par as sophisticated as like state-of-the-art but what's kind of interesting here I think is really sort of like twofold right like I think like what's interesting with granite is you guys are releasing a class of models of different sizes sort of on the idea that like not everybody's going to need like the the chunkiest model in the whole world uh which I think is really really interesting um and then I think again it's also kind of like on par with what we see out of the Apple announcement the open Elm announcement right which is like these are not the biggest models but they're on device models right and it kind of feels like I don't know if you'd agree with this David is like it feels like open source is finally now kind of like responding to Market need like then in some ways like Enterprises are like how do we actually apply this stuff and now like essentially open source Community is like trying to now you know adapt to actually provide solutions to that but I don't know if that's like a characterization you guys would agree with yeah yeah no and and I think I think you're spoton you know there isn't just one thing that people want to do with llms so there's not going to be just one llm that wins the day um you know we have our models running on laptops and there are that's they're really interesting uh you know advantages to doing that like if you if you want to be on Prem you don't want to be data over the network it's it's uh you know there's you know it's proprietary and you're worried about IP you can run these models in many cases on on your laptop for other applications you want the very best performance you know say you're doing an application modernization that's just going to happen once um and it needs to be you know the highest quality then you can move to one of the larger models um so we we really are trying to be responsive across the spectrum of of different needs and yeah at IBM we we are trying to be sort of I think you said the the big boy pants you know like we're very transparent about what data we put in the models which is which is not always true but which is very important if you're an Enterprise want to use these uh the other point of differentiation for our models is we release them under Apache 2 license just a clean uh pachy to no additional restrictions and this can be really important for for adoption we knew this is something that our customers uh would ultimately need and want so um that that that's you know how we're evolving uh sort of the our approach um to open source and and again yeah like you said meeting the customer needs yeah definitely and chis Marine I'm not sure if you've got views on this is like I think you know I think we can use this as a springboard to kind of talk about like how this is going to sort of shape the the market as a whole right because I think you know if I'm now you like an Enterprise sort of thinking about how to integrate llms right it feels like there's increasingly options right well we can you know go work with we can try to do it at all ourselves in house right like we can try to go with like the big proprietary models right um and it kind of also feels like there's going to be a range of new businesses that emerge here as well like just like the whole business of like you come to us with a problem and we fine-tune open source models for you seems like it'll increasingly become a big part of the ecosystem but um yeah I'm kind of curious as from from your point of view kind of like in you know the the kind of research space and even thinking about like you know where this all goes just if you've got views on on how this will kind of impact the ecosystem as a whole yeah I mean uh one great thing that uh I mean instruct lab enables is really I mean shifting power to Value creators so um it uh really allows uh I mean as David said I mean this whole Community to uh to really congeal around this thing um and uh make these models authentic for themselves it's some sort of uh commitment to locality as well I mean for whatever you need uh for your Enterprise for your organization you can um uh really make things uh make things yours so I think it's uh it's an awesome awesome thing yeah I really appreciate being able to add the skills as you find them needed for your own use case so the thing with all of these models is that it's very hard to predict when you put them out what are they actually going to be used for so being able to have the flexibility to say oh I've realized I have a use case I need to adapt quickly I need to make the model adapt quickly sometimes with something that's proprietary or somewhere else you just don't have the ability to move that quickly or even to stress test or check and is this going somewhere or is this not going to be helpful at all so from that perspective actually the way that uh We've released instru lab is very is very good it's very effective for checking these cases it it's very rare that um any given company or Enterprises needs would be represented would be a top of mind for the developer of of Any Given base Foundation model like does does meta you know care about an insurance company well you know they probably do but not it's not their not their primary uh con thinking about just being like I wonder what AIG thinks about this exactly exactly so so having a base that's built for Enterprise but then giving the ability to customize and really focus and and bring in you know knowledge and and and particular things you want to do that are specific to that industry uh can be really powerful can I so we have a few more minutes on this topic can I play Jerk for a second right because I do think that like you know one of the most interesting things about open source is that early on you know if you were if you were a government right or someone worried about AI ethics or AI safety right you basically say well the rise of these few leading companies with proprietary models is like really good for us right because we only have to go to a few companies and change their policies in order to sort of secure the ecosystem right and I think you might say well one of the issues of these increasing proliferation of Open Source models right and the fact that everybody's kind of going to be running their models on premises right is that there's a lot more room for people to misuse these models um and also like you might think that also they create all of these supply chain security issues as well like I'm kind of thinking about how like uh mpm right like other instances in which open source is really taken off um you know security ends up being this really big problem because like the provenance of any particular component is really difficult and your stack might rely on you know hundreds of Open Source components and I guess I'm kind of curious I mean I don't think anyone's got a good solution to this and and look I came up as like a free software advocate so like I I'm on I'm on the side of what's going on here but I'd love the kind of panelist of like you know offer an opinion about that like do you buy that those are risks I don't know if there's kind of smart Solutions you guys are thinking about just to kind of wrestle with that a little bit I think is one of the most interesting parts of this development yeah one thing just to start off um on the security issue um history has proven in open source software that open source ultimately ends up being safer not less safe their efforts for instance to create you know private versions of the Linux kernel and it it turns out it's just hard to keep those safe because more eyes mean uh you know more more sort of you know uh people who can find uh you know problems understand problems and and fix them um so I think having that transparency enabling the academic Community to get involved to build Solutions uh for many problems that we may face I think is super important I will also say we're very careful about what we um what we release I mean we're we're we're very careful about what data goes into these models uh before we release them ensuring that they're you know minimizing the risks uh any po risks around you know you know potentially dangerous you know activities where we're not releasing models that we think are could be used for for for ill intent of course not yeah and I think uh I mean I do think that there's going to be a need almost for like a consumer reports or a wire cutter for these models at some point where it's basically like there's going to be so many models out there that it's going to literally be like well we had a couple experts spend like a few hours really testing this thing you know and this is like an important part of the the ecosystem Kush it looks like you might want to get in yeah I mean uh we actually do work on exactly that the consumer report sort of idea so we call it uh AI fact sheets um and model risk assessment and uh it is uh exactly a way to uh to analyze uh these different models that are out there um give them different scores along different dimensions um and as a consumer um you can I mean really look at different vendors different sort of options and uh get a good sense of uh of what's available so this is actually um something already available through through Watson x. governance one of our Flagship products yeah I imagine it's come some kind of future when I quit my job as a podcast host to be a like a model Sali you it's just like have you considered like this this model for your use case fine vintage that's awesome yeah a fine vintage yeah exactly right yeah really good oky overtones on this 2024 was a good year for llms yeah exactly um Mar any final thoughts before we move to the next topic here yeah I would say that it's uh still very early days also with this technology and everything that we're going into so especially as scientists we would like to try not to have the hubris of thinking yeah we've got this you know leave it with us we've we've sorted out the rest of this there's been so many interesting developments and surprises in this technology in the last few years and we we think that will continue to be for sure in that sense open source is actually going to be more efficient even from a market standpoint more eyes means more ideas means more places that this is going to develop in unexpected and interesting ways so it's actually even I think more efficient besides whatever thoughts we may have about the morality of it as well yeah no for sure and again I'm kind of arguing against myself because like I'm very Pro open source um I think it's just like a very interesting kind of set of considerations as like the whole architecture of the industry sort of shapes uh and [Music] changes well this is great so let's move to the second topic today I really want to talk about retrieval augmented generation or rag um so if you're not familiar with this rag is uh basically one of the hotness uh in in in AI um if you look at the papers that I clear this year or ACL um there are a lot of papers using rag methods um and you know I guess Marina I you keep me honest here I mean I think one of the reasons that it has been so prolific and of so much interest is that rag seems to kind of open a window for solving a lot of the models the problems that we have with language models right like well we can't train these models pre-train these models all the time but if they're really good at pulling data from elsewhere um you know this is a good way of keeping their responses up to date um it's a good way of ensuring that they're you know more factual potentially um and um and so I I'm curious because I know your group recently released a paper um thinking about and using Rag and so maybe as a springboard for the conversation I don't know if you want to quickly talk about that and then we can kind of more generally talk about you know I guess from your point of view what you see as sort of the existing limitations of rag and what are the Big Technical problems that need to be solved sure that sounds great so um the paper that you refer to it's a description of a methodology and a system for trying to evaluate it more deeply again the point of rag is it's one to be able to have a conversation with an llm in which you ask it to write a hiu about frogs they're great at that no problem we he we live at business use cases and so it's very important that when you have business use cases that rely on factual information and it's really a problem if you get things wrong this is where you get into rag like you said being able to point to a reference of all right the reason I'm giving you this answer is because this is the content that I am relying on whether it's informational or it comes from a knowledge base whatever then you want to actually go and double check is this going to act the way that I expect it to act and it's one thing again to uh test these llm models against large benchmarks there was some good comments last week about benchmarks and the use you know usefulness of them as time goes on it's another thing to actually see what happens in a customer's use case this is an old data analysis uh necessity you have to go into okay what when to the test cases that you've created your testing where did your data come from what are the documents how have you managed to without knowing it introduce biases into the evaluation that you're doing because of the way your annotations are done because of the way you defined your metrics because people have different understandings of what is acceptable what is not you have over uh corrected for a particular query type you have cor over corrected for a particular way of responding this is all uh analysis that you need to do to have confidence in the solution you put out that includes an llm but is not just the llm by itself it's the llm as a part of a solution and so that's something that my group is does a lot I know kush's group does that a lot as well is diving into the details of that especially how we take our our customers through getting confidence and what does it mean to to deploy their llm and our system has a fun for those of us from the 90s we called inspector raggot yeah Inspector Gadget um and it really is a a way to to make sure that you can take yourself through that analysis and and feel confidence in what you're getting not just the Agate number yeah it's funny about the 90s I was in a class that a friend was teaching today or earlier this week and one of the kids was like I hear back in the day there's this thing called geoc cities I hear it was really cool or something like that I was like oh my God I gotta get out of here um yeah so there there's there's so much to go into there and I think there's kind of like maybe two topics we could dive into you know I think the first submarine I'd love to get your thoughts on is I think one really great theme I think that came up from last week's episode episode was kind of the idea that almost AI is in this kind of weird period of like Benchmark bankruptcy where like essentially there's like all of these B benchmarks that no one cares about and then the benchmarks that do people do care about are like so thoroughly gamed that they basically provide no valid information anymore and like one outcome that I think schit was saying on on the uh on the episode was like well that's one of the reasons why like the solution now is like just talk to the model for 15 minutes and then you figure out whether or not it's good or not and it strikes me that like I don't know if you put inspector ragit in kind of this context is like it seems like there's also a switch from like from benchmarks to like monitoring as the way that we really assess whether or not models are high quality I don't know if you'd buy that because I as I take kind of your group's work is an attempt to say okay well we're not going to really you know benchmarks are a useful guide but really in practice what most people want is to see like lots and lots of telemetry about their models and like that's how we approach this problem um but kind of curious to get your response on that like do you buy the idea that AI is is in a benchmark bankruptcy and do you see kind of ragged as sort of a solution to that or an answer to that yeah I think you should think of benchmarks is something that you should iterate on rapidly and evolve now the problem with talk to the model for 15 minutes and just get it Vibes uh kind of feel of it is uh people are not very good at coming up with what is the right thing to talk about it to for 15 minutes consistently they are not very good they themselves will uh only think of whatever came into their head whatever they were talking about to their customer last week and they will introduce like I said a really a lot of biases and what they thought of then you end up being very nastily surprised when you actually go ahead and deploy your model and they're like well that didn't work but I talked to it for 15 minutes it seemed fine you wouldn't also um you know deploy a representative to a customer after talking to them 50 minutes and thinking that seems fine so realistically what you actually want and what I hope the point is of approaches like inspector aot is constant evolving benchmarks yeah talk to it for 15 minutes and then go check yourself hey what data did you end up actually putting in what kind of questions did you end up putting in do you realize that you didn't do the right Vibe check do that a few times your Vibe check becomes then into something systematic but it's something that is that is iterative that is interactive rather than some academic somewhere put out a benchmark I don't know what this has to do with my data make your own make it iterative and constantly you know check yourself for what you're doing is actually proper quality that ends up being really the right thing to do so move yourself from that um you know shout out to Daniel Conan from that system one thinking to the system two thinking then you're going to have confidence in what you're actually deploying yeah this is a I think it's so interesting because I think this is what you're describing is going to be an enormous need across like every company that attempts to uh adopt this stuff and you know I was joking earlier about being a model somaler like I think my other business proposal is like you're an eval atellier where you're basically like we help to craft finely crafted evals for like what you need and kind of what you're talking about because like the art of creating a good Benchmark and evolving that benar describing a lot of our jobs actually here at IBM is what that's literally what we're doing for our cellier so um David Kush I don't know if you got responses you want to jump in um maybe I can be a little bit controversial um so yeah um I mean people talk about rag being a solution for hallucination for lack of factuality lack of faithfulness lack of groundedness these sort of things but um to me I mean it's part of the solution but uh I don't think it's the full solution because even when you get the retrieve documents there's a model in between and it can ignore those documents it can get confused by them it can uh I mean just hallucinate anyways I mean all sorts of things so um uh to me I mean what Marina is talking about is very important not just I mean over time but uh like as part of the the the process initially as well or in runtime in uh in Fr time so mean checking for hallucination separately um uh thinking about can we Trace back the information where did it come from in those documents uh can we even come up with new architectures that uh uh don't hallucinate uh By Design in some ways so I think there's rag gets a lot of play right now but uh I think it's a stepping stone um I don't think it's the the end of the journey I actually completely agree with you fully agree with you rag has not fixed it it's just an additional step in the direction I completely agree with you interesting so do you think in like I don't know it's always tough to predict on these things like in two years we'll be talking about rag I think we will because um rag is like I mean it's search right and a lot of the companies who are in the cell LM game are search companies at the end of the day so um I think it'll stick around it'll uh have a lot to to to do but uh yeah I mean I think for Enterprise use cases um maybe it'll uh get a little bit less emphasis maybe not I don't know well I mean for freshness of data some kind of retrieval can be helpful like you just added it to the database you can retrieve it immediately so there there are more than one problem that rag solves and and I agree with uh with Christian Marina that hallucination that's that's a SE it you know it helps a little but like it's a separate problem that we need to address lots of different ways um but but the ability to access new information the ability to customize quickly I mean we're starting to get uh I think layers of technology that allow us to to address that instruct lab is one of them if you wanted to ingest knowledge into the LM and build it into your sort of your into the llm itself you can do that but you probably still want to be retrieving things and there's going to be a balance and and we're going to figure that balance out I think over the next uh next couple years yeah as itol yeah I think it's definitely like the much more realistic pathway that I've heard right like I think like the other Alternatives I've heard are like well at some point the model will become so big and know everything and then we'll be able to pre-train it frequently enough I'm just like I really how many h100 so you're G to buy to pull this up you know it's just like is not within the realm of possibility so yeah and it's not a concept go ahead David sorry I was just going to say and not every company's going to give their data over to open AI to let them you know their proprietary data it's it's a real problem yeah all I was going to say is I mean like these sort of ideas like having multiple levels and layers I mean it's part of Computer Engineering I mean cash in like different types of locality I mean this is all like uh very much the sort of thinking that uh computer people have had so it just needs to come into to this too yeah orchestration right and making sure that there's routing involved there's decisions that evolves there's different checks there's different guards that's not going to go away I don't care how long you've trained the L that's not going to be fixed yeah for sure so we have a few minutes left on this topic I think the last area that I want to kind of push us into is I think Marina you had kind of a really sort of interesting comment when you're explaining inspector ragit which is basically this kind of feature of trust right essentially like what does a user need to be shown to trust the model um and I think what I love about that topic is that in some ways it's it's pushes you into the realm away from like like it turns out people trust models regardless whether or not they're a huge parameter model or tiny model and like you know I heard this great anecdote where um this mle was telling me this story like we were doing an eval where there's these side by sides and what we discovered is that the users that we were testing against just felt that longer outputs were more credible and trustworthy regardless of any content they included right and I was like oh that makes sense because like you're saying like do 500 of these tasks in the next hour and so they're just using these visual heuristics to evaluate text and one visual heuristic that we use to tell whether or not something is more substantive is like is it long and look dense um and I think this is like such an interesting thing because if you go down that route I mean I guess it's a prescription for madness because you're basically like well does font Choice influence how people think like how trustworthy their models are yeah exactly and so I guess Marina I'd love for you to kind of like Riff on that a little bit is like you know how far does this Rabbit Hole go like once you move away from benchmarks and you say we're going to give you a dashboard of different things you know you're now kind of in almost like the theater of trust like what do we need to show you and what is the metric that drives the most trust with the user and is that trust Justified or not and just would love to get your thoughts on that as someone who's working in the space there's some interesting psychology here so we know that people get extremely mad at uh computers when they make one mistake but they're much more okay if you were told that it was a human so that's an interesting psychology there's another fact that these models are fabulous snake oil salesmen they will tell you something that you will read it and you'll believe it even if in the back of your mind you're like wait isn't that not what I thought that was but it will sound so convincing and so accurate that you like oh yeah that's that that's the right answer I have no further questions they're very good at that so in that sense actually even human evaluation is very challenging people are bad at catching these kinds of things on the other hand you find yourself inprise situation and that's risky if you really did give incorrect information that is very risky it's again it's a good reason that you can't really deploy these models by themselves with with no support but I think there's a lot of psychology actually in setting expectation just in the same way that when we first had Wikipedia in the world when we first had Google search and people thought oh well if it's on the Internet it's true and then people learned and I think that over time it's going to be the same thing where you're going to learn what kind of things really need to be what is the right way to interact with these models what is the safe way what is the consumer report sort of uh appropriate way so some of this is technology a lot of this is people a lot of this is people psychology I cannot give you enough data points and tell you I will never ever ever make a mistake in this model it's not possible so we're going to have to figure out how it is to set people's expectations if people are allowed to sometimes make mistakes and you ask for for a clarification how do we get to that state of the world also with the use of the technology yeah for sure yeah I think and and this sort of pushes into I think a sort of interesting direction is like under certain conditions I'll just throw out the hot take is like under certain conditions basically optimally safe performance of the model is not necessarily optimally uh easy to use I guess right so like that is to say like a perfectly articulate model may actually signal more trustworthiness than is warranted so there actually may be weird situations where there's kind of this trade-off which is like we actually wanted to perform worse because it inspires uh a an optimal level of doubt in the user right would be sort of the theory that I'm I guess I'm arguing well short of you don't have to have a perform if it just could just express its own uncertainty that would already be a big a big Improvement because people really aren't great at um dissociating things like fluency and you know convincingness of of of sort of discourse with actual truth and particularly in contexts that enterprises are working in with rag where you're you know taking Enterprise documents HR documents and policies and you have to be correct about them unless the person who's Vibe checking that model really understands those policies in perfect detail it's it's very tricky to evaluate and people tend to be fooled very easily Chris you want to jump in I see you kind of um no I mean I think the word Marina used earlier humility um is the key I mean the AI needs to be humble um and we need to be humble as well so I think that that combination is is the the right way to go yeah for sure um yeah and I think it's part of the problem too you know I was talking with my friend who kind of relayed this story as well is basically that um people have all these expectations that are built up around computers which makes this particularly difficult and like the language model behaves in such a fundamentally different way that it's like violating our expectations where you're like it's good at poetry but bad at math like that literally flips everything we have built up in terms of intuitions about computers for like the last 20 years and so like the adage I've been using is like everything you think computers are good at like llms are bad at everything that LM are bad at like you know you know computers are good at and there's kind of this weird mismatch that we're sort of navigating at the [Music] moment all right so Kush uh you and I are going to kick this topic off um this is going to be the big challenge of the episode um which is if you've been watching the more nerdy channels of the AI discourse online people have been very recently excited by something called uh the kog gav Arnold representation theorem which is giving rise to a paper that proposes a kind of kog gav Arnold Network or can for short and um it's very difficult to tell from the outside I think if you're not a technical person as to like what it is and why it's exciting which worries me because it has all of the indications of being like Oh do you do you use blockchain cuz like blockchain will solve this right like I think we like rapidly go down that direction so what I kind of want to do over the next um few minutes as we close out this episode is to basically give the clearest easiest to understand explanation of cans that anyone has yet articulated on the internet and we're going to do this together right okay um so Kush no pressure on this no pressure um so I think Kush maybe the best place to start reading some of the papers is can you give kind of like a quick explanation of why models large language models approximate functions what does that mean exactly yeah that's precisely the right place to start so um yeah uh I mean so a mathematical function uh it's looking in some space right um so in middle school or high school we saw these one-dimensional functions um uh if our data was just onedimensional what that uh function is trying to do is fit that data and by that uh use the function rather than the data to predict the next thing um so by doing so we're actually able to uh to So-Cal generalize so data tells us the pattern the function describes the pattern so that the next time we want to um make a prediction we use the function instead of the the past data so I think that's a very key point and then we can go into what those functions are how to represent those functions and how to compute those functions but uh yeah that's the the starting point right and a prediction here just to make it very simple is like um uh tall people that's one variable tend to be heavier is that right like for example that's that's a prediction that you could build a function around yeah exactly and you can be very quantitative about that so if uh I'm 6 feet tall maybe that predicts that I'm 180 lbs or something like that so um yeah so then I think the next step we're going to take the step by step we think through this problem step by step right uh is basically um so as I take it right like one of the things that machine learning has really done better than kind of traditional models of AI or traditional models of you know computer programming is that we've we frequently tried to kind of like hand draft all of these rules right so like you want to write an algorithm to divide you know pictures of cats from pictures of dogs you would do feature engineering right you get a bunch of people together to kind of like write these equations these functions out right and I guess is it right to say that machine learning what it's been really good at doing is like coming up with these functions on its own right like to basically come up with those rules on its own to do this prediction yeah I think that's a good way to put it I'll be a little bit more specific though um so we as humans um the algorithm designers Etc um have been the ones who come up with what the functions are um the functions have parameters and it's the learning algorithm that's figuring out the parameters to best uh best fit the data but at some level um we the the computer scientist the folks are the ones who decided what were the functions uh in this library or this universe of of possible functions and then let the uh the algorithm figure out the uh uh the Nuance the the parameters and so forth yeah for sure and so what we what we do hear I guess right now in the world of AI is multi-layer perceptrons right which is like this very particular way of implementing AI that is doing the approximation of these complex functions on its own and can solve all of these magical things right like it can you know have conversations with you it can sort pictures of cats from dogs whatever whatever you want do you talk a little bit about maybe like the trade-offs of that like what what do we need to do in order to achieve that magic right like um yeah yeah so um there used to be this uh car dealership that had a commercial one in the 9s again um so they used to dating all of ourselves here I'm yeah so they used to say um stack andum deep salinum cheap uh in terms of their cars um so uh so with these uh the multi-layer perceptrons or the feed forward neural networks um uh the trend has been just uh there's this one kind of these layers what they do is uh they multiply some inputs buy some weights um add them up and then apply a nonlinearity um something often which is called a reu function a a rectified linear unit which kind of um changes the output right so you have those you layer them on top of each other you stack them really deep um you keep doing this keep doing this keep doing this um and that way uh you actually end up with the ability to um uh end up with almost any uh sort of nonlinear function uh to describe the data so uh you talked about uh this Universal representation or approximation theorems and stuff so um uh you can actually prove that through even not very deep neural networks um uh you're able to represent any function uh that that you have in front of you yeah so that has been seems like we're at the magic right like this is this is how it works um I guess one result of that right is that these models have been like they're really expensive to make like you need like a lot of energy and a lot of chips and a lot of data and a lot of computing time um and so Along Comes col gav and Arnold actually don't know who those people are I just know them from their representation theorem you should know korov from a lot of other things yeah oh he's that one okay same gu okay all right yeah I know that guy uh it's the big labasi yeah know um so uh tell me about the representation theorem right what does it what does it tell us what's what's the big deal yeah so um like we just were talking about so when you have some function that you're trying to represent um mathematicians have come up with all sorts of ways to um be able to decompose that goal of representing this function I have in front of me in terms of more primitive uh other functions right so uh this is I mean we see it uh as an electrical engineer I see it um like forier transforms or Foria series are ways of decomposing a function into uh SS and cosine functions um the same way um uh with the uh uh the mul perceptrons it's into that particular structure you're decomposing into these weights and um these nonlinearities so what uh the kagor of Arnold representation theorem is about is decomposing again any function into um some other functions in this case they happen to be one-dimensional um so uh they could be splines or some other smooth one-dimensional function and by combining them uh in a particular uh summation uh you can again uh represent any multi-dimensional nonlinear function and that's the proof um is what uh what korel tells us that uh in this way of taking 1D functions you can come up and represent any uh multi-dimensional function yeah which is which is pretty wild right because what you're sort of telling me is basically that like the machine learning models that we have now can work this magic right they basically come up with a magic mathematical formula that can help you tell pictures of C from dogs let's just take that example and then kind of what you're saying is that we can take that magic formula and like represent it like we can break it down to these tiny tiny tiny Lego blocks right like all the way down to what we were just talking about like tall people or heavier like single variable stuff right and and kind of the theory and course you keep me honest you're the one who actually understands this stuff is like like any I don't know or like most very complex uh formulas can can be reducible in this way I think is what the theorem is saying is that right exactly okay so we're there uh the can Network then is a network that attempts to do that yeah exactly so um uh so when in the regular neural network we had um these weights um on these edges that are multiplying the inputs here we're applying that function the splines or the other sort of wending um uh sort of nonlinear function and then instead of uh I mean you add in the same way so the can is also adding up the inputs but then there's no additional um nonlinearity afterwards um like the reu that we see um in in the normal neur Network so all the nonlinearity is done um before you add things up uh so it's just more complication in one place rather than in a in a different place and so uh just by doing that change you can um uh reduce the number of parameters um because uh this more complicated thing on the edges uh is actually uh more able to represent the these weird various sort of behaviors so so that's kind of what it's uh what it's trying to do nice so we're there deep breath yes two last questions yes does it matter what what would what would can models what's what's the promise of can models if that you can do this it seems like you've just taken this complex thing and turn it into a bunch of Lego blocks so from my one brain cell standpoint I'm like isn't that kind of the same thing yeah um so it it's true I mean you're just shifting the nonlinearities from one place to a different place um one thing that the cans uh are able to do better a little bit is um more interpretability so um when you look at those blinds uh they actually make sense to us um so that uh height and weight sort of relationship or any of those other things um uh we can understand better uh so there's this interpretability method called shap which um people have been using for a while um this is like automatic shap without having to do shap in a sense um inability there for some folks who are not maybe as into the papers is just simply understanding what the models why the model is doing what it's doing yeah exactly exactly yep um and so I mean that's one advantage um the disadvantage though is that uh uh our Hardware infrastructure has not been optimized um for uh uh for these sort of things for these blindes and and so forth um whereas The Matrix Vector computations for n networks um are um uh kind of uh like very highly optimized through those uh h100s and so forth so uh so that's I think the difference we might if this catches on I mean develop some hardware for this type of thing as well um and uh I mean the last point that I'll make is these are not new ideas I mean this is something that's been around uh uh even our team uh like a couple years ago um we developed something called Coffer nuts it uses continued fractions which are um a third way of representing functions also it has a approximation or um Universal approximation theorem associated with it um uses continued fractions that have been known since Antiquity like the ancient Indians and ancient Greeks knew about all this stuff so I mean like all this fancy math is great and um uh it's just different ways of putting together I mean different uh of these functions together and then at the end of the day um uh they all let you I mean kind of represent uh these different nonlinear function functions how you train them uh might be more or less costly where the interpretability is might be more or less easy to hard so uh so I mean it could turn into something uh it might just be another option um so we we'll see yeah for sure yeah it's fascinating and that I think opens up definitely a direction that I hadn't really thought of because I think the main thing I had heard is well you can make much more energy efficient models right but um it seems like two things you're pointing out one of them is like we might be able to understand why these models are making the decisions they do at a much like closer level of depth than we have in the past which seems which seems huge um and then I think the second point is actually that this is like not new stuff right like I mean much like neural Nets themselves this is like we're just like pulling all this old stuff back again and being like Oh I guess it works now you know ultimately well I think that resonates really well with Chris's point about the hardware match I mean often times uh you know we get success in the field moves not because something is the mathematically optimal thing but it's something that can be done sort of irrational scale with irrational speed and and as you say deep you know deep learning was you know kind of a Rebrand of artificial Minal networks it was around for decades before it caught on why did it catch on it's not because there was some mathematical breakthroughs because the hardware like gpus by accident were really good at doing this and then that just sort of set us on a path so um obviously all these new developments are really exciting and we could build different Hardware potentially um but you know any new idea like this is going to compete against how wonderfully good gpus are at doing the basic computations needed for for deep learning um so you know that there's it's an interesting battle you know an interesting set of trade-offs there yeah that I think relationship between sort of like hardware and what's happening on the model side I think is like one of the most interesting aspects of this and like how long does it take for a model to influence Hardware you know are we just locked into Cuda for the rest of our lives it's like all these things are like very very interesting questions um so uh Marina any last thoughts uh before we we close up today yeah um even more General comment continuing what Christian David said is representations of data are not created equal so yes it's the same information but when you change the way that you represent it you're able to do things with it that you weren't able to before so even with something like a large language model you're representing data that exists let's say on the internet but you're representing it in such a way that you can access it in a way that you couldn't before same thing with for example can versus MLP the representation changes there's going to be trade-off but it's always very interesting to try this uh the fact that we now have more of these options open to us because the hardware has caught up to the maap that has been around for years or decades or centuries yeah that means try again try again try again see what what new things will come up data representation is really one of the things underlying driving this current ERA of AI so more work in this direction is just going to continue to drive things an interesting place that's great yeah well I can't think of a better not to end on um Kush MVP thank you for coming on the show again um and Marina David hope to have you on the show again and uh I hope all of you listeners out there join us next week for another episode of mixture of experts thanks everyone thanks thank you appreciate it
2024-05-15 19:54