Llama 3.2, AI Snake Oil, and gen AI for sustainability
what comes next in open source if you just combine this recipe and map it to other models I'm expecting a lot of very powerful models because ai's prediction it's just pretty limited right I guess I might take a bit of issue where AI is fundamentally about prediction why exactly are people so excited about the use of AI in sustainable development so you can see how people are are trying to Wrangle how do I balance the computer that's needed versus how do you how do you look at the energy consumption all that and more on today's episode of mixture of experts I'm Tim Hong and I'm exhausted it's been another crazy week of news in artificial intelligence but we are joined today as we are every Friday by a worldclass panel of people to help us all sort it out Mario m is director of product management at Watson X AI sharne is senior partner Consulting on AI for US Canada and Latin America and Skyler Speakman is a senior research [Music] scientist so the way we're going to begin is what we've been doing for the last few episodes I think it's just a fun way to get started is to ask each of you a simple round the horn question for all the listeners uh the guests have not been prepped as to what this question will be so you'll be hearing their unvarnished instinctual response to a really difficult question so here's the question in 2025 a near a few months from now will there be an open- Source model that is absolutely better than any proprietary model on the market show bit yes or no it'll get close okay Skyler I'm sorry what no uh yes there will be great and Mariam what do you think and big yes okay whoa all right nice um very exciting well that's actually the lead in for our first segment today one of the big announcements of course is the release of llama 3.2 um if you've been following the news or been living under a rock llama is the uh sort of best-in-class Open Source model uh that meta has been really helping to kind of um advance in the marketplace um and their release uh just earlier this week featured a large range of different models small ones big ones um and Mariam I understand you were involved actually in the release um do you want to tell us a little bit about kind of your experiences and how that was yes it's just so exciting to be part of that market moment on the first day when the models are released to the market it's available on the platform excitement just just it's just amazing yeah yeah I think from the outside one thing I think itd be helpful for our listeners to learn a little bit more about is what's different with 3.2 release um you know is it just more open source uh what should we be paying attention to well there are really three things that they released U with 3.2 the first one is lightweight unlocking all the iot and age use cases with the release of llama um three billion and 1 billion the second thing was the multi- model Vision support it's Imaging TT out you can think of uh unlocking use cases like image captioning chart interpretation uh visual Q&A on the images and the beauty of that is the way that they did it was they separated the image encoder from the large language encoder and trained that adopter in a way that now the model is not changed comparing to the 3.1 so it can be used as a dropping replacement for the 3.1 llama 11 billion and uh the um 70
billion variants but the image encoder that is added to that now is going to enable the model to process image in and input out so that's the second thing and the third thing that released they released on the model side is the Llama guard for the vision like the safety of these multimodal models matters and they release the Lama guard that is also available in our platform for the customers yeah that's awesome so there's a lot to go through here um I think maybe to pick up on that first theme uh show bit I know you know the the drum you always beat when you come on mixture of experts is the models are going to get smaller and it's a good thing um do you want to talk a little bit about how this matters for people who are uh implementing this kind of stuff in the Enterprise yes so a lot of my clients we are deploying uh these small language models on device and quite a few times it's just because they don't have good internet access in the factory floor or people who are running around in the field things of that nature right so we have to do a lot of that computation on device especially if you're looking at our federal clients or manufacturing and so on so forth right in those cases for the last few months I've been super impressed by the momentum we have had in this AI space going towards much smaller more efficient models so in the 1 billion to 2 and a half three billion parameter space we've seen a influx of a lot of models so I have been running uh Google's Gemma Apple's open Elm we've had Microsoft's 53.5 there' have been some amazing models have delivered quite a bit of value U we have from from meta now the one billion parameter model I was able to download that just before I took a flight so I was able to experiment for the next three hours with these small models and by the way I was looking at the meta Connect using our the Oculus glosses it was a completely experience being there life so I got I got a chance to go experiment with these models there are certain things that we do for our clients where we add another layer of some fine-tuning to these models and the fact that they are small and I can fine-tune them because they're open I'm able to deliver much higher accuracy with a much much smaller footprint I think that's where you get gold the return on investment you get from these small models that you can then fine tune and then run on device that opens up a whole lot of use cases for our clients if you've not been able to do if you're going and calling an API call back and forth yeah definitely and Skyler I guess this kind of response puts maybe your response to the round the horn question into context you know I think I was like are we going to have an open source model that's better than the best model in the world I guess kind of that's not what you think is exciting about this release right I feel like you're you're like chomping at the bit to talk about how great are if if they had come out with a 500 billion parameter model that would have been yeah for me but if they're emphasizing the three billion and 1 billion parameter space that gets me so excited because it's away from the bigger is better idea and that bigger is better idea has crowded out other really cool research problems that probably should have been worked on while people were scaling larger and larger and larger so to see a major player like meta come out and make some noise about a three billion 1 billion parameter model I think that's just some really outstanding work and in the larger context it also really shifts decision makers to not be gated behind the ones that have access to running a 400 billion parameter model so I I think that type of that kind of power Dynamic if if open source is continually getting these smaller scales I I think that's just a really good direction so uh yeah kudos to that about llama coming out and saying one billion in three parameter space has is showing uh skills and and again being able to download right before you said you hopped on a plane I mean that type of thing um that's a really great direction to see these these types of foundation models going so there are a couple other things in this in the space as well the 128k window the context window that was pretty surprising to me for such a small siiz model why is it surprising yeah I think some folks might not actually have a familiarity there it's worth I think for them to hear that subtlety yeah yeah so the fact is you can put more context into that into that prom that you're asking right it's 128,000 tokens I can pass in this context so if I'm looking at a whole email thread chain on device I can pass that in so that kind of a response or or eventually we'll start to see more models that can handle images and stuff too that are this small size currently the Pix Model 12 billion parameters or meta 11 billion those are the ones that are doing images but I'm very hopeful that soon we'll see more image capabilities come down to this two three billion parameter models as well so doing that on device when you're walking around taking a picture of uh equipment and saying what's wrong with this or what's the meter reading things that nature I'm I'm super excited as as the capabilities increase there are a few things that are lack that uh I would like to see come out in the future things like function calling being able to do like being able to create a plan and have more agentic flows between these smaller models I'm very excited about the future iterations of these models as well maram when you compare we have been working on granite models for a while and we've always has been focused on small models can you give you a perspective on the small model size what are you seeing has a good size like 7 billion to 2 billion what where do you see the great threshold of performance and size well it depends on the use case right if you have an iot or Ed use case the smaller the better but also the smaller the better in a case that like it has impact on the latency is faster it has impact on the energy consumption and carbon foodprint generation and it has impact on cost so if we can get the performance that we need for from a smaller model that's that's well suited for that use case but but the Skyler to your point what excites me about this release and the lightweight is the way that they achieved that lightweight models like if you look into the paper of how they did that they grabbed the Llama 8B and they structurally pruned it so it's like cutting cutting the network making it smaller but then they use the very large general purpose models the 405b that they had as a teacher model for distill to to bridge that Gap if you just combine this recipe and map it to other models I'm expecting a lot of very powerful models coming to the market moving forward just with a combination of it distillation and pruning yeah for sure and I think one of the most interesting things is as it gets sort of cheaper and cheaper and more available I think we'll also see like lots of use cases right like so far we've been gated by how much investment you need to put into these models mod and how expensive they are to run but I think it's almost like as it becomes more accessible we'll also just see like well why not just plug a model in right like it'll end up being something that you can apply for all sorts of different applications that you know we would have thought it been like ridiculous to do a few years ago because it would have been too expensive to even think of doing hey Mariam just on on the latency part I was stunned I'm I'm in the flight I have a one bilon parameter model running it's giving me 2,000 tokens a second response that's like 1500 words is generating per second like that's the I want when I'm looking at a model on my phone responding like I I just I became a Believer when I saw that speed of the response the lency yeah the vision of view like on the plane with the goggles using a model I just like your your seat neighbor being like who's thisy playing with LM exactly I'm waiting for the new Airline documentation that come out that says please do not run llms on devices while the plane is in Flight you know like um so maram I guess before we move on to our next topic what comes next do you think like are we going to see more releases of this kind um is this going to be the big release for a while like what should we expect I'm expecting to see a lot of movement in open source and open Community listen the future of AI is open it gives really this openness drives Innovation and it gives you three things one making the technology accessible to a wider audience and when you open it up to a wider audience it gives you a chance to stress test your technology right so we can advance safety of these models together with the power of community it gives you an acceleration on Innovation and contribution back to building better models for different use cases so a combination of accessibility safety enhancement and acceleration in Innovation is what I'm expecting to see in the open community and because of that we are going to see a lot more powerful smaller models emerging in the next six months [Music] two researchers Arvin Nan and his collaborator SAS kapor came out with a book uh which was called AI snake oil um and it's basically the book adaptation of sort of a wildly successful substack they've been running for a while uh where they essentially kind of point out all the places where AI is being oversold overhyped or being deployed in ways that are um you know not necessarily like the best use of the technology um and what's so fun is Arvin you know took to the internet to basically say we're so confident of our arguments here that we want to put a bounty out if you think we're wrong on anything that we're arguing in this book um tell us right and we can we can put a bet on it right in two to five years and there are sort of argument is that like the kinds of critiques that they're pointing out about AI systems are things that don't have to do with like technological capabilities and have to do more with like what can we actually predict in the world so one of the things they say is you know AI really can't predict individual life outcomes or you know the success of cultural products like books and movies or things like pandemics right they're kind of arguing that like prediction can only do so go so far and AI is ultimately a prediction machine and so there's actually like kind of just so far this technology can go I think I just wanted to kind of first start there is like I'm curious if that group sort of buys that argument like you know do we think that this prediction thing is just limited in a certain way and that actually caps kind of what AI can be used for or should be used for um I guess Skylar maybe I'll throw it to you if you got any responses there I guess I might take a bit of issue where AI is fundamentally about prediction um I think the gains that we have seen recently on this idea of the Transformer being used to do the next token prediction in that sense yes but because it's able to do that next token prediction there are so many other use cases that are not prediction focused so it is it's this idea about yes we have to understand what this length of what this context of data is and underlying it that transform model does rely on that prediction but it is so much bigger than just prediction so I I would really probably take that issue that um prediction is very difficult um but the other Downstream tasks that you can do after that prediction task is is really what has probably moved this space forward so don't get too hung up on the prediction uh capabilities of a model yeah I'm I'm be the Skyler on that uh if you look into traditional ml prediction was key and all the use cases the majority of the use cases Enterprise use cases that we were using traditional ml4 was a reflection of really prediction but then when it comes to generative AI the the the the prominent use cases productivity unlocks that it does which is a function of content generation code generation it it can be prediction in a sense as Skyler said like the next token but that's I don't think that's the prediction in the use case as a use case so for that reason I I I don't 100% agree that the prediction use case is the primary use case that AI is designed to deliver yeah that's actually very interesting I hadn't really thought about it like that um this has come up in some of the episodes we've done before but you know this is one of the debates I find most interesting is oh well at some point machine learning kind of diverge from computer science because the way you program a computer is quite different from the way that you you know test evaluate and F tuna model you're almost saying that actually there's even another distinction could be made which is basically this sort of like traditional machine learning if you will right we almost kind of diverge a little bit from like the kinds of concerns that we have in generative AI or whatever you want to call it but like this kind of current generation is almost so different in kind that there's almost like a different set of problems I don't know if that's kind of what you both are chasing after I do think there there is a Divergence away from classical machine learning you know uh take all of your decision trees your regressions all those pH and then generative AI those those have diverged and I'm trying to trying to keep up with it you know that's my my previous background was in the classical uh machine learning space and then man we're we're in for a wild ride on generative AI so uh Tim being a podcast let me just quickly recap uh the book I had uh I had the pleasure of listening to the audio book on the flight while I was hacking oh you did okay you did the homework I was in a very meta phase because I'm trying to hack something while I'm listening to this book on AI there the two authors are brilliant there are two of The 100 top influential people in AI according Time Magazine U there are five points they make in the book the first one is around making they're saying that AI predicts but doesn't truly understand the context uh there's the second point is around there are AI will reinforce our biases in areas like policy hiring things that nature uh third one is around you have got to be spe skeptical about anything that's blackbox AI solution the point that Mariam had just made about openness and that's the future Direction uh then you had there should be stricter regulations and accountability especially when an AI is making an outcome that could have an adverse impact elsewhere and uh ethics and ethics in AI has to be focused on Beyond just the technical capabilities that we are making right so none of these are ground baking statements that uh that we've not heard before but the very first one I think that's where Skyler started was AI is making predictions and in a lot of cases we expect a intern or a junior person to make a prediction look at a pattern and raise their hand when they see something that's not working my wife is a physician she spent 14 years in medicine becoming a doctor right she does critical care lungs and sleep medicine she has a set of medical assistants Mas or nurse practitioners who are helping patients as well she expects them to raise their their hand when they see a pattern break here's the the stats that they've had from all their tests a patient comes to them and say hey something looks different here so all she's asking is recognize the pattern and call me as an expert I think that's where we should be with ai ai is augmenting us we should be very precise in saying pattern recognition is a good thing I want AI to do patterns and I think there's too much of a of a gap between pattern recognition and getting to the root cause analysis of being what caused this that causal modeling requires years of experience and I think that's the relationship I would like to have with our AI be able to find patterns and raise your hand come to me for expert advice so I think we're heading in a good direction the name of the book is very catchy but I think the points that they're making are pretty grounded in what we see in reality today yeah for sure and I think I think to pick up on that point I agree I mean I think that's kind of the dream of how this technology should be deployed you know I think part of their worry is that they feel like the the Market's not going to provide that right that there will be a tendency to be like yeah let's just implement the AI and it will do everything for us um and I guess maybe a question i' POS back to the group is like how do we do a good job fighting that right because I think sh I want to live in the world that you're describing um but I think a lot of people who are particularly getting used to the technology or new to the technology almost have a tendency to kind of apply it for that causal stuff which is actually where we kind of want to preserve the the human role um and so I'm curious like in people's conversations with you know friends and family and others like are there things that they've done to kind of like you know help to set level set with the technology properly I think an example that has come up with this in our conversation recently my parents were both teachers uh Public School teachers and we were talking about whether AI is going to replace teaching and uh similar to the healthcare ideas I would really like to see AI be very measured in education because there's a there's a there's got to be a human connection there that comes through um and so to to back off a little bit in into that that face similar to shit's analogy with the uh the medical situation about where we really see these specific roles and I I think an AI instructor would actually would be would be terrible I don't want that I don't wouldn't want that world but having AI being able to assist students and assist that interaction between a human teacher and the students I think that would be a really cool example of this where we'd want to pull back a little bit and not go full automation uh and and education probably in health as well I will push back a bit sker on the whole education piece I think if you follow Salman Khan doing Khan Academy Khan Migo I think the impact he's having surgically with AI he's figured out a good blend between teachers students and where AI becomes a co-pilot for them right so I think to your point of creating the human connection 100% my mom was was teacher as well growing up and unfortunately she was also the principal of my school so that did not go well with me but wait while you were at when I was at the school so oh my go unpunished but the fact that you can understand the nuances today a teacher is addressing 60 kids in a room and she has to go talk at the at the same level for each one of them so you can't adapt the training to people who have who have different come from different language backgrounds as an example right or there are certain sections in the book that some people will take longer to understand some will take short of time to understand right so I think adapting uh the teaching curriculum to that student AI can do a great job you can take people from MIT great phcs professors and you can take that course work and translate that in Canada for some person in a village in India right I think that I think a can play a very positive role and back to what Tim was saying we need your parents Skyler to tell us where AI should be augmenting like taking the same lesson and creating multiple flash cards and different adapting that lesson and things of that nature and there are lots of things that you can do with AI in that space of teaching too right so next week my parents will be on the podcast and uh we'll they'll uh we should definitely do a parents episode where it's just everybody's parents but none of the usual guest that would be so much fun from this I've learned I need to joke I need to check back in with KH Academy I think the last time I was there they were YouTube videos so I think maybe that space is really expanded I need to go check back into that yeah for sure it's cool yeah they're doing a lot of interesting [Music] experiments I want to make sure we get time for the last topic which is a really broad one um but I think it connects a bunch of stories that have kind of played out over the last few weeks uh and isn't really anything that we've covered in too much detail on mixture of experts in the past and the topic specifically is the relationship between general of AI and sustainability um this week was the UN General Assembly and it was very interesting to me that the US state department said we're going to bring a bunch of people together all the CEOs of all these companies to talk about how AI is going to be used for the sustainable development goals um and then similarly you know um IBM just released a paper fairly recently talking about some collaborations they've been doing with NASA specifically around predicting sort of climate and building climate models that are available um and I guess sh I want to turn to you because my understanding is actually you gave a talk or we're on a panel recently specifically on this topic I'm wondering if you can give our listeners sort of a sense of like how this sort of connection is evolving like using this technology for these types of really really big problems where you know I think uh as someone who hasn't really been as deep in the space I'm kind of like how does chat GPT help save the world uh I I'm not I know that's not the case but if you can give us a little bit more color on like how are people using this Tech in space absolutely and Tim um IBM does a lot of work in the space we have our own commitment to being carbon uh neutral by 2030 and we're doing a great job against that already uh this week I I spent a lot of time in New York with a lot of global leaders and like celebrities in the space and got very humbled by the kind of problems that everybody's dealing with so the the entire conversation is focused around AI can help solve some sustainability U goals for us and we need that compute power to be able to solve these gnarly problems right so making predictions on what happens to to climate all over the world at a very granular level how do you forecast what what events May happen and things that nature there's lot that happens in that space how do optimize the cost envelope of running businesses things that nature on the flip side you have a cost a climate and environmental cost that comes with running these models right to just give you a few data points if you ask chat GPD or massive model like that a question to go create something right it consumes a 500 mL bottle of water to answer that question right that's just the water consumption that goes behind these things just cool down centers and whatnot the data centers Bloomberg came up with the study all the data centers together uh would be the 17th largest country in energy consumption countries like Italy or um use more use less energy than the data centers do today in countries like Ireland Where they' Have Become a center where all these International Tech firms have all their data centers as well the data centers in in Ireland use 12% of the national energy consumption it's more than all the households combined right so you're starting to get to these numbers where if you look at any of these graphs of the energy consumption and then you see where we are today you get to a stage where companies like Microsoft are now partnering with nuclear reactors that things that would had melted down we're now trying to resurrect them so that they can power it was a Three Mile Island right which famously had some trouble you know a little while back so so you can see how people are are trying to Wrangle how do I balance the compute that's needed versus how do you how do you look at the energy consumption so my talk was about we have to be computationally responsible that was the title of the talk and we were talking about how do you figure out the right balance from the chip level all the way up to how do you end up using the models and uh and I was suggesting that like how you have cars that come with MP MPG miles per gallon sticker that one number somebody can look at and say yes this is what I'm doing when you're booking a flight I know the carbon emissions so I think as part of that we need to be very conscious about if I'm using chat GPD as a calculator to add two numbers versus using the actual calculator there's a huge Delta between what and we'll get the answer wrong exactly right yeah I think there are some really good use cases of where AI has been helping augment we do a lot of work with with uh with forestation we look at how how how land use has increased we are predicting catastrophic events with with governments all across the world we're trying to to help them with wild wildfires and stuff like that so I'm overall very impressed with how IBM has taken a position on sustainability using AI for good and we are super focused on smaller models energy efficient all the way down to how do we optimize our compute and this is also part of our whole AI alliance with and all the other companies where we are collectively trying to reduce the threshold required to go Implement AI across the world especially in Africa in parts of Europe and Asia and things of that nature as well show but I I like that bottle of water analogy um there was a paper came out from signal and hugging face just this last week and it was on sustainability and um the energy that's being used here and one of the units of analysis they used is how many cell phone charges this thing the aquari would use and highest was image generation and we're approaching a query to an image generating model is getting close to a cell phone's overnight charge and I just I just really liked that kind of unit of analysis because it brings it home so much more about okay I put in that query for an image generation and now I have to think about that's the power of a cell phone for you a day or two uh so I think it's really cool to try to maybe think about more creative metrics that we can present this to the world about just how power hungry or water thirsty these these models are otherwise I see Millow mowatt hours I'm not I'm not an electrical engineer uh and it I don't really appreciate it but you tell me how many you know bottles of water it is or how many um cell phone charges and and it clicks so uh yeah yeah that's interesting would you want it to be like metered so like as you're you know you're using Claude or something and it's like here's how much power you've you know used yeah yeah um that would be that would be really useful Mar we've done a lot of work with granite models with three and we open sourced them do you want to share with the audience what we're doing with our Granite models with granite we are focusing on the smaller model um for the exact same reason that you mentioned like let me let me just share some data points if you look into a five hosting a 500 billion large language model on A1 100s roughly you need 16 A1 100s for that hosting if you look into a 20 billion models parameter model just one single A1 100 so the API call that you send to a 20 billion model versus a 500 billion model is 16x more energy efficient just because it's 16 times less GPU just ignoring all the cost and latency and all the other concerns just for sustainability because of this what we see in the market emerging is looking into the smallest model that makes sense and customize that on their proprietary data that's the data about their users that's the domain specific data to create something differentiated that delivers the performance that they need on a Target use case for a fraction of the cost and by cost I mean cost in terms of energy carbon footprint and everything together that's the guiding principles for granite like we've been focusing on a smaller Enterprise ready models that are rooted in value and Trust and allow our company the companies to use their own data on granite to make the custom model if you look into our Granite custom uh the open source models they are released under a Apache Apache 2.0 license what it gives Enterprises is the freedom and flexibility to customize those models for their own commercial purposes with no restriction which is really the power of granite I love that and Mariam U the this week we also released our prit Next Generation models for granite right and just to share with the audience we as IPM have been partnering with NASA and the problem we're trying to solve generally we have uh these machine learning models that make predictions on forecasting weather patterns and things of that nature right this is the first time it has ever been done where we have created a foundation model where a pixel where square inch or of the of the earth we're using those as tokens we're trying to predict what will happen next right in soad using text so we have built this Foundation model that combines weather data and climate data together in one model so in that model can then be adapted for various use cases in the current state we have things like if you want to do forecasting in Florida for for rainfall there'll be completely different model if you're trying to do deforestation somewhere else it'll be completely different model so the first time we have combined a model that can be easily adapted this like the foundation models that we've built and as mic drop open source is completely to the community so now you can go and take the these PR models from hugging face deploy them for the same model mult multiple things the next iteration where I think we will hopefully go this is starting to do what multimodal models did you used to have one model that detex one model that did image and then just like meta 3.2 billion 3.2 now we've combining the two together so the same model can do both of them I'm hoping that we'll come to that point with Foundation models for with weather and climate we can then start to connect what's happening in two different places the climate patterns are changing the forestation is changing it'll be able to think through and combine those two so we've made the first step towards a new future where Foundation models will be able to combine all of this data together and the same model can answer all of these questions exactly I got super excited about this the these models and also think about it 40 Years of NASA satellite images are at our fingerprint now with this models to use it for weather forecast um climate prediction seasonal prediction and use that to inform decisions for planning mitigations um for climate Andes that's exciting that's super exciting it's a great note to end on just because I think like both it's a model that's open source listeners you can go and download and play with it if you want it um and uh and I think it's a great application I think show I was talking about earlier like I think it's so useful to get Beyond simply like oh how does a chatbot save or gain sustainability there so all these other aspects in that I think people don't think about when this this topic tends to come up um well great everybody so that's all the time we have uh for today uh thanks for joining us uh if you enjoyed what you heard you can get us on Apple podcasts uh Spotify and podcast platforms everywhere uh show bit Skyler Mariam thanks for joining us and we hope to have you on uh sometime in the future
2024-10-03 21:54