The Institute for experiential AI is proud to invite you to our spring seminar series featuring leading AI experts and researchers sharing Cutting Edge ideas you can expect AI leaders from a broad range of Industries sharing perspectives on defining and applying ethical AI in our distinguished lecturer series and eai faculty in Northeastern University researchers discussing their AI research in our Expeditions and experiential AI series up next join us with Jimmy Lynn professor of computer science at the University of Waterloo for his seminar information access an era of large pre-trained neural models on Wednesday April 26 2023 at 1 pm Eastern online or in person at the curry Center room 440 at Northeastern University [Music] registration is required and the seminars are free to attend we hope you'll join us for an invigorating spring semester hey so my name is Ken church and I'm here to welcome Jimmy Lynn it's my great pleasure to have Jimmy Lynn here um too many things going on um Aaron um Jimmy and I go way back and it's it's really you know I remember uh uh when I was at Hopkins and you were at Maryland you asked me to uh um to give a a guest talk in one of your classes do you remember the first question that was asked I'll never forget this question I actually don't I'm sure you wouldn't so the student wanted to know whether anything I was going to say was going to be on the final and what did I say um I think you said it wasn't oh and then I lost the audience yeah they just went to sleep afterwards anyway so I think none of this was going to be on the final so please take it away okay so that's permission for you to all zone out I guess all right so let me start at the top so what is the problem that we're trying to solve and by we I mean myself my research group and the research community that I'm part of so it's rather simple we want to figure out how to connect users with relevant information now as it turns out this is a relatively modest Endeavor in the sense that I really don't care that much about building agis artificial general intelligences whatever those are I'm not particularly interested in unraveling the mysteries of Consciousness or how babies learn based on limited information I want to tackle pretty simple tasks you got users you got information they need how do we connect them right so in terms of um academic silos this is generally referred to the as the the community for information retrieval dealing with search but it also encompasses things like question answering summarization recommendations and a lot of other capabilities so broadly I like to use the term information access to refer to this uh these bundle of uh capabilities now ideally we want to do it over lots of uh lots of different types of information the Myriad of information that are available so text images video and ideally we like to do it for everybody arranging from everyday users to domain experts such as medical doctors lawyers and the like however for the purposes of this talk I'm going to be mostly focused focused on text and everyday searchers all right so let me deal with the elephant in the room right and the elephant in the room of course is catchy PT large language models and all the noise that we hear coming from the Atlantic the New York Times uh and the we're all gonna die crowd um and so over the past several months I've had to play therapists to many of my graduate students experiencing this uh this sort of existential angst about you know what's going to happen and I'm gonna share some of my thoughts on those topics with you uh right here also all right so the tldr is none of this is fundamentally new however we have more powerful tools to deal with all the problems that we've wanted to solve before and it's a great and exciting time for research right so it's not a time to panic it's a time to look forward to a bright future and so the bottom line is we should keep calm and carry on all right so that really is a message I've delivered it and so since this is not on the final uh feel free to file out the room and I will uh I will see you all for a coffee later or something like but uh since you're here I'll continue on with the talk let me try to offer a little bit more context and to justify and back up this top level message okay so I'll start with a question of um you know where where are we and how do we get here okay so here's where we are right this is the new Bing search engine codenamed Sydney that was revealed a couple months ago and it is search as chat uh this is the example that Microsoft gave in its blog to launch the whole entire Enterprise here's somebody asking for a trip for an anniversary in September what are some places uh that the search engine can recommend that are within a three hour flight from Heathrow and the search engine gives I'm quite a remarkable result while it says you know here's a summary if you like beaches and sunshine consider this if you like mountains and lakes consider this and if you like Arden history consider that it's able to pull a lot of results from the web and synthesize everything together complete with citations wow amazing this is the future of search right okay I'll get into that but this is uh let's start with the history uh with the history lesson right so this is where we are now let's talk a little bit about how we got here right so before being the dominant mode of search was this well it was Google right so this is actually from Google at stanford.edu from Circa 1998 or so um and prior to search as Chad alabing we had a search as 10 Blue Links you type in a few keywords you get back a list of results and you had to sort through it to figure out what was going on right okay so unwinding the clock a little bit before that well this was uh the time with Google there was the time when there was digitized information on the web that you could access before that was a time when there were digital materials available but they were not available from the web so most of you are probably too young to realize this but in the 90s there was a time period in which we got CD-ROMs um and put them into our machine and booted up something like Encarta encyclopedia because Wikipedia didn't exist and that's how we got information right going back even further um before information was widely digitized there were books and we were at the stage for several hundred years right so libraries data back to uh uh several several more centuries before this is an image from uh Trinity College Dublin Circa 18th century this may look familiar because if you're fans of Star Wars this is where they got the inspiration from uh for the the Jedi archives okay and so if you unwind the plug clock back even more you arrive at what I claim to be the first information retrieve Pro a retrieval problem so this is a cuneiform tablet from the Babylonian period um Circa the 7 700 BCE and it essentially is a record paraphrasing of course of Jack owes Matt to sheep right so there were accounting records of debts and they put it in the storehouse Somewhere In The Granary somewhere and then sometime later when they had to settle the debts they had to go out and pull out these records the first information retrieval the first search Problem all right so we've been at this for a long time now right so the next thing I'm going to talk about is uh to uh I at least bring some credibility to the the things I'm gonna tell you about to tell you where I play in this sort of grand scheme of things so I've been working on this problem for a while and a lot of the things I'm about to share I've sort of experienced firsthand all right so where do I enter the picture somewhere between here and here so slightly before this and slightly after this so my own Journey began in 1997 um so when I began studies at MIT across the river so this is a image of 545 Tech Square this building doesn't exist anymore uh it used to be home of the MIT AI lab this is a building right there Tech Square does not look like that anymore um and so my previous my former advisor uh his claim to fame was developing the first question answering system uh for the web and I remember I was absolutely floored by this right remember this is the era of Encarta and the beginning of Google there was a system where you could ask a question in natural language like who wrote the music for next stop Wonderland and get back the answer or you could ask a question uh staying in the entertainment theme who directed Gone with a win and you would get back the answer without having a bunch of links that then you had to sort through this was amazing and it blew my mind and so I've been working on these types of systems and these type of capabilities since around that time so one of my um perhaps most proudest early accomplishments uh was the following so um ichkai is a annual International Conference in artificial intelligence in 2001 uh so over 20 years ago it happened to be in Seattle where Bill Gates was a keynote speaker uh very much younger Bill Gates and one of the things he talked about was ask MSR automated question answering system uh from information on the world wide web and that was my internship project I thought that was the coolest thing so yes so this explains sort of my my place in this sort of uh long narrative of information retrieval capabilities all right so I think I guess this is all a slightly roundabout way of saying the technologies have changed but the major problem that we're trying to tackle haven't it's still about the problem of connecting users with relevant information okay so let's get into a little bit more detail we being computer scientists we like to think of abstractions and so that's the perspective I'll start to walk you through how these capabilities develop and came to be all right so at the end of the day what we want to build is a black box that looks something like this right in comes a bunch of documents and there I'm leaving beside aside what uh generally known as the content acquisition phase you gotta get the content from somewhere crawling the web you got to do data cleaning and a whole bunch of data Preparatory steps so let's set that aside for now right so you get a bunch of documents you shove it into this black box and on sometime later on the other end you put in a bunch of queries and you get out some results and lo and behold they're amazing and a light bulb goes off and you're happy all right so how do you build this system now as it turns out we've known how to build systems like this for many many years for decades in fact right so this is the vector space model that some of you have all are already familiar for doing search and it was proposed in 1975 although some of the ideas date back even further so if you zoom in to some of the diagrams that are presented in that paper some of the figures will look familiar here's an example the vector space with query and document vectors if you pull up the text what does it say well it says you're going to represent documents by T dimensional vectors where each term represents the weight of the jth term and then what are you going to do you're going to use the inner product to compute query document similarities and you're going to do search errors as a result and that was in 1975. so we've had some idea how to do this reasonably well for quite a while all right so um this black box first cut at it would be it's something like a vector space model let me go into that a little bit more detail on what that actually entails okay so you're going to have the documents that come in you're going to run them through some type of term waiting scheme and you're going to get back out a bunch of vectors right so here's an example Passage this passage is about the Manhattan Project and the uh and the efforts to build the atomic bomb during World War II and from there you would get a bag of words representation that looks something like this I've just written in Json format right so you're going to get all the terms and then you're going to get weights assigned to it via the term weighting function okay so you're going to take uh all these document vectors you're going to shove it into the inverted index uh and then then you're happy for a while until the queries come in the queries get represented as multi-hotvectors also in this Vector space and you perform top K retrieval and you get out the top K results and then you're happy okay so the tldr is research and of course I'm being incredibly unfair but the research during the 70s to that say the 90s and information retrieval or mostly about how to assign term weights and at the end of the day people just said find you know BM 25 is sort of what the entire Community converged on so here's a formula for bm25 it's not important except to recognize that the term weights are a function of things like term frequencies document frequencies document links and so characteristics of the statistical characteristics of the terms and the documents they occur in all right so that's a a high level overview of what's going on in this black box here okay so um that's hop in our time machine and Skip ahead a few years I realized that uh after doing my slides that many of you are probably now too young to get what this means this is the Delorean from Back to the Future The Time Machine of a classic movie but these days when I show this I just get blank stares so anyways imagine a time machine and so we're hopping from the synthesis uh to uh from the 70s to not quite modern day yet because this is February 2023 but just before the previous major innovation okay and what was that that was Bert so Bert popped on the scene Circa 2018. all right so what's Bert while Bert is Google's Magic pre-trained Transformer model uh when it came out the author said it could do a variety of things it does classification it does regression it does named entity recognition it does question answering um it does your homework and of course it walks the dog also okay so um but what does it actually do so Google actually wrote a blog post that tries to sketch out at least at a high level of how I applied bird to search and this came out in October of 2019 um where they describe how they apply Bert to search okay now of course not to be outdone Bing came back the next month and said oh yeah we're we're using some of the same models too and it led to all sorts of increases uh in in how effective Bing was so yay bird yay Transformer models okay but what what's actually going on so let me share some thoughts with you on on that okay so instead of having a single black box now the architecture we as a community basically settled on is this is this two-stage architecture where in the first stage you select some candidate texts and in the next stage you try to understand the text and I put understanding quotes here that really means uh in the first stage you use bm25 maybe not literally but something like bm25 and in the second stage it really means taking the results from keyword search and doing some type of re-ranking on them okay what does that mean okay let me talk talk you through it so this is the usage of bird in a technique known as a cross encoder so here's the complicated architectural diagram of the Transformer layers and Bert but at a very very high level what Burke does is it computes a function of two input sequences so if you read in the original paper it'll be sentence one and sentence two and the output will be are these two sentences paraphrases of each other right so apply to information retrieval you can put a query and you can put a candidate text in instead of the sentences and you could ask Bert to give you a probability of relevance and so you're essentially using it as a relevance classifier so this re-ranking works something like this you get out the candidate text from the first keyword retrieval stage and here's Bert peeking in from the side you get the you iterate over each candidate document you say hey how relevant is candidate one how relevant is candidate two how relevant is candidate three and so on and so forth and based on the answer given to You by The Bert model you do re-ranking and you get a better set of results all right so um the roots of this idea in fact go back many many years so this is a paper from a journal article in 1989 and just honing in on the abstract you see oh it's it's by an author from a country that no longer exists as an independent entity just to show you how old it is and if you read the abstract uh it'll talk about uh estimating the probability of relevance PR given some feature representation of the documents that's what I just told you about except it was done with Burr right so today we'd call this a point wise learning to rank back then they were using relatively primitive features today we use Bert to do the same task and uh it's much more sophisticated of course but the underlying principles are the same so bird of course is a neural network but neural networks have been applied for re-ranking and information retrieval for a long time also this is one of the earlier ones that probably dates back even further but in all already in 1993 people were talking about using three layer fee for neural networks for estimating query document relevance all right so we've been doing this for a while or at least we've had the aspirations to solve the problem in this specific way for a while now okay Bert now this was the previous major innovation how does it relate to this Bing thing right so well the connection is quite clear they both draw their intellectual ancestry from Transformers so this is the typical architecture of a transformer diagram you have the encoder that tries to read an input and generate some latent semantic representations from it that then you feed to a decoder that generates output so Bert and all these models I'm gonna briefly talk about draw their intellectual lineage all the way back to this paper attention is all you need in 2017 the last time I checked this paper had over something like 70 000 citations so um Transformer architecture was proposed in 2017 the way they trained the model was from initial initially randomized weights the next Innovation after that was of course GPT now this is the original GPT right this is the uh there was GP now we have gpt4 before that we had GPT 3.5 before that we have gpt3 this is the original GPT right so they basically said hmm well let's take the Transformer architecture and that's Lop off the encoder part so it only has the decoder and then that's add the pre-training using Auto regressive language modeling on top of right around the same year came bird and Bert was essentially the opposite they said hmm uh encoder decoder too much that's Lop off the decoder part work only on the encoder part and we'll pre-train it using mass language model as the objective and then finally in 2019 came the T5 model that was the pre-trained version of the entire encoder decoder Network right so there is common lineage of all these models that we're hearing today dating back to the original Transformer architectures all right so back to this two-step process of um of of doing search of doing information access so you select the candidate text and then you re-rank them somehow by trying to understand where I put understanding quotes to get a better result these days it's mostly done with Transformer based architectures encoder only decoder or encoder decoder okay now what about Innovations up here selecting candidate text well so as I've already told you before it was done using something like bm25 a bag of words term waiting scheme as a reminder this is what it looks like you get back uh you feed it a document you get back a feature Vector that looks kind of like this now as it turns out what you can do is simply rip out the bm25 waiting function and replace it by encoders that are based on some type of Transformer architecture right so this works by feeding the Transformer a large number of query document Pairs and essentially asking it to do representation learning right so you're asking it so you see a query document pair I want the similarity to be close together and you see a query and a non-relevant document pair you'll want to push the similarity apart and you do that with neural networks you learn good representations for queries and documents right but it's very much the same high level design instead of a bag of words a representation you get in a you feed it the the model a piece of text and you get out a dense Vector representation typically something like 768 dimensions of real value to to pick random numbers um and uh that's the representation where putatively the dimensions of this Vector represent some type of latent semantic meaning but don't read too much into that okay so this is if you hear the large uh the the discussion about Vector search that's what people are talking about it's nearest neighbor search over query vectors using uh over document vectors using a query Vector so you have a bunch of document vectors that are encoded by a Transformer based encoder you have a query Vector that comes in and you're still doing top K retrieval all right so Transformers have been used in search since 2019 right so even though you just heard about chatgpt a few months ago the same type of technologies have been deployed in Google at Bing for several years now all right so the next question I want to address is well what really is the big advance from this to what came before the the modus searching We Know by the by the phrase the getting the 10 Blue Links okay so I like to talk about abstractions so that's go back to boxes and filling in boxes okay so at the high level what is chat GPT it's a large language model it's fed into it uh pre-training Corpus and some instructions right so step number one is auto regressive language modeling and step two is instruction fine-tuning using uh reinforcement learning uh with human feedback right so you build the large language model and into it some type of prompt and you get out some type of amazing response a completion is what they call it so um how does Bing work while Bing is uh bing this is uh the new Bing search is using a large language model we now know that it's using gpt4 behind the scenes so in comes a query and out comes amazing search results like the example I just showed you a few slides ago but how does it actually work well it works by talking to a retrieval model right so the query comes in the language model does something it sends a query to the retrieval model the retrieval model uh does some searching Returns the result and the language model post processes it okay I'll I'll give you an example exactly how that's done in a bit but I want to open up this black box and uh ask the question well what's going on here well in fact I just spent the last 20 minutes telling you what's going on in that box all the things I discussed Transformers being used in search since 2019 well all of that is now squeezed into this box all right and so this is generally the approach known as retrieval augmentation right the large language model is not doing everything by itself it's calling out to a retrieval model that's supplying with it the candidate text that in the later post processes in a way okay so retrieval in other words forms the foundation of information access using large language models right you don't have to believe me this is exactly how Bing works if you go read their blog post oh the URL got chopped off but the Bing's blog post exactly describes this type of architecture they have an orchestrator on top but it's essentially GPT talking to a retrieval model in their case a retrieval model is the Bing search engine okay Bing calls is grounding and I'll illustrate the sort of the effects of grounding using this example okay so without grounding without retrieval augmentation you could just ask uh the large language model something like this uh tell me how hydrogen and helium are different and it'll give you something reasonable if you want something better what you could hand to the large language model is a prompt that looks something like this given the following facts tell me how hydrogen and helium are different yeah and you put in a bunch of facts about it and it will construct an answer that is much more to the point because you've told the large language model I want the answer but I want to ground it on this this is what retrieval augmentation provides right so thinking now in terms of uh so rolling it back and not talking about Bing and chat GPT in particular I'm going to send the next few minutes talking about this interaction a little bit more what are the rules of large language models and retrieval Okay so to explain that I have to make a confession all right and the confession is that I lied um I lied mostly in a mission yes I hallucinate it also so what is the problem that we're trying to solve in the first slide right I uh stated you know I want to solve the problem of how to connect users with relevant information but what I forgot to add is like why are we doing it all right why that's the critical omission okay so some of our colleagues have expounded on this at length so uh Nick Belkin a senior information retrieval expert would say something like well we want to connect users with relevant information to support people in the achievement of a goal or task that led them to want that information in the first place okay so some people would say we want to connect users with relevant information in order to address an information need I think that's accurate but I think it's uh kind of circular you want information because you have an information need [Music] it fits but not not ideal okay uh some people would say well we want to connect users with relevant information to support the completion of a task that's actually very practical and accurate although it's not particularly aspirational right we want to connect user for information to do a task okay so what I really like is what my colleague Justin zobel says we want to connect users with relevant information to Aid in cognition well now that's something aspirational that we can all lash onto now you say cognition seriously isn't that too grandiose absolutely not if you look up the dictionary definition of cognition here's what it says this is from Merriam-Webster I believe it's the mental action or process of acquiring knowledge and understanding through thought experience and the senses at exactly explains what we're trying to do Okay so looking at the broad picture I focused on this black box but in fact I've drawn it too big this is the cognition process right I've drawn it too big in fact it should actually be cut down to size because the actual process by which you get results exist in a broader set of activities of which that it's just a small piece okay let me walk you through some of them okay so for example uh the query has to come from somewhere right the query comes from somebody with an information need so the information need or needs to come from some abstract information itch that I want to scratch into something that I actually type into a computer keyboard or or talk into my device right we haven't actually invented uh mind reading computers yet not not yet so we still have to go through that query reformulation process okay um how often is it that you get the results and you're like oh that's it I'm done I got exactly what I needed no that doesn't happen so there's this interactive retrieval process where you read the results and you're like Ah that's not quite right uh I don't want that oh here here's an interesting keyword let me put it back in the search right there's this interaction Loop that goes on um when you're doing this you're doing this in the context of some tasks you're trying to accomplish this is a once again that got cut off in the slides but this is a task tree diagram from a recent paper by uh sharak shell at all and so it's trying to show that things that we're trying to accomplish can be hierarchically decomposed right so for example you're trying to plan a vacation what are the subtasks of that well you got to figure out where to go first you got to reach uh research each destination then you gotta book The the flights and the hotels and the activity and so on so forth and it can be decomposed into a tree-like structure and each one of these subtasks might be on different devices I may start with a uh on the browser and then come up with a brilliant idea and ask in uh ask some questions on my phone and then move back to the computer for some other types of tasks um but and and this may occur temporarily distributed over days if not weeks all right um and of course there's synthesis right so you have multiple queries you gotta look at the results pull in the relevant portions and synthesize them together before uh and all of this needs to go round and round in an iterative manner before you finally get the task completion and have that light bulb go in your head right so um previously we've been focused mostly on this why because it's the part of this whole broader processes that we can make Headway as computer scientists but the cool thing is now with large language models we have the tools to be able to tackle the other parts around just the core retrieval Black Box right so um and in fact this is where the Bing interface is trying to go okay let me talk about some specifics right so before we had large language models in the Google search engine Temple links view of search you had to come up with query terms yourself you got to look at the results you got to say hmm that's a good query term I'll plug that in oh that's a bad term I'll throw that out and I'll iterate uh with another query with large language models you get much more natural interactions through natural language so that's one key difference that llms had may have improved the process here's another one before you had multiple queries multiple queries and multiple results and uh you all know this one each one appears in a browser Tab and you have a browser window that has way too many tabs and your putting through the going through the tabs trying to synthesize the results now you get some level of awesome automated synthesis that's aided by the large language model right let's talk about this before you had to manually keep track of subtasks right to plan a vacation you had to do some preliminary research and you got to dive in into each one of the options and finally you got to make the reservations with loms it'll help they'll try to guide you through the process in a helpful way and that alleviates a lot of the burdens on essentially navigating this task tree all right so this is where Bing wants to go foreign but of course none of this is fundamentally new let me just give you some example so um here's a paper from computational Linguistics journal in 1998 this was 25 years ago it's about generating natural language summaries from multiple online sources by Drago radev and Kathy McEwen here's another word I don't need to tell you the date because it's already up there and this is from over 15 years ago and it's the task overview from one of these uh Benchmark evaluations that are a community-wide and if you look into one of the paragraphs I've blown it up what is the task well the task is answering a complex question synthesizing a well-organized fluent answer from a set of 25 to 50 documents uh and it has to be more than just stating a name date or quantity and summaries are evaluated for both content and readability 15 years ago we were already trying to do this and then even the idea or the metaphor of information retrieval as chat this goes back even further this goes back decades right this was a time before you couldn't use the search system yourself you had to go to a library and talk to a librarian that did these mediated interactions with you they conducted with you what's known as a reference interview right they ask you questions about the stuff that you're trying to find and they were the ones that were trying to that access the online databases uh behind the desk right so Bing is getting its metaphor from something that is decades old right so coming back to this right none of this is fundamentally new but the key Point here is these large language models allow us to um do everything better before these goals and these capable capabilities were more aspirational now today with large language models as a tool we actually have the the capabilities to to to execute on this vision and that's why I think it's really really exciting let me give you an example right so back in 1998 when uh Rod of and McEwen working on this multi-document summarization uh at a high level it worked because they did a lot of manual knowledge engineering right the system had a lot of uh templates essentially ontologies that were manually engineered this system was designed to summarize events a specific series events related to terrorist act and and the like and it was only with the aid of these knowledge templates that we were able to make Headway on the problem right so as soon as you fit in uh personal interest stories or something like that and wanted the systems to summarize those it basically fell apart Okay so today we don't need this knowledge engineering right llms allow us to execute in the idea but do it in a much better way Okay so the message I'm trying to convey is that before we are focused primarily on this box because that was the part of this broader information ecosystem that we could most easily make progress on but with llms as a tool we can now tackle the entire problem okay let me start to wrap up by giving you a few examples okay so here is an example from Bing search not doing so well all right so this was uh on a tweet thread that I I pulled off the uh off Twitter a few months ago all right so the answer that it's trying to uh to give is uh recommend me some phones with a good camera on battery life under this particular budget and you see in the answer it makes a lot of things up it just doesn't get it right uh so gets the price doesn't get the price right doesn't get the price right here uh it thinks the camera has a wrong megapixel count the wrong battery size so on and so forth right we know this today as the hallucination problem all right it's gotten better but it's still not solved but I think we have the tools to solve it retrieval augmentation that I described to you is the most promising solution to this challenge in my opinion right so instead of the recap asking how hydrogen and helium are different this is the working example we ask we we tell the model the large language model to tell me how hydrogen and helium are different given all these facts and if you hand it a bunch of facts it's less far less prone to hallucination right well what's that well that's this box and so this is what I say to my students we've been working on this for a while today it is not only com important but it's critical right because we all know the garbage in garbage out phenomenon right so if the grounding the facts that you're trying to feed to large language model are garbage you're going to get garbage output which makes a retrieval component as I said not only important but essential but this also means that the large language model once it's gotten the facts shouldn't screw it up right that's another complementary line of research that we need to explore further okay so none of this is fundamentally new llms allow us to do it better and there's still plenty left to do now let me Circle back to this uh to this problem that we're trying to solve right how to connect users with relevant information to Aid in cognition so let me step up at an even higher level okay so at the end of the day we're starting with some artifacts digital or maybe physical and we're going through some process and at the end on the other end comes some type of cognition some types of Greater some type of Greater understanding that we've gotten and in the beginning this was all mostly via human effort manual effort right with technology what we've noticed is we've gotten more and more system assistance with each new generation of Technology we've gotten more and more system assistance so the interesting question is now that there are tools have gotten better what happens right do we all become fat and lazy like Wally or is there an alternative future well so one simple answer is that we now become far more efficient all right but I think the more the better answer is that these tools free us to do more to tackle tasks with greater complexity right so I don't think it's about AI artificial intelligence it's really more about IA intelligence augmentation so all of this I've been hearing about AIS replacing humans that is not the right discussion to be having it's not about replacing it's about assisting and augmenting human capabilities all right so coming back full circle right with so much noise from catch EBT llms and the we're all gonna die crowd where are we or you already know the answer so the tldr is none of this is fundamentally new people have needed access to information for literally thousands of years Transformers have been applied to serve since 2019 multi-document summarization is at least 20 years old and this whole idea of a search as a as an interactive dialogue dates back even further right technology has augmented human cognition for centuries right but the key difference now is we have more powerful tools we have more powerful tools to make us more productive and to expand our capabilities and there's still plenty left to work on okay so the message is Keep Calm and Carry On but the more optimistic version of that is actually it's an exciting time to do research and that's all I have I'll be happy to take any questions okay so do we have any questions right so uh here the rules are you got to speak into the shiny part okay all right hello you mentioned uh something about query reformulation right in the beginning we used to type in something into Google find something else and not be satisfied yeah yep with large language models we have a very similar type of procedure with product engineering right you're trying to engineer The Prompt and get some results you're not being happy with what you get and then re-engineering the problem sure so how does it exactly solve the problem of query formulation ah so the question is uh you're you're saying that before you just put in keywords and change the keywords and now today uh well you just try a prompt and the prompt didn't work and you got to try another prompt so how have things gotten better um I think things have gotten better because the cape it's it's able to the models be better able to understand your intent and so the amount of low-level fidgeting with the queries has decreased and so I think what you're trying to do with a prompt is try to alternatively formulate your information you know what you're trying to do at a higher level right so it's not choosing the right keywords it's about well that tone is not quite right so I want to change the tone so in that way I think it's an improvement so that's sort of Point number one another way to address it is that look prompt engineering didn't exist three months ago right so we are in the beginning of this revolution right so learning how to do better prompt engineering is like learning how to better search Google in 2003 which was a valuable thing people didn't know how to search Google and so uh in some ways things have changed and in some ways they haven't does that answer your question absolutely okay um here excuse me oh gee I can really I could really mess things up there yeah right can you pass this back thanks so much very interesting and like I like how you sort of laid it all out I think in the beginning you sort of said there was this process of uh document collection and I'm wondering uh it seems like that plays an even more important role when you start to get to these like use case specifics so if you're talking about you know a travel company might have to collect their documents in particular fine tune the model in a particular way and so I wonder if you have any thoughts about that process and the role that that process plays and sort of the development of these so the answer is you're absolutely right all right and this all goes back to garbage in garbage out right and so um people have expressed concern about the data that's being fed into these large language models number one we actually don't know exactly what uh GPT 3.5 gpt4 is trained on we have some idea uh and but we're pretty sure it's ingested a large portion of the web including all the toxic material that's found on the web and so that is a point of concern but I think we're uh as we're as we move forward and these Technologies become more and more commonplace or commoditized I think it will become more and more practical to essentially train your own models so there are not quite as capable but fairly capable open source models that you can download the so-called jailbroken llama models and some other open source Alternatives like Dottie is a one that people have been playing with that you can download and you can further uh pre-train on on your own internal data you can change the alignment by by doing your own instruction fine tuning and so um I'm optimistic because there are a lot of options to solve all these all these problems yeah okay so all right hi Jim thank you for the really interesting uh talk so you were presenting or discussing the idea of using retrieval to improve the or you know manage a little bit the hallucinations of large language models so what went wrong with Bing because Bing has this very architecture that you were mentioning uh but it still had all these factual errors so okay um so there's a separate question of whether or not um Bing and all these Technologies were deployed prematurely all right so um I I think that's one of the that's one of the uh that's one of the concerns here is that these large companies may have perhaps rushed the models out to Market before they were quite ready right so uh but you'll you'll notice in in the slides that I have up that there were actually two steps right the first is You Gotta Give it good grounding right so if you retrieve misinformation the language model is going to spew out more information right if you actually gave it articles from the New England general medicine and asked it about uh vaccine Effectiveness for example it's going to do a reasonable job however I was very careful on top that we gotta make sure the language model doesn't screw up the facts that you fit it via the retrieval augmentation and things are getting better but we're not quite there yet but I think this is a technical problem we can make progress on does that answer your question no oh so I'm going to queue up a question from um the the web the the last one we saw but before I get there I want to sort of set it up with um the hallucination problem is clearly a big problem correct and I think you were outlining a solution which is that we want to identify each of the facts in the statements and then we want to attribute all the facts to something that's credible and so that's sort of a attribution problem or retrieval problem now um could you read the last question we have come in enough precise context included llms might have a better factual response as there are numerous kg databases and other forms of knowledge do you feel we can make llms respond with facts efficiently and what's your suggestion for dealing with this oh absolutely uh I have to thank the uh I think the uh the the asker of the question um and giving these slides there's always things that you have to call that's sitting on The Cutting Room floor but um the more general form of a large language model depending on a retrieval model is a large language model depending on external knowledge sources and external apis of which a search engine or a QA engine is one right so there's nothing to prevent the large language model from querying a SQL database from issuing a sparkle query to a knowledge graph from issuing a query that is uh that gives you a real-time feed of the latest scores from last night in fact that's exactly helping is able to answer questions about the game last night hmm okay so that's that's a very promising area of future work uh wonderful thank you for the very nice talk I I wonder and this is more on the retrieval augmentation stuff so we have these two components we've got these giant llms we've got retrieval models that do a pretty good job but they're totally decoupled and so like I wonder like you know I completely agree that you know retrieval augmentation is likely a path forward to addressing some of the hallucination issues but nothing in the standard llm objectives uh you know encourage or uh you know necessitate that the model actually pay attention to the context with which you are prompting it yeah right and so I wonder if you have thoughts on perhaps like you know better pre-training objectives for the llm component that enforce that as a constraint by construction or if you have other thoughts on like how do you actually realize that kind of criteria yeah that's a that's a really good question and it's something I've thought about so I at least um Can restate the problem in a slightly different way so whenever you have these decouple components um you lose differentiability pretty much right and as soon as you lose differentiability you lose a lot of what makes these technology these techniques work being able to train end to end and so uh various people have been trying to have their cake and eat it too right so attention mechanisms that allow you to access uh the external knowledge sources directly there have been various attempts along those lines they're all promising and I think you're answering part of my same question those are that's another example of why this is such an exciting time to do research right now does it help I yes I I think I I think I know what the problem is but I don't have a solution and if I did that would be my next Europe's paper or something like that right uh thank you so much um it was a you know it was really nice to know your views I would also like to ask you without losing differentiability is there a way to get the biases out of these models that generally creep in and I'm sure you were expecting that question yeah so I I yes um I think what gets lost in this discussion about biases and more generally the question of alignment is that we often lose sight of the fact that I think at the end of the day we're gonna need models that are aligned differently for different tasks for different audiences uh based on different cultural backgrounds different expectations different domains Etc so I I think it's more it's less helpful to talk about you know how do we get bias out of the models in the general case and focus more about how do we get bias out of the uh the models for medical diagnoses for uh job recommendations for particular Downstream tasks I think if we start to look at the problem from that perspective it becomes a little bit more tractable um does it help yeah all right so I think we're sort of out of time but I did want to sort of end with one note I have some students who are feeling a little discouraged with all the hype they're hearing and they're kind of wondering if it's sort of pointless to be studying uh this stuff anymore because it's all been solved but I kind of suspect that your program the Ia program is probably going to take about as long as the other programs you describe so do you think it's going to happen before you retire yes no I I don't think you're on the record uh yeah yeah so what's gonna happen I I mean in the sense that our tools will become more capable and will be able to do more so from that perspective uh this process will be um a never-ending yes in the sense that the problems that we think are problems today I think they'll be solved by the time the my my career comes to the end so for example I think the hallucination problem is I'll I'll say this on the record it's not going to take as long to solve as we think it is because the way I've formulated as a retrieval augmentation problem where the language model's job is just not to screw up what the retrieval module gave it I think that's a much more technical problem I think with respect to bias if we start thinking about uh properly lining the models for particular Downstream tasks we're going to see these problems uh solved in a much more practical manner you'll have to repeat any of that ah so what about ATI I anticipated this question and I addressed it at the very very beginning right so for the purposes of my own research program what I'm interested I just want to connect people with the relevant information that they want I to be honest I really don't even know what an AGI is I have today not heard a precise definition of what an AGI is and so I say yeah you know people can do other things like unravel the mysteries of Consciousness and build agis I'm going to focus on something that I think is much more practical but still very impactful so let me end by I'll go on the record and say I think you're wrong I remember uh there are the recordings of people in the 50s saying that all the problems in machine translation will be solved in five years uh and uh it's 50 years more than 50 years long later and we're still working on it okay I have a feeling that this is a good time to be studying this stuff because there's no problem about uh uh you know employment you're going to be busy yep um and uh I think you'd be lucky to get to the progress you have in mind before you retire okay all right but anyway we'll see if you're a writer see if I'm wrong when you retire a retirement party okay we'll have this discussion okay great let's thank the speaker [Applause] [Music] thank you
2023-05-05