Workshop on Foundation Models Session I Opportunities and Responsibility

Show video

So the first session is on opportunities and  risks for foundation models. So we'll start   with our first keynote talk by Jack Clark. Jack  is a co-founder of anthropic and AI safety and   research company working on building reliable  interpretable and suitable AI systems is also   co-chair of AI index, which tracks AI trends over  time. Uh, previously was a policy director at  

OpenAI where he shelved helped shape policy around  efforts like Q3. So Jack, um, the floor is yours. Thank you very much, Percy. And I, I  promise that Percy and I didn't coordinate,   but I am going to give a presentation that  makes a couple of points that Percy made,   albeit in perhaps a slightly blunter form.  So what I'm going to talk about today   is how, and I'm going to get to I'm using  the term big models, but big bottles,   foundation models are becoming parts of the  ecology of the internet. They're going to change   sort of the environment that we will operate  in. And at the same time, they're going to be  

influencing the power of relationships  between different AI actors. Uh, this   presents opportunities for coordination. It also  presents some risks which we need to pay attention   to. And I think that these models represent a real  grand challenge for society to try and analyze   and sort of integrate, uh, safely. And as, as  perfect alluded to these can be incredibly useful   tools and they could also be, uh, harmful via  bugs or harmful vibe, a factual they're capable.

So I'm going to give a sort of an overview talk,  which tries to touch on some of these issues.   And a web person has covered a couple of the same  things I'm going to go relatively quickly. So   first, what are these models? Well, the foundation  bottles paper from Stanford sort of defined some   as models, trend, or broad data at scale that can  be adapted to a broad range of downstream tasks,   uh, examples of birds, GPT free and clip. I think  that's, that's pretty good. I prefer a slightly   different definition, which is I, I think if these  models is big models and the reason why I think of   them as big is, but I think one of the effects  of these models is, uh, that resource intensity.   They cost a lot to train. They cost a lot  of data to, to be sort of dumped into them. They require a lot of engineering resources and  it's that bigness that scale, which re relates   to sort of the power that these models have  and how they change power relationships in   AI development. So that's why I'm using using  this term, but I think foundation bottles captures  

captures from tactically quite well. So let's just  quickly refresh on why we care about these models,   because as Percy said, these models can do few  short learning. So they can generally be adapted   to a very, very broad range of downstream tasks.  And they also natively have just a huge range of  

capabilities. We have models that can do in  the same model, you know, text generation,   text classification, string prediction, you  know, the same thing techniques you use to train   foundation bottles for texts. You might use to  train models to do, um, material science or,   or chemistry analysis. And it's all going to  go into lead to a sort of similar property. They can do data transformation. And as, as  people likely know, data transformation is just   an incredibly economically valuable things. So it  called the explains why these are being deployed  

so broadly and these models can be plugins to  other models. They can become a world models for   other systems like robots or, or other things  which other panelists are going to talk about.   So on the flip side of his capabilities of his  issues, um, Percy did a good job of, of going over   those. So I'm going to go quickly here, but as we  know, they have biases, they have the potential   to give inappropriate responses. They have the  potential to give dangerous responses because then  

you could ask a model how to build a bomb. And  eventually it might tell you how to build a bomb,   which is something that raises, uh, raises  a broad, broad set of confounding issues. They have this broad potential for utility and  also this broad potential for sort of misuse. And  

probably most importantly, they're kind  of difficult to interpret. Currently.   We know that these models can do a broad  range of things, but we don't necessarily know   what is going on inside of a model that lets them  do these things, but also relates to our ability   to characterize capability imagines. We know that  things emerge out of these models as you train for   new capabilities, many of which are desirable, but  we don't really understand the process by which   had emergence occurs. And we can't really predict  what capabilities are going to come out of a big,  

a big trading run. So this presents huge  opportunities and challenges and a good conceptual   frame. I've, I've found for thinking about these  models as I think of 'em as, as fun house mirrors.   And what I mean by that is any sufficiently  large model will take it a huge amount of data   and it will magnify some of that  sort of data and minimize alphabets.

And what this means in the context of  these bottles is certain types of culture   get magnified by these models, certain types of  culture get minimized, certain identity groups   get maximized, certain identity groups,  get minimized, stereotypes, get maximized,   stereotypes, get minimized to get a picture. And  this is a huge issue. It means that these models   have different behaviors for different parts of  the sort of landscape vaping trained on and where   these models intersect of culture is where a huge  range of really complicated issues start occurring   to do with bias to do with how these models  behave for different groups of people to do with   effectiveness. And I think that using this frame  of sort of a fun house mirror gets at some of the   challenge here where we know these things are,  but they have, it's very challenging property.   And as Percy said, but we really  do need to like emphasize base. These models are being deployed right now. It's  very much not like an academic concern anymore.   It's an economic concern. And when things shift  into economic concerns, power dynamics change,  

and you start to enter a world where huge amounts  of capital will have interest in the deployment   and development of these models. And when  capital has interest in something, it tends to   want to benefit of the most capital oriented  entities, which doesn't necessarily include   academia or civil society or government. So  we're in an incentive structure currently,   but points as towards these models being  developed primarily by sort of capital oriented   actors and not by the whole of society. And  as I'm going to get to later in this talk,  

I think that's one of the really big issues  we need to grapple with this conference today. So who's building them, you know, the GPT three  replications that exist are by open AI, the   originator Walway, uh, Chinese telecommunications  company with a good research team and AI 21   labs, uh, which is an Israeli startup for also  replications being worked on by Sberbank, which is   a really, really big bank in Russia with a good  AI research team. And by some other entities,   many of which are companies and all of these  public replications are companies. They've also   recently been code models developed. And these  code models much like foundation models, uh,  

you know, trained on very, very large amounts  of code have capability, emergence have broader   utility has some challenges, but all of the  people developing them or companies so far,   specifically Microsoft get hub and open AI by  co-pilots open AI via its own service products.   And Google just published a paper on program  synthesis with large language models,   but indicates, it's thinking  intently about this area as well. This fits into a broader pattern, but we really  need to, to, to emphasize, which is that if you   look at recent breakthroughs in AI research and by  a breakthrough, I mean something that was notable   to people in the field and or economically useful  and was subsequently integrated into business.   You notice that there's a correlation  between these things and compute usage,   and specifically you'll see the amount of compute  being used to train these things, go up over time.   At the same time, we're seeing the actors  developing these bottles change from academia   to companies. I here's a graph from open AI,  which I, and my colleagues did on analyzing   these large computer items. And all I've done  is like marked up, which fits were developed  

by companies and which bits were developed by  academia. And it's, it's quite striking. But the   majority of these things in recent years were  developed by, by corporate taxes and not academia. And if you go back in time,  it didn't used to be this way.   If we go and look at the  earlier era of AI development,   important things were developed like predominantly  by academia, academia built early self-driving   car prototypes. They built early speech analysis  systems and built a lot of the early primitives   for deep learning by Jeff Henson and others. And  do you see some coal productivity, like IBM built   a good system, but can play, play back Gavin?  Yeah. And the current at bell labs built, built  

lean at, um, but it's actually a more even world.  So we have to focus on this because, because the,   the actors that get to develop this stuff  determine the future of it, uh, in large parts.   And currently the act is developing the stuff, uh,  predominantly predominantly companies. So why is   that? Well, you know, speaking a little bit from,  from experience at open AI and at my current place   on PropTech, for some fairly obvious reasons,  you know, these models are expensive across   hundreds of thousands to millions of dollars to  train bumps, they need some money allocators.

They're also models that take time, you know,  these models take, uh, weeks to months to train.   That means that you need to babysit from during  training. You need to wake up in the middle of the   night and notice that your, your loss curve has  exploded and roll your model back to it earlier   snapshots, and do all kinds of like maintenance of  that, which is not particularly glamorous and is   also sort of an artisanal science right now, but  isn't like that well documented. Um, in addition,   these models have less inherent attraction  to certain incentive structures in academia,   because there, as Percy said, they are simple.  They are based on like a decade old technique.  

We've just scaled up a lot of data and compute.  So we don't have some of this inherent novelty,   but academia rewards. And there's just a lot  of engineering work probably in pure theory,   work industry reward space in the form of  money and career advancement and academia,   at least in ML. Doesn't, it doesn't so much,  although I think that some of that is changing. So this leads us to the, I think the complicated  issue that we need to work on at this workshop,   which is the people that get to use these models  of a private sector, actors who build foam   and the people that the private sector actors  sell their models to. So developers via an API and   academia, but only by access programs. And I'm  not, I'm not picking on Facebook here. I just   think Facebook generates a lot of these examples,  but Facebook is an example of a large tech company   that periodically does access programs for  academia to look at certain bits of its data,   then academia finds out, but it doesn't, it  doesn't play favorably for Facebook and Facebook   revokes the access. It would be naive to think  this won't happen in the AI community. And I'm  

not saying that to suggest that companies doing  academic access don't have good intentions,   but you'll have good intentions. They're all  like, you know, run by people who are reasonably,   reasonably nice. We're all human, but we will have  incentives. And these incentives are generated by   capital. And these incentives will at some point  mean, but access will be taken away. And so if   access is taken away and academia, isn't building  these models a really bad set of things happen to   do with a lack of accountability, a unconstrained  deployment, and a very uneven uneven world.

Additionally access programs, you know,  uh, make academia in some sense, dependent   from a private sector. And that's a bad thing  because as Percy said, you know, these models   have issues. They have really complicated  issues which need to be worked through,   and these issues are not going to be that fun to  work through. They're going to be really, they're   going to deal with some of the issues that we as  societies have completely failed at solving so far   like bias. Um, so to expect that conversation  to sort of have the best chances of going well,  

you want all the, for people in that conversation  to have leveraged. And currently all of the   leverage is held by a small set of private sector  actors, which means that they are not incentivized   to have an equal conversation. They're  incentivized to exploit that power differential.   Um, and again, we should have seen that this  will happen based on the incentive structure. So why should you, why should I believe you  build, even though it's very expensive and   challenging? Well, I think it's, I've mentioned,  you know, it's all about leverage information   is power. We want more information about these  models to be, to be broadly available and we want   it to be broadly distributed. And if we don't  do this, then students go to industry to learn   how to build these models. Many of them probably  stay in industry. Um, and don't cycle back into  

academia. Academia gets trapped into dependency on  the private sector for mobile access, which is not   going to lead to anywhere particularly great.  In the long term, the private sector gets to   shape for regulatory environment because it will  have the models and it will generate information   about the models and it will do its own studies  and it will partner with friendly institutions   and it will use all of this to shape the policy  environment in which these models are deployed.  

And academia won't have as much leverage and  academia won't be able to, it will be able to   critique, but not necessarily proposal Turnitin.  It's easier to get an alternative to become   true. If you can prototype it and prototyping  stuff here requires developing or analyzing and   developing models of this kind of class in  terms of scale and magnitude of resources. So final couple of points. And then we'll go  to Q and a, you know, what will these models   do in the world? Well, as I said, the internet  is an ecology of systems. Um, the interactions  

of these fees entities and this ecology lead to  societal changes. Uh, good examples that just   to look at the internet for the last 10 years or  so, where the emergence of platforms like Facebook   and Twitter and, and, you know, aggregators like  Google has centralized human eyeballs onto a small   number of platforms mirroring the economy of scale  effects. You see everywhere, everywhere else.   And you could probably be of the same time. These  platforms are starting to recommend stuff to users  

via increasingly complicated and inscrutable  AI models. And actually we know that it's   changing stuff. My media habits have changed  because of YouTube recommendation algorithm.   I bet yours have as well, and this  has effects ranging from the benign.

And, but I know now know a lot about like how  cheese gets made, because I like watching cheese,   making videos for some, some reason,  no, no need to the ML algorithm to   the dangerous. We know that these recommendation  systems also drive political polarization.   Something that's just starting to happen now  is also via emergence of AI tools for altering   and augmenting our own subjective reality.  Snapchat filters, Facebook filters, deep fakes.   And again, these got, you used her range of  things, but fun and interesting to the harmful,   you know, DPX get used to generate misogynistic  revenge pornography. They also get used throughout   [inaudible], you know, they get used for  things, some of which are, are clearly,   uh, if not illegal, um, you know, right on the  border to things that have commercial value.  

What this means is that the world we're in  is already being influenced in a major way   by kind of non-trivial large-scale  AI models, but are changing culture. They are changing the  culture that we all exist in,   and that will have a huge effect. And to just sort  of tie it back to what I've been saying earlier,   if we're changing culture, we need  a large set of people to be able to   analyze the things changing culture, because  that has a big effect on human civilization.   And as it currently stands, these models  benefit capital because capital builds them.   Companies built these bottles and they get to  deploy them and they get to make money. It doesn't   leave quite as much room for different approaches  and different different stakeholders to benefit.  

Obviously, if these models have benefits,  they're going to be used to help people via   therapeutic chatbots for are going to be better.  Search engines are going to be aides to designers   and artists are going to be amazing tools to  help people for the batter programming, uh, like   myself program better by using natural  language interfaces into code models. You know, there's a whole bunch of benefits  out here, but because of a Funhouse mirrors,   they all are going to by default benefit everyone  equally, that's going to take work on our part   and work. And we're part of the academic  community. And because these, these models may  

cause harm to people, but are less represented or  represented, but stereotypes, we need to be able   to develop the tools to let us interrogate bats  and at least know that that might be happening.   So my final, my final point is just what are  some things we can do to make things go well,   I think it's all about reducing asymmetries.   There is a power asymmetry in model development  right now, and I believe it needs to be worked   on by broadening the range of actors that  can build foundation bottles. I believe   that universities should be building these  models either individually or as consultants. And I think beyond the universities, as Percy  said, we can look into things like distributed   training and big science and other, other ways of  kind of skidding this skinning, this cat. We also   need to deal with the inherent information  asymmetries here. There must be more tools  

available to assess measure and analyze foundation  models. Um, partly to encourage accountability,   partly to discover capabilities in  Burma, but we don't yet know exists.   And we need to create things like datasets  to test for types of benefits and harm.   And we need to create, you know, leader boards  that go into some of these societal impact or   ethical dimensions of the model, the same way we  have leader boards that look at raw performance or   on a, on reducing a loss curve for something  like, you know, super glue. And finally,   and this is something, but I, I I'll be having  some research coming out on soon as government   should invest in tools to audit, to measure  and analyze bottles that are deployed because   we need to create a shared, accessible set  of societal information about these models.

If we wanted to go well so long, we're building  them. We need to equip government with the tools   to see what is being deployed and to sort of be  able to automatically analyze and to some extent,   interrogate these models as they're being  placed onto the ecology of the internet.   So that's my talk. Uh, I think I've got time for  some very brief Q and a, before we go to the next   next section, or you can reach me on Twitter at  Jack Clark SF. I broke out a weekly newsletter   called impulse ai@jack-clark.net. And in the weeks  since I said, I was doing Mr. Talk, I've had a lot  

of kind of robust discussions with people on,  on Twitter and other places about this. So, so   thank you. It's, it's there, there's huge amount  of interest in this subject from the large set   of people. So thanks very much. And I hope that  we can do a quick bit of Q and a now. All right. Thank you, Jack. For that really insightful  talk. I really like how you've laid there,   the academia versus industry research, scap and  issues with power. Um, so let's do some Q and a.   So I wanted to start by asking you a question  about, um, incentives within industry. So there's  

a big difference between in terms of resources  between let's say a Google and a startup, and, um,   you know, right now it's, it's kind of hard for  a startup to break into the search engine market.   So do you think startups will be able  to compete in five years and what needs   need to be done right now, or  even putting academia aside? I think that if you look at things like neural  machine translation, that was a system that   Google developed and replaced a load of very  expensive specifically developed translation   systems at Google. And similarly, these large  generative models that are being trained now   are incredibly good at doing things like  search and other things. So I actually expect   the emergence of these models is going  to like shake up the game board a bit   in terms of dominance in different areas,  because once you train these models and   you could imagine sets of startups teaming up to  train one, and then all using their access to it,   to compete in different business areas  against larger companies, you may be able   to change some of the power dynamics event. I  think that's an opportunity that's out there. Yeah. Great. So now let's talk  a little bit about academia.  

Um, how should we get funding for  academia? I'm very curious about this one. Well, there's the national AI research  results, which is being worked on by the MSTP.   Um, and in the, in the U S white house,  there is an RFI out for that which expires   on October 1st and they're looking for ideas  or what that should look like. We also just   passed for endless frontiers act or as it was  renamed a youth worker by we, I mean, myself   and you sucker unlocks something like 200 billion  of funding, which we can potentially use to, uh,   increase funding at places like the national  science foundation and others. So I think money  

is actually out there. And what we need to do is  develop the, the payload for the money to go into.   Um, and also just speaking, frankly, whenever I go  to Washington, cause I do a lot of work in policy.   I talk mostly about this problem and I talk mostly  about this problem to the U S government and say   they need to significantly change resourcing  here. So that academia kind of has an ability to   do it. Uh, I don't know how effective that is, but  it's what I spend a lot of my personal credit on. Oh, that's great. All right. So unfortunately  we have to move on. So let's think Jack again   for that great talk and we'll chat more  about this at the panel. Thank you. So now  

we'll have two 10 minute talks from two Stanford  professors whose primary research area is not AI,   but whose worlds have been colliding  recently with foundation models? So first   is Michael Bernstein. I'm associate professor of  computer science at Stanford and a member of the   HCI group. Uh, Michael builds social computing  systems involving anywhere from small teams. So   large crowds to help groups achieve collective  goals. Um, he specifically looked at governance   and online communities and directive machine  learning, and recently has led the effort at   high to design an ethics and society review  board for AI research. Michael, take it away.

All right. Thank you very much. So I'm going to  talk from the perspective of a human computer   interaction researcher. So this is sort of my  day job in some sense. And let's talk about   what I'm going to call threshold effects here. So  I'm going to tell a story of ancient history to   kind of set the stage here a long, long time  ago, tens of hundreds of days ago, it used   to be really, really hard for us to make public  content on the web. And you had to own a server.  

You had to figure out how to set up Apache config  files. Um, you had to learn this like crazy tongue   of hypertext markup language. It was, it was a  big deal, but over time tools or rose that made   it much easier to publish. So you can think of  stuff early on like WordPress or, or media Wiki. Uh, and then eventually the a web 2.0, now I can  wrap my sound cloud. Things changed. And what  

happened here is well-described by what Brad  Meyers, Scott Hudson and Randy Pausch described   back in 2000, oh, this threshold ceiling  diagram. I'm going to draw a graph here   on the Y axis is just going to be what I'm  going to call threshold and how hard it is   to sort of get off the ground to do something.  And on the Y axis is going to be the ceiling,   which is the sophistication of what you can create  with that thing. So early on, you had to basically   be fluent in the entire server stack to get  anything public on the web. But over time as   Tim Berners Lee started with, uh, with the web, we  had HTML, which, which dramatically reduced the,   how, how challenging it was the  threshold to making a webpage.

And eventually we started seeing things like  markdown. I can, I can write posts on medium,   or I can just fire off something on Twitter  within seconds. And this has dramatically   reduced the threshold. Now it's also changed  the ceiling. I can't do anything as expressive   on Twitter as I can writing a full stack web app,  but that threshold has really dropped from months   and months to just a few seconds. Likewise, we  have really, really high threshold, high ceiling   tools like Adobe Photoshop, moving to something  like Instagram, where you can again do, do pretty   complex, but not as complex stuff. A movie's  premiere moving into something just like Tik TOK   work and that filters I can, I can do it now. So  we, we lowered the threshold. Something went from  

really hard to get in to real, to relatively easy.  And what happens, this is the central question   here, as it relates to foundation models,  what happens when you lower that threshold? The published thing? Well, two things, one,  you're going to get this massive increase in   adoption of the media, right? A lot more people  are tweeting than are making full-stack web   websites. And you also have a really broad  proliferation of different kinds of use cases   that the original creators probably were  not intending or aiming for. Now, this does   increase sort of an iteration speed in a sense,  you know, this is the very classic human computer   interaction cycle, where we have an idea we have  to, we implement it. And then we reflect on it.   And just like research itself, this practice of  design requires this reflection and iteration,   it's this, it's this iterative process. And  so every time you're lowering the threshold,  

you're making that implementation easier,  more accessible, faster. You're getting more,   more turns around around that cycle, which  means you get in principle better designs. It does mean cooption. So I'm guessing that  when we started the web, we were not imagining  

communities of knitters, but it's pretty  cool that we have them. Uh, we, we're not   imagining, you know, the largest encyclopedia  in the history of mankind, uh, excuse me,   of humankind. Uh, we were not imagining  that person, you could just write tweets   and it would go out to the world. At the same  time, we were probably also not considering,   uh, platforms would be used to convene the  alt-right, uh, that there would be a hate,   hate activity going on online that there would be  cyber bullying, uh, and that all sorts of things   were going to happen because people co-opt  these platforms in ways that maybe the I'm   I'm relatively confident that the original  creators were not intending or even agreeing with   we had this entire thing that  happened as the threshold lower. And this entire time though, one thing remains  stubbornly high threshold that's AI. It was  

incredibly difficult to get going, to really  build an AI. You'd generally required some   sort of advanced coursework or an advanced degree  and internship with Andrew doing a lot of effort   to collect data and to train it. So when we talk  about foundation models, I feel like we often put   our focus on the fact that the ceiling has gone  up. That performance is improving on a series of   tasks that this, that this approach has moved  our benchmarks. This is raising the ceiling.   But what I want to argue is that I think the real  action is going to be on lowering the threshold   because these foundation models have lowered  the threshold that the few shot nature of them,   the natural language prompts basically make  it so that all of a sudden you have people   who previously could not, or would have required a  ton of effort to train an AI, start crafting AIS.

And we can start to see that there are  lots of opportunities here, certainly,   uh, and here I'll point to work by a PhD student,  June park, finding ways that there, that, that it   is possible to help scaffold new non-expert users  in writing effective prompts. Here's a particular,   uh, challenging task that, that GPD three doesn't  do all that well on a few shot learner does barely   above chance at trying to predict whether, whether  a community is going to moderate a comment.   But if we start to decompose these concepts, that  people can start to basically take this, this   question and decompose it into a few different  questions that are more amenable to GPT three, all   of a sudden, we start to on some tasks, see better  F1 scores. So there's a real lever here for human   insights. It also means we're going to start  seeing a bunch of different kinds of interactions.

Now you'll notice that these citations  are actually from before GBT three, uh,   in the bottom left, you can see that the HCI  community has basically been working on codex   for the last five years. Um, so thanks, thanks  HCI community, but imagining interactions   ranging from, uh, being able to use natural  language, to generate, to generate code, to,   to edit an image to a defined fashion, to having,  uh, the ability to automate, uh, tasks on your,   on your mobile device to having a ubiquitous  computing systems that can do some activity   recognition. Uh, so that, for example, it'll  start, start making your coffee. When you wake up   now, all they're going to be all sorts of new  forms of interaction does I'm saying that,   that we haven't seen, for example, uh,  emphasizing this prototyping, uh, prototyping   approach. So if we, if we make it faster and  easier to prototype, what are we going to see?

So maybe we can start to see on the positive  side, things like social network, uh, designers   taking advantage of the fact that these  foundation models have memorized a bunch   of behavior on these social networks so that it  can use a natural language product, excuse me,   natural language prompts saying something like  here's a person who's upset about a breakup, and   we can actually see and populate a space with what  might happen. I could say that my colleague James   sees that post and decides to reply, and these  are actual responses generated by GPT three. So   we could actually start to populate these spaces,  not just with what might happen in a social space,   but also say here, Trevor decides to be a  troll. We could foresee the kinds of negative   behaviors that are going to arise before they  actually arise. If we can do this, I think we   can do a much better job of forestalling some  of the negative outcomes of on these platforms.

We don't have to imagine necessarily, uh, what  what's going to happen. We can essentially   simulate some of it and see, you know, what we  need to, we need to build protections for this   kind of thing, but again, I'm going to bring up  the butt. And here, I think I'm beating the drum   that the first two speakers really, really spoke  about already. We saw with publishing that when   we, when we lowered the threshold, we got many of  these positive and creative applications, but also   a number of really negative and harmful ones that  we were maybe not as a society, uh, able to, to   grapple with in advance. So I think we need to say  that the same thing is going to happen here as we   lower the threshold there's going to be,  co-option not just of these applications   that I, that I think will be,  you know, handy will be useful. And that HCI will be, uh, as a field really,  really excited to, to explore. But also, as  

Percy mentioned, uh, this will become an endless  fountain of misinformation. It'll become a troll   generator. That's really difficult to detect that  I can just, I can just pull up a thing that says,   Hey, um, this person, I don't like them. They  broke up with me and go, go mess with them. Uh,   we can reword articles for hope, higher, emotional  valence and vitality. So we know that these kinds  

of articles get spread further on social media.  If I can have a thing that makes me sound,   um, more, uh, that makes me more viral,  uh, hyper curated online profiles, we're   going to be able to get feedback and it's gonna,  it's gonna make us look a particular way that   makes us look good in the way that we want,  but then everything starts to feel more fake. Um, Jeff Hancock in the communication department  has been thinking a lot about this kind of work,   um, more targeting with less data,  right? That's a, that's a foundation,   a risky routine, classifiers. We're seeing things  like, uh, your, your insurance rates might start   going up because we can classify what you're  doing with less information about, about you.   So what, what, and all these things that  we probably haven't already foreseen.  

So what's going to happen here. We're going  to see a wide variety of new interactions,   both on the positive side, but I think really  we want to, we want to focus on what's what   are these risks and how can we mitigate those  risks? What principles should we be following   in the design and deployment, both of the  models and of the applications, what should   the applications, what should their contract be?  And I'm going to sort of handle that question on   contracts, perhaps off to my colleague, Dan  ho, who's going to take the next piece. Um,   I want it to thank, uh, students who have been  working on this kind of work that I mentioned,   uh, a colleague Percy who's who's, uh, co-advising  some of this work, uh, and, and funders. All right. Thank you. Michael Love the framing  of foundation models, reducing the special, and  

we'll come back to questions, um, in a bit, let's  go on to our second talk. Um, I dunno. So Dan ho   is the Benjamin Scott and Luna Scott professor  of law here at Stanford, a professor of   political science and associate director  at Stanford high, his scholarship centers   on quantitative and legal studies, focusing  on administrative law and regulatory policy,   anti-discrimination law and courts. So recently  Dan has been really interested in using NLP for   legal applications and his talk will intertwine  these two themes. Dan, please take it away. Well, thanks so much. Uh, Percy, um,  and, uh, I think, uh, uh, the work,  

uh, here in bringing this group together is  really, uh, been quite engaging. I'm going to talk   about two topics. One is the use of foundation  models for law as an area of application   and the potential promise there. Uh, but then also  the law of foundation models. Uh, that is what are   the kinds of legal constraints that may actually  affect what kind of foundation can be built. This   is going to be admittedly an American, uh,  sort of a us perspective in part, because   of, uh, my perspective and the perspective of the  wonderful collaborators who helped with the white   paper, uh, Peter Henderson, Neil [inaudible],  Mark Cross, Julian Norco and Jenny home, um,   on the first topic it's widely acknowledged,  um, that, uh, while the, uh, us legal system   strives for justice for all, there are pretty  profound, uh, issues with access to justice.

In 1978, president Carter gave a speech to the  American bar association that said, quote, we have   the heaviest concentration of lawyers on earth.  90% of our lawyers serve 10% of our population.   We are over lawyered and underrepresented. Uh,  that situation has not changed much to present   day. Uh, for instance, the systematic  look to the systematic underfunding  

of public defenders, uh, or, uh, the file room  of, uh, one of these, uh, um, uh, adjudicatory   agencies that looked something like this, a few,  just a few years back where it took about five   to seven years for a veteran, for instance, to  have an appeal, uh, resolved. And one estimate   has it that about 7% of veterans pass away  while waiting for their appeal to be resolved.   And, um, at the same time, uh, one of my  colleagues who spent much of her writing,   talking about access to justice, uh, pointed to a  kind of less noticed revolution of legal services   aimed at America's low and middle income,  uh, consumers, uh, technology is replacing   lawyers wholesale in areas like preparing wills  or forming limited liability corporations. And what we do in the white papers, we spell out  a kind of range of different applications across   the stages of a civil lawsuit. Uh, we'll note  that that's already quite prevalent in areas   like discovery, which is the process where,  uh, by, uh, litigants, uh, seek to, to find,   uh, facts, uh, and the kind of conventional a  way in which that was done was, uh, for instance,   by providing these larger bankers boxes of files.  And that process has been revolutionized already   by the use of natural language processing that  said one of the historic challenges in terms of   thinking about even applying these kinds of tools,  uh, to law is that labels are really expensive.  

The law has not had anything like the kind  of large scale benchmark data, uh, um,   sets that have powered, uh, sort of  NLP, uh, uh, uh, development. Um, and   so the basic question, the basic structural  challenge is that law is expensive. It's hard to hire lawyers to label legal  decisions. And one of our graduates, uh, Pablo   ardando figured out that actually there is a place  where we can look for high fidelity, uh, labels   of legal decision, which is that lawyers have  actually come up with extremely complex sets of   rules about how to cite prior precedent. Uh, and  one of those rules turns out to be about the use   of particular paranthetically rules that literally  states, uh, in the blue book, the, the conditions,   grammar, and how to characterize holdings of  prior court decisions, um, and holdings are   this kind of basic task in the first year of law  school. It's the part of the judicial decision in  

a common law system that can be relied upon as  precedent to site too. And so what we did is we   did some work here basically, uh, leveraging,  uh, this really, uh, uh, detailed set of rules   to extract these kinds of holding  statements out of existing case law. The context may look like the blue, then there  are the purple kind of legal citations. And the  

holding statement is something stated in that  pen theoretical that characterizes the key   precedential value of prior cases. And then it  turns out we can actually construct something   like, uh, one of the classic, uh, Q and a tasks  where we can find very similar holding statements   here about qualified immunity for police officers,  and turn that into a kind of question and answer   task. And here's where we start to see some  of the promise of, uh, foundation, uh, models   for the law, um, uh, foundation models, uh,  boost performance pretty significantly on F1.   Um, and, uh, what we then did is actually,  uh, compile, uh, through the Harvard case law   access Corpus, uh, kind of 3.4 million federal and  state court decisions and do some domain specific,   uh, uh, pre-training, uh, uh, one sort  of important thing here is to really pre   we found was really important to, uh, it was  really important to create custom vocabulary and   sentence segmentation that paid attention to  that complex, uh, system of legal citation.

And, uh, that turned out to actually perform,  uh, best. And that's where you see some of   these really interesting potential gains where the  gains from domain specific pre-training and these   foundation models seems to be largest when the  training dataset is smallest. That said we have a   long way to go and example offered by, uh, Julian  Nyarko, uh, is about, uh, GPT three and its limits   and discerning legal reasoning. So, uh, a simple  prompt might be are liquidated damages, clauses,   enforceable, liquidated damages, uh, typically  appear on contracts to specify what the damages   are if a breach of contract occurs. And it is  absolutely the case that GPT three compare it what  

the black letter legal statement is. Liquidated  damages are generally enforceable, unless the sum   stipulated is exorbitant or unconscionable, but if  you feed GBT three, anything like the kind of fact   pattern that we train our first year law students  on like, uh, the contract over at Toyota Corolla,   Corolla, where the liquidated damages are  posited to be $1 million, uh, GBG three is   not able to actually perform that simple form  of legal reasoning, nor is it able to do so,   even if you, uh, specifically state that  $1 million is exorbitant or unconscionable. So there's a long way to go, which leads me to  the way in which the law may actually constrain,   uh, the, uh, construction of foundation models,  given all of the basic questions that Percy and   Jack covered. So while about the performance  bias and mechanism of these kinds of models,  

and I'll only scratched the surface here, but  I'll highlight a few. Uh, one is obviously the   worries about, uh, the extent to which foundation  models may bake in bias and lead to disparate   treatment or disparate impact. Uh, another is,  uh, the questions about due process when used   within, uh, uh, any kind of decision-making system  where, uh, due process doctrine typically requires   a sufficient process, uh, when an adverse decision  is made and as Danielle Citron, uh, wrote in a   kind of logical AI based system, uh, there were  computer programmers, uh, in coding, uh, benefits   decisions that actually violated the  federal, uh, regulations on this. And that becomes of course, much harder to  audit when we're talking about, uh, the kinds   of decisions that may be embedded in more opaque  foundation models. Um, then second, there are   questions about input liability. Uh, Percy noted  that he didn't want his vacation pictures used,   uh, think here of, of GitHub, uh, copilot,  the machine learning based program   being assistant and their basic questions  about the licensing terms and fair use, um,   to the extent that co-pilot is trained on  data. That is not for instance on GitHub  

and the kind of biggest tension here lies in big  questions about accountability. It was only this   past term that in the van Buren case, the Supreme  court actually pared down an interpretation of the   computer fraud and abuse act that actually could  have criminalized a lot of web scraping conduct,   uh, to bring data into these models. Uh, uh, and,  uh, if, if you were violating the state of terms   of service of a particular, uh, website, um,  and that obviously had major implications,   uh, Facebook was involved in one of the kind of  underlying CFAA cases, uh, in terms of, uh, the   ability to actually access this kind of data  last, uh, we also spell out the fact that   there may be protections for model, uh, or, uh,  inference outlets from these kinds of models. Um, but at the same time, while there may be legal  protections, uh, the interesting thing from kind   of, uh, the perspective of law is that it may  actually challenge some, uh, core tenants of,   uh, uh, legal doctrine itself. So, uh, colleagues,  Tony Massaro, Helen Norton and Margo Kaminski, uh,   wrote a really, uh, uh, uh, fascinating piece,  uh, arguing that quote, if we take the logic of   current first amendment jurisprudence in theory  to its natural conclusion [inaudible], which was   the 2016 chatbot that was taken off of Twitter  after 16 hours due inflammatory tweets, um,   described by one newspaper as quote, artificial  intelligence at its very worst, uh, um,   [inaudible] may actually have first amendment  rights undercurrent first amendment doctrine.   And that is of course, something that may not  just affect, uh, foundation models, but it's   also the kind of cross-fertilization, um, uh, that  I think is so overdue because it may be an area,   uh, where, uh, the emergence of foundation models  may actually be really teaching us something   about some of the limitations of  current legal doctrine and inspire   a significant amount of rethinking. Um, thank  you that I think the alternate to Q and a,

All right, thanks tan. Um, and now we will  go to QA, cancel to turn on my video anymore,   but that's okay. Um, um, you know, we're, if we're  talking about building foundation models for,   for law, you know, Michael talked about  lowering the barrier of entry, um,   you know, the lowering of the threshold,  um, there's also a barrier to entry for law.  

For example, I account really to be illegal  is how do these two types of barriers   interact with each other? And maybe this  is a question either of you could speak to, Yeah, I'm happy to take a first stab at that. Um,  uh, I think, uh, you know, we, we did this other,   uh, report where we tried to understand, uh, how  it is, uh, that, uh, government agencies were   really experimenting with forms of AI. And one  of the kinds of themes coming out of that report   is that while nearly half of federal agencies, for  instance, are, are trying to do this, the best use   cases really came aware where you had a core staff  members within those teams that both had technical   insights and a deep domain knowledge in the social  security administration. For instance, they built   a sort of NLP system, and it was really built out,  uh, by some of the folks who were adjudicators for   several years and figured out what they really,  what kinds of tools they really wanted built for   themselves in the process of adjudication, which I  think exemplifies some of the sort of interesting   HCI, uh, components of this. But Michael,  I'd be curious for your perspective on it.

Yeah, it's interesting. One of the, one of the  first things I got asked, uh, from someone in   the government to do when I started working on  crowdsourcing was to help them crowdsource more   interpretable versions of, of, you know, public  legal documents and policies. Um, I think we   have to differentiate here. There's sort of what's  internally needed for the, for the legal policy to   be correct. And then there's, what's projected  and clear to sort of the, to the citizens. And   those may be very different in the same sense of  like how I would teach, um, you know, how machine   learning works to someone who's just learning  is going to be different than the conversation,   the kind of conversation you would have with  someone who who's an expert. And so, um, I   feel like that needs supervision that doesn't  exist. We can't just sort of say, oh, here we go.

Because if we just try to translate, what's  already there, it's going to try to take stuff   that might be, as you suggested, like quite  sort of technically complicated, legal detail   and make it simple, but in a way that's not right,  or it gives the wrong intuition I'm in some sense,   um, almost more interested in how it might help  go the other direction that if I can provide some   there's this vocabulary gap, you know, if I don't  have the words to describe what I need to know,   or the kinds of legal policies I need,  like help me get in touch with experts,   who can, you know, who can, who can help  me? Uh, it just helped me cross that chasm. Right. Thanks, Michael. Um,  we're having another question on   both of you discuss the risk and co-option of  these models. So how should we reason about   these risks and what recourse should we take  to proactively and reactively mitigate them? I want Dan to go first on this one. Well, I think part of the, the reasonable, why I  split up my portion of the talk to both be about   the application portion and also, uh, an  understanding of the legal constraints is that   a lot of the times, the way we choose to do this  as a society is to take broad ethical principles   and then write them down into law. And so some of  the ways in which we obviously do risk mitigation   has to actually involve, uh, uh, writing, uh,  enforceable rules rather than, uh, merely, uh,   sort of afforded Tori, uh, uh, ethical principles.  Um, and so, uh, you know, I was very interested  

for instance, in what, uh, uh, Jack noted,  which is, uh, how do we actually build, uh, a,   uh, workable, um, kind of audit system.  And, uh, I think, uh, uh, uh, perceived,   I think you've made a really compelling case  that, uh, academic auditing is part of that.   Uh, but I would also probably say that that, that  alone is not enough that there may have to be   actual enforceable, uh, rules to be able to audit,  uh, systems, for instance, that might be used for,   uh, uh, benefits adjudication to make sure  that they're actually designed and work,   uh, in a way that is, uh, uh, legally compliant. Yeah. W what I'm wrestling with here is, you  know, in, in computing, there's a real push   for CS for all right, to try and make to lower  that threshold, to give it give as many people as   possible access here. And yet, in some, you know,  in other domains, that's not necessarily the case  

have licensing procedures. We have, uh, you know,  communities of practice where we're groups will,   will self adjudicate, right? So medicine, law,  and others, where there are real consequences,   you know, you're barred from practicing, uh,  if you, if you have a severe ethical violation.   And so I, I'm wondering at what point, you know,  are these tools, you know, no one says medicine is   bad for you, or that it can't be used poorly, but  like at what point do we need to start asking more   seriously about those kinds of professional  associations and what that would mean, um,   and what harms it would do to create that in terms  of restricting access to potentially marginalized   groups versus the benefits of, you know, you know,  say cutting out negative, uh, negative use cases.  

That's really what I'm chewing on is like, where,  where do we, where do we stand on that? Yeah, Just to follow up on that. I mean, in your talk,  you talk about, uh, the co-option of publishing,   which I guess we've seen, um, kind  of play out. Are there any lessons   there that we might be able to take and, and  apply them to either, uh, foundation models? One thing I can say is I was   early on, I think more optimistic about the  role that sort of design and research could   play in sort of setting models and think of it as  a role model or a path forward. Look, we should  

be doing it this way. I think I've become far more  cynical over the last, say, five years, maybe more   in line with Jack, uh, that, well, we can show  pro social paths. There's nothing that prevents   a anti-social actor from taking, taking that and  just ignoring it or doing the opposite. And so   in general, my sense, and Dan would be the  expert here. My sense is that policy tends to   be reactive rather than sort of happening before  the issues are before there's a violation. Um,   but to the extent that we, as a community can do  something like what, um, you know, the researchers   didn't CRISPR, where they came together, came up  with essentially a set of principles that they   published alongside the technique and say, here's  what we think is okay. And here's what we think  

is a violation. I think that's fairly counter  normative behavior for computer scientists,   but I think it would be a really important step if  we could, you know, say in this group or elsewhere   convene civil society, researchers industry, and  so-and-so, this is, this is what is, okay, this   is what's not, and then there becomes at least  social approbation for that kind of behavior. Yeah. We definitely need some  more professional norms here.   All right. Uh, thank you both for your talk and  taking the questions. Um, so this concludes the,   the toxin. Now we're going to, um, move on to our  first panel discussion. So let me introduce the   panelists. The first panelist is Jack Clark. So  please welcome him back. Um, next we have Sulan  

Blodgett. Um, she's a post-doc researcher at  Microsoft in a faith group. She studies social   implications of language technologies, and just  recently led some really nice critical work on how   to think about bias and how to measure it  in LP. Um, next we have Eric Horvitz, he's   a technical fellow and the chief science officer  at Microsoft he's led efforts at the intersection   of technology people in society. He has made  foundational contributions to principles for   responsible and ethical AI, and has always been  a strong advocate for a human AI complimentarity.

Um, next we have Joelle Pineau, she's co managing  director at Facebook AI research and associate   professor of computer science at McGill. She  has worked on planning and partially observable   domains, dialog systems, robotic assistance, and  she's led some really remarkable reproducibility   efforts in the machine learning community for  the last few years, such as the reproducibility   challenge and the river's participant checklist.  Um, and finally we have Jacob Steinhardt. Jacob   is an assistant professor of statistics at UC  Berkeley. He works on making machine learning  

more robust and aligned with human values.  Recently he and his group have built several   challenging benchmarks for language models across,  um, different domains, like language and code. All   right. So welcome all of you. Um, and let's get  started. So as it's advertised in the name of the   panel, I want to start with two broad questions,  one on the opportunities and the one on risks. So is it the first one is, you know, w what are  these applications or sectors, um, do you think   are most going to be transformed by foundation  models in the next two years, even based on   kind of the current trajectory now, what are these  types of populations or types of interactions that   we haven't seen? Um, you know, we need a bit of  imagination haven't don't exist yet, but are now   made possible. Um, maybe we could start with, uh,  maybe Joel and Eric, uh, since they, uh, being at,   you know, Facebook and Microsoft have seen a lot  of resources invested in kind of applications   of the foundation models, and then found we can  go to others. So maybe Joel, you can go first.

I can start a bit. I mean, I think, you know,  there's, there's definitely tremendous potential   that goes across all sectors. So it's a little bit  hard to, to pick which ones is the most promising   one right now to be Frank with you. I think you're  the report, um, put out recently does a great job   at highlighting a few of them, education, health  care, uh, law as, as being prime examples.   But I think that could have just as much talked  about some of the potential in transportation,   some of what we're seeing, not just in smart  vehicles, but across the whole realm of, of the   transportation industry, some of what's happening  across finance and that industry as well. And,   and, you know, I will mention, I think creativity,  entertainment, all of this ability to really   harness the work that's been done on generative  models of AI and use that to enhance human   creativity in really significant ways. I, this  music credibly exciting and really promising.

Yeah. The ability to generate is definitely  something that's kind of jumps out at you.   Um, Eric, do you have any things? Yeah, sure. I mean, I said just that the,  you know, the scale of the data compute and   the model capacity that's been enabled by the  unsupervised learning methodologies has really   led to both advances in reputation and generation  per Joelle's comments. So I'm excited about the   possibilities of, um, into, into how this,  these technologies will change the lives of   people within their professions, as well as in  daily life. Um, you know, I, I see, um, uh, uh,   really incredible boosts in language  tasks, visual tasks and, and generative   tests. I think there'll be continuing to see  surprises. Um, and so I, I, again, I see, uh,   let's say for consumer and business  technologies, I think the textual change,   for example, when it comes to multi-step  interactions, not just one shot recommendations or   recognitions, but rich interactive dialogue where  multiple topics and goals will be maintained,   and there'll be a pushing and popping of a,  kind of a stack of interrelated threads and   having a conversation with the technologies  versus having these one-shot experiences.

I also think that there'll be changes in the  course of daily life when it comes to extreme   personalization, where, um, we've seen that it  takes quite a bit of common sense to, uh, to, for   systems, to help us understand how we can guide  us on how we spend our time, how we collaborate   and get things done. Uh, and we're seeing  some great directions on that front already.   Some of these services are being offered in our  office product line, for example, in dynamics,   um, you know, on this engineering front, you  know, there's been some well-deserved excitement   with software development and we can go much  further. I think, um, there's impressive, uh,   engagement we're seeing in the private preview  with copilot. And so I think that'll be a game  

changer in some ways for software development,  healthcare, you know, what tequila area that   I've been looking at, uh, is scientific discovery.  Um, you know, I think we're going to see platform   models or foundation models for molecular  simulation to change chemistry and biology   and physics in fundamental ways to speed up our  ability to understand what molecules do when they   interact, uh, by many orders of magnitude.  We've seen signs of that coming our way   and with the recent working in, um, bio-sciences  with Yuna rep and ESM, wouldn't be these,   these pre-trained platforms, uh, embeddings  for understanding proteins, not just structure,   but function, I think will be game  changers for science coming our way Science, one of applications we maybe  think less about, but I think it's really   Simon as well. Um, maybe,  uh, Jack, did you want to, I just want to do a really quick shout out  for, um, really intuitive search in very   specific domains, but because you can generate  generate stuff in a debate, it makes it easier   to do things like find recommendations for similar  scientific papers. If you start to like use these   models in the right way. And so I think we'll use  them to sort of change how we educate ourselves,   at least at the level of like when you were  beginning to read about something, I think these   models will give you better and more intuitive  recommendations than a standard search engine.

Cool. Um, maybe you could talk a little bit  about the, the flip side. I think all of this   sounds great, you know, there are, you know,  as many speakers I've commented on many risks   associated with foundation models,  um, you know, what are the ones that   you think are kind of most, uh, to be worried  about? Are they this, the short-term risks because   these models are already out and about,  or are they kind of longer term, um,   because the future is so unpredictable and  what kind of actions should be taken. Um,   so maybe so then we can start with you  since you've started a lot about social   bias in NLP models, which is very, you know,  pertinent here has come up a lot of times. Yeah. Thanks. Uh, I have a lot of feelings  about this. I'm trying to have a somewhat   long answer if that's okay. Uh, I, I am  the resident skeptic, so I'm going to be  

appropriately skeptical. Um, yeah, so I, I  think I, I worry about a lot of harms that,   but I know all of them is not like a  very useful answer. So I think I'll say,   um, that maybe the thing that worries me the  most is the rapid entrenchment of the models.   So the view of these models is inevitable  or an unmitigated. Good. Um, I think it's   a really risky frame because it forecloses and  research alternatives and the end, it assumes   that development and deployment of these models  must happen, which foreclose the situations where   we just say, Hey, uh, the risks or harms really do  seem to outweigh the benefits. Let's not use this. Um, then also ignores the fact that there are  already lots of communities that don't want   this very communities for whom these kinds  of models mean increased surveillance, or   there are cultural resources made available for  general cultural consumption a

2021-09-16

Show video