Yale Engineering Dean's Invited Speaker Series featuring Royal Hansen, VP, Cybersecurity, Google

Yale Engineering Dean's Invited Speaker Series featuring Royal Hansen, VP, Cybersecurity, Google

Show Video

- Welcome, everyone. We're really excited to have our fourth and final Dean's Invited Speaker for this semester. We'll have more to come.

We'll announce those soon. And thanks again to Tsai City for letting us host these here. It's been a great series for us. This is one I'm really looking forward to. Royal Hansen is the VP for Security at Google. He was class of '97.

same class as my sister actually, and also in Saybrook. So you may know her, Susie Brock. Do you know if that rings a bell? - Of course. I would never have put that together. - Yeah, so.

- We'll have to talk about that. That's nice. - That's my little sister, yeah. In any case, so we've just made that connection, which is interesting. - Wow, yeah. - He was a computer science major.

Learned everything he knows from Mike Fisher here in the front row about security, so and. - You can blame him for Google's Foibles, and that's (Jeffrey laughing). - So prior to that, he worked for American Express as a VP for Information. - Same kind of technology, security. - Okay, and at Morgan Stanley. So we're really excited to hear from him about how he's thinking about security in the sort of new era of AI systems and how they're being deployed.

The way that these kind of call on questions of privacy and security as we input our information so that we can get answers to the questions we'd like to know the answers to. Lots of questions needing to be asked, answered, and explored with this, you know, massive scale compute that's going on at Google. So we're kind of relieved to know that a friend, a child of Yale University is at the helm of this bold new effort at Google. And we're really excited to hear from him. He's gonna give about, you know, 25, 30 minutes of remarks, and then we'll open it up for questions.

So you'll have plenty of time to ask and we will address your questions after that. - Great, thanks. - All right.

Welcome Royal. - Thanks, Dean. And it was great, we saw each other not even a week ago, is it? Or maybe it's a little over a week. It's something like that in California. I was super impressed with the collection of Deans. How many deans did you have there? - [Jeffrey] All of them.

- It really was all of them, yeah. (Jeffrey quietly talking) And I mean, apart from that just being impressive, it is one of the reasons I've wanted to get closer to the university again and really, as the computer science department has sort of expanded, I mean, that's probably an understatement. You know, when I graduated, I know if we ask you had to guess, there were eight of us in the major.

So it's sort of, it's like a different era. I mean, it might as well have been in the 19th century, not the 20th. But to see though the full university come together around some of these topics.

I don't, you know, as someone who wants to participate in these discussions and help society work out a lot of these questions, I think Yale has an incredibly important role to play, and I would like to, you know, help in that and be a part of it. So I really do feel like the meetings we've had over the last year and even today with students, professors, not just in the computer science department, I'm super excited for. So as we, I have maybe 15 slides, sort of relatively quick and just hopefully set the table.

But as Dean Brock said, I do privacy, safety, and security, anything bad that can happen at Google. And so we could talk about any of that as you ask questions. The slides are more about AI just 'cause I think it's something that's, it's more topical and sort of front of mind for me at the moment. But happy to go in a variety of directions. Let's see, with that, the rendering across the Google slides instead of PowerPoint probably gets that AI in the wrong place. The first point I'd like to make, and as we go through the slides, I think we sometimes forget that in the security of AI, so this means think about like software security, data protection, cryptography, that's a whole body of work, and I'll talk about that in the second half.

But I've been really impressed and excited about the uses of these large language models in AI more generally in the service of safety and security. So I'm just gonna remind us a little bit of that in beginning so that in our enthusiasms to kind of restrict and regulate. You know, you see lots of, you saw the White House this week and lots of efforts, and it's all good efforts to come to grips with it. But we are doing some incredible things as a society.

You can think of the AlphaFold, just the most recent publication this week, the protein pathways, like we're going places we've never been. And I suspect we're really, really just at the beginning. And those are in the service of health, safety, even if not, you know, and I'll talk about a few examples of cybersecurity, but to me there are two topics. How do we use this stuff to make people safer? And how do we secure it, make it safe itself? So I'll cover both of those. You know, wildfires, just a few real world examples.

And then we'll get into the more technical side that in California in particular, obviously in Canada this year. I was in Greece last month for a democracy forum. And it tells you kind of the, I joined Google because I felt like it was a moment where people would all of a sudden wanna talk, not just about the technical work, but the implications of it.

So in Athens, we were talking about the role of democracy in technology. We went with the government there, and they're busy in the face of both floods and fires, thinking about how they deploy into areas where there's very little electricity, little power, and even internet connectivity. How do you get very lightweight sensors so they can detect early fires and flood conditions to get ahead of these things? But it's interesting that like the, that happened in California, happening in Greece, happening all over the world for climate related activity.

And I think enormous opportunities to use AI there. The next one, you know, obviously we talk about Waymo or Cruise or these different examples of self-driving cars, but the safety benefits of using large models in transportation are incredible. Now my brother-in-law always tells this story he was a member of the American Medical Association when they convinced the airlines to let two years and under children fly on laps because they didn't wanna do that cause they were gonna kill so many children in accidents in the flights. And they made the mathematical argument that you would save children's lives because they wouldn't fly. They would drive those 200, 300, 400 mile trips, and if you just did the actuarial analysis, you would kill more children under two by forcing them to have a seatbelt on a flight than by letting them sit on their parents' lap.

And just give it an example of where it's that kind of analysis, I think we're all gonna have to do as we think about the application of these technologies in different arenas. It doesn't mean that it will always be the right answer, but I think you could, you can make a pretty strong argument that the safety benefits, if we look at it with clear eyes, will be greater in taking some risk with the technology than in being conservative. Again, that's gonna vary, but I think it's a really good example. I talked about AlphaFold, and I'm most excited about the developments in health and, you know, computational biology and the use of AI. I actually think it's an area where Yale, I'm sort of really excited to see where you all go, given the strength of the molecular biology program here and the combination with the sort of the growth in the technology side. I mentioned Deep Mind.

When you go to the Deep Mind or the AlphaFold paper. When you go to the Deep Mind office in London, they don't put up any of their technical papers on the wall, like sort of in the lobby. They put up only their science papers, nature, science, health. They are in this to solve human problems, not just, you know, it's all tools to them to get these things done. And I'm super impressed with that group.

If people have ever heard Demis Hassabis talk, who's the CEO, just incredible. And has become a friend as all of this has come together, but I think I'm really inspired by the work they do in health. And I actually think it's probably the area we'll be most surprised at the developments using AI. Just one slight sidebar on that is that my senior thesis, we couldn't get David Lerner to come to this I guess, but David was my senior thesis advisor and I've tried to look back in, you know, my email records are not great from the yale.edu era, and I need the Google search of that, but he had a multidimensional cluster analysis, really, I think it's called Diana maybe or something like that.

But that very early on they had used to do mammogram kind of detecting anomalies in mammograms. This is, you know, this is 90s, and now we're, you know, really seeing the sort of progression in that work. But, you know, way back when, that's what was going on here, using neural networks in their very basic levels. That's what I started doing at Yale years ago. Super excited about that.

I mentioned cybersecurity. So again, before we even get into securing AI, it's worth noting that, so the example I have on there is the Play Store. So there, every day we scan 128 billion app to device pairs to make sure they're still operating in a way that, you know, we get this sort of safety that no harmful apps found. And so you get that in the Android ecosystem. You can't look at 128 billion apps or app instances without using some sort of large modeling capabilities. And this goes back further than the large language models.

But we're using, you know, machines. Humans are not looking at that stuff. This is all purely done through the training of classifiers and looking for anomalies in behavior of on-device apps. Real world making people safer, you know, across the globe at the billions of people scale.

Gmail, which people would, you know, I think you probably are more than familiar with, and we could take a poll on whether people are happy or sad with the spam filters at the moment. But you know, it's a hard problem because in cybersecurity we have the opposite problem of some of these other areas where you're aiming for a perfect score. Every time you get to a perfect score, and you block all the phishing attacks, again using these classifiers, the adversary is sentient and active.

They are looking at it and changing their approach. So there's no such thing as ever getting to 100%. You get to 100%, they're immediately gonna change their approach. Oftentimes even weaponizing your defenses against you.

So it's in some ways what makes the field really, really fun because it's got, it never ends. It's another thing that makes it really difficult because, you know, I get beaten up semi-regularly because we let too much spam in, in trying to tune this. So it really is like keeping a ship afloat to keep it at the right tolerance for those classifiers because we're, you know, we're looking at literally hundreds of billions of messages over the course of a week. The last one, which I don't think most people probably know about, and by design, and probably the best example, something called Safe Browsing. So years ago Google realized we would learn pretty quickly when a phishing site had been stood up to be the basis of a email and link to get someone to enter a password or download some sort of malware.

You know, then watering hole attacks start to occur where you're just browsing, you're searching, set up a site to look like a site you're used to, and it drops malware on your local device. We would pick these things up just for the scope of what we saw on the internet and start to build a list and block those things from ever, those sites from ever being rendering. Or in the case where we weren't sure, but it looked like it might be a problem, we would give a warning.

We put that behind Chrome, but that's now behind Safari. It's behind Firefox, you know, in some forms it's even be behind the, you know, other browsers, and it's on 4 billion devices. You know, we're sort of talking on the order of magnitude of the number of people on the planet that have access to the internet. Every time they go to a site, whether they know it or not, it's being checked against these known lists. You know, and we start to offer things that allow the time between detection and usage in the browser to even go down to under 30 minutes. So all of that using sort of the what you'd call old school machine learning.

This is not necessarily the large language models, but this is what we've been doing for many years to solve these kinds of problems at scale. So that's using AI to keep people safe broadly, physically as well as in the cybersecurity world. The next sets of slides are about how do you make this newfangled AI infrastructure, whether it's GPUs and TPUs, whether it's foundational models or frontier models that then are distilled and deployed into all these other form factors, chat bots and otherwise, how do you make that secure by default? One of the things I love about Google is we early on decided we didn't want to put too many bells and whistles on the front like that safe browsing. But you're better off as a user if you never have to worry about the security.

If we force you to deploy another piece of software and configure it, we're likely to fail because people are not gonna do it. They're not gonna do it well. So how do we make this AI infrastructure secure by default? And just to give you a taste for the, kinda the problems that we're up against, I talked about the biological sciences. Just to give, we were at Stanford, I guess I can say that. I don't know what the rivalries are these days on these, but you know, but in the spirit of an academic activity with the computer science department and the biology department, and you know, you see these four base pair primitives for DNA, just to give it to the hook to talk about this, increasingly the 3D printing world through model organisms if you wanna think about that, whether yeast or others, you can give a computer the instructions, the DNA to RNA and then protein pathway, just like AlphaFold, instructions to create life in various forms.

In fact, the story they told was, these are sort of the, this is Switzerland, Italy, France in the case of COVID. So if you think back to sort of the very earliest months of '20, this is that, that says march right here. And you see that in Italy it was first, but it, COVID-19 was transmitted from China over the internet in its DNA instructions to the WHO in Switzerland and then printed and existed in Switzerland in that area over 10 days before it arrived via human transmission. So when I talk about doing security, I don't mean it's just for email or websites.

We're headed to a world where the structures of life and society are digitized, and then the whole supply chain needs to be safe. Cause the mistakes here are very different than in one where your email, I miss a spam message. So it kind of gives you a sense of where we're headed, and AI being a really big part of that. One other idea that we've talked a little bit about, and I think again is a good one for the context of Yale is, let's just start with the big picture question, and then explain it. What is a country on the internet? So much of the discussions that I get involved in are, well, does the data resident in the country? Can the government have access to this data? Who, which jurisdiction has sort of sovereignty over this question? And it doesn't really map to the technical world.

You all know this very well. We've spent the last 50 years trying to distribute data and computational power everywhere it needs to be. We didn't, it's not really grounded in the geography of a map. The example that I sometimes talk about, there's a professor at Claremont who wrote his dissertation on the inception or the sort of the development of cartographic technology in the 15th and 16th century and the establishment of the nation state.

There's, you know, historically we thought of France and England and these sort of European nation states coming into form because of Hobbes and Locke and these kind of sort of theory of how to run a state. He makes the case that actually you couldn't have France Before you could build a very specific map and replicate it and distribute it so that you could even talk about the thing France. Just, it's an interesting question, this sort of intersection between the material conditions or the technology developments and the political and societal structures that go with it. My question and I think one we should work on as a community is what is the nation state? I get in these weird discussions with people about the location of our data center, and people have religious discussions about this. I get the spirit of it, but that's just not the way the internet and applications and software work.

You want these things to have interconnectivity. People are not gonna go to the data center with a forklift and get the rack and take it away and then use forensics software to get the data. There are much more normal ways to get at that data if you wanted to do it. So I think this discussion of what is a country in an era of the internet is super interesting and related to then how we solve questions, questions of security, privacy, safety.

Because you know, you're gonna have legal and regulatory, other activity on top of that. The other example related to that, just against some images that'll recall it, is I was in Paris, don't tell Lem this, my friend in the political science department, talking about a new cybersecurity related law. And one of the ministers said to me, you know, we are Latin in our sort of history like so the Latin Roman tradition of law, and we're addicted to writing laws. Like we really like it, which is great, and you know, so you get to how people sort of go after their governance and things. But I said, what will happen when the laws we continue to write are now expressed or written or mapped to COBOL, and we're 50 years down the road and someone needs to change that law. You're not gonna be able to pull it off the shelf and edit it and publish it back and have people obey it.

Somebody's gonna have to know how to change the COBOL. I just use that as sort of an antiquated version. People even know what COBOL is here anymore? Okay, good.

Maybe I should use something, another example for the, depending on the crowd. Fortran, I don't know what the, my prologue interpreter written in LISP. What was the, that was my favorite project, my favorite problem set with, I can't remember who had us do that, but really you in technology, we in technology will be coding the laws of today and tomorrow, and then changing them will be a availability risk.

Things will break. Let alone will they, will the spirit of the law actually meet the spirit of the regulation. So again, in all these areas of AI, or I mean AI is gonna make this even more complicated, but it just, it happens today in the context of software. And just to maybe bring that a little bit closer to home.

I was in, I've spent a lot of time in Poland over the last 18 months after the invasion of Ukraine. And you know, we've gotten to know the cybersecurity community in eastern and central Europe really well because it's, I mean, the truth is that the attacks on Ukrainian infrastructure go back to 2014, and the Wiper and many of the kind of destructive malware you've seen are a result of attacks on Ukraine from Russia and threat actors back at 2016 and beyond. The latest developments though that when I was last in Poland with the cyber command group there, you now have weapons. We saw what was, I think the Dutch just announced something about a aircraft that's being, you know, that's being delivered. But they pointed out to us that the supply chain for ships, trucks, warehouses, for all these weapons or supplies that are ultimately flowing through Poland and into Ukraine are the, that's the underbelly of the cybersecurity issue because Russian threat actors track that. If they can figure out which device, which weapon, which supply is going where, when, strategic advantage, not just in the battlefield, but in sort of the broader context of war.

And that's got nothing to do with the military or the government. These are like mom and pop warehouses and trucking companies who know nothing about Google or Morgan Stanley's level of cybersecurity. These people are as vulnerable as anyone on the planet to these attacks. I say all of those things then to say we are working on this framework called the Secure AI Framework.

And just the spirit of it is to not throw out in this new rush to AI all of the improvements and foundational security work that we've built up over the last couple decades. Think of supply chains, think of kind of the full stack of a piece of software. I'm not gonna drain it here because it's more of a framework, but the idea is that the way we handle data, the way we write, you know, the logic, the business logic of a piece of software, the way the input is validated, think in the context of AI, historically for a website, you would validate the entry that a user put in before you passed it along to the database.

People have heard of things like SQL injection or sort of where the code is passed to the database, and if not checked, can extract more data than you would think. Sort of simple approach to it. But in the world of the large language model, there's no intermediation of the language because the language is the code.

The model interacts with the English that the user types in. So your ability to constrain that or to check it is dramatically changed from where you've got JavaScript or something on the server side. So all of a sudden we've got a whole new class of ways of trying to solve this input validation or data lineage. Think of the other thing that, how do we know which data is being used in training? How do we know where the data goes that's entered into the models? That's what this is designed to do is to take the best of the last 20 years and make sure in the flurry of activity, we don't forget all these foundational things, things we learned in the cryptography courses. We don't throw those out just because, you know, you do have this risk every time there's a massive innovative moment that people build it from scratch very quickly to get to market, and they forget about all the work we've done over the years to, on security. So I hope that's at least a table setting.

And I think the key for me is that, and the reason I came to Google was I could never solve the problems we're talking about or participate at one bank or one, you know, institution. You had to be part of this fabric of technology, whether it was data centers and cloud, whether it's mobile devices, whether it's email and video rendering. I think that we're at a unique moment where the infrastructure for the 21st century is being laid down.

If we mess it up now, it'll be like the railroads of a previous era and you won't change those, the gauge of the railways again. So we'll live with whatever infrastructure, security we build right now for AI. I'd love to answer questions. That'd be great. Thanks.

(audience applauding) - I'm sort of fascinated by the framing you give, Royal, about the need for a certain level of scale, and I think, you know, we wrestle with this in universities as we do strategic planning and think about how we can, you know, we're not gonna be in the business of competing on scale with Google, but we do want to feel like we're part of a conversation that involves leadership where the scale exists. What do you see the right kind of way for universities to engage with these questions, which clearly are affecting and asking questions that leaders in academia should be thinking about? - Yeah, I mean the good news is I think on work like the data centers, networks, AI, kind of at the foundational levels, there's actually a really rich exchange between academia and these tech companies. I think the problem or the question or really for the next step is that's been siloed on our side and somewhat siloed I think in the academy. And so finding concrete, and you told me this, I think the first time we met, which I think was, it helped me frame some of my thinking. It's one thing to talk in the abstract, it's another thing to have a joint project on a particular question, a legal or regulatory question. The good news is the Googles of the world are now much more in the middle of those legal and regulatory questions, whether we like it or not.

So now we're I think more open than, you know, I don't think 10, 15 years ago it would've been even interesting or relevant to some of these companies to be in those discussions. I think it is now. How do we bring the other departments around the university to the table in concrete enough ways? It struck me, we talked about this before, like as the Deans were all around that room, just have to have one conversation in that room, but could it become 15 conversations about regulation, healthcare, you know, different areas.

So I would, I think you can help us by convening that cross-disciplinary work in ways that we would never even know how to. - Yeah, I'm looking forward to that. So first of all, and I think second of all, it is interesting how we are already seeing kind of calls on computer science and on the technical side from engineering from the other schools. You know, we need not just to apply algorithms to say medical data or problems like this, but to really understand mechanistically how these things are working and how they're processing information. So I think that's a key point. The other thing though that sometimes we hear internally to engineering is, well, you know, we saw neural networks, and then there was a long winter, and then there was a little sort of data science blip, and then there was another long winter, and this is just gonna be another hype cycle, and there'll be another long AI winter.

Do you see that as? - That's a great question because I, you know, I've seen a number of those along the way. I, you know, just personally and the more I work with the Deep Mind folks and others in the kind of work we're talking about, whether it's for the malware analysis or safety filtering on the internet, people I really respect that are not into this for the investment side, 'cause I think that, you know, that's sort of a, it's helpful but not always the right signal, are surprised at what this is able to do in ways that I think make me feel like we are, there's still a lot to go here. And just my thinking on it, just for what it's worth and talking about it is that the big difference is gonna be can these large models or the next layer up from them become the equivalent of the operating system? Instead of you going, you know, all these problems I talked about before, you go and solve one use case at a time, and you train data, and you tune the data, and then you put it at a particular point in the logic of the code to make a decision. I'm starting to see instances where the user themselves in interacting with that next layer of, you know, model, tune the model as they use it, and so you, that'll be far more effective and efficient in writing software than having a software developer in the middle of that activity. If we can do that, I think you, that is a very different world than one of a bunch of Linux boxes strung together running some middleware.

- [Jeffrey] Right, right. - And I think that's, that's what people are working on. - Yeah, that, I mean I think that's one of the things we heard when we were out west is that this, this changes the way anyone can interact with a computer. Whether it's someone who wants to write a technical piece of code or someone who just kind of wants to take an idea and turn it into something that can be deployed technically.

- I mean the interesting, and I'm really enjoying this, I talk about my, you know, back to the neural network with Professor Lerner was that you know, so much of the work historically went into training, you know, or into labeling data. Obviously the deep learning quickly allowed us to label, auto label data and train these models. But what I didn't realize is this reinforcement learning with human feedback and different versions of that, you know, if you can get an expert to react to the output of one of these models, you can immediately use that feedback and the model gets exponentially better, not just a little bit better. And so all of a sudden the user becomes the most valuable piece of the equation.

It's why I'm actually not worried about the labor questions in many ways. I actually think people are gonna find this remarkably democratic and democratizing 'cause they're gonna be able to do things they were never able to, and they in fact will add what they know that nobody else knew. And the model will get that much better that quickly. But that's not the way we write software, and that's what, that's the work that's gotta be done. - Right.

- Because you can't just say, okay, let's use it like an operating system. That isn't, like that's not works. - Interesting. We were talking a little bit earlier about how as products of Yale, you have a kind of a latent humanism that is kind of baked into your brain regardless of what you end up doing. What, you know, as we think about strategically what makes the most sense for us to be doubling down on here, how do you see your own experience at Yale as informing your work on the one hand? And two, do you think we have kind of a privileged position or you know, strategic advantage in terms of bringing these broad strengths of the university across the humanities, arts, law, policy, et cetera to this discussion? - I definitely think you do. And that's something I would like to be a part of and hope, you know, I can help in any small way.

But I think, and again going back to what you said, and this we need to spend more time, on the concrete activity between departments. I think we'll have lots of great discussion, right? The beauty of this place, it could convene and you can do things like this and many others, but to make it concrete and progress the discussion and then, Ruzica, I think we met with this morning who's great. Yeah, we sort of couldn't stop. - She's fabulous, - She's great, but we really sort of mind melded on this idea of abstraction layers. And so I spend a lot of my time at Google trying to convince these engineers that it's, you know, in the early days of Google, if you dealt with an abstraction layer, you were a loser cause you couldn't code. Like you had to talk about these things like you were, like this and Jeff's probably the patron saint of that stuff, right? Like these people were hands on in ways that was just not true.

But we've gotten to a point where I would argue you can't embed all those laws in code. You can't. - Right.

- That becomes a weakness if you can't abstract and interact, I mean APIs being this sort of, web services being some instances, but I think we need to take it even further in the abstraction. And that was the point. So I think the law school, the health, like can we come up with a vocabulary that matches the abstraction layer that is concretely mapped to technology and to policy, law, scientific, you know, laws. To me that's the work that has to be done. And so instead of just talking about it, can we pick these domains and say we're gonna, and that's why I like the SAFE Framework. It says like find a, each of these domains is gonna need a technical abstraction layer that they can reliably do their work on and still have it map to the technical implementation.

Right now that happens through your software developer somewhere or some other company. So there's almost like a vocabulary, I mean, which I think the academy's good at, right? - Right. - Like this is the whole scientific, you're sort of genus and species. That's what we're talking about. - And I mean we've had a couple of conversations with AI folks about what this is revealing about the structure and indeterminacy of language itself, you know, almost from a mathematical perspective.

Not that we haven't been studying language mathematically for many years, but the mathematical questions become part of how the algorithms work. I mean in terms of how you do search through language and how you represent language. - Totally. - All of these things you're talking about. - I mean one just quick example of that that sort of, I mean, makes sense but surprised me was that if you train a model on just text, sort of one mode, and you train even with a smaller amount of the image and text, that second model is better at text only because to your point, our ideas and what we understand are not perfectly abstracted to language. - [Jeffrey] Right.

- And in fact it learns something that we don't express directly. It's loss-y right. Language to ideas loses information.

- Right. I have, I'm sorry, I just have one more question I have to ask him, and then I'll open it up. So on this point, kind of the way that that tech, that images can improve the way language interpreters or language models work, do you see projecting forward a future where in some sense by say representing by ingesting video and having this be part of the language, which I know we're already doing, but that there's an opportunity to explore how these models actually learn like something like physics? Like watching balls bounce and things like this. I mean, is that, is that something you're thinking about at Google? - Yeah, you know, and this would be, now you're getting, you know, beyond my primary experience, but what in my world version of that, the modality, I think we need to think about what a modality is.

- [Jeffrey] Right, right. - Just 'cause you can write it in English, think of that like a ladder. All of these things will ride on the language and image. Then they need the modality of the arena.

You know, in my case logs. Like we're gonna train these large language models with an additional modality of logs. Cause logs are, yes, they're text, but they're not language. - [Jeffrey] Right, right.

- And I think for each of those domains, there will be a refining modality, which maybe it's dimensionality, and it may be some geometric modality that needs to be introduced. But the thing about it is it's not, don't think of those as separate. Back to the operating system. Language I think is a fantastic, language and visual are fantastic foundational models that mean you don't have to, you're not starting from scratch in something like physics or biology. - [Jeffrey] Right, right. - You can still use whatever's baked in to the understanding of language.

- So I mean, the fact that we're even able to have a conversation as bizarre as this one, I think represents a turning point in the evolution and history of these technologies. But I wanna open it up to the audience who took time out of their day to come here. Yes, right here. - [Audience Member] Thank you. - Oh, we have a microphone actually, which would be great for the video.

- [Audience Member] So I got maybe a little bit of a boring question. You probably get this a lot, but you're, you know, you're involved with security at Google, dealing with AI risk. How big do you think existential risk of AI really is? How should Google be taking steps to sort of, you know, prevent that from getting out of hand? How are you thinking about that? And then also, you know, in computer science there's often this dialectic between like making something open source and making it secure, and that's why everyone like, you know, loved blockchain for the time that we had it, right? Where it really seemed to be very promising. So how do you ensure that when you're securing everything and making AI secure to prevent that existential risk, you're not also preventing small players in the marketplace from actually being able to do cool stuff with AI and kind of drawing a moat around, you know, meta Google and Microsoft and Open AI. - Right. I may need a disclaimer before I answer some of these questions.

Huh, yeah, nothing going on in the world on that front at the moment. Let's see, let's see. The first one was on, sorry, I lost the, as you got started about antitrust, I got thinking about that, sorry. Existential risk, yeah. So again, I have not spent my life like some of the folks in the AI research, and I think we should respect and do think about alignment in these questions. But from a practical perspective in this security, the thing that gives me some comfort and sort of there's a downside to it as well is most of the world is not automated very well.

I mean, think about how much of the world you still, you know, the technology stops well short of full automation, and that's gonna, that means that like even the best ML model in the world knows how to kill everyone, whatever that is. It's not gonna be able to do it because the things aren't automated. So we're gonna have to, like we still got a lot of work in the world to automate even the very basic things.

You can all imagine things that you could do much more automated with your phone or things. So I think we've got a long ways to go in not the ML work, just continuing to instrument and automate everything, you know, the world we live in. You know, if you lived in a perfectly automated world where everything you thought could be done, you know, then I think you, then you begin to have those. We've got a lot of work. I mean, as someone who lives, you know, 30, you know, we got 180,000 people at Google, and, you know, it's remarkably slow how much is still not automated, even in our world.

So I think, I mean, I don't say that with pride. I just think it's a reality of where we're at. It's not just, the model is not on its own able to automate things that are not automated. Someone's gonna have to do that. And I think it's a good, you know, it's a good example of where the practitioner, somebody said it like, the map is not the territory if that makes sense. A person can know the map, the model can know the map perfectly.

That doesn't mean they can manipulate the territory. On the second question, you know, I don't know. It's a good question on where open source models, closed source, frontier and foundational models versus distilled.

Again, my only thing would be to say, I think over time these foundational models will be more like operating systems, and people will ride on them, whatever form of getting there, whether open source or not. I think the constraints in many ways are more just the work it goes into building those frontier and foundational models, whether, you know, whether the open source or not. But I'm not, this is not, I don't know enough about the domain specifically to be, I'm just charged with securing it if like, if it goes wrong. - [Jeffrey] Great, another question, over here? - [Audience Member] Thank you.

I was just curious, with all of these new frameworks coming out from different companies and governments, they have a lot of different pillars, learning from history. I'm curious what you think are the largest areas that they're missing or that they're not putting enough emphasis on or any shortcomings that you see in these new pillars of safety and ethics? - Yeah, I think the two hardest areas, it's not because people aren't saying it, but are the data questions and then the interface, the user input. I don't think that people are missing that. I just think there's still work to be done to codify and commoditize the controls.

So everybody can do that. But I'm not, maybe back to the question, I think maybe the people are over-indexing at times to the third, the other question, the kind of larger risks. I'm pretty focused here on the plumbing.

Like how do we make this thing stable, solid, reliable, repeatable for everybody? And so maybe to the degree that people are distracted by some of the longer term risks from the short term, that's what I'm worried about. - [Audience Member] Thank you. - [Jeffrey] Other questions? Yeah? - [Audience Member] So you said AI has been using very like different fields including the economics and also like the technologies. But how about the fields that has relatively low tolerance of error, for example, the healthcare and the autonomous vehicles.

What do you think, how reliable and how secure is in those fields? - That's a great question. You know, the hallucination is one thing in a chat bot. It's a totally different thing in a healthcare device. Is that the, you know, the spirit of the question? I think, look a little like quantum computing, the same thing. Most of the work, and it's even back to this question of risks of input and output, I think we've got more work to do to make those safe in more and more sensitive worlds. But that's the work that's going on.

Just like in quantum computing, the real question is can you make error correction efficient? I think the same thing is gonna go on here. How far down can we drive the error bars on those hallucinations in different arenas? But the good news is, this is back to my point of like, if you ride on a foundational model and then you keep distilling it, it actually is, you know, chess is the simple example, but Alpha, I think it's already proving that it can do these things very, very well if you constrain it enough. And the question is, can you relax the constraints and get the precision you need? That's the front line of the work right now for sure. - [Jeffrey] Yeah, right here.

Go ahead. - [Audience Member] Thank you. So in terms of regulation and security with newly developed AI models, and you mentioned that you think that the user experience is integrated within the software. What's your approach with regulating or your conceptual approach with regulating it in terms of like starting broad and then specifying it down to case models when my experience and my use of an AI may be different than someone else's? Do you start broad? Do you start with more specific instances and then connect those to like a broader umbrella? Or is there a specific path that you would think would be optimal for taking? - It's a good, you know, it's a good question in a couple of ways. One, my first answer is, and this is informed from living in banks for a while.

I think it's important not to sort of over, make cybersecurity or AI overly precious, and to remember that these things are done in the service of a particular domain. You know, financial services, transportation, healthcare. And what we really wanna regulate is the outcomes in that domain.

The problem is that, you know, and this happened in cybersecurity to us that everybody rushes to the newfangled area and wants to regulate it as a class onto itself. I think the mature state will become healthcare will use AI in lots of interesting ways. And there will be regulation that is specific to healthcare. I think it's tricky to imagine all the implications or knock-on effects of AI writ large with enough precision to be a helpful guardrail. That doesn't mean that, I think what's useful now is lots of people at the table learning and getting better, but my hope would be that they become industry specific, and they're agnostic to AI.

What we care about is outcomes and risks. Because the thing's gonna change so quickly, I don't think, I think we always get in trouble when we over specify regulatory action to technology that changes faster and faster every year. The other one though that, I think this is back to the the university, but there's a whole, I mean, political theory is all about this stuff. Like how do you regulate well? It scares me that when the tech, like I don't want just the tech industry doing this. I want good, thoughtful regulation.

I don't want reactive regulation as a weapon. I want it as a collaboration for healthy innovation and outcomes. So I think the university, you know, instead of this being tech against somebody else, how do we together write good regulation that promotes the level of innovation and competition? Like that's the right thing.

You know, you've seen lots of these, what was the video recently about the telecom industry? Like we're pretty good at doing regulation bad in many ways. It's the whole concept of regulated capture and all this stuff. So how do we do it better? I don't know that we are the, we do not have the corner on on that question. - Can you speak just for a second about how this intersects issues of intellectual property when you're training on, you know, vast amounts of data, some of which may or may not be owned and then reproduced, and, you know, that's a regulatory question, but it's also just a, it's a technological question in terms of what the mashup is that is spat out by the algorithm of PM. - It's a great topic that will be more than just technical.

And I think that's the, like, so on the technical side, lots of work going into data lineage and which data is used, even at scale. And then the other side, which is as it's produced, you know, how do you water market or identify it. But as you all know with like certificate authorities and websites, it's like the least interesting thing about the internet. And I worry that, you know, I think what we need is societal and legal norms that go with that. The technical implementation is not gonna be either the answer or the impediment. And I think we're gonna need new norms of when you modify someone else's, not even the technical platform, but you modify someone else's material.

How's that gonna work? - [Jeffrey] Right. - How will licensing work? So I think all of that, back to the question of regulation has to be a whole of society question. I think we get in trouble when, you know, we as engineers or we as society think there's one answer to these things, and we've just not found the mathematical equation.

These are actually ongoing processes that need feedback loops and healthy, legitimate outcomes that people respect 'cause they'll evolve. - [Jeffrey] Yeah, where there a question over here? - [Audience Member] So I kind of wanna return back to the idea of like regulation under industry verticals. So something that I'm really interested in is that like it seems really easy for all these companies to talk about like these big lofty ideas of like we need to be regulated.

But when it comes down to the actual language of like what these regulations are, it becomes really tricky. And like the specific instance that I'm thinking of is like when Sam Altman testified in front of Congress and then like saying that we need more regulation, but then a couple days later the EU came out with a package and OpenAI just completely rejected it. So do you have any like ideas in mind to like how to resolve this kind of issue that we see? - Yeah, and I, again, back to my banking history, I think we need to get to a place where there's a healthy and informed ongoing refinement of the approach. And so to your point, I think there's, you know, this is new, and people are posturing a bit in all kinds of ways, but what we want is industry, government, this public sector in whatever form iterating towards something that works. I guess my point is I don't think there's a single group within that who knows enough to write the perfect regulation even if it existed.

And so what we need is a healthy way, to your point, you know, we need a healthy way to have the dialogue, not to sort of reject things out of hand, but to refine them as we go. Because look, I just said, we talk about, I don't know what it'll look like in a couple years talking about a large, a frontier model. Will that be, like that's a concept, but is it well-defined enough to be the basis of a regulation? That's a good question. And so I think we need to use that. We need a vocabulary, and it's partly why I go back to the domain specific, because again, when we operate in these really high level abstractions, I'm not sure that it's even implementable in some cases, let alone, you know, to people's reactions to it. Whereas you can look at outcomes in a domain, even if the tech changes under the hood.

- All right, I think we have time for maybe one more, and if there're none, I have one, and you knew it was coming. Right here? - [Audience Member] Yes, thank you, Royal, your analogy, - Oh, we need a mic for you, I guess. I can hear you, but yep. - [Audience Member] I enjoyed your analogy with cartography, the definition of a country, and it made me think of digital and data sovereignty.

- Yeah. - [Audience Member] Is it correct to think that it's not too late to define a sovereign nation with respect to data and digital, and if so, what criteria would you, can you think of, what major high level criteria would you focus on? - You know, back to the crypto discussion, we'll come back to my education here, but we have built just as one example of an attempt on this, what we call a sovereign cloud offering where instead of it being, you know, we do, you can do it in various locations, but we use crypto to define the scope. And so the cryptography defines the sovereignty completely. Really it's geographically meaningless really, but it gives the basis for discussion of this, the scope of the data. And I think you still have questions though of what data, is any data about me because I'm an American, American data? You know, I think there's lots of use cases, but at least if somebody can say, this is what we wanna consider the data of that country, I think cryptographic work provides one possibility for how to answer that.

The problem is, the practicality is no data lives on its own. I mean very little. It needs to interact with other data.

And so if you're using crypto to do it, you need some key exchange or some means of combining data, and all of a sudden you make people manage keys, and people stink at that. So yes, there's some offerings, and I think crypto provides it. I don't think that's the long-term answer though. I think we've gotta come up with more durable conceptual models of what the country, its people, its companies, its technical infrastructure. But I think that's work I'd love to be doing.

I mean, I honestly think that's a question that the university should begin to have a thesis about, not necessarily Google. - [Jeffrey] Yeah, So last question. - So I think - Michael has one? We gotta give Michael the mic.

- Yes. - Yes. - [Michael] Okay. - Turn it on. It's on, okay, fine, great. - So I just wanted to say that it's very easy for technologists to think about, we have to reach out into these other areas and incorporate law and medicine and political science into our thinking and into our technologies.

But I think you made a good point that it's the players in those fields that are going to be determining our future. And my question is can you give any advice to universities about how we can best get the technological foundations out to the people that are going to be making these decisions? In other words, what should we be doing to help our related colleagues better understand the technology and where it's going and how it's going to affect them? - It's a great question, and I think the tension is, I mean, I think the best way, and we come back to I will keep doing this, is concrete questions that we can work on together. Because I think you learn then, not just the theory of it, but the practice. And I think that's what is required to really learn and to develop that abstraction and language that we can then use for the next instance and the next instance.

So I think concrete work where we answer questions like what's a country on the internet? And then we force each other to learn, well, what do you mean, a data center? Like at some level people don't really understand what that or how caching works, right? And I think you get into these sort of concepts, concrete things. The one thing I would say, and this is one thing I love about Yale, I do worry sometimes if the over enthusiasm for data, so one way this all happens is everybody gets excited about data, right? All these fields are now quantitative in ways that seem ridiculous to me somewhat. But, so I think that the key is not to be, not to think of data as, data is a way to communicate. It is not the ideas in and of itself. So I think one of the things the university has to continue to offer is to remind us that these ideas are longer standing and these concepts go back much further. The data is a way to have a conversation.

It is not the answer. It does not solve the conflict. It does not solve the partisanship in and of itself. So that's my, I guess my one back and forth on the advice is, yeah, let's work on things.

Sometimes that means people get data, and they work together. I worry that sometimes that, you know, stops people from digging in at the real roots of these problems. - Yeah, well, I wanna thank Royal again for joining us and enlightening us. (audience applauding) - Thank you. - One of the things I loved about his talk and about this conversation is it kind of reaffirms, I think, a belief that we share here that there's work going on in the world and technology companies and lots of different places that should be in deep conversation with what we're doing here in academia.

The research we're doing, the teaching we're doing, that walling ourselves off from these advances does us and society no good. So I'm thrilled that there seems to be mutual enthusiasm for continuing this conversation, and I wanna thank you all for joining us here today. So more next semester. Thanks again.

- Thank you. (audience applauding) (soothing tones)

2023-12-17 16:55

Show Video

Other news