Dan Jurafsky: How AI is changing our understanding of language
(upbeat music) - Reading and writing, talking and listening. These are special skills associated typically with human intelligence but ever since computers were invented. And this is really almost literally true.
There has been an interest in using them to analyze language. The field of linguistics and computer science called the natural language processing and we might say NLP for Natural Language Processing is devoted to developing methods for computers to understand human language and even generate it. One challenge has been translation and this has made astounding progress if you've used any of the online translators.
Another challenge is simple commands to your TV or to your phone, or to your refrigerator, by talking to it. And again, great progress, Siri, Alexa, your favorites. But there are many other potential uses of NLP that are being considered and pursued.
Can we analyze the use of language to understand the mood of the speaker or the writer? Can we use language to understand who's in power and who's not in power, who's being subjugated? Can we use language analysis to persuade and sell things? For sure, that's of great interest ideas and products. And can we understand the hidden meaning or goals on social media or in discussions debates? So computational analysis of texts benefits from a huge amount of texts that is available on the internet. And this is used in training these programs to detect what is being said and how it is being said. However, it also means that the programs may be exposed to texts that contains false hoods or lies or hatred or is meant to incite trouble. So scientists building these systems for analyzing texts, have many challenges to make sure that their systems perform unpredictable in appropriate ways. Dan Jurafsky is a Professor of Linguistics and Computer Science at Stanford University.
He works in this field of NLP and has recently been focusing both his research and his teaching on ethical and social issues and analyzing texts and language. Dan, what has caused you to focus on these ethical and social issues? - Thanks, Russell. That's a great question.
I'm kind of excited about this question and really I would say it's the students. I feel like this generation of Stanford students but really students everywhere are really focused on ethical issues in linguistics, in computer science. And they're really pushing the middle-aged people to be thinking about the implications of the kind of research we're doing. But I think it's also the case that for a long time, natural language processing NLP was focused on really hard problems of understanding sentences and translating sentences.
And we sort of in, and now things like parsing sentences and translating a simple sentence works okay enough that we're dealing with the fact that language is a technology that was invented for people to use to interact with people. And sort of the people side of language as a technology is much more relevant now, especially, look the web has made of words everything we do computationally online we're surrounded by language. So these people issues are now coming up. - So for somebody who hasn't thought about this very much can you give us example of potential harmful things that could happen with natural language understanding or translation and why it raises ethic ethical issues? - Sure, I mean, there's a couple of core examples.
So one is the way that the kind of biases we might have in society getting coded into language. And then if you're a building a big NLP system these systems are all built on machine learning meaning that they read a bunch of texts and they learn a bunch of associations. And if there's a bunch of biases and stereotypes and negative things in the text they're gonna pick this up. So for example, recent papers have pointed out that spent a minute analysis that the task of deciding if I spent into the positive or negative, is this review of a product a positive view or negative review or toxicity analysis the task of was that sentence toxic. If I'm a big online social media giant I wanna, maybe I wanna get rid of the toxic sentences or censor them in some way or eliminate them or suggest to people that they'd be word there, things. The problem is that the sentiment and toxicity detectors, how are they trained? They're trained on texts that we have.
And it turns out that well, for example, when people talk about people with disabilities or people with mental illness, my sister is mentally ill. The kind of language that people use around mental illness let's say is very negative. And so the toxicity detectors pick up on this negativity and now they assume that if you're mentioning mental illness you must be being toxic. So they just denser any discussion of mental illness which of course is not the goal. So it's like these, we they pick up on these latent hidden associations that we don't notice as humans but they're there in the text. - So this suggests that going to the internet and grabbing all the texts and building some sort of system could be treacherous.
And, but, and then it suggests that then you're gonna have to carefully curate what you actually expose these programs to so that they perhaps have a worldview that is more and it's tricky for me to even say this because you actually use the word sensor. And I think you were using the word sensor in a good way. Like you wanna sensor the bad staff but other people might say, well, wait a minute you're censoring my speech or my computer's speech. So how do you navigate this freedom of speech versus the responsibilities of non-hatred related speech? - No, this is a great question. So there's a technical situation and there's a social and political situation.
So technically we know that these models do what's called bias amplification. So if there's some kind of bias in the text be it NLP or any machine learning model really have been shown to just exaggerate that bias. So again, my mental illness example. - That's fascinating because that means it's not just replicating the bias that might be regrettable is regrettable. It's actually amplifying it makes it worse. - Yes, and we don't really know I don't think I've seen a convincing explanation of why this is.
And so that's a kind of a hot research topic among the young people, but it's clearly a problem. So it's not just enough that even if there were little biases they're gonna get amplified. So these systems behave in ways we don't completely understand. So there's a whole side of NLP that's just focused on what's often called explanatory AI. Can we understand why the model is doing what it's doing? And that's actually, I would say, a third of my students that's the focus of their research interest. - Right, because before you could fix it you need to have a kind of a fundamental technical understanding.
I am reminded, this is very interesting because as you know very well, there's been also a literature on how fake news spreads faster than real news like in Facebook and I wonder if they are a related phenomenon, I'm just making stuff up here. But the fact that humans are so good at amplifying fake news. I wonder if there's some kind of parallel in what's going on in the, inside the technology where it seems to be amplifying the kind of funky stuff.
- That's a very cute hypothesis. Actually, I had not thought of that and I'm gonna go talk to some students and see if we could check for that. That's a really good idea. - Well, thank you. So, but that's the technical side and you were gonna also discuss some of the social side of it. - I mean, there's like a huge social question, which is what kind of, when our models are generating speech what kind of speech do we want them to generate? And if our models are giving advice to people, like, for example, if lots of writing tools give you advice on whether you should use a passive and that sentence or not.
So this automatic writing advice, Google had been around for a long time, and they're pretty good now but should we be giving people tone advice? Like, maybe that what you just wrote is really rude and insulting. Maybe you wanna reword that even like a useful tool. - And I also know the students are very sensitive to like majority ways of writing majority ways of making arguments and the idea which I think is reasonable is that a plurality of ways to make arguments and to have Microsoft Word make, now this is my language not anybody else's, make you sound like a old white guy might not be what your strategy for writing wants to be. - This whole idea of diversity actually comes up in all sorts of ways. So in like a recent paper that Allison Koenecke who's a Stanford student wrote that I helped out with.
We looked at this whole idea of language variety from a different perspective which is if we looked at beach recognizers. So five commercial speech recognizers the big online systems for doing voice to text and looked at how they act when they're given speech by African-Americans. So the African-Americans might be speaking African-American vernacular English African-American language. So the dialect of English or the variety of English that's often spoken by African-Americans.
And what we found is that all of these recognizers have about double the error rate when they're trying to transcribe African-Americans than non African-Americans mostly related to these variety issues. So the fact that people speak differently that plays a huge role in how well our technology works. And that's a problem that we're trying to solve.
- Fascinating, this is the Future of Everything. I'm Russ Altman, I'm speaking with Dan Jurafsky. Dan, I know in addition to that work you've done work using texts technologies. I guess we'll call them as a window on social issues.
And even as a way to look back historically at the evolution of attitudes and approaches, and I find this really fascinating. So can you kind of tell us the motivations for that work and the kinds of things that you found? - That's great. So this is work by Nickel Gargan and my colleagues Jane Zoe and (murmurs) and what Nickel had this idea which is we know that our technologies are mirroring this and amplifying this kind of social bias. And so Nickel idea was, well, gee could we use this fact about bias not as a negative thing, but as a tool could we go measure bias? And in fact, could we measure bias in the past? So if we go look at tech written a hundred years ago and we see what kind of biases people had a hundred years ago but we look for example at newspaper and old data over the last 150 years, and looked at, for example how women are talked about or in America, how Asians are talked about, how Chinese, Americans, how were talked about from 1850 all the way to the present. And what Nickel found is you can sort of trace the framings and stereotypes of how people thought about other people just by looking at how they talk in text.
I mean, if you think about it the text we write reflects our attitudes. So, texts about Chinese people in 1890, very racist, talked about Chinese people as monstrous and bizarre and an alien. And you see that slowly dying out as we have more Chinese immigration and things normalize. And especially in the sixties after you get I think it's in the immigration law. So you can watch these changes the thing with women. So you can watch adjectives for being really competent or smart are much more associated with men in these old texts and slowly with the rise of the women's movement from the BC we know that these adjectives are now being described used more equally for men and women.
- That's amazing. But let me ask you just like this is a little bit of a technical, do you look at those texts in the context of models that have been trained on modern sensibilities and modern connotations, or do you also look at it in terms of what their local connotations were in 1890 or in 1870? I don't know if you could do that but it seems to me that, like, I don't I am not being an apologist for the views of the 1890s but somebody from that time might say, well, you're using language from 100 years from now which is not the way we think. And I don't even know if I would believe that but it might be a reasonable protest.
And I'm just wondering how you dealt with the changing connotations of words. - No, that's a great question. And one that we think about a lot technically. And so the simplest answer is we don't look at the modern text at all. when we do these studies. We literally, there's so much old text lying around.
We completely retrain the models on data from 1890. So the model looks like a text reader from 1890 and we can ask exactly in the context of that model how are they thinking about, you know my grandparents were immigrants in 1890s, how our Jewish, Russian Jews talked about in the media how we're Chinese people talked about it in the media. We can see how our women modeled.
So this idea that we have this kind of microscope onto the past, and it happens that luckily psychologists, as early as the 1920s and thirties were doing experiments on as like I'll just often do on college freshmen to understand how they felt about people. And so what we found is that the text written in the twenties and thirties mirrors the psychological attitudes expressed in these experiments that we have sort of a test. - So like a positive control to make to calibrate your methods.
I'm glad I asked that question then. So you really are able to kind of control for the evolution of language to some degree in doing these. So that was a historical analysis but I know that you're also doing work looking at current threads and I can't help but ask you if this might be useful are there clues in language like to tell you, well this is hatred or this is fake.
Is that something that will text analysis might be able to help us get our arms around? - It's hard. I would say hard unsolved problem for both of this. So like toxicity and hate speech, then some, like the Supreme Court said about pornography, you know when you see it as a human we're relatively certain we know when you see it, but very hard to detect automatically. So just one example, again, when I started with the mental illness case or disability case these systems accidentally, because of the just the negative attitudes in the texts they're trained on think about discussions of mental illness as toxic or written by African-Americans more likely to be thought of as toxic.
So the task of deciding what's toxic much easier for a human, go hard for human but really hard still for a machine and fake news really hard, like propaganda. So the whole question of propaganda which is of course, closely related to advertising. And anytime you're trying to convince somebody by emphasizing some aspect or saying something that's just not true. Like that's really hard for us because I know it's not true. You have to know facts about the world and getting facts about the world into our systems is quite hard. They're trained on language.
So they're very good about facts that are mentioned in texts, but of course not all those are true. And so the, I would say the field of automatic fact-checking is kind of in its infancy. - I recently had a fascinating conversation with a colleague of ours named Julie Awono. And she's very interested in this content moderation.
And she's thought that part of the answer is going to the local area where the text is being used in generated to either persuade or mislead and go to local organizations on the ground who understand the nuance of why those arguments are being made and who's hurt by them and who gains from them. And that those local understandings can shed light. And it was a very interesting model because it's not that anybody's sitting 3000 miles away would be able to even recognize that hate speech. No, let's do anything about it. But locally where it's being put out. A lot of the local organizations know exactly why it's going out and what the positions of the players are.
And so she's advocating for a local understanding of these phenomenon, which I found to be very interesting. - I think that's very wise and I mean we've certainly found related things and sort of one finding for example is if you're gonna have people label toxicity, they need to know something about the language they're labeling. So if you're labeling African-American speech you better train your labelers either their speakers of African-American dialect or you train them in the variety so that they know what to look for. So it's very important that the humans who play a role in this either come from the community or are trained by the community.
And this is maybe true on a academically larger level which is we found that doing this kind of research, computer scientists and or linguists are, shouldn't be in a vacuum. So we're, if you're studying social harms you better collaborate with somebody who understands the particular social harms. We've been working on the language of that police you'd looking at body camera data. This is a project led by Jennifer Eberhardt my colleague in psychology.
And she understands the context of policing and African-American interactions with the police so much better than I do. And so, we were bringing in these different kinds of knowledge. It's really important.
- It totally makes sense that the micro context of this statement may lead people to understand pregnant words, pregnant statements that we don't understand who those from not that environment, this is the Future of Everything. I'm Russ Altman more with Dan Jurafsky about natural language processing and its potentials next on SiriusXM. Welcome back to the Future of Everything. I'm Russ Altman. I'm speaking with Dan Jurafsky about understanding and generating texts and Dan, in the last section you touched upon but I'd like to dive a little bit more into the medical applications of text understanding or text generation.
So I know you've done some work in mental health and other areas. Can you tell me what the approach has been in these different domains? - So again, I mentioned my sister has schizophrenia and that's caused me to take an interest in that. And so one project we did with John Uni in the med school and my student Dan Aither was look at intake interviews for schizophrenia to see if we can help doctors do a better job of identifying the kind of linguistic features that happened in the schizophrenic. So one thing people have talked about linguistically in schizophrenia language is a kind of incoherence. You sort of tend to move from topic to topic in ways that don't seem coherent in the same way that non schizophrenia speeches. And so coherence is just the kind of thing you could imagine an NLP system would help with because it's, we can automatically measure, like are we on the same topic from the sentence from the last sentence, are the sort of words flowing in the same way that they flow in natural speech? We have a lot of non schizophrenia could train on and we could look at the difference between that and schizophrenia.
- Fascinating, so now what is the goal? So you do that analysis. What do you wind up? Does it help the patient? Does it help the doctor? What do you do with the output of that analysis? - Here the goal is to build something that might help the doctor. So, as John has explained to me, doctors these are long hours, they spend hours doing these interviews and getting multiple people could look at the data to decide exactly the diagnosis is hugely consuming.
And you can imagine that some kind of linguistic markers indicators that might say, well, notice that this patient did more of this or it in the interview, when you mentioned this topic, this kind of incoherence happened. So some kind of a tool that might point out linguistic features that make it easier to get more accurate diagnosis or for the doctors to be able to mark a specific features that the language could be very helpful. skill of research question rather. - That really makes sense.
In my medical training, they taught us about this concept of the flight of ideas. And if a patient had a flight of ideas, it was along the lines that you're describing where there was too many ideas too fast and too many different directions. And we labeled it that way and we would report it to our attending physician if we detected it in the patient. And I can imagine that you could get much more specific about, well, exactly what did you see and how much flight was there and how many ideas were there. - Right, and if you were doing longitudinal studies of a patient could you notice them getting better or worse? And that's (murmurs) - Right, the treatment is working. So looked at patient perceptions of their physicians, I believe.
Tell us a little bit about that. - This is a very fun project. This is Heidi Chen and Emma Pearson. We're the first authors.
And this is a collaboration with Nishita Qatari in the med school. And a bunch of other collaborators. And here, we're looking at what people say about their doctors afterwards. - Excuse me, for laughing. As a physician I would just, it's like, be careful what you ask for. - No I mean, we have this for we get student reviews of course, on every class.
And if you're gonna open up your suit end of the quarter you get your student reviews after the course is over and you sit down and you make sure that you have your favorite breakfast and before you open that window because it's, they can be devastating and they're always helpful, but you have to yeah. Helpful, but painful. - Fortify yourself before. - Because they figure out what you're doing wrong and they tell you - And it can be, and getting back to our topic it sounds like the language they use can be detected. - So for example, when patients were talking about doctors, if the doctors are, if they describe them with more communal language. So if they describe them with, as being more caring or more inclusive or more family oriented, these kinds of words of a communal nature, they tend to give them higher ratings.
So doctors who are friendlier and more instead of rating caring, patients like them better, not shocking. And, doctors like are more likely to describe women doctors in that way. So these are communal being communal is a positive thing according to patients, which although it seems very sensible, in the past I think doctors were not trained in that way.
They were trained to do the theme professional and smart and knowledgeable. And it turns out that patients value that's other thing. - Right, ad like a authority figure at a distance would have been a model from the olden days. - That make sense And that seems like the well it turns out that's not what patients want. - So, and that's interesting, and that is as a physician I'm thinking and that is an almost entirely divorced from the quality of the health care that the patient receives. Like, we don't actually know if these friendly communal doctors are delivering good care but in some ways it threatens the alliance between the patient and the doctor if there's not this trust and this kind of mutual respect.
And sounds like that's a useful piece of knowledge. You've also just because we have a couple of minutes left you've also done work in text summarization. And I find this interesting because here the computer is looking at texts but then it's actually generating some sort of summary itself. And it sounds useful, but potentially treacherous. So what's the status of that work in that field? - That's a great question because I don't know if you probably know a lot about this but maybe six or seven years ago, we had a very large shift in our natural language processing systems were built with the rise of deep learning and these large neural networks. And I would say before that the field of generating tech.
So lots of our work has been an understanding text. You have some texts and you wanna measure some property of it or build a parse tree or understand that the text is positive or negative or extract information from the text, understand it. And tech generation with sort of the poor stepchild of the field, because it wasn't clear what you were generating. It was very hard. It was a hard problem.
I think it was the hardest problem in the field. And our methods didn't work that well in generating texts like oftentimes in NLP applications, if you have like a I dunno, a chat bot, like your theory type of chat bots for the generation of texts humans would just write all the texts out. And the system would just repeat and tech. That was the date of BR 10 years ago.
And what happened with the rise of neural networks is these networks seem to be able to generate texts better. What they form some representational understanding of the meaning of the sentence. And they're able to convert that back into words in a way that we weren't capable of doing before but the fields have been extremely exciting like all the applications for which we have to generate texts. So translation, dialogue, where you're having conversations and the system has to generate something summarization. The one you brought up where I'm reading a whole news article and giving a one sentence summary all of these are cases where before we didn't have really good handle on the generation part of the problem. And now they're working much better, but it's still hard.
So these neural systems, and they have a number of problems, they are very repetitive. They tend to say the same thing multiple times. They repeat themselves.
They're redundant that I'm going to. - Very good, touche. - Thank you. So they do that they miss, they say things that are false. So we're back to the old fact checking problem is they generate things.
They think which can be wrong. So how do you fix that? So these two issues, the sort of repetitiveness that they say boring things that are too obvious and they think false things that's sort of where the field is trying to figure out right now. - Well, although it does give me promise because this would describe also how my eighth graders used to write and how by high school and college they removed the redundancy. They got more concise in their writing. So maybe we just have eighth grade and we're headed towards a high school and college - I like that, you're an optimist. I like it.
- Thank you for listening to the Future of Everything. I'm Russell Altman. If you missed any of this episode, listen anytime on demand with the SiriusXM app
2021-03-16 15:15