Machine Learning, Ethics and Fairness
I'm. Natalia, Lavina I am the director. Of the center of the football Centre for technology business, and innovation I, am. Just. Gonna say that we this is part of the series and we, might have yet. Another event. In May so keep, your eyes open, it's not yet fully announced, it, would focus on technology. Strategy, but. We, will be sending you an invitation, in. In, mail if you are now a mailing list so. Without further ado it's my great, pleasure to, introduce professor, Foster. Provo who is a co-director, of the Fubon Center and who, is heading, our. Focus. On data, science, and artificial intelligence he, is a professor, here at Stern and he, would lead the conversation with, Professor, Baracus please, welcome Forster, pro. After. The. Really. Enjoyable. For me at least as the fireside, chatter chat, with Peter domingus last time one. Of the one. Of the pieces, of feedback we got from the, audience was you. Spent too much time on introductions. And so very. Quickly. Tonight, we have. Dr.. Solo and Baracus and. And. Weiu alumnus. And. Really. The, top. Scholar in the world at the intersection, of ethics law and machine learning and. That's. Pretty much what he's going to talk to us about tonight, he's. Currently. A. Researcher. At Microsoft Research in New York a faculty member at Cornell. University and. Here. He comes so. Thanks a lot I, have, to stop myself from listing, the rest of his rest, of his accolades. So. Thanks. So much for for coming and spending time with us this evening and, so someone's gonna gonna. Give a little talk and then we'll have a little all chat, cool. Everyone. It's, very nice to be here thank you, Foster for the invitation. It is a true. Pleasure to come back to stern I, did. Not get my degree from Sturm but I spent a lot of time here in fact I feel like most of what I actually know about machine learning originates. From these very generous conversations, that I have with foster many years ago now and. What. I'll be talking about it's actually work that started as a doctoral student and. I've been continuing, to pursue over the past few years and. Kind of remarkably in the past five, years in particular what. Was sort of a niche. Concern, has suddenly become this, gigantic, topic, of debate and, so, I'll try to give a survey, of this kind of emerging, discussion, around ethical, issues and machine learning and. I'll start. With. A, kind of cheeky introduction, here, so. This is an article from the New York Times from 2015. Now a couple years ago, which. Posed what at the time seemed like quite reasonable question can, an algorithm, hire better than a human and. It sort of rehearsed what you would think is the standard argument here that there. Are frailties. To human decision-making including bias, that. We might be able to overcome if we formalize, the process if we standardize, it if we make it vivid driven or force those decisions onto some kind of evidentiary, standard. And. So this article of rehearses many of those arguments that we would have probably, been familiar with already including. The following something like hiring. Could become faster, and less expensive lead. Recruiters to more highly skilled people who are better matches for their companies another, potential result a more diverse. Workplace the. Software relies on data to service candidates from a wide variety of places that match their skills to the job requirements, free of human biases. Now. This is actually I think a quite compelling vision, many. Of these things are in fact probably true but. This was a statement that was made at a time where there was a fair a fair, amount of. Naivety. Around these kinds of questions but the that. The move to algorithms, alone might somehow, purge. The process, of any kind of human bias and the. Few of us that were kind of working on these topics. Reacted. To this article and some of us actually had a chance to speak, with the journalists and then just, a few weeks later. The. Same journalist then and the Brian apiece quoting. Some of the key figures in the field kind of spelling, out what were some of the risks that in fact when we do use data, to actually drive decisions it doesn't mean that those decisions were suddenly neutral, or objective, or fair that. They're vulnerable to, inheriting. Many of the same kinds of biases, that we thought they might stamp out, so.
If Something like this but. Software is not free of human influence algorithms, are written and maintained by people and, machine learning algorithms adjusts, what they do based on people's, behavior, as a result algorithms. Can reinforce, human, prejudices, so. And this kind of I don't know a couple weeks span, I feel, like we actually observed what has been the shift in discussion, for the past few years where there was a lot of enthusiasm and. Hope, about. The possibility, of machine, learning and algorithms. More broadly being. A force to advance civil rights and protect, people against discrimination, and increasingly. There's a realization that that is not automatic. Feature of using this technology that there are many ways in which this can end up having, the same types of problems as the human decision-making that it might replace. So. I'll try to talk a little bit about some of the ways that's true and I'll give you a little bit of an introduction to this field so. Following. Some of my work as a doctoral student I set. Up a workshop, with, Moritz Hart who is now a professor at Berkeley in the computer science department the, workshop was called fairness accountability, and transparency machine, learning and this was a workshop. Attached, to NURBS which is one of the kind. Of leading annual, machine learning conferences, in. 2014. Foster, actually was one of the participants but, we had maybe a group of what 30, people. Maybe. That's generous a small. Group and. This. Was definitely a kind of Fringe, question, for the field of the time some important, figures, academics. Were able to attend but, it was not something that was seen as sort of a legitimate area of study and if, you fast-forward through, the kind of sequence of workshops this has now become a kind of core topic, within the, more technical, parts of the community and increasingly even something that's seen as oh I got a, topic, of popular conversation, I guess, the the heuristic I have is that I went from working on things and I parents did not understand just something that they actually talked to me about routinely. Right. And. Actually just to kind of drive home this point last, week actually senator, Wyden, and Booker introduced a federal. Bill, which. Is let's see it's called the, e algorithmic. Accountability, act of 2019. So. In a pretty short amount of time this went from I think being. And. Not particularly well under Stuart or take or some we have taken terribly seriously the, something that's actually now being introduced, as legislation, by a presidential, candidate right and I also know that there's similar conversations, happening with other presidential. Candidates so. What's. Interesting about this and I'll say more as we go along tonight is that this is actually a law that applies broadly, this is not just one of the traditional domains that have been subject, to discrimination law, which I'll mention this is a new bill that potentially, would regulate all sorts of business. Decisions not, just those traditionally, covered by discrimination, law the. Possibility, of such a bill actually being passed it's not so clear but.
It's Actually interesting to see that there's movement in this direction. Okay. So. Let, me kind of try to get some important, foundational. Questions out of the way when. People talk about bias or discrimination or, fairness in this area especially if you come from a slightly more technical background, it can be confusing what this term even means so for those of you who have some background in statistics or, in, computer science bias. Could, invoke, the following things the idea that the way that you collect data might be biased in the sense that you're not collecting. A representative sample, of the population or could refer to something like bias of an estimator, meaning like the prediction you're making or systematically, off from the mark or, especially. For those of you a machine-learning you could think of something like inductive, bias which is the way that we're sort of trying, to focus on certain types of patterns in the data from, which we're trying to learn. Now. These all have, the possibility, of actually contributing, to problems with what I will call unfair. Bias but. What I'm going to try to point out tonight is something slightly different. I'm, going to try to focus on something that I'm calling sort of unjustified, discrimination or, unjustified. Bias. And so you, know you could think of this as being a way to respond. To the quite reasonable, point that isn't the whole endeavor. Of machine learning or classification. Or supervised machine learning to. Discriminate, to be able to make distinctions between between, people to, figure out which class, some, particular example, belongs to. But. What I want to focus on tonight is instead these kind of unjustified, basis, for differentiation, when. We might say that some, particular feature, about, people for, example their gender or race we're, going to assert it's not relevant to the decision at hand and, interesting. Lee and I'll tell tell, you more in a moment, even, when certain features, like your gender or your race might be predictive, of some outcomes we have decided by law that those things should not be considered because we deem them morally irrelevant even if they are predictive and.
So. I'll try to explain how when, we're speaking about bias or fairness and machine learning these are the kinds of situations we, have in mind not, the kinds of issues of bias I just mentioned a moment ago, ok. Um and to kind of continue to lay the groundwork here I think it's important to understand that when, were thinking. About unfair, discrimination that. Discrimination is not the general concept, right so it doesn't really make sense to say at, least a normative terms that. You discriminate, against people based on the color of their shoelaces right. Here. We often mean something like there's discrimination happening in a particular domain and I'll, describe those again in just a moment and that, it's discrimination, on the basis of very specific features, and I'll also describe those let me start with domains. And. The United States at least we've passed a series of civil rights laws that are focused on regulating decision-making in a number of high stakes domains, that. Includes things like credit education. Employment housing, and public accommodation, I mentioned. Most these things are familiar to you the one that might be less so is public, accommodation, this, refers to the idea that businesses can't refuse to service certain customers, like you can't refuse to service people, at your hotel or at. The at. The movie theater based on their race for instance. So. Here the the kind of restrictions, on the way we can use certain features and making decisions or or. Kind of determine based on what domain you're in so when making credit decisions when making education decisions and so on. It's. Tightly, scoped, so there's no such thing as kind of a general, discrimination, law these are all specific to different, domains, and. I should explain that in many cases even though these are sort of regular the way we make decisions in those domains it, can extend to things like marketing advertising too so if our advertising, is, purposely. Trying to exclude, certain people, or seek, out certain people to, the kind of detriment, of others, these, laws might extend to that as well and because. This is a business school I can happily set aside all the much more complex laws that actually govern government.
Decision-making So tonight we'll only be talking about industry decision making, so. In terms of what features we actually care about these things are actually, they. Have been sort of added over time right so this kind of reflects changing, social understandings. About what, is a unjustified. Or objectionable, basis upon which to make important decisions many. Of these were first enumerated, in the Civil Rights Act of 1964, so things like race and. Color. And. You, can see as this goes on but this kind of this is growing with time. You'll. Notice in particular that disability was, added in 1973. And, that even more recently the kind of most recent discrimination, law is around genetic information which. Governs employers. Use of genetic information in decision-making and so, again this is not to say that like shoelaces. For, some trivial purpose it's discrimination this is saying very, specific, features in very specific domains, or the things we care about often. Because these have been the, kind of basis, upon which certain, populations, have been kept in a subordinate position often. For unjustified, reasons and that, these types of decisions the, ones I mentioned a moment ago are really high stakes they really determine, people's, life course and their ability to access, fundamental. Opportunities, in their lives. I'll. Kind of try to round. This out by just giving you some, sense of the way discrimination, law works and this largely. Applies to all the laws I just mentioned so. There are two different ways to kind of conceptualize discrimination. The law and this would be important to understand how it applies to machine, learning so, the first is that and this is probably a more familiar one it's called disparate treatment this. Is the idea that by simply considering something like gender and making, one of these decisions, for instance about employment. You've immediately, violated, the law simply considering, this fact alone in that context, is illegal okay. Now. The, other way you could imagine this too is that let's say you found some, very convenient. Proxy. Purpose a kind, of factor. That you know it's tightly correlated with gender but. It's not gender itself and you're using it in your decision-making because, you know it to be a proxy this, would be something like intentional. Discrimination even, if it's not directly considering. The, prescribed feature okay. So, this is the kind of traditional way that I think most of us would think about discrimination but, actually in the law there's another doctrine, which is called disparate impact and this. Is designed to do a complex, set of things the first is to say maybe. Even, if the factors, that you're considering are what they call facially. Neutral meaning. On their face they seem perfectly benign, it, may be that they when. Making decisions on that basis, you're, still producing, a disparity, in outcome, along, the lines of gender or race or one in sort of categories, and. Unless. As a decision, maker in one of these domains you, can actually justify. The, use of those factors, in making such a decision you. Might be liable this is a way of saying that unless there's good reason, to be making decisions in this way because. This decision is actually producing, a disparity, an outcome then, it shouldn't be justified. There's. A slightly different part here to which is to say if it's avoidable, you.
Also Might be liable so if there's another way to pursue, the same business objective, that. Actually, reduces, the disparity, an outcome that too might, be a reason why the decision-maker, is liable so this is kind of an injunction to do the following right so the first is sort of saying there, should be equal. Opportunity, if for people regardless of their gender regardless, of their race that's, the sort of way of saying don't consider these factors the. Other is a kind of principle, of minimizing, inequality, of outcome, as, much, as possible right, so not to say that like you need to have perfect, paradeen outcome but, instead if there is a disparity, an outcome you should try to minimize, it as much as possible subject. To your kind, of business objectives. Okay. So, having, now kind of reviewed all this I'll try to explain to you why on. Its you know just based on these principles you you can kind of see that to the extent that just machine, learning models do. Not actually consider, something, like gender explicitly, it, will never violate disparate, treatment right, it's just pretty easy to actually make sure that as a decision, maker you, can satisfy this requirement by, making sure the model doesn't have access to that information so. If there's going to be a place where discrimination, law is relevant it's in this second category right where the, facially, neutral features, are producing this disparity in outcome when you might want to ask whether or not it's justified, okay. But let me kind of is still explained well I think there, was so much optimism and why actually things probably justified to still remain quite optimistic about. The use of machine learning for a lot of these high-stakes decisions, so. To give you a sense for the kind of severity, of the problem at least in employment there's some really seminal work that's been done in, audit studies these are studies, where you, send resumes to, employers, which are basically identical and the only thing that you would vary are signals. Of the person's race so in this case they work. I'm citing here they stay varied, the names of the people applying such. That the names were very clearly, going to be a white person and very clearly going to be a black person and everything. Else was held constant, so it's a very nice study which. Allows you to kind of determine just. Based on the perceived, race of the applicant, how much, does the callback rate differ meaning, our employers. More likely to return to, reach out to the applicant, given, their race and here, what they find is like extremely damning, right so the callback rate for applicants, with white names what's 50% higher than those with black names all things, equal right everything else true everything, else about them being the same and. This was work that was done in the early 2000s, but a meta analysis this is a more recent paper kind of surveying a huge range, of audit studies over the past 25, years found. That there are really very little, change in this, rate despite. A huge amount of investment on the part of industry to try to deal with things like implicit bias and all sorts of other problems which. Is very discouraging right so we might feel like we've become much more sensitized, to these issues but, the evidence suggests that that actually hasn't been much progress and, so, you might think that one way to deal with this and there's actual evidence to support this is to, try to formalize decision-making, even more than you have already right, to limit the degree to which there is human discretion, in the process which, might be the Avenue, through which biases, entry in these decisions.
And. I, think for that reason many of us thought, like well isn't machine learning going to be the kind of pinnacle of formal decision-making right it's, not gonna be us sitting down for, the right, by Handsome heuristic, about what we think a good employee would be we're, gonna actually learn what a good employee is from the data right, it's all gonna be coming directly from the data we. Can make sure that this is not could be discriminatory in some sense. That I mentioned before and disparate treatment sense because we can withhold that directly from the model and. We can completely. Eliminate in, some sense any role, for subjective, assessment, because it could be fully automated doesn't have to be a decision made by humans, so. I think this is sort of the impulse that the journalists probably had when. When writing that first story and I think there are reasons to believe it this is still a real. That. This holds promise. But. What I'll spend the next few minutes talking about is how actually despite this problem is there are still real risks and I'll, show how there are some perhaps. Now well-known but still quite subtle problems, that allow machines, to reproduce many the same biases, that we see in human decision-making, so. This comes from some work that I did with my longtime, collaborator. Andrew selfs and I'll just quickly walk through a few examples and I, don't know what the norms. Were but if people have questions as I go through this part in particular please maybe resume on is that okay. Excellent. Ok great ok so I'll, just do this very quickly so as, I, mentioned right. Selection. Bias is. One notion of bias, but a selection bias in the sense of statistical. Bias can produce what you would call a kind of normative bias so, really, interesting work that's been done by Lum and Isaac looks, at the way that arrests. Information, is used to train models, for predictive policing and, what's. Really important here is that arrest, records, are not a perfect representation of, the incidence of crime in society it's, a very particular, way of actually, recording, one crime has occurred and so I thought I thought this quote was very nice so arrest, records or police records is what I call them is some. Complex, interaction, between criminality. Policing, strategy and community, policing relations. Right so this is not capturing. Crime. As it occurs everywhere as where, police happen to see it it's the topic crime that people are likely to report it's, a something crime that people are likely the report given the relationship to the police right. And. If we build models using that data we're, not necessarily going to build a model attack that's able, to predict crime wherever, may occur we're, going to be able to predict a crime where police, have been able to observe it in the past and. This. Can have a kind of feedback effect, where. You build models and it suggests that you deploy your police in a particular area those. Predictions are confirmed because you actually do observe crime there but, you've also pulled police from other areas, so there's less opportunity to observe crime where they might have been looking previously, and they, don't get dis confirmatory, evidence or, they don't know that there have these false negatives, and.
So This kind of compounds, over time where the predictions, are confirmed and they don't have a way of learning that they're you. Know their, failure to look somewhere is, not justified. Okay. This I'm gonna say is a little bit different than what, you might think of a stated example, so with. Machine learning what we're doing is we're taking a large, set. Of training data and, although. We can be sensitive to this possibility we often accept, that this is kind of ground truth right, we say that whatever's, in the training data is going to serve as our ground truth for how to build this model, but. There might be reasons, to actually be suspicious of the quality, of that training, data in the sense that isn't actually capturing, the thing it says, it is is it a is it a correctly measuring, the, quality that, it's standing in for and I'll give an example here that tries to make this a bit more concrete. So. Let's imagine a, scenario where we're training a model using, some. Data about past, hiring decisions in order to predict who to hire in the future and let's, imagine that despite the best efforts this institution, had some degree of discrimination. Or bias of. Influencing. Its previous, decision making right, now I'll. Start with the kind of silliest possibility, and I actually originally thought that this would be very uncommon in practice but actually I've learned that this is weirdly. Common, in practice which is the following maybe. I just want to predict, what, the hiring decision, decision-maker. Had made how, does this maybe, the goal here simply, to predict what the human decision-maker decided. When looking and applicants in the past right so the goal here is. Predict. What a human would have decided when looking at this application okay. And here. Like the labels, are basically was this person hired or not this. Seems like not the best thing to actually predict because what you're learning to train the model but you're training them all to do in this case is to, replicate what, decisions, the the human and may in the past right, so at best all it will do is actually replicate, what the humans do we're doing the past. That. Might have other benefits like eliminating. Your HR department because, you have to pay them and there may be some ways. In which is this awful learning things that are not completely, cleared of like the subjective human decision-maker but, this is kind of like the the lease, I think creative way of doing things and of course to, the extent that human decision-makers actually were biased, and their previous assessment, this model will just learn to reproduce it ok, but let's think up something slightly more sensible. Or just like ok let's let's try to predict who will do well on the job and then higher on that basis and here, you might think well what's the target variable what is the dependent, variable we want to predict in order to make a hiring decision and you might say something like well we. Have annual reviews, and those are scores in many, cases and maybe that's a nice thing to focus on so. We're trying in this case to predict those scores I didn't. Mention this earlier but there's also a fair amount of research that shows despite our attempt to formalize this review process still. Implicit, bias kind, of corrupts, the assessment, and these annual review scores end up actually being, influenced, by this subjective assessment, as well and so, again what our model is learning to do is not to predict their actual true performance, in the job it's. Learning to, predict. How. Humane. Manager. Would assess that person right which, is again like maybe better than the first scenario but not necessarily completely departing, from, human decision-making and now. Let me talk about the final one which is like let's say actually we just choose an objective measure something that really is not somehow influenced, by a, person, subjectively, assessing, them something, like what would your sales for that year and. You might say that if there's some difference in sales that's a justified, basis upon which predicted, difference in sales that's, a justified way of. Obtrusive. Who to hire but. I'll give you two possibilities, and you might have different feelings about the two I'll describe so one might be let's. Imagine that, women. Have overall, lower sales figures but, it might be because the work environment, is more hostile to them right. There's something happening within, the institution itself that, makes it a less, welcoming. Less supportive, environment, in which to succeed, for women let's say right or, let's imagine the reverse right let's say that like you're dealing with customers your, sales staff is dealing with customers, who themselves have, maybe bias tastes, right so it may well be that women have much higher sales figures because the people they're interacting with are much more susceptible to their persuasion or something like that right, it's.
Interesting Question whether or not we would see this as troubling from the point of view of the business but again what what happened is like you're not necessarily predicting, what their sales would be given. Their, kind of same underlying capacity, you're, sort of predicting how they were due given those particular circumstances, and depending, on how we feel about those circumstances, we might not feel so willing. To use, a model, that simply assesses, people assuming those circumstances, can never change. Let. Me speed up a little bit another, possibility here is that we also don't have these examples we have. Features. We have you, know the kind of factors we're looking at more making these decisions and it, may well be that certain factors, that will have access to end up being more informative, for some populations, than another meaning I can make higher quality, predictions. About one part of the population than I can about another, an. Example, this might be something like thin credit files or no credit files right there are certain parts of the population. For. Which we either have poor, information or, no information, and so, we're not as effective. In making predictions about that from the population, and that itself can end up having a kind of effect on the distribution of outcomes. More. It's hard to I mentioned earlier had this really interesting insight from years ago where he said that by. Necessity. And. Minority. Populations I, mean in some part, of the population that is you know smaller than the rest of the majority will. Constitute, a proportionately. Smaller amount, of the training data and our. Models, tend to do better the more training data we have and so. Minority. Population that does not actually follow the exact same patterns, as the majority will, just be harder to assess just by virtue of having fewer examples of them so, this was like kind of really straightforward point, that has I think really troubled a lot of people. And. Then the final thing I'll mention is that we. Also have to cope with the problem proxies, which is that even, if we don't consider, the, sensitive, attribute like race or gender that, these things are increasingly reflective, across the many other factors that we do consider and. It's not always so clear how, to determine whether or not something being correlated, with, the sensitive attribute, is legitimate. Or not right does it have to be a perfect proxy, or is it have to be something above some threshold there, isn't really some principled, way of deciding this and the, rich of the data set becomes which, is increasingly I think the thing that business confronts. The. More difficult it will be to ever end up the situation where these things are not pretty well and coded redundant, Li and the other factors okay, so this is the kind of quick summary of what I was going to say I, think, this presents three, quite different problems for people who are trying to confront this these, issues there's. A whole entire area, of computational. Work trying to address these problems but. I'll kind of spare you that for now and simply say that this. Kind of suggests that there's a fair amount of work to be done trying to understand. How. Our models, are actually not doing as well as they are reported to do so we when we evaluate, them we often do it with holdout data meaning we we, have a set of data we've kind of set aside we. Want to say on the, holdout data how accurate, is this model performing, but, what ends up happening is that if the holdout data was collected in the same way as the training data it, will have in, many ways the same types of problems and so it will report a degree of accuracy that is actually not true when applied in practice but, it's true on this particular data set and this. Is precious because we often don't know that there's also not only an obvious and easy way to know that, our models doing will do less well in practice than, it does and the kind of holdout set. The. Other problem is that it may well be that the reason our model does less well for some populations, than others is that we just don't have enough information either. In the terms of the features we have or the set of examples, and this, is and can end up being a difficult problem to solve both because could be very costly, to collect more information or reminded, even be obvious, how to collect the features that will allow us to do as good of a job and then, finally if there are proxies, you might ask well like why is it that certain features end. Up being correlated with the outcome what I care about is the, reason that these things are correlated actually something I feel comfortable with or is, it for some kind of historical, injustice right. Is the reason for instance the the fact that I tie my shoes a certain way well that's about example but you kind of get the idea right if there are certain things that just seem to be correlated for reasons having to do with past injustice, we might feel uncomfortable making decisions about basis okay.
So, This, I think all applies. The decision-making I described earlier in, the sense that credit employment. Education. And so on these, are all areas where machine learning has quickly entered and, where there were significant, hopes that would have this effect of reducing bias, but, these are pretty significant, problems confronting everyone, in the area and they're not just purely technical problems, they're. Increasingly legal problems actually there. Are many lawyers we're trying to figure out how to bring cases about. These questions and in fact just this past month, Facebook, hasn't been involved, in a sequence of lawsuits. That were sort of broadly in this area so I can imagine that in the next couple years this. Will not just be a kind of academic exercise, or even a technical problem but something that we have to think about in terms of complying. With existing law. But. What I kind of miss from the remainder of the time talking about are a whole set of issues which, don't actually fit into these regulated domains a set. Of applications, of machine learning that are not covered by discrimination, law and what, I'll describe here is work that I've done with colleagues at Microsoft Research including. Kate Crawford an, interpreter who's actually an intern with us but he's from University, of Pennsylvania, and Hannah, Wallach who's also at Microsoft Research, and. We call this the kind of shift from a locations representation, and hopefully I'll make this clear with a example, to start off with, Latanya. Sweeney is, a computer scientist and professor at Harvard she, actually did one of the earlier studies that is extremely, well known in the area demonstrating. That adds four, distinctly black, sounding names were, appearing, more often, suggesting. That people had an arrest record when. Searching for those names on Google as compared, to distinctly, white sounding names okay, so. This is actually a screen. Grab from her, own paper you. Would type in her name these. Kinds of ads would come up and to, be clear Latanya. Has no arrest record the, advertisers, were not targeting, on that basis as far as I understand. It. Seems likely the mechanism, here was the way that Google was trying to determine which ads took place and if anyone knows the way this works it's an expected value calculation actually. I learned this in Foster's, class so I should point out so. It is the kind of the bid price how much the advertisers, willing to pay for the ad to R on x, the likelihood, that it will be clicked, on right. And. The way that that Google the likelihood for an ad to be clicked on it's actually run the ad for a short period of time and then observe like you know how much the users actually interact with it and what.
Latanya Kind. Of hypothesized, and it seems likely to be the case is that what, what happened here is one shown these types of ads depending, on the name of the person who's being queried, users. Themselves were more likely to click on the ad then, they, were if it was a white sounding name something along those lines this has been much further complicated, and subsequent, work but that's generally the idea here, the, reason I mentioned this is that this Letran, his main concern about this was someone. Was about online dating right, or like you know you're gonna meet someone they're gonna type, your name into Google and they'll see that you maybe have an arrest record or something but, her more serious concern was actually employment, she actually was worried that employers, would look at this and then think well this person has such arrest record in and kind, of discount them out of hand, but. Actually think that there's something we can learn from this lesson right so all, the things I've just talked about were these kind of questions about locating scarce resources, right how do you how do you decide who to keep. There's. A lot of questions about how to use these kind of tools in criminal justice so who do you actually incarcerate, when you keep pending, trial but, like questions about employment and credit are these sort of we have scarce resources, that we're trying to allocate and another, sense though there's a question about not just allocating, these resources, but. How these systems are involved in representing, certain kinds of identities so. Here the kind of sequence of steps that I think Latanya has in mind is that these. Ads were representing, black, criminality in an often stereotypical, way and. That then reinforces. People's difficulty, in the labor market right so they have their name search and then it makes it more difficult for them to find a job um, but we can actually kind of just stop at the first step and say like what is the role that these systems play in representing. Identities, in certain ways that can be harmful right, even, setting aside the downstream effects that might have on their ability to land a job. And, so I'll try to discuss a few of these examples and I'll suggest, that the reason that we have generally focused on these allocation, problems aside from the fact that they are covered by law is. That they're often much easier to kind of formalize we can actually think about them in more discreet terms so and then in the one sense these are in the allocation context. Like employment decisions it's an immediate decision we just decide to hire this person or not we, can kind of quantify the difference. In outcome in ways that lend themselves to these measures they. Tend to be these like great discrete decisions transactional. Whereas, in these representation, scenarios, which I'll give more examples of in a moment it. Can be much more long-term right these are not actual, decision moments these are sort of just presenting, information in ways that might contribute to certain beliefs about certain groups I mean it can be very difficult to formalize exactly, what is the problem here and the, kind of thing we're talking about here aren't specific, transaction, that sort of culture, right, how do these things actually contribute, to our popular understanding, of people. So. To give you some examples, of this this, comes from some work that many of you must know this. Is the gender, shades project. Which was an, attempt to show that a bunch of standard, facial, let's, say men get this right gender. Recognition, algorithms. Actually. Did much worse for dark. Skinned faces and in particular for dark skinned female, faces so. If, you kind of look here d/f. Means dark. Female diem means dark male LF, means light female L M means light male and. What this is trying to show you is that there. Are dramatically. Different rates of accuracy depending on which group that you belong to, and. Here you're looking at a bunch of standard packages so microsoft's, face plus plus i'm not sure if the developer is and ibm's work. This. Has become a kind of seminal example, in the field that shows that you know some commercial, offerings, that are often. Used. For many different applications, by their clients, suffer. From pretty profound differences, in performance depending, on the race and gender of the people that they're applied to you okay now, again this is not actually something that is necessarily illegal they're not being used to make decisions about employment or about, housing, or anything like that but. Nevertheless we might be very concerned about their application right this, might even range from something as simple as does the thing work like does your. Does. Your computers.
Way Of authenticating you as a user that, relies on face recognition actually work as well for certain populations is another right not illegal, but, certainly something that I think business would be sensitive, to. Much. More long-standing concerns. Around things like autocomplete. So this is sort of some work from 2014. In Australia. Where. Maybe. People have this experience where autocomplete, can often have suggestions. That are like, either. Stereotypical. Or offensive. Right like kind of denigrating. The particular, identity this belongs to you. And this has been something that's you know for years you, have stories, coming in and waves about different groups who might be subject to this this is often very sensitive, to what's kind of popular in the news right, but. Again these kinds of interesting questions where to, what to what extent do we want our tools which are trying to help people actually find information, to, reinforce, some of these, harmful. Beliefs, right. Or. Some, work from some colleagues at University of Washington they, were looking at the results for things in image, search and here. Anyone. Can guess what the search term was. Yeah. I guess it's kind of helpful if you see the actual word CEO um but. The thing that this what's really great about this is well great is the wrong word I suppose the. Very first, appearance of a woman in this entire page, is the bottom right and I guess people can see it it's. Barbie, yes so this has actually changed and if you do this now most, of the search. Engines have actually tried to address this problem, but. Again this is an interesting question right so when doing the kind of information. Retrieval you. Have questions about like what should the composition, these results look like and. This work actually does, this quite nicely across. A number. Of different platforms and. Some. Colleagues at Princeton, a few years ago we're. Looking at word, embeddings which is commonly used for all sorts of NLP, tasks, here, one, of the authors actually is Turkish, they. Were trying to show that language, translation, suffers some interesting biases, in Turkish. There is a general, gender-neutral. Pronoun, right, so you don't have he, or she there's just a gender-neutral pronoun, and what, happens when you actually translate from English to Turkish as it goes from the kind of gender pronoun to the gender-neutral pronoun and then if you just go back you'll, see that it actually swaps it in the gender stereotypical. Way and this. Is perhaps not surprising, if you know very much about the way this works but these models were kind of statistical, in the sense that they look for what are the regular, ways in which these terms co-occur, and it.
Is Basically just reflecting, the fact that the corpus upon which we train these models, probably. Had these, these terms using this way much more frequently than the reverse right. So there's a way in which training, our language, models, on, corpus. Of texts. Corporal texts which, themselves potentially, reflects, very, gendered, very, troubling. Way. Of speaking about the world can, learn to encode those same stereotypes. Okay. And then maybe in the remaining few minutes I'll just quickly talk about what are some possible ways to think about this and this is why no means exhaustive. So. One is the following which is to say maybe the right thing to do is to do nothing right. Maybe when confronted, with these kinds of problems, what. We should say instead is that this, is actually important for us to recognize right, that when we are observing. These. Kinds of results we're actually learning about, something that's objectionable, in society that we should address head-on, in. Fact perhaps, it would be wrong to change these things if it denies people the opportunity, to know that, these continue to be kind of commonly held beliefs, something, like that. Or. You could have the kind of following position which is something like well maybe, this is actually just a problem of accuracy right. So for instance in the case of gender recognition or facial recognition, one. Way to respond, to the differences. And accuracy, scores is to simply make sure that you know your model is as accurate as possible where the goal here is simply perfect accuracy. This. Is actually itself led to a pretty contentious, debate within the community or some people are upset that the kind of normative debate has shifted just simply, toward improving, accuracy and away from whether or not we even want to use these tools. But. Here you could imagine the appropriate way to solve, this problem that's just to continue to improve accuracy, in. Much, the same way that you might have before. There's. A like on a more manual way of doing this and certainly this is the common, approach for things like autocomplete, I don't. Know if anyone ever noticed but there's even a way to like report an autocomplete, like is this an objectionable, suggestion, and. So, what we basically do in this case is to say like well this is going to be an impossible task for us to kind of deal with computationally. How, do I figure out all the possible offensive, autocompletes there might be that. Doesn't seem like something you can solve with. Like a heuristic the best we can do is sort of rely on the, wisdom of the crowd and so, you might have a way of crowdsourcing, the. Identification. Of terms. That we see is objectionable. There, are of course some attempts to try to use computational, methods and in fact these, word embeddings, and NLP applications, I mentioned earlier some, of the thinking that people have advanced, is to say that well maybe we can actually preserve, some of the information we learn from, these corpora, texts but then sever the relationships, in the, model that we don't want to preserve so for instance can we disassociate.
Certain. Occupations, from certain gendered, words such, that the model continues, to maintain other information, about these associate, the the words associations, but not those right. So, we're kind of breaking, the associations, that are gender stereotypical, while preserving the other ones this, is the idea that maybe we can scrub these things to a neutral state. Another. Possibility which might feel, somewhat obvious is like well maybe at a minimum we should aim for representativeness, right, so that CEO search, result probably wasn't even representative, I don't think I hope it's not in. The sense that you, know among, a hundred results there. Were only one and the one that was there was Barbie right like that doesn't, seem to be representative, of the world in which we live I'm, sort of a minimum maybe what we should be doing these domains is. Striving. For a representation. That, is in keeping with the current state of affairs. But. You also might have a more, progressive. Vision you might think that this is an opportunity to, display the world that you want to exist right that maybe there's a way in which these companies can do more, than just simply try to represent the world as it is but, in fact reflect, back an image of the world that we would like instead. And. In, this way a lot of this work comes much more close it becomes much closer to a lot of the thinking, traditionally. In media studies in the communication, right what, is the way in which we're going to evaluate the. Representation, of different people on television or in film or in books right, these are the same set of questions. And. Then the final point and, this is one that some folks have advanced, some, of the folks at Princeton have advanced is. That maybe the goal here is to like be modest, about how much we could ever even do to address this problem there's, not going to be some perfect, way of kind, of eliminating. This problem from the tools, but. We can cultivate it kind of critical sensibility, maybe, we can help our users understand, that they should be as. Critical, of the computational, platforms, are working with as they, are potentially, of the, media that they consume something, like that right, and. What's, interesting about this area is that like I would not say that there is some. Obvious direction that anyone is going there is increasing, evidence to suggest that the set of problems that industry is confronting, are. Not just those to do with traditional things like credit and employment but increasing these things like this where. There's not going to be legal guidance because there's no law well. There's not going to be a kind of obvious answer because it's not often clear where, it might be contentious, or quite politically. Sensitive. To figure out what the right thing to do here is here, and. My sense is that this is where many of the large companies are struggling there's a good challenge. To address here and, I'd, love to be able to talk about this kind of stuff in the remainder of the time too so I think I'll stop there Thanks.
They. Put my mic on so that I sit on the, stage. Right well. Thanks a lot that's as always that's. Just. In case sure. So. Um. So. I got a bunch of things that I thought we might talk about and I guess we'll get to some subset, of them or else we'll use them up and then we'll just to get field, questions field. Questions from the rest of you guys. So. I thought might start. By, sticking. Over on I guess. It's really more on the allocation side, but. When. I look at the. Literature. On, this in, this area, as. We've talked about at, length you know the, you. See a lot. Of most, of the focus is on, either. Criminal. Justice or, for. Business on. Cases. Where. It's. Really. Violations. Of the law or things that are going against, regulations, which I guess could also be seen as offense violations, of the law right, and and, as, you sort of are talking about at the end you know it seems to me like although, that's very very difficult it's kind of the easy part. Technically, that you know there you know the easy part right and before. We know, if we're gonna get into the representation. It seems like even in these more. Straightforward. Decisions, there's there's. There's. An interesting. First. Level, of complexity, to sort, of add in there one, is in all, of the I think in all of the cases that, we, were you were considering, housing, employment. And, so on right it, would be the case that, if, an, individual. Decision, were made. Along. These lines right. That you would see that individual, decision, as being, problematic. Right. So, I thought an interesting case might be can you guys think of a case where. We. Are completely. Comfortable. With. Individual. Decisions, being. Based. On. Discriminating. Based on any characteristic. You can think of race, gender. Sexual. Preference. Anything. Well. Not tender, but. Our, individual. Decisions. On. Choosing. Somebody to date, right. You. Know you. Know if I happen to like people, that look. Like this right. You. Should be okay with that right, and, the reason I bring that up is because. Here's. A very interesting sort of case that sort of steps us into into the you know starts to put our feet, into the deep end there because as, soon, as tinder. Would. Start, to. Build. Models. That now are going to apply these criteria, you, have individuals. The individuals, are making these decisions based, on whatever it is that that is that is right for, them tinder. Learns these by using machine learning or whatever and now as a motto and now what I'm going to do is I'm going to figure out what I think you're, gonna like and represent it to you which.
May. Be first of all may well not if, especially, if you haven't specified, your preference the preference. Is along a particular dimension. May. Not reflect that but also may. Be discriminatory, in all the manners. That you've said isn't necessarily against, the laws I don't know what the laws are on dating sites but maybe is very undesirable. It's. I have to have to laugh because I actually have a paper. Okay. So. Yeah to give some context to this actually there's. Been for. Many years discussions. About sort of sexual, racism on online dating, platforms and. Like. One of the more well-known. Cases is with coffee, meets bagel which, is you. Know this app that helps you find meat people to have a coffee or bagel with and. In. This case actually I, think, you were allowed to kind of indicate whether or not you had any particular preference, and like the race of a potential, partner and even, when you did not indicate such a preference, mean that you say like I have no preference. What. Happen, is that they actually learned from your behavior that, you did and then began to make recommendations, along. Those lines despite the fact that you kind of like declare that you had no particular interest and. This, was an interesting question right like what does it mean to kind of like. What, is more respectful right should you kind of follow, the. Declared. Interest, or the observe, interest, right like which is being more, respectful. Of the person's like you, know underlying, preferences, and, I. Think in general in this domain right we would often, to, be clear although. We can make some pretty creative legal arguments for when this might actually be problematic for the most part I think most people would agree most lawyers would agree this is like not a legally, regulated. Domain and, that there are many many reasons well beyond the law to. Understand this to be like a deeply personal, choice. In which we like publicly should not intervene, but. I think what's interesting about this area is that like we. Don't necessarily know, what people's preferences are in fact people don't even know their own preferences, right, and. So there was a scandal separately. Many years ago with, OkCupid. When. They explained that they actually did a bunch, of experiments to figure out how. Effective, the combinations, were because they need to actually experiment to know if. Their, users were actually being, meaningfully. Experienced, be meaningfully, improved and. It turned out that like actually they weren't doing a great job they, recommended, that people go on dates with people that were predicted. To be not a good match and then, told, that they were and that alone was often enough to actually improve the reported, quality, of the date but. What's interesting about that aside from the controversy around experimenting, on users is, that, it kind of shows that like unless, we, actually experiment, unless, we actually kind of accept that we don't even necessarily know our own preferences. We. Can end up sort of just reinforcing. What are the. Poorly understood preferences. Or pull the Express preferences in the first place and. I, think that like there's a lot of opportunity, here not, just to sort of respond to the idea that we should try to engage. Encourage. Companies, to foster. More diverse interactions, but, actually they should probably be experimenting. Anyway right like we don't really know how all these things are doing and, so, I think this is an interesting, way in which some of the concerns, around.
Bias. Interact, with just a general sense in which we, want to make sure that we're not just content with our limited observational. Data, right. Like observational, data is going to be very, poor in comparison to the wide, range of possible things that people might be interested in yeah and so there is this this this notion of. You. Know what are the data the observational, data actually, is. Also. I think, brings up a limitation. Of an awful lot of the of the current work because, the current work basically. Thinks of machine learning as as, the following right, um you. Have a set of labeled data let's, just forget about. Forget, about selection, bias for the moment it, is let's say a representative, sample, from, the population right, and, so. Now we're gonna learn a model, because now we have this representative, sample, of the population, you put it into your machine learning algorithm outcomes a model and then you use it for prediction and yeah maybe you get more data it's gonna get more accurate right. This. Actually presumes, that. You. Have what we with. The jargon term would be labeled, data right, label data means this thing that you want to predict which we call the argot variable right is you. Have values, for it in this representative, sample right you know this is only as you. Know there's. Only a subset, of the, of the of the possible. Applications. That we that we see that are gonna have that right, and then sometimes, well. What we really want to know for instance is, whether. Or not some, intervention. Is going to cause a certain, effect right we actually have no label. Data. Anytime. On whether, or not an intervention actually caused an effect we haven't in whether you got an intervention, we have the effect but in order to say whether it caused the effect you would have had to have seen the other side of the counterfactual what if you hadn't been given the intervention, and timer rolled forward, in the exact, same in, the exact same place so we do all manner of things. In order to be able to build models to estimate that, but we don't actually have a label data what's even worse is. The. Cases, that we're talking about here are limited, to cases that we would call self. Revealing, problems, right, those are problems for which this, label, the value of the thing that you're gonna predict on at least some subset, of available at least over, time is going to reveal itself right, and so, some problems that happens in some dozen, so the an, easy example would, be. Credit. Card fraud detection. Credit. Card fraud detection machine, learning has been used for decades very. Effectively, much more effective than humans, at doing this not only in terms of the fact that it can operate. In, milliseconds. But that it is much more accurate. Credit. Card if you think about it after you just wait a little while and somebody's. Gonna call up and say I'm not gonna pay this bill if you didn't catch the fraud and so the fraud pretty, much reveals, itself, it just doesn't reveal itself necessarily on the time scale so, you didn't lose a whole service so the bank didn't lose a bunch of money okay. There's. Also the. Detection, of, fraud. By individuals. Within, your firm. Machine. Learning is used to detect fraud by individuals, within your firm right, this. Is not a self revealing, problem right the whole point is to get away with it right, it, reveals itself and, you know you get fired to go to jail, you get fine something, happens right, you know and so when, we're even talking about these things there might be very, interesting. Questions, about discrimination. And fairness. And so on if we're actually applying, machine learning models to our employees, in order to basically.
Judge. Them as, conducting. Malfeasance all right yeah, you don't have any label, data you, get the labeled data by intervening, in one way or the other and, investigating. People and labeling, things and taking a look at them right and this is all. Focused. Either by humans, or by ever and so you're just sampling a little over here sampling, and so it's this code of extreme, case of this selection bias so is there a lot of discussion. On the sort of one on the latter side where the it's not just a selection, bias because of one of the standard, things it's like there is no label data every. Data point is is the result. Of some selection, procedure yeah, this is a great question I, think. I'll start by saying but this is to. Some degree studied. In the in the kind, of area of fairness machine learning to the extent that people, have looked to optimization. Method it's like explore exploit and and even some stuff in reinforcement, learning where the idea is to like not. Accept. That, your. Way, of collecting new, examples, based on your current, models, estimates, of like where to look is in, fact reliable, we should experiment. In the sense that we should actually for, instance. Send. Police to areas, which we think are going to be low crime in order to make sure that they are in fact low crime or. To. Kind of think about credit although I have no idea if they actually do this to. Kind of occasionally, give credit to people that your model currently thinks, are likely to default to, actually observe if they're going to do a good job yeah, oh, a. Little. Bank called Signet, Bank famously, did that in the end, of the 80s early 90s. And. Became. The largest consumer. Credit. In. The, in the world became Capital One because, basically because, it did that interesting, right and so it was able to actually then essentially, learn models, that were able to skim skim the cream off the top of everybody else's business because. It actually had more new, to view into, the, the likelihood. Of default, of. People interesting, but let me give one other example which is more to do with, you. Know wanting to actually focus, on interventions, when we're in fact using observational, data so. The. Predictive policing debate, has often been frustrating, to me because. Even. People, who are running critically, about it don't. Really think very carefully about what, the target. Variable is and. By that I mean like what is the purpose or the goal of, predicting. Crime. You. Might think that maybe the point is to increase. The likelihood that whenever police, that are deployed, somewhere that, they will observe, crime, which, is sort of like about increasing. The efficiency of, policing, but, that's very different than saying that like the goal of policing, is to reduce crime right. So we, can actually deploy police when we know they're going to catch a criminal but, that actually just might increase, the observed crime rate right we're not actually reducing, crime we're just catching more people committing crime, you. Would think a more sensible way of saying that the problem is to actually like figure out what kind of interventions, would in fact reduce crime and what's. Really puzzling here is that like there are some data that we're collecting where. The police have observed the crime sorry. This is like a training data right so. Really those examples are like the worst examples, because they're examples of where the police were present and despite. Their prep their presence someone, committed a crime right, so, like actually sending police to those scenarios actually, seems exactly the wrong thing to do like what you really want to figure out it's like if I deployed police here would it prevent a crime from happening now that's a much more difficult problem and there, are lots of ways to maybe think about how do you design a study to figure that out but. It's kind of puzzling to me that we don't even ask this basic question. One.
Direction's Can go which is not the direction i'm trying to encourage us to go it's like live in a police state where we observe everything and therefore everyone's deterred from crime but. It's still worth asking right like what happens, here to, what degree is police, actually solving like a deterrent, effect to, what to clear police just being used in a way that we're going to catch people as much as possible these are very different goals have, very different implications, for the way people will live in that society so, I think this is an example where like if we're not thinking about interventions, were only focused on predictions using observational, data we. Can end up doing things that are both very foolish and I think very bad and. This seems to be an instance. Of another. Thing that we've talked, about for a decade now which, is. Makes. My students, some of whom are here, you. Know I, hope. At, the end of class that, you at, the end of the course you you you know you you. Agree with me on this I've, shown you this in some way or another is when, it comes to success, with, implementing, machine. Learning, systems. I've. Been
2019-04-27 12:46