SERC Symposium 2023: Sendhil Mullainathan
um so what I'm going to do today is I'm just going to talk about a couple of papers connected by a common theme as Dan said these papers are about racial bias but I hope you can see that ultimately the point I'm trying to make is not about racial bias um I've known Dan for such a long time that he doesn't want me to put the number down on how long I've known him and the first paper that I want to talk about is almost that old this is a paper that I did actually when I was a junior faculty member at MIT and what it was actually quite fun what we did was we sent out a bunch of resumes for help wanted ads saying hey I'd like this job in effect these resumes were made up they were made up to be realistic but they were made up and there were thousands of them in their variety the reason we did that was because well tenure is low probability event and I just want to cover my no no that's not why we did that we did that because before we sent out the resumes we made one change half of the resumes were sent out with names like Brendan Miller and the other half of the resumes were sent out with names like Jamal Jones so that is these are statistically identical resumes the only difference is half of them had a name that's associated with being white and half of them had a name that was associated with being black what we found was that having a white sounding name led to 50 percent more callbacks that is just an enormous number if you translate that into unemployment rates that would mean actually what we actually see in the United States African Americans have about a one-third higher unemployment rate than whites which at least just from this alone would explain that people have now done this there was an awesome study from a team at Berkeley that did this at some massive scale and you tend to see these type of effects pretty pretty large uh continuing so that's that's study one okay it basically showed large amounts of racial discrimination now Flash Forward this is a study from very recently like three years ago yes four years ago and this is about in health care and to understand this second study you just need to know something about the institution of Health Care so most of you go to the doctor you show up you wait in the emergency room for a long time it's a pain in the ass um that's a technical term but there's something called care coordination programs care coordination programs exist because for some people they have such extensive Health Care needs that they go to the hospital and go to the health system a lot there's a metaphor here which is that because these patients are the epicenter of of healthcare you really need to treat them differently Healthcare people really dislike what I'm about to say but it really works it's like a frequent flyer program for the health system so if you're in a care coordination program just like in a frequent flyer program you don't have to wait in the usual line there's a separate line there's a separate number you call before you even go in and they say don't bother coming in or definitely come in it's a bunch of rapper services that help people with lots of chronic conditions give them extra resources okay so these programs are practiced everywhere pretty pretty much every health system in the world in the United States has one now here's the interesting thing most people don't realize to Target who gets into these programs it's largely done by an algorithm they're sort of different algorithms built by different people but they're all in spirit the same thing and what's funny about this algorithm is that it's being applied to about 100 million patients so what this program does is it targets people patients who are in extra need of these resources and it says let's uh let's give it to them so what we did was we took one of these algorithms and we decided to measure the racial inequity in that algorithm so you can see we did it with people and resumes at first but now Flash Forward 20 years later it's like the same song again this time with an algorithm and what we did was we had access to this large private sector algorithm that has consequences for who's actually going to get in the program and what we wanted to know was what kind of white and black patients are chosen by the algorithm for the program one easy way to see it is the program the algorithm gives you a score for this patient here's their need or riskiness to get into the program if you're in the top three percent you automatically get enrolled a little bit lower than that someone looks at you a little bit more carefully and makes some choices so all I'm going to show you is I'm going to show you given the score on the x-axis how sick were the patients and this is what we found so the good good news the algorithm Works higher score sicker bad news at every level of the score white patients are less sick than black patients so another way of seeing it is if we choose the auto enrollment threshold which is 97 percent you find that blacks have 28 percent more chronic illnesses at that auto enrollment threshold and this is true of many other health outcomes I just picked this one in fact if you want to see how big the effect is say let's say I zoomed into that part and I said I want to equalize this I want to take the marginal black person who's sick and I want to swap them out to the marginal white person who's less sick and I'm going to keep swapping until I get equality at the margin if I did that so I'm going to add and remove the top is the black patient's ad remove the white patients until I get to the same health the fraction of blacks enrolled in this program goes from 17 to 36 percent another huge disparity we would double the number of blacks Auto enrolled in this program the previous one was about a form of livelihood this is about life and death all right so the first study was an audit study the second study was a sort of Health algorithm study I'm telling you these two studies because if you think about it in some sense they're trying to accomplish the same thing they're trying to assess a decision-making system and ask how much racial inequity is in the decision-making system but there's obviously one key difference the decision-making system on the left at the time it was done was an entirely human decision-making system the decision-making system on the right was an entirely algorithmic system at least the part that we studied so we are assessing two decision-making systems one by people one by algorithms both of which are showing the Striking racial bias and can we understand something by comparing at least my experience in trying to do these two activities and how does that help us understand how we should think about algorithms and how we should think about people so I want to compare and contrast these two experiences and I hope you'll get the sense that I'm not I'm not doing this in a very narrow sense I'm just hoping these two serve as a metaphor for illustrating something that I'm noticing across many different contexts okay so let's start with the first one so let's talk about the first thing that you must do in any context is you got to uncover the problem in this case you have to uncover that there's bias and in other cases it'll be other stuff let's just understand what that looked like in both parts of the study so for the audit study I have to tell you this is the most the closest I've ever come to feeling like I'm running some sort of clandestine spy operation like I had an army of Ras we set up fake phone numbers fake resumes the RAS from that period from 20 years ago still email me and say oh that was so fun but it was a lot of work because you have to make stuff up and you gotta like run this whole operation so it was to say hard and not just hard at some level unsatisfying because you'll notice I I could only talk about callbacks many of you may say well what about at the interview stage oh okay that's really hard because my fake resumes don't like to be interviewed so I don't know what do I do then of course people have audit studies there too but now it gets a lot more complicated you got to hire people and actors and actresses and make sure that it gets harder and people have done it but it gets hard the health algorithm was a combination of very hard and very straightforward the very hard part was getting access to the algorithm that's very without that you really can't do much but once you have access to that it was a straightforward statistical exercise where I put straightforward in the sense that we now have tools to do this stuff it would be even easier if someone had actually not just given us the prediction algorithm but the actual training algorithm and the data but having said that on the right hand side it is noteworthy there is a bit more straightforwardness on the right again conditional on someone actually giving us what we need to do okay now let's talk about having uncovered the problem having uncovered that there is a problem understanding why there's a problem you'll notice that on the left 20 years after this I don't think we still have very definitive answers as to the sources of bias since that paper there was a lot of discussion in the employment literature on is there implicit bias is it you can imagine but it's ongoing because human processes are messy if you're very familiar and you live in the world of algorithms it's normal to say oh these very complex algorithms are inscrutable but the human mind is extremely inscrutable people will sometimes say to me well you can always ask a person so you're telling me I should ask the person screening the resume do you discriminate I'll do it but I can tell you what you find all right what about the health algorithm where is the algorithm going along well just a little bit more detective work but it's detective work in the Box so what we did was we said let's actually look and see where the algorithm is going right it turns out that there I can show you the same graph x-axis score y-axis something and I find no racial bias this is that what's on the y-axis money cost expenditures actually turns out the error is an almost comical error that is tragic see in healthcare when we talk about the health of people it's all Health as it's measured but how do we measure health well there are two ways of measuring health one way of measuring health is what I showed you in the previous graph you actually go and get physiological measures of Health that's not the most common way of measuring Health you'd be surprised to learn the most common way of measuring health is what it's care delivered but care delivered is not a measure of Health per se it's a proxy for health that is about health plus access to care and when I put on the y-axis help care delivered there is no bias this appears to be what the algorithm is being trained on and it's very well calibrated and what you find is that in fact what am I personally that if you look at it's just generally known whites have access to Better Health Care so at every level of actual physiological Health they're getting more care but more care makes them look quote sicker and so that's blacks utilize less health care costs less so they will look less sick hence the problem accurate cost prediction equals biased Health prediction and it turns out once you have that you at least have given again I'm going to emphasize given access to the algorithm you can actually do that now let's talk about fixing the bias well these are a bunch of audit studies of the type that I've described I haven't added the latest one this great one from this Berkeley team from 20 20 I guess 2021 and on the y-axis is the is the gap and you'll see almost nothing is changing the Gap is the same today when you do this audit study at scale as what we found literally the same which is depressing as all hell because we've known about this people teach these papers HR people teach these papers as part of training but not much has changed that is really depressing that you can take the same resume study and run it today and find the exact same effect in contrast when you look at the racial bias thing it turns out once now that we know the problem we ourselves developed a method and it's just a simple thing which is by knowing you've got this other objective it turns out you can combine Health with cost measures and do pretty well and get basically the same kind of effect with almost no racial Gap and now there's a bunch of Health Systems and algorithm manufacturers that want to adopt and fix their algorithms I'm not saying that that's a Panacea but for example now Regulators want to get in and make sure that people are actually implementing the fixes and again that's not necessarily going to solve the problem but it does mean that there's now a Machinery that can kick in where I'm more optimistic that in 20 years we won't be saying isn't it silly these care coordination algorithms are racially biased foreign so it's a matter it's not a technocratic matter it's a matter of political and policy will whereas on the left hand side we still don't quite know what to do even if we had the political and policy will okay so at each stage uncovering the source of bias understanding the source of bias and fixing the bias I just want to point out that it's easier to uncover the bias and algorithms than it is in humans as I'll say it again with proper regulations it's easier to understand the source of bias and it's easier to fix the biases and so and I hope you see the lessons are more General than bias I deliberately pick something optimistic not because I don't think algorithms can't do a lot of harm they can do an immense amount of amount of harm think of how insane it is that a hundred million people roughly had their health being dictated by this poorly built algorithm that's absurd but I hope you get the sense that when we talk about algorithms we should talk about people when we talk about X is hard with algorithms algorithms are inscrutable we should talk about are people inscrutable that is we need to understand that there are immense problems with these fragile algorithms but we live in a decision-making system where there are decisions being made by fallible people and if we're going to try to understand how to integrate these two we have to both understand the fallibility of people and the fragility of our algorithms and figure out how to combine them so we get the best of both and I think that I'll end on one small note which is look this way I'm in no way trying to say let's automate stuff let's put that's not that's not what I'm trying to say I just think we need a richer model of people and a richer model of algorithms and when you realize that you realize the design space the ways in which we can use algorithms opens up dramatically I'll give you one tiny paper on that just to give you a sense of it so this is a paper on knee pain so uh osteoarthritis is actually the most common joint disorder in the U.S quite a few people who are older have it it's basically degeneration of the cartilage in the knee now the Striking fact that most people don't realize is if you take knee pain it's much more common amongst the disadvantage in the United States much more common amongst blacks much more common amongst poor people now we could ask why is this happening and it's kind of if you're interested in poverty and inequity it's one of the more important things that you should care about because it means above everything else being poor or black in America means living with more pain on a day-to-day basis so people have thought about this a lot and I would say there are two explanations that have come up the first is well of course people who are more disadvantaged do jobs that are more demanding and their knees have more problems there's a problem inside their needs that's different the second perspective is that there's something outside their needs less access to pain medication less access to social support how do we test these hypotheses very easy we get a data set where we know how severe the knee problem is and we see whether the pain differences persist and in fact when you do that x-axis severity y-axis this is not my fault the official pain measurement called the Coos pain score has the property that the lower the number the more the pain that's not on me so I am not going to be blamed for this but what that means is as a disease gets worse there's more pain but there's always a gap at every level so there's something inside their heads some social structure something outside their knees that's been the interpretation of this literature but is that really true I did that that's the measure of severity well people decided that's the measure of severity well who is it osteoarthritis in a doctor's looking at it but what but that's the human eye looking at this thing it's the medical literature which if you go back and look at the original crew score or the original klg score for severity that was a study done all on white people in England well I can't say that it's done on all and white people they don't even report the racial distribution because it was just taken as given that that's the racial distribution why would anyone want to know so let's ask the question can we actually use algorithms differently here to do that let's do a second look let's train a com net to look at the X-ray but this time it's going to take the input is the image of the knees and it's going to predict what the person says not what the doctor says it's actually beautifully evocative it's an algorithm listening to the patient okay so the key is the algorithm only sees the X-ray does not see the race of the person doesn't know anything and now we get a measure of severity as measured by pain correlated with some physical thing in the X-ray when we do that and put that on the x-axis so the left hand is the original severity score when we use this new measure the Gap completely disappears what that means is when people are saying there's something wrong with my knee and we said no no it's something else no there was something wrong with their knee it's just medicine didn't notice that it's What patients have been trying to tell us all along and this I hope you see it's because bias was built into the structure of medical knowledge I'm not going to propose using this algorithm as a as an aid or necessarily but it says a diagnostic to help us understand holy cow this whole area of medicine has been not listening to disadvantaged people for quite some time and we've not even tried to create drugs to solve this problem that gives you a sense I hope of the bigger design space here there's a lot to love and a lot to lament about algorithms but there's a lot to love and a lot to lament about people and I think poorly designed algorithm can can magnify the worst of our Tendencies if we simply automated what the Radiologists would say about knees we would have automated that bias but I think well-designed algorithms could end up covering for our mistakes if we say don't look listen to what I say about the X-ray look at what is in the X-ray maybe we learned something about ourselves and that's what I hope is a hopeful ending to thinking there is something good that we can do by thinking of well-built algorithms that cover for our mistakes all right thank you thanks uh maybe one quick question if there is one sure I'll ask a question um I just wondered whether there was a algorithmic solution to your human problem of the the resume uh bias in particular I thought people had tried algorithmically redacting the name and gender pronouns from a resume you know is is that something that we could you know more companies could do and so forth and would that have a an effect I I I think it it's a great question I think it goes back to thinking of the bigger design space I think that that's a that that kind of thing is a place where the algorithm can if we knew about the nature of the bias help us change our decision architecture to deal with the bias I think redacting the name alone might not be the specific thing I chose for example later in the process there will be an interview but it's definitely the case that we could think about re-architecting the human screening process along with some aids to kind of read undo the bias which I think is exactly expanding the design space all right
2023-05-23 01:11