hey thank you so much for the invitation to be here um so as an anthropologist I'm presenting here together with the co-author of this paper recent MLK fellow and Tufts Professor Valencia Joyner cumsen in electrical and computer science and we're glad for this chance to be in conversation with you all so you may have heard in recent headlines and Swati just uh highlighted this example that's become sort of iconic medical devices that don't work nearly as well for people of color due to unequal design process so obviously this is very worrying there was a recent FDA hearing last year that resulted in new safety advisories now the device ISO is being Rewritten but what you might not know is that these recent headlines actually were catalyzed with the collaborative effort to re-examine these devices here at Circ so today we wanted to start by sharing some of the contexts for that story just to cover some of the Alternatives now being imagined um to create medical devices that could work more equitably and this problem has not yet been fixed in hospitals and finally professor kumston and I wanted to bring our respective backgrounds together across computer science and social science to offer some questions we'd love to see more people attending to and as we heard in the last presentation this is already happening in a really exciting way so my own involvement with this issue began when my family got coveted early in the pandemic I bought this pulse oximeter money in medicine called a pulse ox and just at our local pharmacy in the spring of 2020. at the time numbers from these devices were being called a biomarker for covid many hospitals were issuing written guidelines to come in at you know precise number such as 92 or 91. so I was spending many late nights staring at this thing kind of flickering on my husband's finger go to the ER no stay home and started wondering not only as a concerned family member but also as a professor who studies the societal side of medical device design here at MIT you know what is this thing how is it actually working and I could see a red light glowing inside of it so I started wondering how this device had managed to avoid the issues of racial biases that other light sensing devices you know we always teach about in our classes here at MIT were already known for so much of this was happening you know happening in the uh at the end of May 2020 that was also a larger reminder of the unequal Stakes of of these questions so I started by just Googling it um and the studies about racial biases in this device popped up immediately there were articles out of UCSF from uh 2005 and seven and the authors had explained that in their lab they realized they were testing a color sensing device on a mostly white population and noted reports of Eris historically and they showed that hospitals the device devices in hospitals didn't meet FDA safety thresholds for people of color at that time So reading this I was thinking as a medical Anthropologist you know we build social assumptions and theories into our Material World all the time who's an imagine user versus an actual user there's a huge SCS literature on this and in particular attributions of problem troubled breathing have a very long history of racial inequity and I also work on how seemingly tiny device errors matter and can have very high costs of people in my earlier work on Equitable design for diabetes and of course areas like this means so much more than they otherwise might because of everything we know about how implicit biases and inadvertent medical racism often Works in hospitals black indigenous and other people of color are already much more likely to have their distress go under treated in so many areas of Health Care so looking at all this in June of 2020 was the point when Cirque created a chance for me to bring these questions to engineers so a few of us got together we were put in touch by the chair of the MIT social and ethical responsibilities and Computing group at the time include including one of the then heads Professor David Kaiser who in addition to being an STS scholar is also you know a specialist in the history of physics which is obviously very important for understanding the mechanics of a light sensing device and we were able to have a conversation together with people who really understood the technical side of medical device design in particular Professor Brian Anthony who has a lot of experience making optical devices and we looked at the data together I shared what was then a draft of what later became an article and they confirmed yes this has historically been a real Equity issue and it's solvable but it hasn't been solved so to get into the more technical side of what they were able to help deepen my understanding of which helped me explain the issue to the public in my writing later the pulse ox reads out the color of your blood to sex assess oxygen saturation looking at the shade of iron containing hemoglobin so you can see in these kind of cartoons uh blood's a bit like a cooler blue color when desaturated a bit warmer red when saturated but what images like this don't show is all this is also passing through the color of our skin and tissues which vary differently absorb light from LEDs among other things depending in part on melanin and other chromophores in the skin which is why diversity and safety testing groups in calibration matters so much and I'll always appreciate what a different Circ made and how the story unfolded from there um because when we went to publish about this it didn't take long to learn that asking the questions we were discussing really went against medical consensus at the time actually so for example New York Times offered me a contract and then ended up pulling the story the day before publication because some of the leading doctors in the world blocked it they told editors these errors couldn't possibly matter or they'd already know about it and the problem was actually in medical textbooks at the time but Dr Gatekeepers said sort of we never teach that part of the textbook um but those conversations at Cirque gave me the courage to kind of stick with trying to ask questions outside the box because I'd had opportunities to talk with Engineers making devices directly which many doctors don't necessarily have that chance um and my anthropology and SCS training to draw on how many times the medical consensus of a particular era when we look back on it seems wildly wrong and it was a little scary to kind of go out on that limit first and kind of thinking this issue really should just be raised for public debate and examination and whether our cert group is right or wrong on this one the evidence is there in every hospital in the world you know the stakes for people are so high so let's take the leap and kind of push here maybe a doctor will find this and look at the data in their Hospital more closely and and that was what happened so a few months later the first doctors who looked at a huge data set across various hospitals in the Michigan area found that oximetry errors were three times more common for black patients and affected about 12 percent of their black patients severely enough to change the way their care was clinically managed and they reached out to let us know that when they were first seeing errors like this so you can see here this is one of their team's images of a pulse ox reading at 97 with an ABG the blood tests of uh only 83. so you know a 14 point
discrepancy that's huge and they thought at first it was a covid thing but they couldn't find you know figure out some way that it made sense then they found our essay and wrote to thank us for the work it's kind of putting anthropology and Engineering in dialogue because it helped them learn about studies in their own fields of specialty they'd never learned in medical school and sort of asked new questions of the data so here's a quick visual of that blood test called an ABG and arterial blood glass test that this data is using to check the optical sensors accuracy and a few more slides just summarizing the the data from this Michigan team but to think where here we are in the age of personalized medicine um you know this is one of the most important devices kind of positioned to gatekeep what sorts of interventions including oxygen admission to an ICU um that people will receive and the errors were serious enough to put 16 of black patients in the category for no clinical treatment when they in fact did need treatment so um yeah to just think of a device that only has 84 accuracy for black patients it you know just isn't acceptable um it also continues so these errors also exceed Federal safety guidelines still today uh set by the FDA they're measured in something called arms but so three arms is the the most that these devices are sort of uh that that's the federal threshold and it's not meant for people of color so this started a big research movement among physicians in the past few years using Hospital data to try to protect their patients and understand the ways Technologies might be working unequally in the the clinics where they're using these tools so later retrospective clinical studies found these sensor errors were also associated with delayed oxygen treatment increased risk of organ failure and mortality so the gravity of this news has spurred some important and overdue research to find new approaches to creating sensors that work equally for everyone you'll see the co-author of this paper Professor cumson pictured at the the top right of the stat news piece where you can read more about her really exciting work on these issues but alongside remaking the optical devices themselves she raises important questions about critical studies of clinical algorithms so some of the world's very best researchers on this issue and we would certainly count professors I had obermeyer among them given his brown break groundbreaking work such as the science article noted here but even some of the best Minds working on this seem to still be treating sensor data as a direct line to Patient Health unmediated by structural racism so we're hoping this presentation and conversations like at the one swati's already begun for us are a way to work on raising Collective awareness that if design and calibration happen in racially biased ways due to systematic inequalities these devices aren't outside social issues at all they're very much reflecting them and really anyone working in the healthcare and Computing space should have questions like these on their radar how are clinical algorithms and AI models trained on such distorted data sets incorporating them today have any machine models managed to avoid reproducing racial biases after being fed such disproportionately incorrect data if so what can we learn from how they manage to identify and address patterned designers and if not what are the implications for outcomes and Care ahead you know what next steps are needed and part of this stock taking stage here is just to try to list out some areas of work that you know um that where these issue questions might be relevant so what algorithms use this data pulse ox readings are increasingly substituted for blood tests and what are the implications for that there's sort of a Russian doll effect when pulse ox is already contain many algorithms and how they produce the their numbers then they're fed to other algorithms so here you can see the eight categories that different oxygen levels are sort of how patients are classified and as part of the Rothman index that part's public facing other parts of those algorithms are proprietary and those are fed to systems like epic which are full of other algorithms and so there was an earlier argument to sort of separate data and algorithms but I think in cases like this that becomes um sort of a more complicated picture um that most data already contains algorithms embedded in some other way in a healthcare setting and then to think about some of the patterns that Physicians have brought to our attention what will machines make of the fact that certain subsets of patient populations are deteriorating more rapidly you know in um on a population level through processes like this what other conditions so beyond oxygen Administration alone pulse ox is used for as a diagnostic so Professor kumson is working on this question of congenital heart defects pulse ox is often positioned as the only screening for a newborn baby and so it's especially concerning that these errors have now been found also among children and newborns um these device errors what other devices so um there are certain infrared thermometers that also use Optical sensing that can work unequally um and many new devices about to be released like ventilators meant to replace physician in the loop models with closed loop models you know many of these devices optimize their setting using AI but they're Tethered to the direct input of pulse oximeters so the three signals mentioned here spo2 which is the pulse ox the pleth variability index and the perfusion index actually all three of those are measured by the pulse ox and all three of those are subject to these disproportionate errors so it I think it really might take black box audits in certain cases to know with a discrepancy of four or even 14 Which is less common but very worrying points like we saw earlier what does that do to the settings of a ventilator you know what do we need to know to make sure these new models of ventilators and other similar devices using AI work just as well for people of color once they're released into clinical use so um since many coders are used to thinking of bodily measures like vital signs as raw data it's uh you know this is a great space for Collective conversation just trying to understand these are actually process signals subject to discriminatory design too and um trying to get the word out to those who work in Computing and Healthcare is also a question of how do we communicate this realization of historically flawed sensor data to machine learning systems that were trained on it so some who work on designing medical devices have recently been turning to the sort of image version of chat GPT um is a creativity tool and these are some from maker health but if you ask for images of a pulse ox that solves for racial bias these are the images it generates which are sort of an ordinary pulse oximeter covered in African flags and sort of diverse marketing imagery and um you know I think that that's a sort of telling reflection of where this social problem currently stands and why it's so important not to leave this to sort of a policy version of autocomplete you know we really need sort of socio-technical creative systems building and conversations across different disciplines to think about how to um actually go beyond superficial fixes and marketing Optics to design Optical sensors that work equitably for everyone and how can we trace the Ripple effects of such data distortions across various sectors of computing and Healthcare it might impact to interrupt anyways these data errors might produce or amplify racial biases and clinical algorithms and devices and to learn more effective practices of how various models differently interpreted this flawed data so we'll wrap up here with a quote from one of our MIT colleagues who's since taken up this issue in his own work about why AI researchers um you need to pay attention to issues like this just a reminder of the stakes here so some hospitals are now starting to publish estimates of how many thousands of patients they believe they inappropriate released or under treated during the pandemic due to pulse oximeter errors and if we think about that around the world over almost half a century now since pulse access came to Market in the 1970s the human costs be behind these numbers are deeply upsetting yet numbers like these are being given more and more weight in our age of computing and Healthcare so it really matters to address this robustly immediately and together and there's been kind of due to the lack of clarity for the past over four decades I would argue that that's really stifled Innovation around this issue that would have been possible so in a sense there's a lot of lost time to make up for here try not to embed these historical device errors into clinical futures so we'll stop there for today I'm very grateful for the research support of MIT anthropology shast and Circ and Professor Joyner cumson would like to acknowledge the support from the NIH artificial intelligence machine learning Consortium to Advance Health Equity and diversity so thank you for listening we'll look forward to our conversation together [Applause] uh thank you very much for your work it's has a very direct effect on my family and it keeps me up at night I know that one of the things you found in your research is that the early versions of pulse oximeters developed actually did solve this problem that they were specifically designed to avoid racial bias as well as gender bias and age and disability bias and that we have quite literally moved away with that in our technological development are there lessons we can draw from that process that can help us today with the next steps yeah thank you that's such a great question so the model that was being referenced there was created by Hewlett-Packard in their work in collaboration with NASA in the 1960s and 70s it wasn't a pulse ox it was an oximeter that worked through a different mechanism that didn't involve pulse and the device used eight wavelengths of light instead of just two which was one part of why it worked better across skin tones but it could also be calibrated for each particular patient using a light reading of a drop of blood if you were worried the default settings weren't working for a particular person rather than needing an ABG could do you know just a kind of glucose test size drop of blood um to me so recently someone reached out who used to work at the Hewlett-Packard um Center that created that model right nearby in Waltham to let me know that Hewlett-Packard didn't just test their model to make sure it worked equitably um for for everyone but they they Incorporated uh 248 black volunteers into their recruitment process for safety testing which is 246 more than the FDA currently requires for models in hospitals um but this person who used to work with Hewlett Packard and this is what I love about anthropology and oral histories and kind of why bringing interviews and stories back into knowledge about these lost Technologies can be important he told me the these devices weren't just tested on more diverse groups it was actually designed by black engineer and to me that really speaks to the the question of what lessons can we take away from this is you know making sure to prioritize people who are as they say in STS at the table and not just on the table the question of who's in test groups but who's designing these devices they're Inseparable um and so I'm really excited to see um the the responses that have been generated by black Engineers including Professor kumsen to to address these issues because they're able to draw from the lived experience as well as their background in computer science and medical device design and so I I hope to see more of that and for me that's a key takeaway from from that history thank you for the question very much Amy
2023-05-26