Debiasing artificial intelligence: Stanford researchers call for efforts to ensure that AI technologies do not exacerbate health care disparities
Medical devices employing AI stand to benefit everyone in society, but if left unchecked, the technologies could unintentionally perpetuate sex, gender and race biases.
Clinicians and surgeons are increasingly using medical devices based on artificial intelligence. These AI devices, which rely on data-driven algorithms to inform health care decisions, presently aid in diagnosing cancers, heart conditions and diseases of the eye, with many more applications on the way.
Given this surge in AI, two Stanford University faculty members are calling for efforts to ensure that this technology does not exacerbate existing heath care disparities.
In a new perspective paper, Stanford faculty discuss sex, gender and race bias in medicine and how these biases could be perpetuated by AI devices. The authors suggest several short- and long-term approaches to prevent AI-related bias, such as changing policies at medical funding agencies and scientific publications to ensure the data collected for studies are diverse, and incorporating more social, cultural and ethical awareness into university curricula.
“The white body and the male body have long been the norm in medicine guiding drug discovery, treatment and standards of care, so it’s important that we do not let AI devices fall into that historical pattern,” said Londa Schiebinger, the John L. Hinds Professor in the History of Science in the School of Humanities and Sciences and senior author of the paper published May 4 in the journal EBioMedicine.
“As we’re developing AI technologies for health care, we want to make sure these technologies have broad benefits for diverse demographics and populations,” said James Zou, assistant professor of biomedical data science and, by courtesy, of computer science and of electrical engineering at Stanford and co-author of the study.
The matter of bias will only become more important as personalized, precision medicine grows in the coming years, said the researchers. Personalized medicine, which is tailored to each patient based on factors such as their demographics and genetics, is vulnerable to inequity if AI medical devices cannot adequately account for individuals’ differences.
“We’re hoping to engage the AI biomedical community in preventing bias and creating equity in the initial design of research, rather than having to fix things after the fact,” said Schiebinger.
Constructive – if constructed appropriately
In the medical field, AI encompasses a suite of technologies that can help diagnose patients’ ailments, improve health care delivery and enhance basic research. The technologies involve algorithms, or instructions, run by software. These algorithms can act like an extra set of eyes perusing lab tests and radiological images; for instance, by parsing CT scans for particular shapes and color densities that could indicate disease or injury.
Problems of bias can emerge, however, at various stages of these devices’ development and deployment, Zou explained. One major factor is that the data for forming models used by algorithms as baselines can come from nonrepresentative patient datasets.
By failing to properly take race, sex and socioeconomic status into account, these models can be poor predictors for certain groups. To make matters worse, clinicians might lack any awareness of AI medical devices potentially producing skewed results.
As an illustrative example of potential bias, Schiebinger and Zou discuss pulse oximeters in their study. First patented around 50 years ago, pulse oximeters can quickly and noninvasively report oxygen levels in a patient’s blood. The devices have proven critically important in treating COVID-19, where patients with low oxygen levels should immediately receive supplemental oxygen to prevent organ damage and failure.
Pulse oximeters work by shining a light through a patient’s skin to register light absorption by oxygenated and deoxygenated red blood cells. Melanin, the primary pigment that gives skin its color, also absorbs light, however, potentially scrambling readings in people with highly pigmented skin. It’s no surprise, then, that studies have shown today’s industry-standard oximeters are three times more likely to incorrectly report blood gas levels in Black patients compared to white patients. Oximeters additionally have a sex bias, tending to misstate levels in women more often than men. These oximeter biases mean that dark-skinned individuals, especially females, are at risk of not receiving emergency supplemental oxygen.
“The pulse oximeter is an instructive example of how developing a medical technology without varied demographic data collection can lead to biased measurements and thus poorer patient outcomes,” said Zou.
This issue extends to the evaluation of devices after approval for clinical use. In another recent study, published in Nature Medicine and cited in the EBioMedicine paper, Zou and colleagues at Stanford reviewed the 130 medical AI devices approved at the time by the U.S. Food and Drug Administration. The researchers found that 126 out of the 130 devices were evaluated using only previously collected data, meaning that no one gauged how well the AI algorithms work on patients in combination with active human clinician input. Moreover, less than 13 percent of the publicly available summaries of approved device performances reported sex, gender or race/ethnicity.
Zou said these problems of needing more diverse data collection and monitoring of AI technologies in medical contexts “are among the lowest hanging fruit in addressing bias.”
Addressing bias at the macro level
Over the longer term, the study explores how structural changes to the broader biomedical infrastructure can help overcome the challenges posed by AI inequities.
A starting point is funding agencies, such as the National Institutes of Health. Some progress has been made in recent years, Schiebinger said, pointing to how in 2016, the NIH started requiring funding applicants to include sex as a biological variable in their research, if relevant. Schiebinger anticipates the NIH instituting a similar policy for gender, as well as race and ethnicity. Her group at Stanford, meanwhile, is developing gender as a sociocultural variable during clinical trials, as reported in a February study in Biology of Sex Differences.
“We want to start with policy up front in funding agencies to set the direction of research,” said Schiebinger. “These agencies have a great role to play because they are distributing taxpayer money, which means that the funded research must benefit all people across the whole of society.”
Another opportunity area centers on biomedical publications, including journals and conference reports. The Stanford study authors suggest that publications set policies to require sex and gender analyses where appropriate, along with ethical considerations and societal consequences.
For medical schools, the authors suggest enhancing curricula to increase awareness of how AI might reinforce social inequities. Stanford and other universities are already making strides toward this goal by embedding of ethical reasoning into computer science courses.
Another example of using an interdisciplinary approach to reduce bias is the ongoing collaboration between Schiebinger, who has taught at Stanford for 17 years and is a leading international authority on gender and science, and Zou, an expert in computer science and biomedical AI.
“Bringing together a humanist and a technologist is something Stanford is good at and should do more of,” said Schiebinger. “We’re proud to be in the forefront of the efforts to debias AI in medicine, all the more important considering the many other facets of human life that AI will eventually impact.”
James Zou’s work was supported by grants from the National Science Foundation and the National Institutes of Health.
To read all stories about Stanford science, subscribe to the biweekly Stanford Science Digest.