It has long been known that AI can be biased. It can pick up the subconscious belief system of those who program it, or it can reflect the world view represented in content used to train it.
We’ve even known for a decade that it can be sexist or racist.
In 2015, Amazon discovered that a new AI-based recruiting engine did not like women. The team had built programs to review resumes with the aim of automating the search for top talent. But, because the computer models were trained to vet applicants by observing patterns in resumes submitted over the previous 10 years, it found mostly male applicants. That was inevitable, thanks to female underrepresentation in the tech industry.
But then, the system taught itself that male candidates were preferable, since men appeared most likely to be hired. The team behind the system was disbanded and the world was left with an object lesson in the dangers of automating the hiring process.
A 2019 study of 3.2-million mortgage and 10-million refinance applications from major US home loan providers found evidence of racial discrimination in face-to-face lending as well as algorithmic or AI-based lending.
The study by the National Bureau of Economic Research showed that black and Latino applicants received higher rejection rates of 61% compared with 48% for everyone else, and paid as much as 7.9 basis points more in interest. That translated into an annual “race premium” of more than $756m (R13bn) a year.
We didn’t expect such bias to last into the mid-2020s, when we’re all supposedly hyper aware of the danger.
But now, a new study led by Cedars-Sinai Health Sciences University has found a pattern of racial bias in treatment recommendations generated by leading AI platforms for psychiatric patients.
“The findings highlight the need for oversight to prevent powerful AI applications from perpetuating inequality in health care,” said the institution, which aims to advance ground-breaking research and educate future leaders in medicine, biomedical sciences and allied health sciences.
Investigators studied four large language models (LLMs), AI algorithms trained on enormous amounts of data. In medicine, LLMs are drawing interest for their ability to quickly evaluate and recommend diagnoses and treatments for individual patients, says the university.
“The study found that the LLMs, when presented with hypothetical clinical cases, often proposed different treatments for psychiatric patients when African-American identity was stated or simply implied than for patients for whom race was not indicated.”
This was despite the fact that the diagnoses, by comparison, were relatively consistent.
The findings, published in the peer-reviewed journal NPJ Digital Medicine, were startling.
Most of the LLMs exhibited some form of bias when dealing with African-American patients
— Dr Elias Aboujaoude, director of the programme in internet, health and society in the department of biomedical sciences at Cedars-Sinai
“Most of the LLMs exhibited some form of bias when dealing with African-American patients, at times making dramatically different recommendations for the same psychiatric illness and otherwise identical patient,” said Dr Elias Aboujaoude, director of the programme in internet, health and society in the department of biomedical sciences at Cedars-Sinai, and corresponding author of the study.
“This bias was most evident in cases of schizophrenia and anxiety.”
The study uncovered a range of disparities, including:
- Two LLMs omitted medication recommendations for an attention-deficit/hyperactivity disorder case when race was explicitly stated, but they suggested them when those characteristics were missing from the case.
- Another LLM suggested guardianship for depression cases with explicit racial characteristics.
- One LLM showed increased focus on reducing alcohol use in anxiety cases only for patients explicitly identified as African-American or who had a common African-American name.
Aboujaoude suggested the LLMs showed racial bias because, surprise surprise, they reflected bias found in the extensive content used to train them.
He said future research should focus on strategies to detect and quantify bias in AI platforms and training data, create LLM architecture that resists demographic bias and establish standardised protocols for clinical bias testing.
One of his colleagues at Cedars-Sinai, Dr David Underhill, wrote: “The findings of this important study serve as a call to action for stakeholders across the health-care ecosystem to ensure that LLM technologies enhance health equity rather than reproduce or worsen existing inequities.
“Until that goal is reached, such systems should be deployed with caution and consideration for how even subtle racial characteristics may affect their judgment.”
• Arthur Goldstuck is CEO of World Wide Worx. He was principal analyst for the SA Social Media Landscape study.






Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.
Please read our Comment Policy before commenting.