In a recent systematic review, a team of researchers led by Yuexing Hao, a Ph.D. candidate in human centered design, found that while Large Language Models (LLMs) have the potential to help cancer patients and clinicians make decisions about care, the models’ average accuracy remains low and current limitations pose potential risks to patients.

The team reviewed 56 papers, with an overall model accuracy of just 76.2%, with average diagnostic accuracy of 67.4%. They found that most studies focused on small groups of data, limiting generalizability and reproducibility. Many authors did not release their data or only released partial data. And evaluation matrixes often ignored dimensions like safety and harm. 

“We realized that accuracy is pretty low for large language models in summarizing, translating and communicating the clinical information,” said Hao. “The deployment of large language models in cancer decision making looks kind of mature in a lot of research papers, but if we have such a low accuracy, does the deployment in a real-world scenario pose more risk?”

Hao and her co-authors categorized the potential risks into seven areas: bias, lack of real-patient data, harm and safety monitoring, data privacy and ethical oversight, equity and representation, generalizability, and reproducibility.

They advocate for open-source datasets and more comprehensive evaluations that move beyond quantitative analysis and automated methods to include questions of safety, harm and clarity.

“Also, we hope researchers have more clinician oversight to understand how clinicians think of translating this model performance into the real-world deployment,” Hao said.

The paper’s co-authors are Zhiwen Qiu, a Ph.D. student in information science, Corinna Loeckenhoff (psychology) and Saleh Kalantari (human centered design) from Cornell; Jason Holmes and Wei Liu from the Department of Radiation Oncology at the Mayo Clinic; and Marzyeh Ghassemi (electrical engineering and computer science) at the Massachusetts Institute of Technology.

Posted on
08/05/2025
Author
Emily Groff
Tags
Holistic Human Health, Technology + Human Thriving