Conversational artificial intelligence/large language model can accurately diagnose and triage health conditions, without introducing racial and ethnic biases

Digital Health



GPT-4 conversational artificial intelligence (AI) has the ability to diagnose and triage health conditions comparable to that provided by board certified physicians, and its performance does not vary by patient race and ethnicity.



While GPT-4, a conversational artificial intelligence, “learns” from information on the internet, the accuracy of this form of AI for diagnosis and triage, and whether AI’s recommendations include racial and ethnic biases possibly gleaned from that information, have not been investigated even as the technology’s use in health care settings has grown in recent years.



The researchers compared how GPT-4 and three board-certified physicians diagnosed and triaged health conditions using 45 typical clinical vignettes to determine how each provided the most likely diagnosis and decided which of the triage levels – emergency, non-emergency, or self-care—was most appropriate.

The study has some limitations. The clinical vignettes, while based on real-world cases, provided only summary information for diagnosis, which may not reflect clinical practice that typically give patients more detailed information. In addition, the GPT-4 responses may depend on how the queries are worded and the GPT-4 may have learned from the clinical vignettes this study used. Also, the findings may not be applicable to other conversational AI systems.



Health systems can use the findings to introduce conversational AI to improve patient diagnosis and triage efficiently.



“The findings from our study should be reassuring for patients, because they indicate that large language models like GPT-4 show promise in providing accurate medical diagnoses without introducing racial and ethnic biases,” said senior author Dr. Yusuke Tsugawa, associate professor of medicine in the division of general internal medicine and health services research at the David Geffen School of Medicine at UCLA. “However, it is also important for us to continuously monitor the performance and potential biases of these models as they may change over time depending on the information fed to them.”



Additional study authors are Naoki Ito, Sakina Kadomatsu,  Mineto Fujisawa, Kiyomitsu Fukaguchi,  Ryo Ishizawa, Naoki Kanda,  Daisuke Kasugai, Mikio Nakajima,  and Tadahiro Goto.



The study is published in the peer-reviewed JMIR Medical Education.


Related Content


Media Contact

Enrique Rivero
[email protected]