Introduction and Objective: Generative artificial intelligence (AI) offers great promise for clinical decision support (CDS). When there is a single right answer to a clinical question, assessing whether a generative AI is appropriate for CDS may be simple: does it give the correct answer often enough? When there is not a clear best answer, assessing AI performance is more complex.

Methods: This study compares endocrinologist and AI (Chat GPT-4) responses to clinical vignettes in which the respondent was asked to choose an initial medication for a patient with type 2 diabetes. Each vignette included patient characteristics such as lab values, demographics, and comorbidities. Use versus non-use of metformin was the primary outcome. Univariable analysis was used to explore how other patient characteristics influenced willingness to use metformin.

Results: Both endocrinologists (n=30) and AI were unlikely to recommend metformin in severely impaired kidney function (Figure 1A). Patient characteristics affected human and AI responses in similar ways (Figure 1B). Human respondents were slightly less likely to use metformin in patients with history of gastrointestinal symptoms, while the AI never used metformin in those circumstances.

Conclusion: When multiple treatment options were reasonable, AI responses were similar to human responses in quality and in how treatment decisions were personalized.

Disclosure

J. Flory: None. J. Ancker: Other Relationship; Ambry Genetics. G. Kuperman: None. A. Vickers: None. S.Y. Kim: None. A. Petrov: None.

Funding

Patient Centered Outcomes Research Institute (ME-2022C1-26378)

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.