Study Reveals Pros and Cons of Using Generative AI to Draft Patient Responses
A new study from Mass General Brigham, published in The Lancet Digital Health and reported in a press release from the hospital, investigates the pros and cons of using large language models (LLMs), a type of generative AI, to draft replies to patient messages.
The study offers conflicting conclusions. On one hand, using LLMs in this way “may help reduce physician workload and improve patient education,” but there are also significant limitations that “may affect patient safety, suggesting that vigilant oversight of LLM-generated communications is essential for safe usage.” Administrative and documentation responsibilities have been on the rise for clinicians, and these generative AI algorithms are meant to take some of the burden off the clinicians in drafting messages to patients.
The researchers used OpenAI’s GPT-4, “a foundational LLM, to generate 100 scenarios about patients with cancer and an accompanying patient question…Six radiation oncologists manually responded to the queries; then, GPT-4 generated responses to the questions.” When provided with the LLM-generated responses for review and editing, the same radiation oncologists “believed that an LLM-generated response had been written by a human” in 31% of cases.
The study found that physician-drafted responses were on average shorter than LLM-generated responses. The physicians found that the LLM-generated responses were safe in 82.1 percent of cases “and acceptable to send to a patient without any further editing in 58.3 percent of cases.” On the other hand, the researchers identified 7.1 percent of responses that “could pose a risk to the patient and 0.6 percent of responses” that could pose a risk of death.
With these shortcomings (and successes) in mind, Mass General Brigham renews calls to “balance [AI tools’] innovative potential with a commitment to safety and quality.”
Matt MacKenzie | Associate Editor
Matt is Associate Editor for Healthcare Purchasing News.