ChatGPT 4.0, the most recent iteration of OpenAI’s large language model (LLM), scored 85% correctly on a clinical neurology exam of the American Board of Psychiatry and Neurology during a proof-of-concept study.
A group of researchers from the German Cancer Research Center in Heidelberg and University Hospital Heidelberg published the experiment results on December 7. Two LLMs were examined on May 31: ChatGPT 3.5 and its subsequent iteration, ChatGPT 4.0.
The researchers supplemented a subset of the European Board of Neurology questions with those from the American Board of Psychiatry and Neurology‘s neurology exam question bank.
The accuracy rate of the older ChatGPT model was 66.8%, or 1306 correct responses out of 1956, whereas the more recent ChatGPT 4.0 improved to 85% with 1662 correct answers. Humans achieved an average score of 73.8%.
ChatGPT 4.0 demonstrated superior performance compared to human users on psychological, cognitive, and behavioral-related questions, effectively “passing” the neurology exam with a score of 70%. A passing grade is typically equivalent to 70% accurate responses in academic institutions.
“These findings suggest that with further refinements, large language models could have significant applications in clinical neurology.”
As per the experiment’s conducting group, the following modifications to the LLMs should be considered for clinical neurology application:
The researchers note that several reservations remain. Although the documentation and decision-making support systems offer a distinct opportunity to implement LLMs, neurologists should exercise prudence in practice due to their continued limitations regarding high-order cognitive tasks. One of the authors of the study, Dr. Varun Venkataramani, stated:
We see our study more as a proof of concept for the capabilities of LLMs. There is still development needed and probably even specific fine-tuning of LLMs to make them properly applicable for clinical neurology.
More precisely, our research is a demonstration of the potential of LLMs. Further development and precise refinement are required to ensure that LLMs are applicable in clinical neurology.
AI is already tackling significant healthcare challenges, including combating antibiotic overprescribing in Hong Kong and discovering a remedy for cancer on behalf of AstraZeneca.