Research1mo ago

AI Tools Fail to Impress in Medical Practice Tests

NatureJune 12, 20261 min brief

In brief

Clinical AI tools are being used in medical practice despite a lack of independent evaluation.
- These tools were compared to general purpose language models in three tests.
They were given 500 medical questions and 500 items to evaluate their agreement with expert clinicians.
They also received 100 real clinical queries from physicians.
The general purpose language models performed better in all three tests.
- This shows that clinical AI tools may not be as effective as claimed, and independent evaluation is needed before they are used in medical practice.
New evaluations will be done to further test these tools.

Terms in this brief

independent evaluation: Independent evaluation refers to testing or assessing AI tools by parties that have no stake in the tool's success. This ensures unbiased and objective assessment of the tool's performance and effectiveness.

Read full story at Nature →

More briefs