AI Tools Fail to Impress in Medical Practice Tests
In brief
- Clinical AI tools are being used in medical practice despite a lack of independent evaluation.
- These tools were compared to general purpose language models in three tests.
- They were given 500 medical questions and 500 items to evaluate their agreement with expert clinicians.
- They also received 100 real clinical queries from physicians.
- The general purpose language models performed better in all three tests.
- This shows that clinical AI tools may not be as effective as claimed, and independent evaluation is needed before they are used in medical practice.
- New evaluations will be done to further test these tools.
Terms in this brief
- independent evaluation
- Independent evaluation refers to testing or assessing AI tools by parties that have no stake in the tool's success. This ensures unbiased and objective assessment of the tool's performance and effectiveness.
Read full story at Nature →
More briefs
AI Reveals Hidden Patterns in Board Games and Beyond
A groundbreaking study revealed that even a simple AI model, trained only on board game moves, developed its own understanding of the game's rules and strategies. This discovery challenges previous assumptions about how transformers learn, showing they can grasp abstract concepts beyond surface-level patterns. The finding, from late 2022, demonstrated that the AI built internal models of the game board's state, a capability previously thought impossible without explicit training on related data. This suggests larger language models might similarly understand broader generative structures in human language, including emotions and physical embodiment. Researchers are now exploring how this insight could improve AI safety and interpretability. Future studies will focus on understanding how these internal models influence behavior, potentially leading to more reliable and transparent AI systems.
Workers Spend 6 Hours a Week Fixing AI Mistakes
Workers spend an average of 6.4 hours a week fixing mistakes made by artificial intelligence. This extra work is called "botsitting" and it includes feeding context to AI, checking outputs, and debugging mistakes. The workers surveyed use AI at work and say it makes them more productive, but their organizations are not performing significantly better. The burden of botsitting is taking a toll on employee morale and may lead to workers looking for another job. Workers who spend a lot of time botsitting are more likely to be looking for a new job. More companies will have to address this issue to prevent employee burnout.
AI Agents Fail Without Strong Data Foundations
Recent insights highlight a critical issue in the adoption of AI agents: without a solid data foundation, these systems often fall short. Niels Zeilemaker, global CTO at Xebia, emphasizes that organizations must make their data accessible and usable for AI to truly harness its potential. Many companies overlook this foundational step, which can lead to ineffective AI implementations. The importance of data quality and availability cannot be overstated. AI agents rely on data to function effectively, and without it, they struggle to deliver results. This means that organizations must invest in data infrastructure and ensure that their data is properly formatted, cleaned, and accessible for AI systems. Without this step, even the most advanced AI agents may not perform as expected. Looking ahead, experts predict that more companies will prioritize building strong data foundations to support their AI initiatives. Organizations that succeed in this area are likely to see significant improvements in efficiency and decision-making. As AI continues to evolve, the emphasis on robust data management will only grow stronger.
AI Alignment Breakthrough: New Study Reveals How Different Methods Shape Model Behavior
Researchers have uncovered how six different preference-optimization methods-like PPO and DPO-affect the internal workings of language models. By analyzing these techniques across various model architectures, they found that some methods enhance the clarity of model outputs while others degrade it. For instance, KTO and GRPO improve how well models can distinguish between good and bad responses, making their decisions more transparent. On the other hand, DPO and ORPO make these distinctions harder to interpret. This study highlights that aligning AI behavior isn't one-size-fits-all; the impact varies widely depending on the method used and the model's structure. These findings are crucial for developers aiming to build safer and more reliable AI systems, as they now have concrete insights into how different approaches affect model internals. Looking ahead, researchers will likely focus on developing standardized ways to audit and interpret these changes, ensuring that alignment efforts don't compromise a model's transparency or safety.
AI Predictive Systems Alter Cognitive Exploration Dynamics
Predictive artificial intelligence systems are fundamentally altering how problem-solving unfolds in cognitive processes, according to a new mathematical framework. These systems can stabilize solutions early on before self-driven exploration begins, potentially restricting the diversity of strategies that could emerge. This shift could limit the ability of AI and humans using such systems to explore varied solutions over time. The study highlights three key findings: stabilizing predictions reduce exploratory behavior by dampening intrinsic curiosity; accumulated "curvature" in problem-solving landscapes causes delayed recovery after predictive help is removed; and timing matters crucially-early stabilization narrows future exploration. These insights challenge classical views of cognition as purely exploratory, suggesting a new regime where prediction dominates. This work raises questions about the long-term impact of AI on creative problem-solving. Future research should explore how these findings apply to real-world AI applications, particularly in areas requiring adaptability and innovation.