Research1d ago

AI Models Show Unexpected Inconsistencies in Decision-Making

LessWrongMay 3, 2026

In brief

AI models like Claude Opus, DeepSeek V4-Pro, Google Gemini, and OpenAI GPT have shown surprising inconsistencies when making decisions.
In a study with over 25,000 calls across four models, researchers found that the same model could recommend one action in one scenario but value another differently in another.
For example, when asked which lead to pursue first, models often chose a safer option, yet when evaluating potential earnings, they valued riskier but potentially more rewarding choices higher.
- This mirrors classic human decision-making biases observed decades ago.
The study tested various prompt formats and reasoning settings, revealing that even at their most advanced, AI models still struggle with consistent judgment.
In one format, inconsistency rates dropped from 48.4% to 30.7% when reasoning was set to its highest level.
However, the models consistently showed a preference for safer bets in the short term while valuing riskier but potentially higher-reward options more highly.
Looking ahead, researchers suggest that these inconsistencies could impact how AI is used in real-world applications like business decisions or financial advice.
As AI becomes more integrated into daily life, understanding and addressing these biases will be crucial for ensuring reliable and ethical outcomes.

Terms in this brief

Claude Opus: A specific version or iteration of the Claude AI model developed by Anthropic, known for its advanced capabilities in decision-making and reasoning. This study highlights that even advanced models like Claude Opus can exhibit unexpected inconsistencies in their judgments.
DeepSeek V4-Pro: A sophisticated AI model created by the Chinese company DeepSeek, noted for its performance in various tasks including problem-solving and decision-making. The research indicates that this model also shows surprising inconsistencies in its decision-making processes.
Gemini: A state-of-the-art AI model developed by Google, known for its multi-task capabilities and advanced reasoning skills. The study reveals that Gemini, like other models, struggles with consistent judgment across different scenarios.

Read full story at LessWrong →

More briefs