latentbrief
Back to news
Research2d ago

AI Models Exposed: They Copy Numbers Instead of Solving Problems

Hugging Face Blog, arXiv CS.LG1 min brief

In brief

  • Recent research reveals that small language models, even when using chain-of-thought prompting, often rely on copying numbers from earlier steps rather than performing genuine arithmetic.
    • This shortcut significantly impacts their accuracy-incorrect answers occur 54-92% less often when the correct number is available.
  • For example, if a wrong number precedes the answer delimiter, accuracy plummets to near-zero, despite correct intermediate reasoning.
  • The study highlights that this copying behavior varies by model architecture: Qwen and Llama copy distractors up to 95% of the time, while Gemma is more selective.
  • Larger models (7-8B) show improved content-selective gating, reducing reliance on positional shortcuts.
    • This finding challenges assumptions about AI reasoning abilities and underscores limitations in current oversight methods.
  • Moving forward, researchers will likely focus on improving model architectures to reduce reliance on copying and enhance genuine computation.
  • Developers should also consider refining evaluation metrics to better assess AI reasoning without conflating shortcuts with actual problem-solving skills.

Terms in this brief

chain-of-thought prompting
A method where AI models simulate step-by-step reasoning by generating a chain of thoughts leading to an answer. This approach aims to make AI decisions more transparent and logical by breaking down complex problems into smaller, manageable steps.
Qwen
A model architecture known for its ability to handle sequential tasks efficiently. Qwen has shown high performance in various benchmarks, particularly in tasks requiring careful step-by-step reasoning and minimal reliance on copying previous numbers.

Read full story at Hugging Face Blog, arXiv CS.LG

More briefs