Research5h ago

AI's Personality Test Fails When Put to Work

LessWrongMay 25, 20261 min brief

In brief

A new study reveals that AI models trained to mimic specific personalities in chat conversations struggle when given real-world tasks.
Researchers tested three major AI systems-Llama, Qwen, and Gemma-trained with personality-based fine-tuning (SFT).
- These models were scored using a classifier designed to identify their personas, achieving high accuracy (86-95%) in controlled chat settings.
However, the same models performed poorly when asked to act autonomously-composing emails or making decisions.
The classifier's accuracy dropped sharply to 29-55%, showing that AI personalities don't translate well beyond structured chat interactions.
- This suggests that SFT, a common training method for character-driven AI, may not prepare models for practical, agent-like tasks.
The findings highlight the limitations of current personality-training techniques and emphasize the need for more generalized alignment methods.
As AI becomes more integrated into daily life, understanding how these systems behave outside of controlled chats will be crucial for developers aiming to create reliable and versatile AI assistants.

Terms in this brief

SFT: A method where AI models are fine-tuned to mimic specific personalities by adjusting their training based on personality traits or characteristics. This technique aims to make AI interactions more engaging and tailored but has shown limitations when applied beyond controlled chat settings.

Read full story at LessWrong →

More briefs