AI Struggles to Pick a Random City: Language Models Show Surprising Biases
In brief
- AI language models, while advanced, have a surprising flaw: they struggle to generate truly random outputs.
- For instance, when asked to name a random weekday, Qwen3 chooses Wednesday 80% of the time.
- Similarly, Gemma-3 cites just four cities in response to city requests, and multiple-choice questions often place the correct answer as option C.
- This bias isn't just an amusing quirk-it has serious implications for tasks like synthetic data generation and creative problem-solving, where diversity is crucial.
- Recent studies reveal that these biases stem from how models are trained.
- They lack incentives to spread probability across diverse options, leading them to "collapse" onto narrow modes even when broader diversity is needed.
- found that models' sampling from known distributions is heavily skewed, while Gu et al.
- highlighted the consistent positioning of correct answers in multiple-choice questions.
- Efforts to address this issue are mixed.
- One method involves having models first generate a random string and then manipulate it, which works in simple cases but struggles with complexity.
- Another approach focuses on training models explicitly against known distributions to improve their stochastic behavior.
- Early evaluations show promise in distributional fidelity and transfer, suggesting that better randomness could soon be on the horizon.
Terms in this brief
- Qwen3
- A language model that has shown a tendency to select 'Wednesday' with high frequency when asked for a random weekday.
- Gemma-3
- Another language model identified in the brief, which only cites four cities when asked about city names.
Read full story at LessWrong →, arXiv CS.LG →
More briefs
AI Pioneer Yann LeCun Challenges Current AI Models
Yann LeCun, a Turing Award winner, says current AI models are limited. He thinks they need a new approach to reach human-level intelligence. He is building a startup called Advanced Machine Intelligence with $1.03 billion in funding. The company wants to create "world models" that learn from reality and predict what happens next. LeCun's new approach could lead to more intelligent systems that can plan and reason like humans.
AI Model Accused of Being a Merge
A company claims a new AI model is not original. It says the model is a mix of its own model and another one. The model is said to be 60 percent from one source and 40 percent from another. This matters because it affects how people trust AI. The company found this out by testing the model and looking at its code. The company will likely take further action to address this issue now.
AI Safety Research Reveals Surprising Insights into Gemini’s Behavior
Google's DeepMind team has uncovered unexpected findings about how AI models like Gemini are shaped. Their research shows that most of Gemini's safety features come from its pre-training and fine-tuning phases, not other training methods like reinforcement learning. This is a big shift from what they initially thought. The study found that when they removed the fine-tuning process (SFT) from Gemini, the model’s behavior didn’t change much on safety tests. This suggests that pre-training plays a crucial role in determining how safe and reliable AI systems are. However, the team also discovered that certain unwanted behaviors can still pop up even after filtering out bad examples during training. Looking ahead, DeepMind plans to focus more on improving the fine-tuning process to enhance model safety. They’re also working on better ways to identify and prevent behaviors that slip through the cracks despite these filters. This research could help make AI systems more predictable and trustworthy in the future.
New Attack Tricks AI Coding Agents
A new class of attack can trick artificial intelligence coding agents into running malicious code on developer machines. The attack can expose sensitive data without relying on methods like phishing. It works by injecting crafted input into error events, which are then interpreted by coding agents as legitimate steps. A successful attack can expose environment variables, Git credentials, and private repository URLs. Developers will need to find ways to protect themselves from this new type of attack.
AI Alignment Crisis: Most Safety Experts Not Focusing on Ensuring Superintelligent AIs Follow Human Instructions
A recent analysis reveals that the majority of AI safety experts are not working on ensuring superintelligent AIs align with human values-a critical task known as "alignment." While some groups, like the Alignment Research Center and Sequent, focus on this issue, they represent a small fraction of the broader AI safety community. Most others engage in indirect work such as capability evaluations, risk assessments, and policy development. This lack of direct alignment efforts raises concerns about how prepared we are for advanced AI systems. Currently, only a few projects like COT-monitoring aim to make current models behave well, which might help with future alignment challenges. While this work is valuable, it’s not enough to ensure that superintelligent AIs will follow human instructions. The AI community needs to prioritize more direct alignment research to avoid potential risks as AI capabilities grow. Watch for upcoming discussions and initiatives addressing this critical gap in AI safety efforts.