Research2w ago

AI Models Show Promise in Creative Tasks

Simon WillisonApril 16, 2026

In brief

Two major AI models, Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic, have been tested on a creative benchmark involving generating images of pelicans riding bicycles.
Results show that Qwen3.6-35B-A3B accurately captured the scene with proper details, while Claude Opus 4.7 struggled with the bicycle frame's shape in its initial attempt but improved slightly when prompted to think at maximum level.
- This highlights the growing capabilities of AI models in creative tasks despite their limitations.
Developers and researchers should continue monitoring advancements as AI pushes boundaries in image generation and beyond.

Terms in this brief

Qwen3.6-35B-A3B: A large language model developed by Alibaba, known for its performance in creative and detailed tasks. It demonstrated strong capabilities in generating accurate images of pelicans riding bicycles compared to other models.
Claude Opus 4.7: An AI model from Anthropic that was tested alongside Qwen3.6-35B-A3B. While it initially struggled with image generation, it showed improvement when prompted to think at a higher level, highlighting the ongoing advancements in AI creativity.

Read full story at Simon Willison →

More briefs