AI Triumphs in Pokémon Red After Years of Trials
In brief
- AI has achieved a significant milestone in its quest to master Pokémon Red.
- Anthropic's Claude AI has finally beaten the game, marking over a year of development and multiple failed attempts.
- The journey was filled with hilarious challenges-like getting stuck at Mt.
- Moon or trying to escape by fainting all its Pokemon.
- Despite these setbacks, Claude improved steadily across various skills, including memory and spatial reasoning.
- The success of Claude highlights advancements in AI problem-solving, though it still faces limitations.
- While some progress came from "scaffolding" tools like screenshot-saving, much of the improvement was due to the AI getting smarter over time.
- This achievement follows similar breakthroughs by other AI systems, like Google's Gemini, which previously conquered Pokémon Blue.
- Looking ahead, this milestone raises questions about how AI can tackle even more complex tasks.
- While Claude's victory is a notable step forward, its struggles in Pokémon Red suggest there's still room for improvement in understanding dynamic environments and making strategic decisions.
Terms in this brief
- Claude
- Claude is an AI developed by Anthropic that has achieved significant milestones in solving complex tasks like completing Pokémon Red. It demonstrates advancements in AI problem-solving and learning capabilities, highlighting the potential for AI to tackle more intricate challenges.
Read full story at LessWrong →
More briefs
AI Model Showdown: November 2025 Inflection Point
In November 2025, the landscape of large language models (LLMs) underwent a dramatic shift. The top model crown changed hands five times among major providers like Claude Sonnet, GPT-5.1, and Gemini 3. A unique test-drawing a pelican riding a bicycle-helped highlight differences in these models. While most agreed that Anthropic's Claude Opus 4.5 was the best for general tasks, November also marked a breakthrough in coding agents. OpenAI and Anthropic had been refining their models to write better code through reinforcement learning. This effort paid off when coding agents reached a quality threshold where they could be used reliably for real work. The month also saw the first commit to an obscure repository called "Warelay," which later gained traction. From December to January, developers explored new model capabilities and even built ambitious projects like micro-javascript-a JavaScript interpreter in Python using Pyodide and WebAssembly. These developments hint at a future where AI tools become more integrated into everyday workflows, pushing the boundaries of what's possible with LLMs.
Google Launches AI-Powered Design App
Google announced a new AI-powered design and image-generation app called Pics for Google Workspace. The app lets users generate images using simple text prompts without needing editing skills. This matters because it can help small businesses and individuals create visual content easily, with over 10 million people using design apps like Canva. Google will roll out Pics to subscribers this summer, and users can edit images directly, making every element adjustable. Google will continue to update Pics to make image editing easier.
AI Education Demand Surges
MIT Sloan Executive Education saw over 20,000 leaders attend AI courses last year. These leaders want to learn about AI basics and how to adopt the technology. Demand for AI education has grown from a basic understanding to implementing and managing the technology. Leaders are looking to understand the implications of AI on their workforce. AI education will continue to evolve as more companies adopt the technology.
AI Agents Gain New Capabilities in Self-Learning and Problem-Solving
AI agents like Claude Code, Codex, and LangChain Deep Agents have shown remarkable skills in managing tasks, chaining tools, executing code, and responding to complex queries. These advancements allow them to work more efficiently with minimal human intervention, making them valuable for developers and researchers. The integration of these AI systems into software architecture and big data schema is transforming how applications are built and maintained. By leveraging a skills repository, these agents can adapt and learn from their experiences, improving over time without constant supervision. This development could significantly reduce the time spent on repetitive tasks, allowing humans to focus on more creative and strategic work. Looking ahead, the ability of AI agents to train new sub-agents themselves opens up possibilities for even greater automation and innovation in various industries. As these technologies evolve, we can expect further improvements in how AI interacts with both data and users, making it a powerful tool for problem-solving across sectors.
Google's AI Costs Skyrocket as New Models Emerge
Google has unveiled its latest AI advancements, including Gemini 3.5 Flash, a model that outperforms its predecessor but comes at a much higher cost. Running Gemini 3.5 Flash is reported to be 5.5 times more expensive than earlier versions, and for agent tasks, costs exceed even the pricier Gemini 3.1 Pro by 75%. This trend isn’t isolated-AI expenses are rising across the board as companies invest heavily to stay competitive. At Google’s I/O developer conference, the company also introduced Gemini Omni, a multimodal model, and Gemini Spark, a personal cloud agent that runs continuously. These new offerings highlight the growing complexity and resource demands of AI development. While they promise enhanced capabilities, the steep costs may challenge developers and businesses looking to adopt them. As the industry evolves, keep an eye on how these cost increases impact innovation and accessibility in AI.