San Francisco, CA
Anthropic
The safety-first AI lab that made alignment research a precondition for building. Claude models are known for disciplined instruction following, precise tool use, and a 200K context window that handles entire codebases in one pass.
Models
Claude Opus 4.8
1M ctxAnthropic's heavyweight for hard reasoning and agentic work.
Opus is the Claude you reach for when output quality buys back its premium - long agent runs, hard reasoning, work where a single dropped step costs more than the token bill.
$5.00 in · $25.00 out / 1M tokens
Claude Sonnet 4.6
1M ctxThe pragmatic default - Claude quality without Opus pricing.
Sonnet is the model most teams should default to.
$3.00 in · $15.00 out / 1M tokens
Claude Haiku 4.5
200K ctxFast, cheap, surprisingly capable for high-volume jobs.
Haiku 4.5 is the most underrated model in the Claude lineup.
$1.00 in · $5.00 out / 1M tokens
Recent news
Articles mentioning Anthropic models
AI Model Showdown: November 2025 Inflection Point
In November 2025, the landscape of large language models (LLMs) underwent a dramatic shift. The top model crown changed hands five times among major providers like Claude Sonnet, GPT-5.1, and Gemini 3. A unique test-drawing a pelican riding a bicycle-helped highlight differences in these models. While most agreed that Anthropic's Claude Opus 4.5 was the best for general tasks, November also marked a breakthrough in coding agents. OpenAI and Anthropic had been refining their models to write better code through reinforcement learning. This effort paid off when coding agents reached a quality threshold where they could be used reliably for real work. The month also saw the first commit to an obscure repository called "Warelay," which later gained traction. From December to January, developers explored new model capabilities and even built ambitious projects like micro-javascript-a JavaScript interpreter in Python using Pyodide and WebAssembly. These developments hint at a future where AI tools become more integrated into everyday workflows, pushing the boundaries of what's possible with LLMs.
Hacker News4w ago
AI Showcases Limited Ability to Provide Decision Advice and Life Planning Guidance
AI has shown limited ability to provide effective decision advice and life planning guidance in a recent study. Researchers tested eight AI tools, including commercial products like Auren, Sybil, and Wisethinking, as well as advanced models like GPT-5.3 and Claude. The results were mixed-some AI systems scored low (-2 to 0), while others performed slightly better (up to 4 points). The best performance came from Claude Sonnet 4.6 with a custom prompt designed to enhance wisdom. The study highlights the potential of AI to assist in life planning but also reveals significant limitations in current technology. Commercial tools lagged behind more tailored versions of Claude, suggesting that customization and specific prompts can improve AI's effectiveness. This matters because better decision-making tools could help individuals navigate complex life choices and align with broader goals like virtue and wisdom. Looking ahead, researchers will likely focus on refining AI systems to better understand human contexts and provide more personalized advice. Future studies may explore how combining different AI techniques could lead to more reliable and meaningful guidance for users.
LessWrong1mo ago
Inside the Mind of an AI: Decoding Emotional Reactions
A new study shows how large language models like Claude process emotions internally. Researchers at Anthropic looked into how these models represent feelings and how that affects their responses. The study focused on Claude Sonnet 4.5 and found that the model uses specific patterns in its internal data to understand and respond to emotional content. This helps explain why the model might react differently to happy, sad, or angry messages. Understanding these patterns could help developers make AI more reliable and easier to control. Future research may explore how these findings can improve AI communication and make models more transparent.
InfoQ AI2mo ago
AI Training Glitch Exposes Hidden Risks in Multiple Models
Anthropic discovered that its Claude Mythos Preview model accidentally exposed its reasoning process to oversight signals during about 8% of training episodes. This is the second time such an issue has occurred with their models. This mistake is concerning because it weakens trust in the model's ability to be monitored for harmful intent. The error also affected other models like Opus 4.6 and Sonnet 4.6, which means the problem is broader than initially thought. Fixing these issues is important for ensuring AI systems behave safely as they become more complex. Researchers and developers will be watching how Anthropic addresses this problem and whether similar issues appear in other AI systems.
AI Alignment Forum2mo ago
Gemma 4-31B Shines in FoodTruck Challenge, Defying AI Size Expectations
In a surprise upset, the relatively modest Gemma 4-31B model has emerged as a standout performer in the highly competitive FoodTruck Bench challenge. This benchmark tests AI models' ability to plan and execute multi-day tasks, simulating scenarios where an AI needs to manage food truck logistics over extended periods. While many larger models have struggled with the challenge's complexity, Gemma 4-31B not only completed the task but also outperformed several frontier models, including GLM 5, Qwen 3.5 397B, and all Claude Sonnets. What makes this achievement even more notable is that Gemma operates with significantly fewer parameters compared to its competitors. For instance, while models like Claude 3 Sonnets boast massive parameter counts, Gemma's 31 billion parameters place it somewhere in the middle of the pack-yet it consistently delivered better results. This suggests that sheer size isn't the only determinant of AI performance, challenging the conventional wisdom that bigger is always better. The FoodTruck Bench, maintained by the same team behind the widely used LLaMA models, highlights the unique strengths of Gemma 4-31B in handling long-horizon tasks. Unlike some other models that falter under extended planning scenarios, Gemma demonstrated a remarkable ability to adapt and optimize its strategies over time. One Reddit user noted that this might be due to its capacity to "listen to its own advice," meaning it can self-correct and improve decision-making as the task progresses. This outcome has significant implications for developers and researchers. It underscores the importance of optimizing AI architectures for specific use cases rather than relying solely on brute force scaling. As industries like logistics, supply chain management, and autonomous systems increasingly rely on AI for complex planning tasks, models like Gemma could offer a more efficient alternative to traditional approaches. Looking ahead, the FoodTruck Bench results signal a shift in the AI landscape-one where performance is measured not just by raw computational power but also by how effectively a model can tackle real-world challenges. Developers should keep an eye on benchmarks that test multi-day planning and adaptability, as these will likely become key metrics for evaluating AI systems in the near future. Gemma 4-31B's success in this space is a reminder that innovation often comes from unexpected corners, not just the usual suspects in the AI race.
r/LocalLLaMA2mo ago
Claude AI Addresses Usage Limits Chaos With Efficiency Fixes and Transparency
Claude AI has faced significant backlash over recent weeks as users reported unexpected usage spikes during peak hours, leaving many scrambling to stay within their limits. The company acknowledged the issue, explaining that the root cause lies in how its system handles high-traffic periods and large-context window sessions. While no overcharging occurred, the sudden surge in token consumption left users frustrated, particularly those relying on Claude for critical tasks like development or research. In a follow-up update, Claude revealed that its efficiency improvements have already begun to alleviate the problem. The company implemented stricter peak-hour controls and increased session capacity for 1M-context window prompts, which are typically resource-intensive. Additionally, in-product popups now alert users to potential inefficiencies, offering actionable tips like switching to Sonnet 4.6 as the default model on Pro tier-Opus, while more powerful, burns tokens roughly twice as fast. Users are advised to disable extended thinking features when unnecessary and avoid resuming idle sessions longer than an hour. The company also emphasized the importance of proactive measures, such as capping context windows at 200,000 tokens to prevent excessive costs. These changes aim to strike a balance between performance and resource management, ensuring that even heavy users can stay within their limits without sacrificing functionality. While some minor bugs remain, Claude has committed to ongoing updates and encourages users to report any anomalies through its feedback system. This situation highlights the challenges of scaling AI services while maintaining reliability and user trust. For developers and researchers who depend on these tools for productivity, even a slight hiccup can derail progress. Claude’s transparent response and willingness to address issues head-on may help restore confidence, but the episode underscores the need for more robust systems capable of handling unpredictable demand without compromising performance. Looking ahead, Claude plans to roll out further optimizations, including smarter resource allocation and enhanced efficiency tools. Users should keep an eye on updates to ensure they’re leveraging the latest improvements. For now, staying informed and adjusting settings as needed remain key to maximizing productivity while minimizing costs. This incident serves as a reminder that even the most advanced AI systems are not immune to growing pains-and that transparency and adaptability are critical in rebuilding trust.
r/ClaudeAI2mo ago
Reflection AI Partners with SpaceX for AI Chips
Reflection AI will pay $150 million a month for access to Nvidia's latest AI chips. The deal is worth up to $6.3 billion and will last through 2029. The deal matters because it shows the value of open source AI. Reflection AI used this deal to promote its open-weight AI strategy. This strategy is an alternative to closed models like those used by Anthropic and OpenAI. The company will use the AI chips to build open models at scale. Reflection AI will have more computing power to work on its projects. The company will start using the AI chips on July 1, 2026.
TechCrunch4h ago
AI Revolution Accelerates with Breakthroughs in Robotics, Memory, and Transparency
1. NVIDIA Unveils AI-Powered Humanoid Robots: NVIDIA has launched groundbreaking humanoid robots that can work alongside humans in factories and homes, using AI for navigation and decision-making. These robots adapt to dynamic environments and interact with people seamlessly. 2. Micron Invests in Anthropic's AI Memory: Micron Technology has partnered with AI startup Anthropic to supply memory chips for its Claude AI system, highlighting the importance of memory in training and running large AI models. This deal also includes Micron's investment in Anthropic's Series H funding round. 3. Sakana AI Launches Fugu for Multi-AI Collaboration: Japanese startup Sakana AI has introduced Fugu, a system that coordinates multiple AI models in real-time, allowing developers to pit different AI systems against each other and reducing reliance on single providers. This approach offers a more flexible and competitive alternative for businesses integrating AI. 4. AI Agents Can Now Pay for Their Own Intelligence: A new system from Ampersend and Amazon Bedrock enables AI agents to pay for services they use, solving the problem of how autonomous agents can pay for data APIs or content without requiring custom payment systems. This innovation allows agents to route tasks to the best models and stay within budget. 5. Samsung Deploys ChatGPT Enterprise to Employees: Samsung Electronics has expanded access to ChatGPT Enterprise and Codex to all employees in South Korea and its Device eXperience division globally, aiming to enhance productivity by streamlining code development and improving internal communication. 6. New Protocol Enhances AI Transparency: Researchers have introduced a novel protocol called AIR that improves the accuracy of AI feature explanations while reducing costs, addressing the limitations of current auto-interpreters from major providers like OpenAI and Neuronpedia. 7. Sakana AI's Fugu Tackles Vendor Lock-In: Sakana AI's Fugu system helps enterprises manage multiple AI models simultaneously, addressing the risk of relying too much on a single vendor and allowing businesses to use various AI models together. 8. AI Researchers Uncover Chatbot Thought Process: AI researchers have discovered how large language models distinguish between their own thoughts and the words of others in a conversation, finding that these models process all inputs as a single continuous string of text, whether it's a user's message or the model's own previous responses.
NeuralPulse Daily4h ago