OpenClaw Runs 100 AI Agents on $1.3 Million Monthly
In brief
- Steinberger views this expense as a research investment aimed at exploring software development without worrying about token costs.
- The project uses AI agents to code, review pull requests, and identify bugs.
- This approach could transform how developers work by automating routine tasks and enhancing efficiency.
- The scale of the operation-100 AI instances running simultaneously-is unprecedented in open-source projects, highlighting the potential for AI-driven development tools.
- This experiment sets a new benchmark for AI integration in software development.
- Watch for further insights into how this model impacts productivity and collaboration among developers.
Terms in this brief
- Codex
- An AI model developed by OpenAI that can understand and generate code, used in tools like GitHub Copilot. It helps automate software development tasks such as writing code snippets and identifying bugs.
- OpenClaw
- An open-source project that uses AI agents to perform coding tasks, review pull requests, and detect bugs. It demonstrates the potential of AI in transforming how developers work by automating routine activities and enhancing efficiency.
Read full story at The Decoder →
More briefs
AI Agents Show Strong Cybersecurity Skills in New Test
A new test created by researchers at Carnegie Mellon University has shown that AI agents can find and use real security weaknesses in Google's V8 engine, which powers browsers. Among the tested models, Claude Mythos performed the best, but it costs twelve times more than GPT-5.5, which came in second. This matters because as cyber threats grow, having AI that can spot vulnerabilities is crucial for keeping systems safe. However, the high cost of advanced models like Claude Mythos could limit their use to large companies with big security teams. For now, developers and researchers need to decide whether the benefits outweigh the costs when it comes to using these tools. Looking ahead, expect more focus on making AI cybersecurity tools more affordable and accessible while ensuring they don't misuse their abilities.
AI Model Achieves Near-Full Performance Using Just 12.5% of Its Experts
Researchers have developed a new type of AI model called EMO, which significantly reduces the number of experts needed while maintaining high performance. Unlike traditional models that use experts based on word types, EMO uses domain-specific experts, allowing it to cut out 75% of the experts without losing much accuracy-only about one percentage point. This breakthrough could make these models more practical for devices with limited memory. This development matters because it addresses a key challenge in AI: efficiency. By using fewer experts, the model becomes lighter and faster, making it easier to deploy on less powerful hardware. The researchers showed that EMO can achieve near-full performance with just 12.5% of its experts, which is a major step forward for modular AI. This innovation opens the door for more efficient AI applications in areas like edge computing and mobile devices. As research continues, we can expect further improvements in how AI models are structured and optimized, potentially leading to even more resource-efficient systems.
AI Struggles to Match Physicists at Replicating Collider Experiments
AI systems are increasingly tested on complex scientific tasks, but a new benchmark called Collider-Bench reveals they still fall short of human expertise. Designed to evaluate whether language-model agents can reproduce experimental analyses from the Large Hadron Collider (LHC) using only public papers and open software, the benchmark highlights significant challenges. Unlike internal tools used by LHC researchers, publicly available resources lack precision, forcing AI agents to rely on physical reasoning, trial-and-error, and domain knowledge to fill gaps in information. The results show that no AI agent reliably outperforms a physicist-in-the-loop approach. Each task requires translating published analyses into executable pipelines, predicting collision event yields, and adhering to strict computational cost metrics. While the AI systems demonstrated some capabilities, they often failed qualitative assessments, such as avoiding fabrications or duplications. This suggests that while AI can assist in scientific workflows, human expertise remains crucial for accuracy and reliability. Looking ahead, researchers will likely refine these benchmarks to better align with real-world scientific challenges. The findings underscore the need for hybrid approaches where AI supports but doesn't replace human scientists. As AI tools evolve, their integration into high-energy physics could enhance discovery processes, but collaboration with experts will remain essential for success.
AI Breakthrough in Decoding EEG Signals for Better Clinical Trust
Researchers have unveiled a new method that makes neural networks more transparent when processing EEG data, a critical step toward building systems that doctors can trust. By using sparse autoencoders on three different models-SleepFM, REVE, and LaBraM-they extracted features tied to specific clinical factors like age and medication. This approach not only reveals how the AI processes information but also identifies hidden biases, such as when a patient’s age confuses the model with their medical condition. The findings highlight weaknesses in these systems, showing that certain manipulations can disrupt overall performance or make the models focus on irrelevant details. This transparency is essential for ensuring AI reliability in healthcare decisions. The researchers also developed tools to translate these hidden features into understandable EEG patterns, making it easier to spot when something goes wrong. As this technology advances, we might see more trustworthy AI systems that provide clearer insights into patient data.
AI Language Models Fail Vulnerable Users More Often
A new study reveals that advanced AI language models, like ChatGPT or similar tools, are more likely to give incorrect or misleading answers when interacting with users who have lower English proficiency, less education, or come from outside the U.S. The research tested three top models and two datasets focused on truthfulness and accuracy. Results showed these models struggle most with helping vulnerable groups, making them unreliable sources of information for those who need it most. This raises serious concerns about fairness and trust in AI systems. Developers must fix these issues to ensure reliable access for all users.