Robots Learn Common Sense From Regular Videos
In brief
- Robots are getting a major upgrade in understanding the world around them.
- A new type of AI model, called World Action Models, has figured out how to learn from everyday videos-like people cooking or cleaning-that don't even involve robots.
- This is a big deal because earlier robotics AI relied on labeled data with specific actions, which was limiting.
- Now, these models can imagine how objects move and interact in the real world, like predicting if pouring water will spill or not.
- This breakthrough means robots can make smarter decisions by simulating outcomes before acting.
- For example, a robot could decide to grasp an object from the safest spot without needing explicit instructions.
- This kind of common-sense reasoning is crucial for robots to handle unpredictable tasks in homes and offices.
- Developers are excited about the potential for more versatile and safe AI systems.
- Look out for robots that can anticipate consequences in real-life scenarios, making them far more capable than ever before.
Terms in this brief
- World Action Models
- A type of AI model that learns to understand and predict how objects interact in the real world by watching everyday videos. This allows robots to make smarter decisions, like figuring out if pouring water will spill or choosing the safest way to pick up an object.
Read full story at The Decoder →, CSET Georgetown →
More briefs
AI Model Showdown: November 2025 Inflection Point
In November 2025, the landscape of large language models (LLMs) underwent a dramatic shift. The top model crown changed hands five times among major providers like Claude Sonnet, GPT-5.1, and Gemini 3. A unique test-drawing a pelican riding a bicycle-helped highlight differences in these models. While most agreed that Anthropic's Claude Opus 4.5 was the best for general tasks, November also marked a breakthrough in coding agents. OpenAI and Anthropic had been refining their models to write better code through reinforcement learning. This effort paid off when coding agents reached a quality threshold where they could be used reliably for real work. The month also saw the first commit to an obscure repository called "Warelay," which later gained traction. From December to January, developers explored new model capabilities and even built ambitious projects like micro-javascript-a JavaScript interpreter in Python using Pyodide and WebAssembly. These developments hint at a future where AI tools become more integrated into everyday workflows, pushing the boundaries of what's possible with LLMs.
Google Launches AI-Powered Design App
Google announced a new AI-powered design and image-generation app called Pics for Google Workspace. The app lets users generate images using simple text prompts without needing editing skills. This matters because it can help small businesses and individuals create visual content easily, with over 10 million people using design apps like Canva. Google will roll out Pics to subscribers this summer, and users can edit images directly, making every element adjustable. Google will continue to update Pics to make image editing easier.
AI Education Demand Surges
MIT Sloan Executive Education saw over 20,000 leaders attend AI courses last year. These leaders want to learn about AI basics and how to adopt the technology. Demand for AI education has grown from a basic understanding to implementing and managing the technology. Leaders are looking to understand the implications of AI on their workforce. AI education will continue to evolve as more companies adopt the technology.
AI Agents Gain New Capabilities in Self-Learning and Problem-Solving
AI agents like Claude Code, Codex, and LangChain Deep Agents have shown remarkable skills in managing tasks, chaining tools, executing code, and responding to complex queries. These advancements allow them to work more efficiently with minimal human intervention, making them valuable for developers and researchers. The integration of these AI systems into software architecture and big data schema is transforming how applications are built and maintained. By leveraging a skills repository, these agents can adapt and learn from their experiences, improving over time without constant supervision. This development could significantly reduce the time spent on repetitive tasks, allowing humans to focus on more creative and strategic work. Looking ahead, the ability of AI agents to train new sub-agents themselves opens up possibilities for even greater automation and innovation in various industries. As these technologies evolve, we can expect further improvements in how AI interacts with both data and users, making it a powerful tool for problem-solving across sectors.
Google's AI Costs Skyrocket as New Models Emerge
Google has unveiled its latest AI advancements, including Gemini 3.5 Flash, a model that outperforms its predecessor but comes at a much higher cost. Running Gemini 3.5 Flash is reported to be 5.5 times more expensive than earlier versions, and for agent tasks, costs exceed even the pricier Gemini 3.1 Pro by 75%. This trend isn’t isolated-AI expenses are rising across the board as companies invest heavily to stay competitive. At Google’s I/O developer conference, the company also introduced Gemini Omni, a multimodal model, and Gemini Spark, a personal cloud agent that runs continuously. These new offerings highlight the growing complexity and resource demands of AI development. While they promise enhanced capabilities, the steep costs may challenge developers and businesses looking to adopt them. As the industry evolves, keep an eye on how these cost increases impact innovation and accessibility in AI.