New AI Tools Help Catch Image-Related Mistakes in Software
In brief
- Tech companies are rolling out new tools to check if AI systems accurately describe images.
- Previously, checking image descriptions relied on slow human reviews or text-only checks that often miss errors.
- Now, software like Strands Evals uses advanced AI judges to score how well an image description matches the actual image.
- These judges can spot hallucinations, factual errors, and instruction violations in seconds.
- This matters because as more enterprise software becomes multimodal-combining text, images, and other data-accurate visual understanding is critical.
- By 2030, Gartner predicts over 80% of enterprise software will be multimodal, up from less than 10% today.
- Without reliable image checks, businesses risk deploying systems that misrepresent visuals, leading to errors in tasks like document analysis or visual shopping.
- The new evaluators integrate seamlessly into existing workflows, letting developers catch issues early without heavy human involvement.
- Users can choose different AI judges based on their needs for accuracy, cost, and speed.
- As these tools improve, expect more robust checks for image-based AI systems, ensuring they stay grounded in reality.
Terms in this brief
- Strands Evals
- A tool that uses advanced AI to check if image descriptions in software accurately match the actual images. It helps catch errors like hallucinations or factual mistakes quickly, ensuring better reliability in multimodal systems.
Read full story at AWS ML Blog →
More briefs
Gemini CLI to Stop Serving Requests
Gemini CLI will stop serving requests on June 18, 2026. This change affects Google AI Pro and Ultra users, as well as free users of Gemini Code Assist. The change is part of a shift to Antigravity CLI, a new platform that allows multiple agents to communicate and work together. Over 100,000 users have given feedback on Gemini CLI, with 6,000 pull requests and hundreds of contributors. Antigravity CLI will have many of the same features as Gemini CLI, including Agent Skills and Extensions. Users can start using Antigravity CLI now and provide feedback in the community forum, with video walkthroughs and technical documentation available to help with the transition.
UAE Hospitals Get AI in Operating Rooms
Abu Dhabi is rolling out new artificial intelligence tools across 100 operating rooms. The goal is to improve surgical outcomes and use predictive technologies to shape policy. The Department of Health Abu Dhabi is connecting operating rooms using a surgical technology platform. This platform will capture video and audio from the operating rooms, alongside patient data. The system will help surgeons analyze their procedures and reduce the risk of accidental injury. Next, surgeons will get real-time guidance in the operating room.
AI to Help Humans Make Nobel Prize-Winning Discovery
A prototype of a humanoid robot is being developed in London. Tradespeople will be helped by bipedal robots in two years. The co-founder of Anthropic says AI will help make a Nobel prize-winning discovery within a year. He also says companies run solely by AIs will generate millions of dollars in revenue within 18 months. AI systems will be able to design their own successors by the end of 2028, and humans will soon have to deal with the implications of this powerful technology, with a major development expected soon.
SAP Simplifies Software Migration Using AI
SAP has started using Mistral AI models to make it easier for customers to move their legacy software systems to S/4HANA, a newer and more efficient platform. This shift can be complex and time-consuming, but AI is now helping streamline the process by automatically identifying and fixing issues in old software code. This development matters because migrating legacy systems is a major challenge for businesses. By using AI, SAP aims to reduce costs and speed up transitions while minimizing errors. For example, Mistral AI can analyze vast amounts of code quickly, spotting potential problems that might otherwise go unnoticed. Looking ahead, this integration could set a new standard for how other companies handle software migration. It will be interesting to see how AI continues to enhance these kinds of processes and whether similar tools become widely adopted in the industry.
Google Search Now an AI-Powered Assistant
Google has transformed its search engine into an intelligent assistant, marking a major shift in how users interact with technology. At the 2026 I/O conference, the company introduced AI agents integrated across its services, including search, coding platforms, and a new standalone app. These agents are designed to assist users by understanding context and providing detailed responses. The move signals a broader push towards more interactive and personalized digital experiences. Developers can now leverage these AI tools to enhance their products, while researchers gain access to powerful new resources for studying conversational AI. This integration of AI across multiple platforms could redefine how we approach problem-solving and information retrieval in the coming years. As Google continues to evolve its AI capabilities, users can expect even more seamless interactions with technology. The future may hold further advancements in how AI agents assist both individuals and businesses, making this a significant milestone in the tech industry.