Launch22h ago

New AI Tools Help Catch Image-Related Mistakes in Software

AWS ML BlogMay 20, 20261 min brief

In brief

Tech companies are rolling out new tools to check if AI systems accurately describe images.
Previously, checking image descriptions relied on slow human reviews or text-only checks that often miss errors.
Now, software like Strands Evals uses advanced AI judges to score how well an image description matches the actual image.
- These judges can spot hallucinations, factual errors, and instruction violations in seconds.
- This matters because as more enterprise software becomes multimodal-combining text, images, and other data-accurate visual understanding is critical.
By 2030, Gartner predicts over 80% of enterprise software will be multimodal, up from less than 10% today.
Without reliable image checks, businesses risk deploying systems that misrepresent visuals, leading to errors in tasks like document analysis or visual shopping.
The new evaluators integrate seamlessly into existing workflows, letting developers catch issues early without heavy human involvement.
Users can choose different AI judges based on their needs for accuracy, cost, and speed.
As these tools improve, expect more robust checks for image-based AI systems, ensuring they stay grounded in reality.

Terms in this brief

Strands Evals: A tool that uses advanced AI to check if image descriptions in software accurately match the actual images. It helps catch errors like hallucinations or factual mistakes quickly, ensuring better reliability in multimodal systems.

Read full story at AWS ML Blog →

More briefs