Research3h ago

AI Agents Show Strong Cybersecurity Skills in New Test

The DecoderMay 16, 20261 min brief

In brief

A new test created by researchers at Carnegie Mellon University has shown that AI agents can find and use real security weaknesses in Google's V8 engine, which powers browsers.
Among the tested models, Claude Mythos performed the best, but it costs twelve times more than GPT-5.5, which came in second.
- This matters because as cyber threats grow, having AI that can spot vulnerabilities is crucial for keeping systems safe.
However, the high cost of advanced models like Claude Mythos could limit their use to large companies with big security teams.
For now, developers and researchers need to decide whether the benefits outweigh the costs when it comes to using these tools.
Looking ahead, expect more focus on making AI cybersecurity tools more affordable and accessible while ensuring they don't misuse their abilities.

Terms in this brief

Claude Mythos: A cybersecurity-focused AI agent developed by Anthropic, known for its strong performance in identifying and exploiting security vulnerabilities. Despite its effectiveness, it is significantly more expensive than other models like GPT-5.5.
GPT-5.5: A version of the GPT model, known for its capabilities in various tasks including cybersecurity. It was noted to be a cost-effective option compared to Claude Mythos, placing it second in performance during the Carnegie Mellon University test.

Read full story at The Decoder →

More briefs