General2w ago

AI Attacks Get Smarter: New Method Evades Detection

LessWrongApril 14, 2026

In brief

Researchers have developed a new way for AI systems to be attacked that could make them harder to defend.
- This method, called attack selection, lets attackers choose when and where to strike, rather than attacking randomly.
- This makes the attacks more effective and harder to catch.
Attackers using this method can significantly lower the safety of AI systems.
For example, a highly selective attacker can reduce system safety to about 0.2 with just a 5% audit budget.
- This means that even a small number of attacks can have a big impact.
The way attackers decide when to stop an attack also matters, but it has less effect on stronger attacks.
As AI systems become more complex, understanding how to defend against these new attack methods will be crucial.
Researchers will need to find better ways to detect and block these targeted attacks.

Terms in this brief

attack selection: A method where attackers strategically choose when and where to attack AI systems, making their attacks more effective and harder to detect. This approach allows attackers to focus their efforts for maximum impact, potentially reducing the safety of AI systems significantly with limited resources.

Read full story at LessWrong →

More briefs