Editorial · Research

AI Coding Agents: Fast and Functional - But Not Secure

April 25, 20261w ago

The rise of AI coding agents has been a game-changer for software development, promising to accelerate the speed and scale at which code is written. These tools can generate functional code with remarkable accuracy, often outperforming even experienced developers in certain tasks. However, as Endor Labs' recent benchmark tests reveal, the real challenge lies not in whether these agents can write code that works, but whether that code is secure. The findings are stark: while the top-performing agent achieved 84.4% functional correctness, it scored only 17.3% on security tests. This exposes a glaring gap between functionality and safety, raising critical questions about the readiness of AI coding agents for real-world use.

The benchmark, built on the Carnegie Mellon SusVibes framework, tested 200 real-world tasks from 108 open-source projects, covering 77 Common Weakness Enumeration (CWE) vulnerability classes. The results were alarming: even the best-performing agent produced secure code just 7.8% of the time. Worse still, 87% of all AI-generated code contained at least one security vulnerability. This is not a minor oversight but a systemic issue that threatens to undermine the trust and adoption of these tools in critical industries.

The problem extends beyond mere functionality; it touches on ethical considerations and accountability. As Varun Badhwar, CEO of Endor Labs, notes, "AI coding agents are dramatically increasing the speed and scale at which software gets written, but security isn't keeping pace." This sentiment is echoed by Luca Compagna, Senior Security Researcher at Endor Labs, who highlights that unlike human developers, agents lack the contextual awareness and security best practices that uphold our systems' integrity. The findings underscore a pressing need for greater transparency and accountability in AI coding tools.

The forward-looking close emphasizes the importance of addressing these challenges to ensure the safe integration of AI coding agents into modern development workflows. While the technology holds immense potential, its adoption must be accompanied by robust security measures and continuous evaluation to mitigate risks. The industry must prioritize not just speed and efficiency but also the safety and reliability of the code these agents produce. Only then can we fully harness the benefits of AI in software development without compromising the integrity of our systems.

Editorial perspective — synthesised analysis, not factual reporting.

Terms in this editorial

Endor Labs: A company that conducts research and development in AI, particularly focusing on the security aspects of AI-generated code. Their work includes benchmarking tests to evaluate the security of AI coding agents.
SusVibes framework: A testing framework developed by Carnegie Mellon University used to assess the security vulnerabilities in software. It helps identify weaknesses in AI-generated code, ensuring it meets robust security standards.

If you liked this

More editorials.

← Back to editorials