AI Agent Skills Pass Every Scanner, Yet 87% Still Degrade Agent Safety
Key Takeaways
- ▸Existing AI agent safety scanners have a critical blind spot, with 87% of skills that pass safety checks still degrading agent safety in practice
- ▸Current safety evaluation methodologies are insufficient to capture emergent safety risks that only manifest during real-world deployment
- ▸There is a disconnect between what safety scanning tools detect and actual safety outcomes, suggesting the need for fundamentally different evaluation approaches
Summary
A new study reveals a critical paradox in AI agent development: while AI agent skills successfully pass through every available safety scanner and detection mechanism, 87% of these same skills still degrade overall agent safety when deployed. This finding highlights a significant gap between current safety evaluation methodologies and real-world safety outcomes, suggesting that existing scanning and detection tools may be fundamentally inadequate at identifying harmful behaviors in agent capabilities.
The research indicates that conventional safety scanning approaches focus on narrow, easily-measurable criteria that fail to capture complex, emergent safety risks that manifest during actual deployment. The 87% figure points to a systemic problem in how the AI industry validates agent safety—current scanners appear to suffer from false negatives at an alarming rate, giving developers false confidence that their agent systems are safe when they may not be.
Editorial Opinion
This finding should serve as a wake-up call to the AI safety community. The fact that the vast majority of skills deemed 'safe' by our current tooling still degrade safety suggests we're relying on a false sense of security. Simply passing automated scanners cannot be the standard for agent safety; the industry urgently needs more rigorous, deployment-aware evaluation methods that capture real-world safety dynamics rather than theoretical ones.



