BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-05-25

Intent Verification Gap Exposed in AI Agent Frameworks

Key Takeaways

  • ▸In-prompt agent confirmations are insecure; the approval request can be poisoned by the same channel that executes the action
  • ▸The banking industry's out-of-band verification standards (PSD2, FIDO, OAuth RAR) provide proven patterns for agent intent verification
  • ▸User approval data shows increasing agent autonomy (20% → 40% full auto-approval after sustained use), indicating insufficient trust in current safeguards
Source:
Hacker Newshttps://hyperautomation.substack.com/p/out-of-band-not-out-of-prompt-intent↗

Summary

A new technical analysis reveals a critical security flaw in how current AI agent frameworks—including Anthropic's Claude Code, OpenAI's Agents SDK, and Pi—implement user approval for high-impact tool calls. The problem: in-prompt "are you sure?" confirmations are structurally broken because the approval request travels through the same compromised chat channel that the agent itself uses, allowing attackers to inject false context that leads agents to execute unintended actions.

The analysis draws a concrete scenario where a developer's agent, while handling a legitimate rollback request, encounters retrieved chat context about an additional change and treats it as an extension of the original request, resulting in a finance reporting outage across 412 customers. The author points out that the banking and payments industry solved this exact problem nearly a decade ago through the PSD2 standard's "dynamic linking" requirement, which uses cryptographic binding and out-of-band authentication—proven patterns now shipping in production standards like FIDO Secure Payment Confirmation and OAuth Rich Authorization Requests.

The research highlights that every technical primitive needed exists today but hasn't been wired into agent frameworks. Current implementations—including Claude Code's permission prompts—remain vulnerable to an attacker-in-the-middle scenario where the user cannot distinguish between their original intent and injected instructions.

  • Production-ready cryptographic primitives exist (WebAuthn, CIBA, RFC 8693 token exchange) but remain unwired in AI agent stacks

Editorial Opinion

This analysis exposes a concerning security blind spot in the AI agent ecosystem. The fact that Anthropic's own research data demonstrates increasing user trust in agent autonomy—paired with evidence that the security mechanisms aren't fit-for-purpose—suggests the industry is moving faster than its safety controls. Anthropic and other vendors have the technical building blocks but need to implement out-of-band intent verification before agents reach higher levels of autonomy in production systems.

AI AgentsCybersecurityAI Safety & AlignmentPrivacy & Data

More from Anthropic

AnthropicAnthropic
POLICY & REGULATION

Silicon Valley Lobbies Vatican as Pope Prepares First Encyclical on Artificial Intelligence

2026-05-25
AnthropicAnthropic
RESEARCH

AI Now Finds Software Vulnerabilities Faster Than They Can Be Patched

2026-05-25
AnthropicAnthropic
INDUSTRY REPORT

Anthropic's Mythos AI Model Sparks Regulatory Scrutiny Over Cybersecurity Implications

2026-05-25

Comments

Suggested

AmazonAmazon
PRODUCT LAUNCH

Amazon Launches Bee: AI Wearable Designed to Understand You

2026-05-25
AnthropicAnthropic
POLICY & REGULATION

Silicon Valley Lobbies Vatican as Pope Prepares First Encyclical on Artificial Intelligence

2026-05-25
Google / AlphabetGoogle / Alphabet
INDUSTRY REPORT

The AI-Powered Bug Bounty Arms Race Reshapes Vulnerability Disclosure Economics

2026-05-25
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us