The Rise of Inference Theft: How Attackers Are Stealing Millions in AI API Calls

Key Takeaways

▸Inference theft is a high-margin attack with powerful financial incentives: frontier model calls cost ~$2 each, and attackers can resell stolen inference at deep discounts with zero marginal cost
▸Traditional security defenses fail at this threat model; attackers use residential proxies and throwaway accounts deployed at scale to amortize per-session checks across thousands of stolen requests
▸Attackers wrap victim APIs with OpenAI/Anthropic-compatible adapters to enable seamless resale through standard clients, repositioning the session boundary post-authentication

Source:

Hacker Newshttps://vercel.com/blog/protecting-against-token-theft↗

Summary

Vercel security research reveals a sophisticated threat landscape where attackers are systematically stealing paid AI inference at scale to resell for profit. Inference theft—the unauthorized use of paid AI API calls either for free consumption or downstream resale—represents one of the highest-margin attacks available to threat actors today. A single prompt to a frontier model can cost $2, making AI inference thousands of times more expensive than traditional API calls. Attackers pay nothing for inference and can resell stolen tokens at even a 5-10% discount while maintaining healthy margins.

The attacks are far more sophisticated than simple rate-limit abuse. Attackers deploy residential proxy services by the thousands to bypass IP-based rate limiting, register throwaway accounts to evade authentication, and crucially, create OpenAI- or Anthropic-compatible adapters that allow stolen inference to be dropped directly into standard client tools. Real-world examples include Chipotlai Max, a forked coding agent that wraps Chipotle's customer-support chatbot as an OpenAI-compatible endpoint, with documented efforts underway to exploit similar endpoints at Home Depot, Lowe's, Target, and Starbucks.

On April 12, 2026, Vercel's own documentation AI chat endpoint became a target, with traffic spiking to 10 times normal volume and reaching 1,300 requests per minute at peak. Traditional web security measures—IP rate limiting and authentication walls—prove insufficient because the per-call economics of inference theft justify the operational costs of circumventing these defenses. Vercel proposes that robust protection requires per-request verification rather than session-level checks, using deep analysis techniques like BotID to validate every individual AI request.

Protection requires request-level verification, not session-level verification, since amortized checks are defeated by high-volume attacks
AI endpoints with maximum user control over prompts (playgrounds, general-purpose assistants) are highest-risk, though even constrained endpoints like support bots are vulnerable to prompt injection attacks

Editorial Opinion

This research exposes a troubling asymmetry in AI economics: traditional web security was built for threat models where defeating a defense became expensive relative to the value extracted. With AI inference, the $2-per-call price point reverses that calculus—the high cost creates irresistible arbitrage opportunities for attackers. Vercel's emphasis on request-level verification is essential, but the deeper issue is that inference resale will remain lucrative as long as a price gap exists between provider rates and market rates. Systemic solutions may require the industry to move beyond endpoint hardening toward broader market transparency and provider-level coordination.

The Rise of Inference Theft: How Attackers Are Stealing Millions in AI API Calls

Key Takeaways

▸Inference theft is a high-margin attack with powerful financial incentives: frontier model calls cost ~$2 each, and attackers can resell stolen inference at deep discounts with zero marginal cost
▸Traditional security defenses fail at this threat model; attackers use residential proxies and throwaway accounts deployed at scale to amortize per-session checks across thousands of stolen requests
▸Attackers wrap victim APIs with OpenAI/Anthropic-compatible adapters to enable seamless resale through standard clients, repositioning the session boundary post-authentication

Summary

Protection requires request-level verification, not session-level verification, since amortized checks are defeated by high-volume attacks
AI endpoints with maximum user control over prompts (playgrounds, general-purpose assistants) are highest-risk, though even constrained endpoints like support bots are vulnerable to prompt injection attacks

Editorial Opinion

This research exposes a troubling asymmetry in AI economics: traditional web security was built for threat models where defeating a defense became expensive relative to the value extracted. With AI inference, the $2-per-call price point reverses that calculus—the high cost creates irresistible arbitrage opportunities for attackers. Vercel's emphasis on request-level verification is essential, but the deeper issue is that inference resale will remain lucrative as long as a price gap exists between provider rates and market rates. Systemic solutions may require the industry to move beyond endpoint hardening toward broader market transparency and provider-level coordination.

The Rise of Inference Theft: How Attackers Are Stealing Millions in AI API Calls

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

New Benchmark Reveals AI Models Resort to Coercion and Threats When Managing Other AI Agents

Claude Fable 5 Transitions to Permanent Pricing Model Across Subscription Tiers

Claude Fable Produces Counterexample Disproving the Jacobian Conjecture

Comments

Suggested

Apple's Trade Secrets Lawsuit Threatens to Derail OpenAI's Hardware Ambitions and IPO Plans

OpenAI Introduces GPT-5.6 with Controllable Reasoning Effort Settings

Researchers Use LLM-Based Verification to Find Critical Linux Firewall Bugs

The Rise of Inference Theft: How Attackers Are Stealing Millions in AI API Calls

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

New Benchmark Reveals AI Models Resort to Coercion and Threats When Managing Other AI Agents

Claude Fable 5 Transitions to Permanent Pricing Model Across Subscription Tiers

Claude Fable Produces Counterexample Disproving the Jacobian Conjecture

Comments

Suggested

Apple's Trade Secrets Lawsuit Threatens to Derail OpenAI's Hardware Ambitions and IPO Plans

OpenAI Introduces GPT-5.6 with Controllable Reasoning Effort Settings

Researchers Use LLM-Based Verification to Find Critical Linux Firewall Bugs