The Rise of Inference Theft: How Attackers Are Stealing Millions in AI API Calls
Key Takeaways
- ▸Inference theft is a high-margin attack with powerful financial incentives: frontier model calls cost ~$2 each, and attackers can resell stolen inference at deep discounts with zero marginal cost
- ▸Traditional security defenses fail at this threat model; attackers use residential proxies and throwaway accounts deployed at scale to amortize per-session checks across thousands of stolen requests
- ▸Attackers wrap victim APIs with OpenAI/Anthropic-compatible adapters to enable seamless resale through standard clients, repositioning the session boundary post-authentication
Summary
Vercel security research reveals a sophisticated threat landscape where attackers are systematically stealing paid AI inference at scale to resell for profit. Inference theft—the unauthorized use of paid AI API calls either for free consumption or downstream resale—represents one of the highest-margin attacks available to threat actors today. A single prompt to a frontier model can cost $2, making AI inference thousands of times more expensive than traditional API calls. Attackers pay nothing for inference and can resell stolen tokens at even a 5-10% discount while maintaining healthy margins.
The attacks are far more sophisticated than simple rate-limit abuse. Attackers deploy residential proxy services by the thousands to bypass IP-based rate limiting, register throwaway accounts to evade authentication, and crucially, create OpenAI- or Anthropic-compatible adapters that allow stolen inference to be dropped directly into standard client tools. Real-world examples include Chipotlai Max, a forked coding agent that wraps Chipotle's customer-support chatbot as an OpenAI-compatible endpoint, with documented efforts underway to exploit similar endpoints at Home Depot, Lowe's, Target, and Starbucks.
On April 12, 2026, Vercel's own documentation AI chat endpoint became a target, with traffic spiking to 10 times normal volume and reaching 1,300 requests per minute at peak. Traditional web security measures—IP rate limiting and authentication walls—prove insufficient because the per-call economics of inference theft justify the operational costs of circumventing these defenses. Vercel proposes that robust protection requires per-request verification rather than session-level checks, using deep analysis techniques like BotID to validate every individual AI request.
- Protection requires request-level verification, not session-level verification, since amortized checks are defeated by high-volume attacks
- AI endpoints with maximum user control over prompts (playgrounds, general-purpose assistants) are highest-risk, though even constrained endpoints like support bots are vulnerable to prompt injection attacks
Editorial Opinion
This research exposes a troubling asymmetry in AI economics: traditional web security was built for threat models where defeating a defense became expensive relative to the value extracted. With AI inference, the $2-per-call price point reverses that calculus—the high cost creates irresistible arbitrage opportunities for attackers. Vercel's emphasis on request-level verification is essential, but the deeper issue is that inference resale will remain lucrative as long as a price gap exists between provider rates and market rates. Systemic solutions may require the industry to move beyond endpoint hardening toward broader market transparency and provider-level coordination.


