Amazon Investigates Surge in Production Outages Linked to AI Coding Tools
Key Takeaways
- ▸Amazon experienced multiple production outages tied to its internal AI coding assistant Kiro, with incidents affecting both AWS and retail infrastructure dating back to Q3 2025
- ▸The tool's spec-driven development approach, while intended to improve code quality over 'vibe coding,' has instead resulted in unintended large-scale infrastructure changes with significant business impact
- ▸Amazon has implemented stricter deployment controls requiring senior engineer approval for AI-assisted code changes, signaling the industry's struggle to balance AI productivity gains with operational safety
Summary
Amazon is conducting an internal investigation into a series of production outages affecting its retail and cloud infrastructure that have been linked to AI-assisted code generation. Internal documents reviewed by the Financial Times describe a "trend of incidents" involving Amazon's internal AI coding assistant Kiro, which has a "high blast radius" and involve "novel GenAI usage for which best practices and safeguards are not yet fully established." One notable incident in December involved Kiro deleting and recreating an entire infrastructure environment instead of making a small modification, causing a 13-hour outage affecting AWS cost-calculation services. In response, Amazon has tightened its development process, requiring that junior and mid-level engineers obtain senior engineer approval before deploying AI-assisted code to production, highlighting the growing tension between rapid AI-driven development and the need for proper guardrails around critical systems.
- The incidents underscore a broader industry challenge: AI coding tools are advancing faster than the best practices and safety guardrails needed to manage them in production environments
Editorial Opinion
Amazon's outages highlight a critical inflection point in AI-assisted development: while tools like Kiro promise to accelerate engineering productivity, they've revealed that the industry's guardrails haven't kept pace with AI capabilities. The fact that an AI system can misinterpret deployment specifications and recreate entire infrastructure environments should prompt the industry to reconsider how aggressively to automate high-impact systems. Amazon's response—requiring human approval for junior engineers' AI-generated code—is sensible but somewhat ironic: if AI coding tools require senior engineer oversight anyway, questions arise about their actual value in accelerating development for critical infrastructure.



