How One Company Cut AI Agent Costs 80% by Switching to Claude Opus with a Two-Tier Architecture
Key Takeaways
- ▸A two-tier agent architecture using Haiku as a triage layer filtered 80% of CI failures, reducing costs despite upgrading to the more capable Opus model
- ▸Semantic search and error message databases enable cheaper models to effectively detect duplicate issues without reading raw logs
- ▸Providing agents with SQL query access to structured data is more cost-effective and produces better results than embedding full log files in prompts
Summary
A software engineering team has demonstrated a cost-effective approach to AI agent deployment by combining Claude Opus with Claude Haiku in a two-tier triage architecture. The system uses a cheap Haiku agent to identify duplicate CI failures (filtering out 80% of cases), only escalating complex investigations to the more capable and expensive Opus model. Despite upgrading from Sonnet 4.0 to Opus 4.6, the company reports lower overall costs due to the efficiency gains from this architectural approach.
The key innovation is the "triager pattern," where Haiku handles duplicate detection using semantic search and exact matching against historical errors, while Opus focuses on novel failure analysis. By giving agents access to a SQL interface to ClickHouse logs rather than embedding massive amounts of raw data in prompts, the team avoids token bloat and allows agents to pull only the context they actually need. This pull-based approach prevents researchers from pre-biasing agent investigations with irrelevant information.
- Higher-capability models should be reserved for planning and hypothesis formation, while cheaper models handle execution and data gathering tasks
Editorial Opinion
This case study highlights an important emerging pattern in AI agent design: capability-aware cost optimization through architectural layering. Rather than treating model choice as binary (cheap vs. capable), sophisticated users are building workflows where different models handle tasks suited to their price-to-performance ratio. The insight that agents should pull context rather than receive it pre-curated is particularly valuable and challenges common prompt engineering practices.



