Claude Code Performance Degraded Before Opus 4.8 Release; Root Cause Traced to CLI Harness

Key Takeaways

▸Daily production monitoring of Claude Code caught a statistically significant 5-day performance drop that release benchmarks would have missed
▸Root cause analysis traced the issue to Claude Code CLI versions 2.1.150/2.1.151, indicating a harness bug rather than a model regression
▸The degradation manifested as a 60% spike in tool calls per task combined with reduced input token usage, suggesting the agent was compensating for a change in the harness behavior

Source:

Hacker Newshttps://marginlab.ai/blog/claude-code-degraded-before-opus-4-8/↗

Summary

Anthropic's continuous monitoring of Claude Code against SWE-Bench-Pro detected a significant performance degradation in the week leading up to the Opus 4.8 release. The model's pass rate on the benchmark dropped well below its 65% baseline for five consecutive days in late May before recovering immediately after Opus 4.8 shipped on May 28. Investigation revealed the issue was likely caused by changes introduced in Claude Code CLI versions 2.1.150 and 2.1.151, not a regression in the underlying model itself. During the degradation window, tool invocations spiked by roughly 60% per task while input token usage dropped, patterns that reversed when Claude Code 2.1.153 was deployed alongside Opus 4.8. The discovery highlights the importance of continuous production monitoring to catch performance issues that standard release-time benchmarks cannot detect.

Performance metrics returned to baseline immediately with Claude Code 2.1.153 and Opus 4.8 release on May 28, confirming the fix
The incident underscores the value of continuous, production-grade monitoring for AI agents to detect silent performance regressions that published benchmarks at launch cannot capture

Anthropic

UPDATE Anthropic2026-05-29

Claude Code Performance Degraded Before Opus 4.8 Release; Root Cause Traced to CLI Harness

Key Takeaways

▸Daily production monitoring of Claude Code caught a statistically significant 5-day performance drop that release benchmarks would have missed
▸Root cause analysis traced the issue to Claude Code CLI versions 2.1.150/2.1.151, indicating a harness bug rather than a model regression
▸The degradation manifested as a 60% spike in tool calls per task combined with reduced input token usage, suggesting the agent was compensating for a change in the harness behavior

Source:

Hacker Newshttps://marginlab.ai/blog/claude-code-degraded-before-opus-4-8/↗

Summary

Performance metrics returned to baseline immediately with Claude Code 2.1.153 and Opus 4.8 release on May 28, confirming the fix
The incident underscores the value of continuous, production-grade monitoring for AI agents to detect silent performance regressions that published benchmarks at launch cannot capture

Claude Code Performance Degraded Before Opus 4.8 Release; Root Cause Traced to CLI Harness

Key Takeaways

Summary

More from Anthropic

Anthropic Releases Turnstile, Open-Source Proxy for Precise Token Capture in Agent Reinforcement Learning

Anthropic Introduces J-Lens: New Technique Reveals Dual Representational Routes in Claude

Agentgateway Opensources Token Exchange, JWT Assertion, and Microsoft Entra OBO Authentication

Comments

Suggested

Anthropic Releases Turnstile, Open-Source Proxy for Precise Token Capture in Agent Reinforcement Learning

state-harness: Framework for Predicting Multi-Agent AI Failures Gains Empirical Validation

Anthropic Introduces J-Lens: New Technique Reveals Dual Representational Routes in Claude

Claude Code Performance Degraded Before Opus 4.8 Release; Root Cause Traced to CLI Harness

Key Takeaways

Summary

More from Anthropic

Anthropic Releases Turnstile, Open-Source Proxy for Precise Token Capture in Agent Reinforcement Learning

Anthropic Introduces J-Lens: New Technique Reveals Dual Representational Routes in Claude

Agentgateway Opensources Token Exchange, JWT Assertion, and Microsoft Entra OBO Authentication

Comments

Suggested

Anthropic Releases Turnstile, Open-Source Proxy for Precise Token Capture in Agent Reinforcement Learning

state-harness: Framework for Predicting Multi-Agent AI Failures Gains Empirical Validation

Anthropic Introduces J-Lens: New Technique Reveals Dual Representational Routes in Claude