What 1,281 Agent Runs Reveal About Coding Agent Failure in Large Codebases
Key Takeaways
- ▸Infrastructure and context engineering are the primary bottleneck for coding agents at scale, not model intelligence
- ▸Above 400,000 lines of code, traditional search tools (grep, file read, glob) fail systematically; agents cannot effectively navigate through 22,000+ files
- ▸Keyword search is insufficient—agents need structural navigation to distinguish the right code from test files, legacy code, and documentation
Summary
Sourcegraph's analysis of 1,281 agent runs across 40+ enterprise-scale open source repositories reveals a critical insight: the bottleneck for coding agents in large codebases isn't raw model capability—it's infrastructure and access to context. The research, drawn from Sourcegraph's CodeScaleBench benchmark and internal studies on context retrieval and code navigation, identifies five recurring failure patterns that systematically undermine agent performance in large software environments.
Around 400,000 lines of code represents a critical threshold in the data: below it, standard tools like grep work adequately; above it, agents relying on traditional search tools fail consistently. The company proposes that context engineering—encoding architectural knowledge, internal APIs, and conventions before agents begin work—is key to solving these challenges. Sourcegraph's agent advocate Stephanie Jarmak summarized the finding: 'The difference between complete failure and near-perfect completion wasn't intelligence — it was efficient access to context.'
- Partial refactorings across interdependent files introduce hidden bugs that may pass surface review but fail downstream
- Pre-encoding architectural knowledge (via tools like Tessl) allows agents to operate with a pre-built understanding of APIs, conventions, and dependencies
Editorial Opinion
This research fundamentally reframes the agent scalability problem: it's not about building smarter models, but smarter infrastructure around them. By identifying that context engineering matters as much as model intelligence, Sourcegraph provides a pragmatic roadmap for enterprises deploying agents in complex codebases—one that shifts focus from waiting for larger models to building better tooling today. The distinction between 'finding code' and 'finding the right code' is particularly insightful, suggesting that future agent breakthroughs may come from structural code navigation and knowledge encoding rather than pure model scaling.



