JetBrains Research Tackles LLM Context Bloat with Hybrid Management Strategy
Key Takeaways
- ▸Agent-generated context grows rapidly and becomes noise rather than useful information, leading to expensive token usage without proportional performance gains
- ▸Two main context management approaches exist: observation masking (simpler, older) and LLM summarization (more sophisticated, used in OpenHands, Cursor, and Warp)
- ▸JetBrains' hybrid context management approach demonstrates significant cost reduction compared to baseline methods
Summary
JetBrains researchers have published a study addressing a critical inefficiency in software engineering (SE) agents: uncontrolled context growth that increases costs without improving performance. As AI agents iteratively add generated outputs to their context, token costs skyrocket while effective performance plateaus, creating a wasteful resource drain. The research, part of Tobias Lindenbauer's Master's thesis at TUM's Software Engineering and AI Lab, empirically evaluates two major context management approaches—observation masking and LLM summarization—and proposes a novel hybrid solution that achieves significant cost reduction. JetBrains will present these findings at the Deep Learning 4 Code workshop at NeurIPS 2025 in San Diego on December 6th, 2025.
- Context management has been largely overlooked as a research problem despite its major impact on both agent performance and operational costs
Editorial Opinion
JetBrains' research addresses a genuinely overlooked pain point in the AI agent ecosystem. While the field has focused heavily on scaling training data and improving planning strategies, the practical reality of runaway context costs has been treated as an engineering afterthought rather than a fundamental research challenge. This work is timely and relevant, as the economics of LLM-powered agents become increasingly critical for production deployment.



