Cursor Trains Composer to Self-Summarize Through Reinforcement Learning, Enabling Long-Horizon Coding Tasks
Key Takeaways
- ▸Cursor trained Composer to perform self-summarization as a learned behavior rather than a prompted step, enabling the model to handle coding tasks requiring hundreds of actions that exceed its context window
- ▸Self-summarization is integrated into the training loop via reinforcement learning, where the quality of summaries directly impacts the reward signal, allowing the model to optimize what information to preserve
- ▸This approach is more token-efficient than traditional prompted summarization baselines and avoids information loss associated with sliding context windows or latent space compaction methods
Summary
Cursor has developed a novel training approach for its Composer coding agent that enables it to handle complex tasks requiring hundreds of actions by training the model to self-summarize through reinforcement learning rather than relying on prompted summarization. The breakthrough addresses a fundamental challenge in agent development: as task trajectories grow longer, they quickly exceed the model's context window, forcing systems to compact information in ways that often lose critical details. Rather than using external summarization prompts or sliding context windows—approaches that typically result in information loss—Cursor integrated the self-summarization process directly into Composer's training loop. This means the model learns to autonomously determine what information is most critical to preserve as it works through tasks.
Composer's self-summarization works by pausing at fixed context-length triggers, generating condensed summaries of its current state before continuing with the task. Crucially, the self-summaries themselves are incorporated into the reinforcement learning training process, where good summaries that preserve task-critical information are reinforced while poor summaries that lose important details are downweighted. This approach proves more token-efficient than highly tuned prompt-based baselines while enabling Composer to learn to handle increasingly complex, long-horizon coding tasks that require multiple rounds of self-summarization.
- The technique allows Composer to autonomously manage context limitations by learning to summarize multiple times when necessary for difficult tasks
Editorial Opinion
Cursor's self-summarization approach represents a meaningful advancement in agent design that addresses a real pain point in long-horizon task execution. By making summarization a trained behavior rather than a heuristic post-processing step, the company has found a way to let models learn what truly matters to remember—a fundamentally more intelligent approach than fixed prompts or mechanical context windowing. As agent systems tackle increasingly ambitious tasks, this kind of learned context management may become essential infrastructure for reasoning over extended interactions.


