Claw Code Rewrite Achieves Up to 74% Token Savings Through Prompt Optimization
Key Takeaways
- ▸Claw Code rewrite achieves 24-79% token savings across different workloads, with individual cases reaching 74% reduction
- ▸Optimization strategy focuses on reducing repeated input costs through prompt summarization, context compaction, and tool surface minimization
- ▸Prompt caching via Anthropic-compatible requests is leveraged by default to enhance efficiency
Summary
A major rewrite of the Claw Code project has demonstrated significant token usage reductions, with benchmarks showing up to 74% savings on individual workloads and approximately 30% average savings across diverse tasks, all while maintaining quality. The optimization work focuses primarily on reducing repeated input costs rather than just one-shot prompt length, employing strategies such as summarizing system prompts and git context instead of replaying raw data, converting instruction files into compact digests, and aggressively shortening static prompt rules and tool schemas.
The project employs multiple token-saving techniques including workspace and configuration summarization, smaller tool surface areas with progressive unlocking of heavier tools, compacted replay inputs and results, and automatic compaction for long sessions. Notably, the rewrite leverages Anthropic-compatible requests to enable prompt caching by default, which contributes significantly to the efficiency gains. The optimization was achieved through iteration using code optimization tools, with benchmark results ranging from 24% to 79% token savings depending on workload type, demonstrating the variability of real-world applications.
The project remains in early stages and is under active iteration, with both Python and Rust implementations available for users to inspect and benchmark. The team provides comprehensive measurement tools including token-audit capabilities and example suite benchmarks to help developers understand and replicate the token-saving benefits in their own applications.
- Project includes comprehensive measurement tools (token-audit, benchmark suite) to help developers quantify savings in their own implementations
- Early-stage project under active iteration with both Python and Rust implementations available
Editorial Opinion
This token optimization work represents a practical approach to addressing real-world economics of LLM usage, particularly for applications with long context windows and repeated interactions. The emphasis on measuring and benchmarking savings across diverse workloads sets a good precedent, though the wide variance (24-79%) in results serves as a healthy reminder that token optimization gains are highly task-dependent. As AI systems scale and costs accumulate, such systematic work on prompt efficiency could become increasingly valuable for developers.


