Zeta2.1: 3x Fewer Tokens, 50ms Faster
Key Takeaways
- ▸Zeta2.1 achieves 3x token reduction and 50ms faster predictions compared to Zeta2
- ▸New 'Multi-Region' prompt format improves efficiency by only outputting code regions being modified
- ▸Open-weight model available on Hugging Face with published Rust bindings for self-hosting
Summary
Zed has released Zeta2.1, an improved version of its code edit prediction model that delivers substantial performance and efficiency gains over Zeta2. The update reduces output tokens by 3x and speeds up predictions by 50ms while cutting server requirements by 30%. The efficiency improvements stem from a new 'Multi-Region' prompt format that only outputs the regions around code the model wants to change, rather than outputting a large region like its predecessor.
Zeta2.1 maintains Zed's open-weight approach, available for download on Hugging Face. The model was trained entirely on opt-in data from open-source repositories. To support self-hosting, Zed has also published Rust bindings to PyPI for easier production deployment. Zeta2.1 is now the default edit prediction model in Zed and is available free to all users, with unlimited predictions available through Zed Pro and Zed Business plans.
- 30% server cost reduction enables more efficient scaling for production deployments


