Sessa: Open-Source Decoder Architecture Offers Alternative to Transformers and Mamba for Long-Context LLMs
Key Takeaways
- ▸Sessa introduces a novel decoder architecture combining self-attention with recurrent feedback pathways, positioning itself as a middle ground between Transformers and pure SSM models
- ▸The open-source implementation includes optional FlashAttention support for performance optimization while maintaining flexibility with a fallback reference implementation
- ▸The architecture includes configurable parameters for fine-tuning behavior, including gamma_max feedback gain bounds set at 0.999, and supports both RoPE positional encoding options
Summary
Libratio AI has released Sessa, an open-source decoder architecture designed as an alternative to standard Transformers and state-space models like Mamba for building long-context large language models. Sessa integrates self-attention into a recurrent feedback pathway, combining input-dependent attention routing with feedback-based recurrent aggregation to improve long-context information preservation and integration. The architecture is now available on GitHub with full implementation code, supporting optional FlashAttention optimization for improved performance on CUDA-enabled hardware. The release includes comprehensive documentation and configurable parameters such as batch size, sequence length, number of attention heads, and feedback gain bounds, allowing researchers and developers to experiment with different configurations.
- Released under Apache License 2.0, Sessa is available for immediate research and production use with clear installation and usage examples provided
Editorial Opinion
Sessa represents an interesting development in the ongoing architectural evolution of large language models, offering a hybrid approach that may address specific limitations both Transformers and Mamba face with long-context modeling. The open-source release accelerates community validation of the approach and enables rapid experimentation. However, the true value of Sessa will depend on empirical benchmarks showing concrete improvements over existing approaches—the paper should include comprehensive comparisons on standard long-context tasks.



