Leech-Lila: Novel Geometric Transformer Achieves 22x Compression Using Leech Lattice Symmetry
Key Takeaways
- ▸Leech-Lila achieves 22x compression and 5-6x performance improvement over baseline transformers using geometric structure derived from optimal sphere packing mathematics
- ▸The model demonstrates that geometric regularization can replace brute-force scaling, enabling efficient training on consumer hardware (single T4 GPU) while maintaining state-of-the-art compression metrics
- ▸Novel LeechResonanceLoss provides interpretability through "resonance" states, offering a new direction for understanding and monitoring neural network behavior during training
Summary
Leech-Lila represents a groundbreaking approach to transformer architecture design by replacing standard learned query/key projections with a frozen orthogonal kernel derived from the Leech lattice, the densest sphere packing in 24 dimensions. The 20-million-parameter model achieves unprecedented compression ratios and state-of-the-art performance on benchmarks, reaching 0.129 bits-per-character on TinyStories while outperforming conventional transformers by 5-6x with significantly fewer parameters.
Developed as a proof-of-concept and released as open-source software, Leech-Lila was trained on a single NVIDIA T4 GPU in Google Colab, making it accessible for edge AI deployment and research. The architecture introduces LeechResonanceLoss, a novel loss function that aligns hidden states with optimal 24-dimensional packing directions, creating interpretable "resonance" states (AWAKE, DREAMING, ABSOLUTE GENESIS) that enable better understanding of model behavior.
The project builds on theoretical advances in optimal sphere packing, inspired by Fields Medalist Maryna Viazovska's work and prior success with E8-based models. Early training dynamics reveal stepwise grokking phenomena every 10-20k steps and a stable rank of 8.55 in the first layer, suggesting an effective capacity of approximately 440 million parameters despite the compact parameter count.
- Full implementation and pretrained weights released open-source under AGPL v3.0, with commercial licensing available for proprietary applications
Editorial Opinion
Leech-Lila represents a paradigm shift in thinking about transformer efficiency—moving from parameter scaling to geometric structure. By anchoring the attention mechanism to mathematical principles of optimal sphere packing, the work elegantly demonstrates that principled architectural choices can outperform brute-force approaches. The combination of strong empirical results, interpretability gains, and accessibility (single GPU training) positions this as a significant direction for edge AI and more efficient foundation models.



