LegendreGPT: Researchers Compress Transformer Model to 15.7 MB Using Orthogonal Polynomials

Key Takeaways

▸Legendre polynomial parameterization can compress transformer weights by expressing 22 layers through 6 coefficient matrices, reducing model size to 15.7 MB while maintaining reasonable performance metrics
▸Orthogonal polynomials (Legendre) remain numerically stable at higher degrees unlike monomials, making them suitable for deep transformer architectures
▸A two-group architecture prevents gradient conflicts when coefficients serve multiple layers, allowing the approach to scale effectively

Source:

Hacker Newshttps://github.com/sergimichi/parameter-golf/tree/legendregpt-submission/records/track_non_record_16mb/2026-03-31_LegendreGPT↗

Summary

A new research project called LegendreGPT demonstrates a novel approach to transformer compression by parameterizing layer weights using Legendre polynomial coefficients rather than storing weights directly. The technique compresses 22 transformer layers into just 6 coefficient matrices per weight type, achieving a total model size of 15.7 MB while maintaining reasonable performance with a validation bits-per-byte (bpb) of 1.2054. This represents the first application of orthogonal polynomial weight parameterization to transformer language models.

The core innovation involves expressing each weight matrix as a function of layer depth using Legendre polynomials—fixed mathematical functions whose coefficients are learned during training. This approach leverages the mathematical property of orthogonality to avoid the numerical instability that would occur with simpler polynomial parameterizations. The model uses two independent groups of 11 layers each to prevent gradient conflicts, supplemented by lightweight per-layer parameters for fine-tuning. Training required only 60,000 steps on a single RTX 5090 GPU (~27 hours) using FineWeb data.

The compressed weights employ mixed-precision quantization (INT8 for lower-order polynomials, INT7 for higher orders) combined with LZMA compression. Additional architectural choices include grouped query attention, factorized embeddings, and RoPE positional encoding. The research demonstrates that aggressive compression through mathematical parameterization can be applied to transformers without catastrophic performance loss, opening new directions for model efficiency.

Mixed-precision quantization (INT8/INT7) combined with LZMA compression further reduces file size with minimal performance degradation

Editorial Opinion

This work represents an elegant application of classical mathematical techniques to modern deep learning, demonstrating that transformer compression need not rely solely on pruning or knowledge distillation. The orthogonal polynomial parameterization is particularly clever—it trades explicit weight storage for learned coefficients of fixed basis functions, analogous to an equalizer interface. While the 15.7 MB model on FineWeb is modest in scale, the approach is architecturally general and could potentially be applied to larger models, making it a valuable contribution to the growing field of efficient AI.

LegendreGPT: Researchers Compress Transformer Model to 15.7 MB Using Orthogonal Polynomials

Key Takeaways

▸Legendre polynomial parameterization can compress transformer weights by expressing 22 layers through 6 coefficient matrices, reducing model size to 15.7 MB while maintaining reasonable performance metrics
▸Orthogonal polynomials (Legendre) remain numerically stable at higher degrees unlike monomials, making them suitable for deep transformer architectures
▸A two-group architecture prevents gradient conflicts when coefficients serve multiple layers, allowing the approach to scale effectively

Summary

Mixed-precision quantization (INT8/INT7) combined with LZMA compression further reduces file size with minimal performance degradation

Editorial Opinion

This work represents an elegant application of classical mathematical techniques to modern deep learning, demonstrating that transformer compression need not rely solely on pruning or knowledge distillation. The orthogonal polynomial parameterization is particularly clever—it trades explicit weight storage for learned coefficients of fixed basis functions, analogous to an equalizer interface. While the 15.7 MB model on FineWeb is modest in scale, the approach is architecturally general and could potentially be applied to larger models, making it a valuable contribution to the growing field of efficient AI.

LegendreGPT: Researchers Compress Transformer Model to 15.7 MB Using Orthogonal Polynomials

Key Takeaways

Summary

Editorial Opinion

More from Independent Research

Audit Reveals Distributional Reinforcement Learning Agents' Risk Claims Are Largely False

ModelDNA: New Tool Verifies LLM Lineage Without Full Model Downloads

AgentMint Launches Research Platform on How AI Shopping Agents Choose Products

Comments

Suggested

Security Research Reveals How AI Code Reviewers Can Be Tricked Into Deploying Secret-Stealing Code

Thinking Machines Lab Releases Inkling, a 975B Open-Weight MoE with Architectural Innovations

TSMC Commits Additional $100B to US Operations as AI Chip Demand Surges

LegendreGPT: Researchers Compress Transformer Model to 15.7 MB Using Orthogonal Polynomials

Key Takeaways

Summary

Editorial Opinion

More from Independent Research

Audit Reveals Distributional Reinforcement Learning Agents' Risk Claims Are Largely False

ModelDNA: New Tool Verifies LLM Lineage Without Full Model Downloads

AgentMint Launches Research Platform on How AI Shopping Agents Choose Products

Comments

Suggested

Security Research Reveals How AI Code Reviewers Can Be Tricked Into Deploying Secret-Stealing Code

Thinking Machines Lab Releases Inkling, a 975B Open-Weight MoE with Architectural Innovations

TSMC Commits Additional $100B to US Operations as AI Chip Demand Surges