MIT Researchers Develop CompreSSM: A Technique to Compress AI Models During Training Rather Than After
Key Takeaways
- ▸CompreSSM compresses models during training rather than post-hoc, eliminating the traditional trade-off between model size and performance
- ▸The technique uses control theory principles (Hankel singular values) to identify and rank component importance, with rankings stabilizing after just 10% of training
- ▸Compressed models achieved up to 1.5x faster training on image classification and 4x speedups on Mamba architecture while maintaining competitive accuracy
Summary
Researchers at MIT's Computer Science and Artificial Intelligence Laboratory, in collaboration with Max Planck Institute for Intelligent Systems, European Laboratory for Learning and Intelligent Systems, ETH, and Liquid AI, have developed CompreSSM, a novel technique that compresses AI models during the training process rather than after. The method targets state-space models used in language processing, audio generation, and robotics by employing mathematical tools from control theory to identify and remove unnecessary components early in training. Using Hankel singular values to measure the importance of internal states, the team demonstrated that component rankings stabilize after just 10 percent of training, allowing the remaining 90 percent to proceed at the speed of a much smaller model. The technique achieved striking results: compressed models maintained nearly identical accuracy to full-sized counterparts while training up to 1.5 times faster on image classification tasks, and achieved approximately 4x training speedups on Mamba architecture while reducing dimensionality from 128 to 12 dimensions.
- This approach reduces computational resources, energy costs, and training time without requiring initial training of oversized models
Editorial Opinion
CompreSSM represents a paradigm shift in model optimization by integrating compression into the learning process itself rather than treating it as a post-hoc engineering problem. This work has significant implications for democratizing AI development, as it reduces the computational barriers to training performant models. The theoretical grounding using control theory provides a principled foundation that could inspire similar innovations across other model architectures beyond state-space models.



