Research: Routing Information in MoE Models Leaks Text with 91% Accuracy
Key Takeaways
- ▸Expert routing patterns in MoE models reconstruct 91.2% of tokens with a trained transformer decoder, compared to ~50% with prior logistic regression approaches
- ▸This vulnerability applies to any MoE deployment where routing decisions are observable, including distributed inference and side-channel scenarios
- ▸Noise injection reduces leakage but does not eliminate it, suggesting defenses require fundamental architectural changes beyond simple noise addition
Summary
A new arXiv paper reveals that expert routing decisions in Mixture-of-Experts (MoE) language models leak substantially more information than previously understood. Researchers demonstrate that routing patterns alone—without access to model weights or activations—can reconstruct tokens with 91.2% top-1 accuracy and 94.8% top-10 accuracy using a transformer-based decoder trained on just 100M tokens. This text-reconstruction attack shows that MoE routing patterns contain comparable information density to the text itself, representing a significant privacy vulnerability in distributed inference and side-channel scenarios.
The research escalates prior work using logistic regression (which achieved only limited reconstruction) by deploying deeper neural architectures. Notably, adding noise to routing decisions reduces but does not eliminate the leakage. The findings suggest that MoE deployments—increasingly common across the industry—should treat expert routing information as sensitive as the underlying text, with implications for federated learning, API architectures, and edge inference scenarios.
- The finding connects MoE security to broader embedding inversion literature and suggests routing information should be treated as sensitive as model outputs
Editorial Opinion
This research exposes a fundamental privacy weakness in a core architecture that multiple AI labs are increasingly adopting. While the paper is technically sophisticated, the practical implications are stark: companies deploying distributed MoE inference cannot treat routing patterns as non-sensitive. The 91% reconstruction rate suggests that for real-world MoE deployments, inference-time privacy is worse than previously believed—a sobering finding that should trigger architectural rethinks beyond simple noise-addition defenses.



