Research: Routing Information in MoE Models Leaks Text with 91% Accuracy

Key Takeaways

▸Expert routing patterns in MoE models reconstruct 91.2% of tokens with a trained transformer decoder, compared to ~50% with prior logistic regression approaches
▸This vulnerability applies to any MoE deployment where routing decisions are observable, including distributed inference and side-channel scenarios
▸Noise injection reduces leakage but does not eliminate it, suggesting defenses require fundamental architectural changes beyond simple noise addition

Source:

Hacker Newshttps://arxiv.org/abs/2602.04105↗

Summary

A new arXiv paper reveals that expert routing decisions in Mixture-of-Experts (MoE) language models leak substantially more information than previously understood. Researchers demonstrate that routing patterns alone—without access to model weights or activations—can reconstruct tokens with 91.2% top-1 accuracy and 94.8% top-10 accuracy using a transformer-based decoder trained on just 100M tokens. This text-reconstruction attack shows that MoE routing patterns contain comparable information density to the text itself, representing a significant privacy vulnerability in distributed inference and side-channel scenarios.

The research escalates prior work using logistic regression (which achieved only limited reconstruction) by deploying deeper neural architectures. Notably, adding noise to routing decisions reduces but does not eliminate the leakage. The findings suggest that MoE deployments—increasingly common across the industry—should treat expert routing information as sensitive as the underlying text, with implications for federated learning, API architectures, and edge inference scenarios.

The finding connects MoE security to broader embedding inversion literature and suggests routing information should be treated as sensitive as model outputs

Editorial Opinion

This research exposes a fundamental privacy weakness in a core architecture that multiple AI labs are increasingly adopting. While the paper is technically sophisticated, the practical implications are stark: companies deploying distributed MoE inference cannot treat routing patterns as non-sensitive. The 91% reconstruction rate suggests that for real-world MoE deployments, inference-time privacy is worse than previously believed—a sobering finding that should trigger architectural rethinks beyond simple noise-addition defenses.

Anthropic

RESEARCH Anthropic2026-06-07

Research: Routing Information in MoE Models Leaks Text with 91% Accuracy

Key Takeaways

▸Expert routing patterns in MoE models reconstruct 91.2% of tokens with a trained transformer decoder, compared to ~50% with prior logistic regression approaches
▸This vulnerability applies to any MoE deployment where routing decisions are observable, including distributed inference and side-channel scenarios
▸Noise injection reduces leakage but does not eliminate it, suggesting defenses require fundamental architectural changes beyond simple noise addition

Source:

Hacker Newshttps://arxiv.org/abs/2602.04105↗

Summary

The finding connects MoE security to broader embedding inversion literature and suggests routing information should be treated as sensitive as model outputs

Editorial Opinion

This research exposes a fundamental privacy weakness in a core architecture that multiple AI labs are increasingly adopting. While the paper is technically sophisticated, the practical implications are stark: companies deploying distributed MoE inference cannot treat routing patterns as non-sensitive. The 91% reconstruction rate suggests that for real-world MoE deployments, inference-time privacy is worse than previously believed—a sobering finding that should trigger architectural rethinks beyond simple noise-addition defenses.

Research: Routing Information in MoE Models Leaks Text with 91% Accuracy

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Children Anthropomorphize LLM Chatbots: Systematic Review Identifies Benefits and Risks

Anthropic Launches Claude Connector for Economic Index, Democratizing AI Labor Impact Data

Anthropic Launches Claude Security Plugin for Claude Code in Public Beta

Comments

Suggested

Distributed LLM Inference Comes Home: Run 405B-Parameter Models on Consumer GPUs BitTorrent-Style

Google Racing to Fix Android Lock Screen Bug Allowing Unauthorized SMS via Gemini

Hazy Research Reveals Transformer MLPs Are Natural Hebbian Memories—Enabling Instant Fact Storage Without Training

Research: Routing Information in MoE Models Leaks Text with 91% Accuracy

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Children Anthropomorphize LLM Chatbots: Systematic Review Identifies Benefits and Risks

Anthropic Launches Claude Connector for Economic Index, Democratizing AI Labor Impact Data

Anthropic Launches Claude Security Plugin for Claude Code in Public Beta

Comments

Suggested

Distributed LLM Inference Comes Home: Run 405B-Parameter Models on Consumer GPUs BitTorrent-Style

Google Racing to Fix Android Lock Screen Bug Allowing Unauthorized SMS via Gemini

Hazy Research Reveals Transformer MLPs Are Natural Hebbian Memories—Enabling Instant Fact Storage Without Training