BotBeat
...
← Back

> ▌

Hugging FaceHugging Face
RESEARCHHugging Face2026-05-13

Researchers Achieve Stable Training of 1000-Layer Diffusion Transformers Using Mean-Variance Split Innovation

Key Takeaways

  • ▸Identified 'Mean Mode Screaming' (MMS) as a geometric instability triggered by mean-coherent gradients that causes ultra-deep diffusion models to collapse into mean-dominated states
  • ▸Proposed Mean-Variance Split (MV-Split) Residuals that decouple mean and centered gradient updates, enabling stable training while preserving convergence speed
  • ▸Successfully trained a 1000-layer Diffusion Transformer, pushing the practical limits of diffusion transformer scaling
Source:
Hacker Newshttps://huggingface.co/papers/2605.06169↗

Summary

A breakthrough research paper selected as HuggingFace's #1 Paper of the Day identifies and solves a critical stability problem that emerges when scaling Diffusion Transformers to extreme depths. The research reveals that ultra-deep diffusion models suffer from "Mean Mode Screaming" (MMS), a phenomenon where token representations collapse into a mean-dominated state after thousands of apparently stable training steps, causing sudden divergence and loss of learned features.

To address this structural vulnerability, researcher Pengqi Lu proposes Mean-Variance Split (MV-Split) Residuals, a technique that decouples the mean and centered components of residual updates. Unlike existing depth stabilizers that uniformly dampen both components, MV-Split allows the signal-bearing centered mode to train at full strength while regulating the mean path, preventing collapse while maintaining convergence speed.

The paper demonstrates the solution's effectiveness by successfully training a 1000-layer Diffusion Transformer—a scale at which standard approaches fail catastrophically. Model weights are now publicly available on HuggingFace, along with an interactive gradient-diagnosis viewer that visualizes the actual training dynamics that previously caused divergence, making this a significant contribution to scaling deep generative models.

  • Released model weights and interactive visualization tools on HuggingFace for reproducibility and further research

Editorial Opinion

This mechanistic research paper represents exactly the kind of deep architectural analysis the field needs as we push generative models to extreme scales. By precisely identifying the root cause of failure and proposing a targeted solution rather than generic regularization, the authors advance our understanding of why deep networks behave as they do. The public release of 1000-layer weights and visualization tools will likely spawn follow-up work on scaling and stability.

Generative AIMachine LearningDeep LearningOpen Source

More from Hugging Face

Hugging FaceHugging Face
RESEARCH

Security Researchers Discover Credential-Stealing Malware in Typosquatted Hugging Face Repository

2026-05-10
Hugging FaceHugging Face
OPEN SOURCE

ML-Intern: Open-Source AI Agent for Autonomous Machine Learning Development

2026-04-23
Hugging FaceHugging Face
OPEN SOURCE

Hugging Face Releases ML-Intern: Open-Source AI Agent for Autonomous ML Research and Development

2026-04-21

Comments

Suggested

AnthropicAnthropic
UPDATE

Anthropic Introduces Dedicated Credits for Claude's Programmatic Tools

2026-05-13
Research CommunityResearch Community
RESEARCH

EditLens: New Research Reveals How AI-Edited Text Can Be Detected and Quantified

2026-05-13
AppleApple
INDUSTRY REPORT

Apple Opens Door to AI Agents: App Store Policy Shift and Siri Makeover Planned for iOS 27

2026-05-13
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us