BotBeat
...
← Back

> ▌

MetaMeta
RESEARCHMeta2026-06-01

Déjà View: Looping Transformers Achieve 3D Reconstruction with 8–10× Fewer Parameters

Key Takeaways

  • ▸Déjà View uses a single looped transformer block instead of scaling model size, achieving 8–10× parameter reduction while matching larger baselines
  • ▸The inference-time compute knob exposes refinement steps (K) as a tunable parameter, enabling users to balance reconstruction quality and computational cost
  • ▸Explicit iteration proved to be a stronger inductive bias than raw model capacity for multi-view 3D reconstruction
Source:
Hacker Newshttps://research.nvidia.com/labs/dvl/projects/dvlt/↗

Summary

Researchers have introduced Déjà View, a novel 3D reconstruction architecture that challenges the industry's scaling paradigm by replacing increasingly large feed-forward transformers with a single transformer block applied recursively. With just 117M parameters, Déjà View matches or exceeds the performance of billion-parameter baselines while consuming 8–10× fewer parameters and 1.9–2.3× less compute across five diverse benchmarks spanning indoor scenes, outdoor environments, object-centric captures, and driving scenarios.

The key insight underpinning Déjà View is that transformer layers often behave as repeated applications of similar operations, and multi-view reconstruction networks refine their predictions progressively through depth. Rather than inefficiently capturing this through unique parameters at each layer, Déjà View makes iteration explicit in the architecture, exposing the number of refinement steps (K) as an inference-time compute knob. This allows users to dynamically trade computational resources against reconstruction quality from a single trained checkpoint.

The model initializes per-view features from a pretrained DINOv2 encoder and applies a transformer block with frame and global attention sub-blocks recurrently. Because step counts are sampled during training from a defined range, one checkpoint supports any inference step count. Testing revealed that explicit looped iteration outperforms an otherwise identical variant with independent per-step parameters, suggesting that architectural iteration provides a stronger inductive bias than raw capacity.

  • At 117M parameters, Déjà View achieves state-of-the-art inlier ratio and pose accuracy across all five benchmarks

Editorial Opinion

Déjà View is a valuable counterpoint to the scaling-centric narrative that has dominated AI progress. As vision transformers have swollen to billions of parameters, this work demonstrates that thoughtful architectural design—making computational patterns explicit—can outperform brute-force scale. The inference-time compute knob is particularly compelling: it suggests we're moving toward AI systems that adapt to hardware constraints rather than demanding exponentially more resources. If this efficiency trend holds across other domains, it could democratize access to state-of-the-art computer vision.

Computer VisionMachine LearningDeep LearningMLOps & Infrastructure

More from Meta

MetaMeta
RESEARCH

PassNet: First Large-Scale Ecosystem for LLM-Based Compiler Pass Generation

2026-06-01
MetaMeta
POLICY & REGULATION

Meta's AI Support Feature Exposes Instagram Accounts to Hijacking Vulnerability

2026-05-31
MetaMeta
INDUSTRY REPORT

Eight New State Data Privacy Laws in 2025 Force AI Companies to Overhaul Data Practices Amid Intensifying Enforcement

2026-05-30

Comments

Suggested

Google / AlphabetGoogle / Alphabet
FUNDING & BUSINESS

Alphabet to Raise $80B in Equity Capital for AI Spending

2026-06-01
NVIDIANVIDIA
OPEN SOURCE

NBD-VRAM Enables GPU VRAM as Linux Swap Space for NVIDIA GeForce RTX Laptops

2026-06-01
Renown ResearchRenown Research
INDUSTRY REPORT

Study: AI Models Show Varying Preferences for Coding Tools — Research Across 10 Models and 1,000 Responses

2026-06-01
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us