BotBeat
...
← Back

> ▌

MetaMeta
OPEN SOURCEMeta2026-05-17

Llama.cpp b9180 Adds MTP Support with Advanced Speculative Decoding Optimization

Key Takeaways

  • ▸MTP support enables efficient partial rollback for speculative decoding, eliminating wasteful full-context restarts
  • ▸Multi-backend optimization across Metal, Vulkan, CUDA, ROCm, OpenVINO, and SYCL ensures broad hardware support
  • ▸Gated Delta Networks (GDN) intermediate state storage enables selective token rollback up to configurable limits
Source:
Hacker Newshttps://github.com/ggml-org/llama.cpp/releases/tag/b9180↗

Summary

The open-source llama.cpp project has merged MTP support in commit b9180, introducing advanced speculative decoding capabilities for optimized Llama model inference. The update includes sophisticated partial rollback mechanisms for Gated Delta Networks (GDN), enabling more efficient draft token management without costly full-context restarts. The implementation spans multiple backend platforms including Metal (macOS Apple Silicon, with KleidiAI optimization), Vulkan, CUDA (versions 12 & 13), ROCm, OpenVINO, and SYCL. The feature allows intermediate state checkpointing and selective rollback up to draft_max tokens, reducing computational waste in speculative decoding workflows.

  • Compatibility verified with n-gram and other speculative decoding methods for flexible deployment
Large Language Models (LLMs)Generative AIMLOps & InfrastructureOpen Source

More from Meta

MetaMeta
FUNDING & BUSINESS

Meta Begins Laying Off Thousands of Employees as It Transforms Around AI

2026-05-20
MetaMeta
UPDATE

Meta Introduces MLX Delegate for GPU-Accelerated PyTorch Inference on Apple Silicon

2026-05-20
MetaMeta
RESEARCH

The Hidden Costs of Scale: Why Advanced LLM Training Remains Precarious

2026-05-19

Comments

Suggested

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Generative AIGenerative AI
INDUSTRY REPORT

Barnes & Noble CEO Backs Selling AI-Written Books, Sparking Industry Debate on Transparency Standards

2026-05-20
Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us