llamafile 0.10.0 Released: Rebuilt Framework Now Supports Qwen3.5, Tool Calling, and Anthropic API

Key Takeaways

▸llamafile 0.10.0 fully rebuit to maintain portability while supporting latest llama.cpp models and features
▸Now supports Qwen3.5 multimodal, lfm2 tool calling, and Anthropic Messages API for local Claude-compatible inference
▸Maintains cross-platform APE executable format running on multiple OSes and CPU architectures with CUDA support

Source:

Hacker Newshttps://blog.mozilla.ai/llamafile-reloaded-whats-new-in-v0-10-0/↗

Summary

llamafile 0.10.0 has been released with a complete rebuild that maintains the project's core mission of portable, executable-bundled AI models while incorporating the latest features from llama.cpp. The new version supports advanced capabilities including Qwen3.5 vision models, lfm2 tool calling, and integration with Anthropic's Messages API, allowing users to run Claude-compatible models locally from a single executable file.

The rebuild architecture uses a polyglot approach combining llama.cpp's extensive model support with llamafile's signature portability across different operating systems and CPU architectures. Users can now run multimodal models in terminal chat interfaces, leverage CUDA GPU acceleration on Linux, and benefit from CPU optimizations for various architectures. The project maintains backward compatibility by preserving older releases and model weights on HuggingFace.

The team plans to continue improving feature parity with older versions, simplify the bundling process through a forthcoming llamafile-builder application, and add Vulkan support for broader GPU compatibility. Pre-built llamafiles covering a range of model sizes (0.6B to 27B parameters) and capabilities are available, with options to load custom GGUF model files directly.

Future roadmap includes easier bundling via llamafile-builder, Vulkan GPU support, and full feature parity

Editorial Opinion

llamafile 0.10.0 represents a pragmatic evolution of the project, successfully balancing portability with feature completeness by adopting llama.cpp as its foundation. The integration of Anthropic's Messages API and support for cutting-edge models like Qwen3.5 positions llamafile as a serious tool for local AI deployment, though the emphasis on community feedback and feature parity suggests the team is still iterating toward the ideal user experience. The planned llamafile-builder application could be transformative if it lowers barriers to creating custom bundled executables.

Anthropic

PRODUCT LAUNCH Anthropic2026-03-19

llamafile 0.10.0 Released: Rebuilt Framework Now Supports Qwen3.5, Tool Calling, and Anthropic API

Key Takeaways

▸llamafile 0.10.0 fully rebuit to maintain portability while supporting latest llama.cpp models and features
▸Now supports Qwen3.5 multimodal, lfm2 tool calling, and Anthropic Messages API for local Claude-compatible inference
▸Maintains cross-platform APE executable format running on multiple OSes and CPU architectures with CUDA support

Source:

Hacker Newshttps://blog.mozilla.ai/llamafile-reloaded-whats-new-in-v0-10-0/↗

Summary

Future roadmap includes easier bundling via llamafile-builder, Vulkan GPU support, and full feature parity

Editorial Opinion

llamafile 0.10.0 represents a pragmatic evolution of the project, successfully balancing portability with feature completeness by adopting llama.cpp as its foundation. The integration of Anthropic's Messages API and support for cutting-edge models like Qwen3.5 positions llamafile as a serious tool for local AI deployment, though the emphasis on community feedback and feature parity suggests the team is still iterating toward the ideal user experience. The planned llamafile-builder application could be transformative if it lowers barriers to creating custom bundled executables.

llamafile 0.10.0 Released: Rebuilt Framework Now Supports Qwen3.5, Tool Calling, and Anthropic API

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

llamafile 0.10.0 Released: Rebuilt Framework Now Supports Qwen3.5, Tool Calling, and Anthropic API

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains