Llamafile v0.10.0 Released: Ground-Up Rebuild Brings Full llama.cpp Feature Parity and Multimodal Support
Key Takeaways
- ▸Llamafile v0.10.0 completely rebuilds the project to maintain closer synchronization with llama.cpp updates while preserving core portability and bundling features
- ▸The new version supports cutting-edge models and capabilities including multimodal vision (Qwen3.5), tool calling, Claude API integration, and CUDA GPU acceleration
- ▸Mozilla is releasing llamafile-builder to democratize bundling, enabling users to easily create custom llamafiles with their own model combinations
Summary
Mozilla has announced the release of llamafile v0.10.0, a significant update that rebuilds the project from the ground up to more easily maintain pace with upstream llama.cpp dependencies. The new version combines llamafile's signature features—portability across systems and CPU architectures, plus the ability to bundle model weights directly into executables—with the full feature set of the latest llama.cpp versions.
The v0.10.0 release introduces substantial capabilities including multimodal model support (including Qwen3.5 for vision), tool calling functionality, the Anthropic Messages API for running Claude code with local models, and multiple interfaces (CLI, HTTP server, and terminal chat). The update also brings CUDA GPU support on Linux, CPU optimizations for different architectures, and maintains APE (Actually Portable Executable) functionality for cross-platform compatibility out-of-the-box.
Mozilla is providing pre-built llamafiles covering various model sizes (0.6B to 27B parameters) and capabilities, while also introducing llamafile-builder, a forthcoming application designed to simplify the process of bundling custom llamafiles. The roadmap includes achieving full feature parity with the older version, easier bundling mechanisms, Vulkan support, and ongoing bug fixes.
- The APE executable format enables true cross-platform compatibility, allowing single binaries to run across multiple operating systems and CPU architectures without modification
Editorial Opinion
Llamafile v0.10.0 represents a thoughtful architectural decision to embrace llama.cpp's momentum while preserving the distinctive value proposition of portable, self-contained AI inference. By rebuilding from the ground up rather than fighting upstream changes, Mozilla has created a more sustainable path forward. The emphasis on tooling (llamafile-builder) to lower the friction of creating custom executables could meaningfully expand adoption beyond developers, potentially democratizing local AI deployment.



