llamafile 0.10.0 Released: Rebuilt Framework Now Supports Qwen3.5, Tool Calling, and Anthropic API
Key Takeaways
- ▸llamafile 0.10.0 fully rebuit to maintain portability while supporting latest llama.cpp models and features
- ▸Now supports Qwen3.5 multimodal, lfm2 tool calling, and Anthropic Messages API for local Claude-compatible inference
- ▸Maintains cross-platform APE executable format running on multiple OSes and CPU architectures with CUDA support
Summary
llamafile 0.10.0 has been released with a complete rebuild that maintains the project's core mission of portable, executable-bundled AI models while incorporating the latest features from llama.cpp. The new version supports advanced capabilities including Qwen3.5 vision models, lfm2 tool calling, and integration with Anthropic's Messages API, allowing users to run Claude-compatible models locally from a single executable file.
The rebuild architecture uses a polyglot approach combining llama.cpp's extensive model support with llamafile's signature portability across different operating systems and CPU architectures. Users can now run multimodal models in terminal chat interfaces, leverage CUDA GPU acceleration on Linux, and benefit from CPU optimizations for various architectures. The project maintains backward compatibility by preserving older releases and model weights on HuggingFace.
The team plans to continue improving feature parity with older versions, simplify the bundling process through a forthcoming llamafile-builder application, and add Vulkan support for broader GPU compatibility. Pre-built llamafiles covering a range of model sizes (0.6B to 27B parameters) and capabilities are available, with options to load custom GGUF model files directly.
- Future roadmap includes easier bundling via llamafile-builder, Vulkan GPU support, and full feature parity
Editorial Opinion
llamafile 0.10.0 represents a pragmatic evolution of the project, successfully balancing portability with feature completeness by adopting llama.cpp as its foundation. The integration of Anthropic's Messages API and support for cutting-edge models like Qwen3.5 positions llamafile as a serious tool for local AI deployment, though the emphasis on community feedback and feature parity suggests the team is still iterating toward the ideal user experience. The planned llamafile-builder application could be transformative if it lowers barriers to creating custom bundled executables.

