Anubis OSS: Open-Source Benchmark Dataset Reveals Real-World LLM Performance on Apple Silicon

Key Takeaways

▸Anubis OSS is a native macOS tool that correlates LLM inference performance with real-time hardware telemetry (power, GPU/CPU/ANE utilization, memory) across Apple Silicon chips from M1 to M5
▸The project aims to build an open, community-sourced dataset covering the massive matrix of chip × memory × backend × quantization configurations that formal benchmarks don't address
▸Each benchmark run takes two minutes and captures detailed metrics including watts-per-token efficiency, Metal allocations, and thermal state—data useful for backend optimization and quantization research

Source:

Hacker Newshttps://devpadapp.com/anubis-oss.html↗

Summary

Developer uncSoft has launched Anubis OSS, a native macOS benchmarking tool and accompanying open dataset designed to measure real-world LLM performance on Apple Silicon. The SwiftUI application correlates inference metrics with hardware telemetry—including GPU/CPU utilization, power consumption, and memory pressure—across any OpenAI-compatible backend like Ollama, LM Studio, and mlx-lm. Unlike synthetic benchmarks or limited reviewer tests, Anubis aims to build a community-sourced dataset covering the full matrix of chip configurations (M1 through M5), memory sizes, quantization schemes, and thermal conditions.

The tool addresses a critical gap in the local AI ecosystem: fragmented tooling that either focuses on conversation (chat wrappers) or hardware monitoring (CLI utilities) without connecting the two. Anubis provides real-time dashboards, side-by-side model comparisons, unified model management, and one-click benchmark submissions to a public leaderboard. Each benchmark run captures power draw in watts via IOReport, Metal memory allocations, and Apple Neural Engine activity, enabling developers to answer practical questions like whether a specific quantization actually reduces memory pressure or just parameter count.

The project is GPL-3.0 licensed and working toward 75 GitHub stars to qualify for Homebrew Cask distribution. The developer emphasizes that every benchmark submission fills underrepresented cells in the hardware-model matrix, making individual contributions valuable to backend developers, quantization researchers, and the broader community building on Apple's unified memory architecture. The dataset is intended as a public resource rather than proprietary analytics, addressing the reality that no single entity has the hardware budget to benchmark the full cross-product of configurations now possible with Apple Silicon's 128GB+ unified memory systems.

The tool supports any OpenAI-compatible backend (Ollama, LM Studio, mlx-lm, vLLM) and requires macOS 15+ on Apple Silicon with no Python runtime or external dependencies

Editorial Opinion

Anubis OSS tackles a genuinely underserved need: the Apple Silicon LLM ecosystem has matured faster than the tooling to measure it. With M4 Max systems now shipping with 128GB of unified memory—enough to run 70B parameter models locally—practitioners need more than anecdotal performance reports. The community-dataset approach is smart: no single lab can benchmark every quantization on every chip under every thermal condition, but crowdsourcing fills that matrix organically. The focus on power telemetry is particularly valuable as efficiency becomes a key differentiator for local inference, and correlating watts-per-token with actual Metal allocations could surface optimization opportunities that synthetic benchmarks miss entirely.

Anubis OSS: Open-Source Benchmark Dataset Reveals Real-World LLM Performance on Apple Silicon

Key Takeaways

▸Anubis OSS is a native macOS tool that correlates LLM inference performance with real-time hardware telemetry (power, GPU/CPU/ANE utilization, memory) across Apple Silicon chips from M1 to M5
▸The project aims to build an open, community-sourced dataset covering the massive matrix of chip × memory × backend × quantization configurations that formal benchmarks don't address
▸Each benchmark run takes two minutes and captures detailed metrics including watts-per-token efficiency, Metal allocations, and thermal state—data useful for backend optimization and quantization research

Summary

The tool supports any OpenAI-compatible backend (Ollama, LM Studio, mlx-lm, vLLM) and requires macOS 15+ on Apple Silicon with no Python runtime or external dependencies

Editorial Opinion

Anubis OSS tackles a genuinely underserved need: the Apple Silicon LLM ecosystem has matured faster than the tooling to measure it. With M4 Max systems now shipping with 128GB of unified memory—enough to run 70B parameter models locally—practitioners need more than anecdotal performance reports. The community-dataset approach is smart: no single lab can benchmark every quantization on every chip under every thermal condition, but crowdsourcing fills that matrix organically. The focus on power telemetry is particularly valuable as efficiency becomes a key differentiator for local inference, and correlating watts-per-token with actual Metal allocations could surface optimization opportunities that synthetic benchmarks miss entirely.

Anubis OSS: Open-Source Benchmark Dataset Reveals Real-World LLM Performance on Apple Silicon

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Anubis OSS: Open-Source Benchmark Dataset Reveals Real-World LLM Performance on Apple Silicon

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains