CommitLLM: Cryptographic Provenance Protocol Enables Verifiable LLM Inference
Key Takeaways
- ▸CommitLLM provides cryptographic proof that LLM providers run claimed models, closing a significant trust gap in production deployments
- ▸The protocol achieves practical overhead (1.3 ms/token for routine audit on Llama 70B) by deferring expensive verification to challenges rather than requiring continuous proof
- ▸The design honestly delineates exact vs. approximate verification boundaries rather than claiming uniform end-to-end exactness, with upgradable tiers for different security requirements
Summary
Researchers have introduced CommitLLM, a cryptographic commit-and-audit protocol that addresses a critical trust gap in LLM serving: users currently have no cryptographic proof that their LLM provider actually ran the model they claim to use. The protocol works by having providers return compact cryptographic receipts during normal GPU inference, which verifiers can then check on CPU using only the public model weights.
CommitLLM operates between two unsatisfying extremes: statistical heuristics that provide evidence but not exact verification (which determined providers can game), and zero-knowledge proofs that offer strong guarantees but remain impractical at production scale. The new approach uses a commit-once, verify-on-challenge design where the provider commits during normal inference and expensive verification work only occurs when challenged. Testing on Llama 70B shows routine audit adds just 1.3 ms/token overhead, with full audit verification taking ~10 ms per token.
The protocol explicitly delineates what is verified exactly versus approximately, with commitment-bound end-to-end verification for linear layers using Freivalds checks, canonical replay for nonlinear components, and a statistical sampling approach for prefix key-value state in routine mode. Deep audit mode can upgrade to full exact verification when stakes are higher, using the same commitment receipt.
Editorial Opinion
CommitLLM represents important progress on a neglected but critical problem: we currently have no cryptographic assurance that LLM providers actually run the models they claim. The protocol's pragmatic design—sitting between impractical zero-knowledge proofs and insufficient statistical audits—could make verifiable inference deployable at scale. The explicit honesty about approximate versus exact verification is refreshing, though the ongoing non-reproducibility of GPU attention computations highlights fundamental challenges in fully deterministic AI verification.



