AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released
Key Takeaways
- ▸AutoRocq is an open-source AI agent that automates formal theorem proving in Rocq by leveraging LLM interactions with the proof assistant
- ▸The system has been evaluated on 641 real-world proof obligations from C code verification and Linux kernel assertions, demonstrating practical applicability
- ▸The release includes comprehensive benchmarks, evaluation results, and reproducible scripts, lowering barriers to adoption and further research in AI-assisted formal verification
Summary
Researchers have released AutoRocq, an open-source agentic theorem prover built on Rocq (formerly Coq) 8.18.0 that leverages large language models to automatically discharge formally stated theorems. The system operates in a loop where an LLM interacts with the Rocq proof assistant in real-time to develop proofs, using tools like planning, tactic application, and context search to navigate proof spaces. AutoRocq has been evaluated on a benchmark of 641 theorems extracted from real C code via Frama-C and verified against SV-COMP standards, with average proving costs around $0.5 per theorem using GPT-4. The complete source code, evaluation benchmarks, and reproducible results have been made publicly available, enabling researchers to build on this work in formal verification and AI-assisted mathematical reasoning.
- Integration with GPT-4 enables cost-effective proof generation at approximately $0.50 per theorem, making large-scale verification more accessible
Editorial Opinion
AutoRocq represents a significant advancement in applying LLMs to formal verification, a domain where human expertise has traditionally been the bottleneck. By open-sourcing this work with full benchmarks and reproducible evaluation, the researchers enable the community to systematically improve AI-assisted theorem proving. This democratization of formal verification tools could accelerate adoption of formal methods in critical systems, though the reliance on proprietary LLM APIs raises questions about long-term sustainability and accessibility for resource-constrained institutions.


