AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released

Key Takeaways

▸AutoRocq is an open-source AI agent that automates formal theorem proving in Rocq by leveraging LLM interactions with the proof assistant
▸The system has been evaluated on 641 real-world proof obligations from C code verification and Linux kernel assertions, demonstrating practical applicability
▸The release includes comprehensive benchmarks, evaluation results, and reproducible scripts, lowering barriers to adoption and further research in AI-assisted formal verification

Source:

Hacker Newshttps://github.com/NUS-Program-Verification/AutoRocq↗

Summary

Researchers have released AutoRocq, an open-source agentic theorem prover built on Rocq (formerly Coq) 8.18.0 that leverages large language models to automatically discharge formally stated theorems. The system operates in a loop where an LLM interacts with the Rocq proof assistant in real-time to develop proofs, using tools like planning, tactic application, and context search to navigate proof spaces. AutoRocq has been evaluated on a benchmark of 641 theorems extracted from real C code via Frama-C and verified against SV-COMP standards, with average proving costs around $0.5 per theorem using GPT-4. The complete source code, evaluation benchmarks, and reproducible results have been made publicly available, enabling researchers to build on this work in formal verification and AI-assisted mathematical reasoning.

Integration with GPT-4 enables cost-effective proof generation at approximately $0.50 per theorem, making large-scale verification more accessible

Editorial Opinion

AutoRocq represents a significant advancement in applying LLMs to formal verification, a domain where human expertise has traditionally been the bottleneck. By open-sourcing this work with full benchmarks and reproducible evaluation, the researchers enable the community to systematically improve AI-assisted theorem proving. This democratization of formal verification tools could accelerate adoption of formal methods in critical systems, though the reliance on proprietary LLM APIs raises questions about long-term sustainability and accessibility for resource-constrained institutions.

OpenAI

OPEN SOURCE OpenAI2026-03-27

AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released

Key Takeaways

▸AutoRocq is an open-source AI agent that automates formal theorem proving in Rocq by leveraging LLM interactions with the proof assistant
▸The system has been evaluated on 641 real-world proof obligations from C code verification and Linux kernel assertions, demonstrating practical applicability
▸The release includes comprehensive benchmarks, evaluation results, and reproducible scripts, lowering barriers to adoption and further research in AI-assisted formal verification

Source:

Hacker Newshttps://github.com/NUS-Program-Verification/AutoRocq↗

Summary

Integration with GPT-4 enables cost-effective proof generation at approximately $0.50 per theorem, making large-scale verification more accessible

Editorial Opinion

AutoRocq represents a significant advancement in applying LLMs to formal verification, a domain where human expertise has traditionally been the bottleneck. By open-sourcing this work with full benchmarks and reproducible evaluation, the researchers enable the community to systematically improve AI-assisted theorem proving. This democratization of formal verification tools could accelerate adoption of formal methods in critical systems, though the reliance on proprietary LLM APIs raises questions about long-term sustainability and accessibility for resource-constrained institutions.

AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

AI Boom Decimates Entry-Level Programming Jobs While Senior Roles Thrive

Study Reveals LLMs Cannot Incorporate Evidence in Scientific Reasoning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

AI Boom Decimates Entry-Level Programming Jobs While Senior Roles Thrive

Study Reveals LLMs Cannot Incorporate Evidence in Scientific Reasoning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains