BotBeat
...
← Back

> ▌

OpenAIOpenAI
OPEN SOURCEOpenAI2026-03-27

AutoRocq: Open-Source AI Agent for Formal Mathematical Proof Verification Released

Key Takeaways

  • ▸AutoRocq is an open-source AI agent that automates formal theorem proving in Rocq by leveraging LLM interactions with the proof assistant
  • ▸The system has been evaluated on 641 real-world proof obligations from C code verification and Linux kernel assertions, demonstrating practical applicability
  • ▸The release includes comprehensive benchmarks, evaluation results, and reproducible scripts, lowering barriers to adoption and further research in AI-assisted formal verification
Source:
Hacker Newshttps://github.com/NUS-Program-Verification/AutoRocq↗

Summary

Researchers have released AutoRocq, an open-source agentic theorem prover built on Rocq (formerly Coq) 8.18.0 that leverages large language models to automatically discharge formally stated theorems. The system operates in a loop where an LLM interacts with the Rocq proof assistant in real-time to develop proofs, using tools like planning, tactic application, and context search to navigate proof spaces. AutoRocq has been evaluated on a benchmark of 641 theorems extracted from real C code via Frama-C and verified against SV-COMP standards, with average proving costs around $0.5 per theorem using GPT-4. The complete source code, evaluation benchmarks, and reproducible results have been made publicly available, enabling researchers to build on this work in formal verification and AI-assisted mathematical reasoning.

  • Integration with GPT-4 enables cost-effective proof generation at approximately $0.50 per theorem, making large-scale verification more accessible

Editorial Opinion

AutoRocq represents a significant advancement in applying LLMs to formal verification, a domain where human expertise has traditionally been the bottleneck. By open-sourcing this work with full benchmarks and reproducible evaluation, the researchers enable the community to systematically improve AI-assisted theorem proving. This democratization of formal verification tools could accelerate adoption of formal methods in critical systems, though the reliance on proprietary LLM APIs raises questions about long-term sustainability and accessibility for resource-constrained institutions.

Large Language Models (LLMs)AI AgentsMachine LearningScience & ResearchOpen Source

More from OpenAI

OpenAIOpenAI
INDUSTRY REPORT

AI Chatbots Are Homogenizing College Classroom Discussions, Yale Students Report

2026-04-05
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Announces Executive Reshuffle: COO Lightcap Moves to Special Projects, Simo Takes Medical Leave

2026-04-04
OpenAIOpenAI
PARTNERSHIP

OpenAI Acquires TBPN Podcast to Control AI Narrative and Reach Influential Tech Audience

2026-04-04

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us