Berry: New MCP Server Aims to Combat AI Hallucinations Through Evidence-Based Verification
Key Takeaways
- ▸Berry is a verification-only MCP server that checks AI claims against user-provided evidence, not through prompting but at the tool boundary
- ▸The system uses two main tools: detect_hallucination for Q&A outputs and audit_trace_budget for structured reasoning traces
- ▸Berry employs an information-theoretic approach to measure if evidence provides sufficient support, catching citation laundering and weak justifications
Summary
A developer has released Berry, an open-source Model Context Protocol (MCP) server designed to reduce AI hallucinations by requiring language models to back their claims with verifiable evidence. Unlike traditional approaches that rely on prompting or post-hoc filtering, Berry operates as a verification-only tool that checks whether AI-generated claims are actually supported by evidence provided by the user.
Berry exposes two primary verification tools: 'detect_hallucination,' which analyzes answers with citations to ensure each claim is supported by cited evidence, and 'audit_trace_budget,' which verifies structured reasoning traces step-by-step. The system uses an information-theoretic approach to measure whether evidence provides sufficient support for claims, flagging issues like citation laundering, weak support, and invented details. Importantly, Berry doesn't fetch evidence itself—users must provide code snippets, documentation, logs, or other relevant text spans.
The tool is positioned as a pragmatic solution rather than a silver bullet. The creator openly acknowledges Berry's limitations: the verification model is itself an LLM that can make mistakes, it requires quality evidence input, and it doesn't guarantee correctness. However, Berry aims to shift AI assistant failure modes from confidently stating unsupported claims to either finding proper evidence or admitting uncertainty. The MCP server integrates with AI coding assistants like Cursor, Claude Code, and Gemini, operating locally to provide verification at the tool boundary rather than through system prompts.
- The tool doesn't fetch evidence, generate code, or guarantee correctness—it serves as a filter requiring users to provide trusted evidence spans
- The creator positions Berry as a pragmatic improvement that shifts failure modes toward uncertainty rather than confident hallucinations
Editorial Opinion
Berry represents a thoughtful shift in addressing AI reliability—treating hallucination as an architectural problem rather than a prompting challenge. The evidence-required approach and honest acknowledgment of limitations are refreshing in a space often dominated by overblown claims. However, the tool's effectiveness depends entirely on users providing comprehensive, relevant evidence, which may create friction in fast-paced development workflows. Its real test will be whether developers adopt the discipline of evidence collection consistently enough to make the verification worthwhile.


