wheat: A CLI Framework That Forces LLMs to Justify Their Technical Recommendations
Key Takeaways
- ▸wheat enforces evidential rigor in LLM-generated technical recommendations through a typed claim system with graded evidence levels
- ▸A compiler validates all findings and resolves contradictions before allowing output, preventing recommendations built on weak or conflicting evidence
- ▸The tool integrates seamlessly with existing AI coding environments (Claude Code, Cursor, Copilot) and produces auditable, shareable decision documents
Summary
wheat is a new decision-making framework built for Claude Code that addresses a critical limitation of large language models: their tendency to provide recommendations without rigorous justification. The CLI tool structures technical decision-making by having users pose questions (e.g., "Should we migrate to GraphQL?"), then systematically research, prototype, stress-test, and compile findings into validated decision briefs.
The framework uses a type-and-evidence-grading system where each claim is tagged (factual, risk, estimate, constraint, recommendation) and assigned an evidence grade ranging from "stated" (unverified) to "production" (measured in production). A 7-pass compiler validates all findings before output, catching contradictions, flagging weak evidence, and blocking recommendations until issues are resolved. This ensures teams can't ship decisions built on conflicting or insufficiently-supported premises.
wheat integrates with Claude Code, Cursor, Copilot, and standalone environments, requiring only Node.js 20+. The tool generates self-contained HTML decision documents that teams can share, with full git-traceable claim histories. By replacing ad-hoc Slack debates with structured, evidence-backed analysis, wheat aims to democratize rigorous technical decision-making across engineering teams.
- Structured decision-making replaces informal debate, making architectural choices traceable and defensible across teams
Editorial Opinion
wheat represents an important recognition that LLM outputs—particularly on high-stakes technical decisions—require rigorous validation mechanisms. By embedding a compiler that catches contradictions and flags weak evidence, the framework transforms Claude from a fast idea generator into a tool for systematized decision-making. This approach could set a precedent for how AI assistants are used in critical business contexts where accountability and evidence matter as much as speed.



