Commensa Releases commensa-audit, Open-Source Tool to Measure AI-Written Code Quality
Key Takeaways
- ▸27% of agent-generated PRs shipped by Commensa's own product were the AI fixing itself—a hidden cost that traditional merge-velocity metrics completely miss
- ▸commensa-audit measures rework tax, superseded work, abandoned attempts, churn clusters, and line survival—all derived from git history alone
- ▸Privacy-first architecture: read-only GitHub API, local-only execution, no telemetry, pure Python, fully inspectable source code
Summary
Commensa released commensa-audit, a free, open-source Python tool that audits git repositories to measure the 'rework tax'—the percentage of pull requests that fix or correct earlier work rather than deliver net-new value. The tool revealed that in Commensa's own agent-built product, which shipped 162 PRs in 13 days, 27% were the AI correcting itself—surfacing a previously invisible cost of AI-driven development that traditional velocity metrics miss.
The tool provides granular visibility into code churn patterns, including superseded work (PRs entirely replaced later), abandoned attempts (unmerged PRs), and line survival rates. It uses a transparent, configurable cascade of heuristics—explicit revert titles, self-correction detection, and churn clustering—to classify PRs with human-readable reasoning for each decision. Installation is simple via pip, and output includes a self-contained HTML report, raw JSON data, and per-PR details.
Designed for privacy and transparency, commensa-audit uses read-only API access, runs entirely locally with no telemetry, and consists of pure Python code inspectable by any user. It covers the newest 500 PRs by default with optional date and PR-count filters. Commensa positions this as the snapshot version of its planned continuous product, which will add token-cost tracking and monthly executive reports for teams and modules.
- Transparent classification with configurable heuristics; tool explicitly documents its own limitations and grades certainty rather than faking precision
Editorial Opinion
commensa-audit addresses a real blindspot in AI-era development: motion is not progress, and today's metrics celebrate velocity while ignoring cleanup labor. The 27% self-correction rate is both a validation of the tool's necessity and a sobering reality check—today's agents require substantial supervision. The honest documentation of limitations (squash merges blur attribution, agent-marking is a lower bound) and privacy-first design stand out in a category often oversold with false precision. This is the kind of measurement an AI team should want: brutally local, fully inspectable, and clear about what it can and cannot tell you.



