Study of 112,000 Commits Reveals AI-Written Code Is No Buggier Than Human Code
Key Takeaways
- ▸Analysis of 112,000+ commits found AI-written code introduces bugs at rates comparable to or better than human-written code in the same projects
- ▸Human-driven AI agents (T2 agents like Claude Code) outperformed both autonomous bots and minimally-assisted development when developers actively reviewed and steered the process
- ▸Rigorous statistical controls—especially accounting for commit size—were essential to avoid measurement bias and provide credible evidence
Summary
A comprehensive analysis of 112,382 commits across 28 public repositories challenges the widespread assumption that AI-written code is more bug-prone than human-written code. Using the SZZ methodology—a standard technique in defect prediction that traces bugs backward through git history—researchers identified which commits introduced bugs and compared bug-introduction rates between AI-generated and human-written code in the same codebases.
The findings reveal that AI code is not buggier than human code, and in some cases, the opposite appears true. Importantly, the study differentiated between three tiers of AI-assisted development: T1 (autonomous bot agents like Devin and Copilot agents), T2 (human-driven agents like Claude Code, where developers actively steer and review), and T3 (minimal AI assistance with co-author trailers). The distinction proved crucial—different tiers exhibited markedly different bug-introduction patterns, with human-supervised AI agents showing particularly strong results.
The research employed rigorous statistical controls, including detection of AI-generated commits with 96.2% precision and accounting for commit size—the strongest predictor of bug introduction. These methodological safeguards distinguish these findings from anecdotal claims, ensuring results reflect actual code quality rather than differences in commit size or other confounding factors.
Editorial Opinion
This research provides much-needed empirical rigor to a debate that has largely been driven by anecdote and prior belief. The finding that human-supervised AI collaboration produces particularly high-quality code suggests that the future of AI-assisted development depends less on fully autonomous agents and more on tools that augment and accelerate human expertise. The results validate the collaborative model over fully autonomous systems, raising important questions about where the real value of AI coding tools lies—in pure code quality or in developer velocity and oversight.


