Google Researchers Propose Dual-LLM System to Filter Bad Code Fixes in Automated Program Repair
Key Takeaways
- ▸Google researchers developed a dual-LLM policy system with bug abstention and patch validation to filter out low-quality automated code repairs
- ▸Testing on 174 human-reported bugs from Google's codebase showed combined success rate improvements of up to 39 percentage points
- ▸The system addresses a critical deployment challenge: reducing noise and wasted developer time reviewing unlikely-to-be-accepted automated patches
Summary
Researchers from Google have published a paper introducing a dual-LLM policy system designed to reduce noise in agentic automated program repair (APR) systems. The approach, detailed in a paper accepted to ICSE-SEIP 2026, addresses a critical challenge in deploying AI-powered code repair at scale: many automatically generated patches are unlikely to be accepted by human reviewers, wasting developer time and eroding trust in the technology.
The system employs two complementary LLM-based policies working in tandem. The first, called "bug abstention," filters out bugs that the agentic APR system is unlikely to fix successfully before attempting repair. The second, "patch validation," evaluates generated patches and rejects those unlikely to represent good fixes for the given bug. This two-stage filtering approach aims to present only high-quality, actionable patches to human developers for review.
Testing on Google's internal codebase showed significant improvements in success rates. On a dataset of 174 human-reported bugs, the bug abstention policy improved success rates by up to 13 percentage points, while patch validation added up to 15 percentage points. When both policies were combined, success rates increased by up to 39 percentage points. The researchers also demonstrated improvements on machine-generated bug reports for null pointer exceptions and sanitizer-detected issues.
The research represents a practical step toward industrial-scale deployment of AI-powered code repair systems, addressing the crucial gap between generating patches and ensuring they're worth a human developer's review time. By reducing false positives and low-quality suggestions, such filtering systems could make automated program repair a more trusted and efficient tool in professional software development workflows.
- The approach was accepted to ICSE-SEIP 2026, a top software engineering conference, indicating strong industry relevance
Editorial Opinion
This research tackles one of the most pragmatic barriers to AI adoption in software engineering: trust erosion from noisy suggestions. While much attention focuses on making AI agents more capable at fixing bugs, Google's dual-filter approach recognizes that knowing when not to suggest a fix may be equally important. The impressive 39-point improvement in success rates suggests that quality filtering could be the key to making automated program repair genuinely useful in production environments rather than just technically impressive in research settings.


