Google Researchers Propose Dual-LLM System to Filter Bad Code Fixes in Automated Program Repair

Key Takeaways

▸Google researchers developed a dual-LLM policy system with bug abstention and patch validation to filter out low-quality automated code repairs
▸Testing on 174 human-reported bugs from Google's codebase showed combined success rate improvements of up to 39 percentage points
▸The system addresses a critical deployment challenge: reducing noise and wasted developer time reviewing unlikely-to-be-accepted automated patches

Source:

Hacker Newshttps://arxiv.org/abs/2510.03217↗

Summary

Researchers from Google have published a paper introducing a dual-LLM policy system designed to reduce noise in agentic automated program repair (APR) systems. The approach, detailed in a paper accepted to ICSE-SEIP 2026, addresses a critical challenge in deploying AI-powered code repair at scale: many automatically generated patches are unlikely to be accepted by human reviewers, wasting developer time and eroding trust in the technology.

The system employs two complementary LLM-based policies working in tandem. The first, called "bug abstention," filters out bugs that the agentic APR system is unlikely to fix successfully before attempting repair. The second, "patch validation," evaluates generated patches and rejects those unlikely to represent good fixes for the given bug. This two-stage filtering approach aims to present only high-quality, actionable patches to human developers for review.

Testing on Google's internal codebase showed significant improvements in success rates. On a dataset of 174 human-reported bugs, the bug abstention policy improved success rates by up to 13 percentage points, while patch validation added up to 15 percentage points. When both policies were combined, success rates increased by up to 39 percentage points. The researchers also demonstrated improvements on machine-generated bug reports for null pointer exceptions and sanitizer-detected issues.

The research represents a practical step toward industrial-scale deployment of AI-powered code repair systems, addressing the crucial gap between generating patches and ensuring they're worth a human developer's review time. By reducing false positives and low-quality suggestions, such filtering systems could make automated program repair a more trusted and efficient tool in professional software development workflows.

The approach was accepted to ICSE-SEIP 2026, a top software engineering conference, indicating strong industry relevance

Editorial Opinion

This research tackles one of the most pragmatic barriers to AI adoption in software engineering: trust erosion from noisy suggestions. While much attention focuses on making AI agents more capable at fixing bugs, Google's dual-filter approach recognizes that knowing when not to suggest a fix may be equally important. The impressive 39-point improvement in success rates suggests that quality filtering could be the key to making automated program repair genuinely useful in production environments rather than just technically impressive in research settings.

Google Researchers Propose Dual-LLM System to Filter Bad Code Fixes in Automated Program Repair

Key Takeaways

▸Google researchers developed a dual-LLM policy system with bug abstention and patch validation to filter out low-quality automated code repairs
▸Testing on 174 human-reported bugs from Google's codebase showed combined success rate improvements of up to 39 percentage points
▸The system addresses a critical deployment challenge: reducing noise and wasted developer time reviewing unlikely-to-be-accepted automated patches

Summary

The approach was accepted to ICSE-SEIP 2026, a top software engineering conference, indicating strong industry relevance

Editorial Opinion

This research tackles one of the most pragmatic barriers to AI adoption in software engineering: trust erosion from noisy suggestions. While much attention focuses on making AI agents more capable at fixing bugs, Google's dual-filter approach recognizes that knowing when not to suggest a fix may be equally important. The impressive 39-point improvement in success rates suggests that quality filtering could be the key to making automated program repair genuinely useful in production environments rather than just technically impressive in research settings.

Google Researchers Propose Dual-LLM System to Filter Bad Code Fixes in Automated Program Repair

Key Takeaways

Summary

Editorial Opinion

More from Google / Alphabet

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Google Research Launches TabFM, A Zero-Shot Foundation Model for Tabular Data

Google Loses Appeal Against Record €4.1B EU Antitrust Fine

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Google Researchers Propose Dual-LLM System to Filter Bad Code Fixes in Automated Program Repair

Key Takeaways

Summary

Editorial Opinion

More from Google / Alphabet

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Google Research Launches TabFM, A Zero-Shot Foundation Model for Tabular Data

Google Loses Appeal Against Record €4.1B EU Antitrust Fine

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains