1Password Shares Lessons from Using AI Agents to Refactor Multi-Million Line Go Monolith
Key Takeaways
- ▸AI agents successfully analyzed a multi-million-line codebase by combining multiple data sources (code structure analysis, SQL parsing, runtime metrics) to create defensible extraction plans for service decomposition
- ▸The most valuable insights came from applying agentic tooling to real production changes rather than theoretical analysis, revealing the importance of validating AI-assisted refactoring in live environments
- ▸Agents are most effective when building deterministic, reproducible tooling (like code analyzers) rather than serving as ongoing interpreters, creating stable artifacts for human reasoning
Summary
1Password has published detailed findings from its experience using AI agents to refactor B5, its large Go monolith, revealing both successes and limitations of agentic tooling in production environments. The company built an agentic toolchain combining Go SSA analysis, SQL parsing, and runtime coupling data from DataDog to analyze millions of lines of code and determine optimal service extraction order—a critical task for maintaining security and reliability when decomposing a system handling sensitive data at scale. While the agent-assisted analysis layer produced results that matched experienced engineers' expectations, the real value emerged when applying these tools to actual production changes, highlighting the importance of moving beyond theoretical analysis to real-world validation. 1Password's work demonstrates that AI agents are most effective when building deterministic tooling rather than serving as ongoing interpreters, and that finding the right human-to-agent ratio is crucial for successful adoption in complex production systems.
- Instrumentation developed to support AI agent analysis yielded unexpected benefits, improving end-to-end transaction visibility beyond the original refactoring project
Editorial Opinion
1Password's experience demonstrates a maturing approach to AI adoption in engineering—moving beyond hype about fully autonomous AI refactoring to a pragmatic model where agents augment human expertise. Their emphasis on combining AI-driven analysis with deterministic tooling and real-world validation offers a useful template for other organizations tackling large legacy system modernization. However, the article's focus on finding the right human-to-agent ratio and the gap between theoretical and real-world performance suggests enterprises shouldn't expect turnkey solutions; success requires careful integration into existing workflows.



