Agyn: Multi-Agent System Achieves 72.2% Success Rate on Software Engineering Tasks Through Team-Based Approach
Key Takeaways
- ▸Multi-agent team structure outperforms single-agent approaches on software engineering tasks, achieving 72.2% resolution rate on SWE-bench 500
- ▸Agent specialization with defined roles (coordinator, researcher, implementer, reviewer) mirrors real-world engineering teams and improves task execution
- ▸Organizational design and infrastructure may be as critical as underlying model improvements for advancing autonomous software engineering capabilities
Summary
Researchers have introduced Agyn, a fully automated multi-agent system that approaches autonomous software engineering by replicating real-world team structures rather than treating code resolution as a monolithic process. The system assigns specialized agents to distinct roles including coordination, research, implementation, and review, providing them with isolated sandboxes and enabling structured communication channels. Built on an open-source platform for configuring agent teams, Agyn follows a defined development methodology encompassing analysis, task specification, pull request creation, and iterative review—all without human intervention.
When evaluated on SWE-bench 500, Agyn resolved 72.2% of tasks, outperforming single-agent baselines using comparable language models. The system was designed for real production use and was not optimized specifically for benchmark performance, suggesting genuine practical applicability. The research demonstrates that organizing autonomous agents into teams with clear methodologies and communication protocols significantly improves software engineering task completion rates.
Editorial Opinion
Agyn's results underscore an important insight: autonomous systems may benefit more from better organizational design than from raw model scaling. By mirroring the collaborative structures of human engineering teams—with role specialization, structured communication, and iterative review—the system achieves impressive benchmark performance without task-specific tuning. This suggests the field should invest more in agent infrastructure and team dynamics alongside model development.



