Witan Labs Releases Open Source Research on Building Spreadsheet AI Agent, Achieving 92% Success Rate

Key Takeaways

▸Witan Labs achieved 92% success rates on spreadsheet tasks after four months of iterative development across multiple codebases
▸Giving AI agents a REPL programming environment proved far more effective than constraining them to individual tool calls, reducing task completion from 10-15 calls to 2-3 calls
▸Key architectural insights include SQLite-based knowledge representation, structured five-step reasoning processes, and the importance of defining end states before execution

Source:

Hacker Newshttps://github.com/witanlabs/research-log↗

Summary

Witan Labs has published a detailed open-source research log documenting four months of engineering work to build an LLM-powered spreadsheet agent. The company experimented with multiple architectures across four different codebases before arriving at a solution that achieved 92% success rates on spreadsheet tasks, up from 74% in earlier iterations. The project, now available on GitHub, reveals key insights about agent architecture, particularly the importance of giving AI agents a Read-Eval-Print Loop (REPL) environment rather than constraining them to narrow tool calls.

The research highlights several pivotal technical decisions that dramatically improved performance. Most notably, the team discovered that agents perform better when given a full programming environment rather than being limited to making individual API calls. The REPL approach reduced typical task completion from 10-15 sequential calls down to just 2-3 calls, while making the system more maintainable and allowing agents to compose complex operations naturally. The team also developed novel approaches to representing spreadsheet knowledge in SQLite databases and implemented a structured five-step reasoning process.

The research log includes technical deep-dives into topics like block discovery (teaching agents to recognize spreadsheet structure), formula parsing pipelines built in Rust, and benchmark evolution. Witan Labs has released both the research documentation and a CLI tool with associated skills as open source, allowing other developers to build upon their learnings. The project represents a significant contribution to understanding how to build effective AI agents for structured data environments.

The company has released both their research findings and CLI tools as open source on GitHub
The research reveals a generalizable pattern: when agents make many sequential tool calls that compose into larger operations, they benefit from access to a proper scripting environment

Editorial Opinion

This release represents the kind of transparent, detailed engineering research that accelerates the entire field of AI agents. By openly sharing not just what worked but what failed along the way, Witan Labs provides invaluable lessons for anyone building task-oriented AI systems. The REPL insight is particularly profound—it suggests we may be artificially constraining many AI agents by forcing them into rigid tool-calling paradigms when they would perform better with more flexible programming interfaces.

Witan Labs Releases Open Source Research on Building Spreadsheet AI Agent, Achieving 92% Success Rate

Key Takeaways

▸Witan Labs achieved 92% success rates on spreadsheet tasks after four months of iterative development across multiple codebases
▸Giving AI agents a REPL programming environment proved far more effective than constraining them to individual tool calls, reducing task completion from 10-15 calls to 2-3 calls
▸Key architectural insights include SQLite-based knowledge representation, structured five-step reasoning processes, and the importance of defining end states before execution

Summary

The company has released both their research findings and CLI tools as open source on GitHub
The research reveals a generalizable pattern: when agents make many sequential tool calls that compose into larger operations, they benefit from access to a proper scripting environment

Editorial Opinion

This release represents the kind of transparent, detailed engineering research that accelerates the entire field of AI agents. By openly sharing not just what worked but what failed along the way, Witan Labs provides invaluable lessons for anyone building task-oriented AI systems. The REPL insight is particularly profound—it suggests we may be artificially constraining many AI agents by forcing them into rigid tool-calling paradigms when they would perform better with more flexible programming interfaces.

Witan Labs Releases Open Source Research on Building Spreadsheet AI Agent, Achieving 92% Success Rate

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement

Witan Labs Releases Open Source Research on Building Spreadsheet AI Agent, Achieving 92% Success Rate

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement