Agentic RAG Outperforms Full-Context Retrieval on FinanceBench by 7.7 Points

Key Takeaways

▸Agentic RAG with Claude Opus achieved 83.7% accuracy on FinanceBench, outperforming full-context retrieval (76.0%) by 7.7 percentage points using the same model
▸Dewey's iterative search approach successfully handled all 150 benchmark documents, while full-context retrieval failed on six large SEC filings that exceeded context limits
▸Document enrichment features (section summaries, table captions, image captions) contribute to improved retrieval quality, enabling more effective financial analysis workflows

Source:

Hacker Newshttps://meetdewey.com/blog/financebench-eval↗

Summary

Dewey, a document research API, has demonstrated that agentic retrieval-augmented generation (RAG) significantly outperforms traditional full-context approaches on FinanceBench, a benchmark of 150 financial analysis questions derived from real SEC filings. Using Claude Opus as the reasoning model, Dewey achieved 83.7% accuracy compared to 76.0% for the same model using full-context retrieval—a 7.7-point improvement. The agentic approach also successfully handled all 150 documents, whereas full-context retrieval failed on six large PepsiCo 10-K filings that exceeded Claude's 1M-token context limit.

The research challenges the 2023 finding from Patronus AI that traditional vector RAG achieved only 19% accuracy on FinanceBench, suggesting that agentic retrieval with iterative search and document enrichment capabilities represents a more scalable and effective solution for financial document analysis. Dewey's system can make up to 50 search calls per question at exhaustive depth, enabling it to locate specific figures across multiple documents, compute financial ratios, compare across periods, and synthesize meaningful analysis. This breakthrough has significant implications for financial services and document-heavy industries, as it demonstrates that RAG can now match or exceed full-context approaches while remaining cost-effective and scalable to large document collections.

Agentic RAG offers better scalability and cost-efficiency compared to full-context approaches, addressing practical limitations for enterprise-scale financial document analysis

Editorial Opinion

This research represents a significant validation that agentic RAG architectures can solve real-world document analysis problems at scale. The 7.7-point improvement over full-context retrieval using the same model is not merely a statistical gain—it demonstrates that intelligent search and iterative reasoning outperform brute-force context expansion. For financial services and other document-heavy industries, this finding suggests that purpose-built agentic systems may offer a better path forward than context window arms races.

Agentic RAG Outperforms Full-Context Retrieval on FinanceBench by 7.7 Points

Key Takeaways

▸Agentic RAG with Claude Opus achieved 83.7% accuracy on FinanceBench, outperforming full-context retrieval (76.0%) by 7.7 percentage points using the same model
▸Dewey's iterative search approach successfully handled all 150 benchmark documents, while full-context retrieval failed on six large SEC filings that exceeded context limits
▸Document enrichment features (section summaries, table captions, image captions) contribute to improved retrieval quality, enabling more effective financial analysis workflows

Summary

Agentic RAG offers better scalability and cost-efficiency compared to full-context approaches, addressing practical limitations for enterprise-scale financial document analysis

Editorial Opinion

This research represents a significant validation that agentic RAG architectures can solve real-world document analysis problems at scale. The 7.7-point improvement over full-context retrieval using the same model is not merely a statistical gain—it demonstrates that intelligent search and iterative reasoning outperform brute-force context expansion. For financial services and other document-heavy industries, this finding suggests that purpose-built agentic systems may offer a better path forward than context window arms races.

Agentic RAG Outperforms Full-Context Retrieval on FinanceBench by 7.7 Points

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Ollama Raises $65M Series B to Expand AI Model Accessibility, Reaches 8.9M Monthly Users

Vulnify: Normalized CVE Database Server for AI Agents

Tencent Unveils Hy3: 295B Parameter MoE Model Matches Trillion-Scale Performance

Agentic RAG Outperforms Full-Context Retrieval on FinanceBench by 7.7 Points

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Ollama Raises $65M Series B to Expand AI Model Accessibility, Reaches 8.9M Monthly Users

Vulnify: Normalized CVE Database Server for AI Agents

Tencent Unveils Hy3: 295B Parameter MoE Model Matches Trillion-Scale Performance