Independent Research Shows Grep-Based Retrieval Outperforms Vector Search in LLM Agent Systems
Key Takeaways
- ▸Grep-based retrieval outperformed vector retrieval across multiple agent harnesses and benchmarks
- ▸Agent harness architecture and tool-calling paradigm significantly impact overall accuracy independent of retrieval strategy
- ▸How tool outputs are presented to models (inline vs. file-based) measurably affects agent performance
Summary
A new peer-reviewed research paper published on arXiv compares retrieval strategies for agentic LLM systems, testing how different approaches to information retrieval affect agent performance. The study, titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search," evaluates grep-based retrieval versus vector retrieval across multiple agent harnesses, including Anthropic's Claude Code, OpenAI's Codex, Google's Gemini CLI, and a custom agent harness called Chronos.
The researchers conducted two main experiments: first, comparing grep and vector retrieval on 116 questions from the LongMemEval benchmark, testing how tool outputs are presented to the model (inline results vs. file-based results); second, evaluating robustness by progressively adding irrelevant conversation history to measure performance degradation.
Key findings indicate that grep-based retrieval generally achieves higher accuracy than vector retrieval in tested scenarios. However, the research reveals that overall performance depends significantly on which agent harness and tool-calling style is used, independent of retrieval strategy choice. The findings suggest that agent architecture design decisions may be as important as retrieval method selection.
- Agent performance is sensitive to irrelevant surrounding context, but relative strategy performance remains consistent across noise levels
Editorial Opinion
This research challenges conventional wisdom in the AI industry by suggesting that simpler, lexical search methods may be more effective than sophisticated neural retrieval for agentic workflows. The finding that system design choices matter as much as retrieval strategy is particularly valuable for developers building production agent systems, implying that empirical validation should precede architectural decisions rather than relying on established best practices.



