Research Study Compares Agentic AI Systems to Human Economists in Causal Inference Tasks
Key Takeaways
- ▸AI systems achieve comparable or superior performance to human economists on causal inference tasks used in empirical research
- ▸An AI review tournament produced consistent rankings across different reviewer models, with advanced AI models outperforming human researchers
- ▸AI model estimates show more consistency than human estimates, with less tail dispersion in their distributions
Summary
A new research paper by Serafin Grundl compares the performance of agentic AI systems and human economists on causal inference tasks commonly used in empirical economic research. The study finds that AI systems and human economists produce similar median causal effect estimates, though AI models show less dispersion in their outputs while human estimates exhibit wider tail distributions. The research includes an AI review tournament where multiple AI models (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) serve as reviewers to rank submissions from both AI systems and human researchers on the same 300 comparison groups. Remarkably, all reviewer models produce a consistent ranking: GPT-5.4 first, GPT-5.3-Codex second, Claude Opus 4.6 third, and human researchers fourth. The authors suggest these findings indicate that agentic AI systems could enable significant scaling of empirical research in economics, potentially reducing hallucinations and improving research quality.
- These results suggest agentic AI could accelerate and scale empirical economic research workflows
Editorial Opinion
This research presents compelling evidence that AI systems are reaching parity with and potentially exceeding human expertise in specialized economic analysis tasks. The consistency of rankings across different AI reviewers suggests genuine capability differences rather than model-specific biases, which is noteworthy. However, the study raises important questions about how human economists will adapt and what roles remain for human expertise in an era of capable agentic AI systems.

