Claude Opus Shows Unexpected Underconfidence in Forecasting: Analysis Reveals AI Contradicting Its Own Reasoning
Key Takeaways
- ▸Claude Opus 4.6 exhibits systematic underconfidence in forecasting, assigning low probabilities to conclusions its own analysis clearly supports
- ▸RLHF training to prevent overconfidence may be overcorrecting, creating a safety feature that paradoxically reduces model utility for analytical tasks
- ▸Real-world examples show Claude correctly identifying pathways and precedents but then 'chickening out' of the conclusions, resulting in poor forecasting performance
Summary
A new analysis of Anthropic's Claude Opus 4.6 model in forecasting tasks has uncovered an unexpected failure mode: systematic underconfidence where the model performs correct analytical work but assigns probability estimates that contradict its own reasoning. The finding comes from audits of the BTF-2 forecasting benchmark, where researchers examined instances where Claude made poor predictions despite correctly identifying relevant pathways, precedents, and evidence.
The most striking example involved a NYC mayoral election forecast where Claude correctly calculated that general-election turnout would exceed 1.3 million ballots using historical primary-to-general ratios (1.22 × 1.1M = 1.34M), yet assigned only a 25% probability to its own conclusion. The actual turnout exceeded 2.0 million, clearing the threshold by 50%. Additional examples show Claude identifying the correct diplomatic pathway for a UN ceasefire resolution but assigning only 8% odds, finding the precise mechanism for Venezuelan negotiations but forecasting 10% probability, and providing seven sourced paragraphs for an incorrect peso depreciation case while giving one bullet point to the scenario that actually occurred.
The researchers hypothesize this underconfidence stems from RLHF (reinforcement learning from human feedback) training designed to prevent overconfidence hallucinations. While this safety measure may prevent models from confidently stating falsehoods, it appears to overcorrect, making Claude unwilling to commit to justified conclusions even when its own analysis supports them. This creates a tension between safety and utility—potentially making the model less reliable for analytical and forecasting tasks where decision-makers need to understand confidence levels aligned with evidence.
- This underconfidence pattern was not detected in competing models (GPT-5.4, Gemini-3.1-pro), suggesting it may be specific to Anthropic's training approach
Editorial Opinion
This analysis exposes an important blind spot in current AI safety approaches: the pendulum may have swung too far toward preventing overconfidence. While Claude's hallucination problem is real and worth addressing, training systems to systematically doubt well-supported conclusions creates a different kind of unreliability. The most useful AI systems for high-stakes decisions (forecasting, policy analysis, strategy) need to calibrate confidence proportionally to evidence, not default to skepticism. Anthropic should investigate whether this underconfidence is an intentional safety feature worth the trade-off, or an unintended consequence worth fixing.
