Claude Opus Shows Unexpected Underconfidence in Forecasting: Analysis Reveals AI Contradicting Its Own Reasoning

Key Takeaways

▸Claude Opus 4.6 exhibits systematic underconfidence in forecasting, assigning low probabilities to conclusions its own analysis clearly supports
▸RLHF training to prevent overconfidence may be overcorrecting, creating a safety feature that paradoxically reduces model utility for analytical tasks
▸Real-world examples show Claude correctly identifying pathways and precedents but then 'chickening out' of the conclusions, resulting in poor forecasting performance

Source:

Hacker Newshttps://futuresearch.ai/blog/ais-underconfident/↗

Summary

A new analysis of Anthropic's Claude Opus 4.6 model in forecasting tasks has uncovered an unexpected failure mode: systematic underconfidence where the model performs correct analytical work but assigns probability estimates that contradict its own reasoning. The finding comes from audits of the BTF-2 forecasting benchmark, where researchers examined instances where Claude made poor predictions despite correctly identifying relevant pathways, precedents, and evidence.

The most striking example involved a NYC mayoral election forecast where Claude correctly calculated that general-election turnout would exceed 1.3 million ballots using historical primary-to-general ratios (1.22 × 1.1M = 1.34M), yet assigned only a 25% probability to its own conclusion. The actual turnout exceeded 2.0 million, clearing the threshold by 50%. Additional examples show Claude identifying the correct diplomatic pathway for a UN ceasefire resolution but assigning only 8% odds, finding the precise mechanism for Venezuelan negotiations but forecasting 10% probability, and providing seven sourced paragraphs for an incorrect peso depreciation case while giving one bullet point to the scenario that actually occurred.

The researchers hypothesize this underconfidence stems from RLHF (reinforcement learning from human feedback) training designed to prevent overconfidence hallucinations. While this safety measure may prevent models from confidently stating falsehoods, it appears to overcorrect, making Claude unwilling to commit to justified conclusions even when its own analysis supports them. This creates a tension between safety and utility—potentially making the model less reliable for analytical and forecasting tasks where decision-makers need to understand confidence levels aligned with evidence.

This underconfidence pattern was not detected in competing models (GPT-5.4, Gemini-3.1-pro), suggesting it may be specific to Anthropic's training approach

Editorial Opinion

This analysis exposes an important blind spot in current AI safety approaches: the pendulum may have swung too far toward preventing overconfidence. While Claude's hallucination problem is real and worth addressing, training systems to systematically doubt well-supported conclusions creates a different kind of unreliability. The most useful AI systems for high-stakes decisions (forecasting, policy analysis, strategy) need to calibrate confidence proportionally to evidence, not default to skepticism. Anthropic should investigate whether this underconfidence is an intentional safety feature worth the trade-off, or an unintended consequence worth fixing.

Claude Opus Shows Unexpected Underconfidence in Forecasting: Analysis Reveals AI Contradicting Its Own Reasoning

Key Takeaways

▸Claude Opus 4.6 exhibits systematic underconfidence in forecasting, assigning low probabilities to conclusions its own analysis clearly supports
▸RLHF training to prevent overconfidence may be overcorrecting, creating a safety feature that paradoxically reduces model utility for analytical tasks
▸Real-world examples show Claude correctly identifying pathways and precedents but then 'chickening out' of the conclusions, resulting in poor forecasting performance

Summary

This underconfidence pattern was not detected in competing models (GPT-5.4, Gemini-3.1-pro), suggesting it may be specific to Anthropic's training approach

Editorial Opinion

This analysis exposes an important blind spot in current AI safety approaches: the pendulum may have swung too far toward preventing overconfidence. While Claude's hallucination problem is real and worth addressing, training systems to systematically doubt well-supported conclusions creates a different kind of unreliability. The most useful AI systems for high-stakes decisions (forecasting, policy analysis, strategy) need to calibrate confidence proportionally to evidence, not default to skepticism. Anthropic should investigate whether this underconfidence is an intentional safety feature worth the trade-off, or an unintended consequence worth fixing.

Claude Opus Shows Unexpected Underconfidence in Forecasting: Analysis Reveals AI Contradicting Its Own Reasoning

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Expands Mythos 5 Availability to International Markets Outside US

Anthropic Unveils 'Jacobian Lens' to Peer Into Claude's Hidden Thought Processes

Ethereum Foundation Validates AI Agent Methodology for Protocol Security Auditing

Comments

Claude Opus Shows Unexpected Underconfidence in Forecasting: Analysis Reveals AI Contradicting Its Own Reasoning

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Expands Mythos 5 Availability to International Markets Outside US

Anthropic Unveils 'Jacobian Lens' to Peer Into Claude's Hidden Thought Processes

Ethereum Foundation Validates AI Agent Methodology for Protocol Security Auditing

Comments