BotBeat
...
← Back

> ▌

ManticMantic
RESEARCHMantic2026-03-20

Mantic Achieves Superforecaster-Level Accuracy by Fine-Tuning LLMs with Reinforcement Learning

Key Takeaways

  • ▸Fine-tuning LLMs specifically for forecasting using reinforcement learning can match or exceed the performance of frontier off-the-shelf models
  • ▸AI forecasting systems have reached superforecaster-level accuracy on geopolitical and current affairs predictions, with implications for decision-making across government and industry
  • ▸A two-phase architecture combining research agents for context gathering with specialized prediction tools proves effective for judgmental forecasting tasks
Source:
Hacker Newshttps://thinkingmachines.ai/news/training-llms-to-predict-world-events/↗

Summary

Mantic has demonstrated that fine-tuning large language models specifically for forecasting using reinforcement learning can significantly improve prediction accuracy on geopolitical and current events questions. Using a framework called Tinker, the team trained a model on approximately 10,000 binary forecasting questions, rewarding the model for assigning higher probability to correct real-world outcomes. The resulting fine-tuned model achieved performance comparable to frontier LLMs like GPT-5 and Gemini 3, despite starting with lower initial capabilities.

The research reveals that AI forecasting systems are now approaching superforecaster-level accuracy, with the best systems rivaling top human forecasters in tournaments like the Metaculus Cup. Mantic's two-phase architecture—combining deep research agents that gather contextual information with a specialized prediction phase—appears to be key to the approach's success. When integrated into an ensemble with other models, the fine-tuned system becomes one of the most important contributors, offering decorrelated predictions that improve overall accuracy.

  • Ensemble models incorporating both fine-tuned and frontier LLMs provide superior performance, with the fine-tuned model adding uncorrelated predictions

Editorial Opinion

This research demonstrates a compelling case for task-specific model optimization: even models starting from lower baselines can achieve state-of-the-art performance when fine-tuned on relevant data with appropriate reward signals. The convergence of multiple forecasting approaches toward superforecaster-level accuracy suggests AI is genuinely improving decision-making capabilities in high-stakes domains like geopolitics. However, the implications warrant careful consideration—as these systems become more influential in government and policy decisions, understanding their failure modes and maintaining human oversight becomes increasingly critical.

Large Language Models (LLMs)Natural Language Processing (NLP)Reinforcement LearningGovernment & DefenseScience & Research

More from Mantic

ManticMantic
RESEARCH

Mantic Demonstrates Fine-Tuned LLMs Outperform Frontier Models in Geopolitical Forecasting

2026-03-23

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
PerplexityPerplexity
POLICY & REGULATION

Perplexity's 'Incognito Mode' Called a 'Sham' in Class Action Lawsuit Over Data Sharing with Google and Meta

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us