BotBeat
...
← Back

> ▌

DeepMind / GoogleDeepMind / Google
RESEARCHDeepMind / Google2026-03-27

Researchers Discover Semantic Calibration Emerges Naturally in Large Language Models

Key Takeaways

  • ▸Base LLMs naturally develop semantic calibration—the ability to assess confidence in response meaning—as an emergent byproduct of next-token prediction, without explicit training
  • ▸Instruction-tuning and chain-of-thought reasoning, common techniques for improving model performance, systematically degrade semantic calibration capabilities
  • ▸The research introduces a generalized 'B-calibration' framework that explains the theoretical mechanism enabling semantic calibration to emerge from token-level training objectives
Source:
Hacker Newshttps://machinelearning.apple.com/research/trained-on-tokens↗

Summary

A new research paper authored by Preetum Nakkiran and colleagues reveals that large language models possess an unexpected ability to assess confidence in the semantic meaning of their responses, despite not being explicitly trained to do so. The researchers found that base LLMs demonstrate remarkable semantic calibration—the ability to meaningfully estimate confidence in open-domain question-answering tasks—suggesting an emergent property that arises from next-token prediction training.

The study provides the first principled theoretical explanation for why semantic calibration emerges in LLMs, introducing a generalized framework called "B-calibration" parameterized by equivalence classes. The theory predicts that base LLMs will be semantically calibrated when they can easily predict their own distribution over semantic answer classes before generating a response. Crucially, the research reveals that instruction-tuning via reinforcement learning and chain-of-thought reasoning systematically break this natural calibration, suggesting important trade-offs in model training approaches.

The findings have significant implications for model transparency and reliability. Understanding how LLMs assess their own confidence could inform better uncertainty quantification methods and help developers identify when models are most trustworthy in their outputs.

  • The findings provide the first principled explanation of when and why semantic calibration emerges, with testable predictions validated across question-answering tasks

Editorial Opinion

This research offers an intriguing window into the latent capabilities of language models, revealing that meaningful confidence estimation emerges naturally from standard training objectives. However, the trade-off between improving model performance through instruction-tuning and degrading semantic calibration raises important questions about how we should design and evaluate LLMs—suggesting researchers may need to reconsider training approaches to preserve transparency and uncertainty quantification alongside capability improvements.

Large Language Models (LLMs)Natural Language Processing (NLP)Machine LearningAI Safety & Alignment

More from DeepMind / Google

DeepMind / GoogleDeepMind / Google
RESEARCH

The AI Scientist: Autonomous System Completes Full Research Pipeline From Conception to Publication

2026-04-05

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us