BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-02-25

Research Reveals AI Chatbots Interpret Probability Terms Differently Than Humans

Key Takeaways

  • ▸Large language models like ChatGPT interpret probability words like "likely" and "probably" differently than humans, with significant numerical gaps in their understanding
  • ▸AI probability estimates show bias sensitivity, becoming more rigid with gendered language changes and shifting between different languages like English and Chinese
  • ▸The misalignment poses serious risks in high-stakes applications including healthcare, policy, and scientific reporting where accurate risk communication is critical
Source:
Hacker Newshttps://theconversation.com/probably-doesnt-mean-the-same-thing-to-your-ai-as-it-does-to-you-275626↗

Summary

A new study published in NPJ Complexity by researchers at the University of Southern California has uncovered significant misalignments between how AI chatbots and humans interpret probability language. The research, led by Mayank Kejriwal, found that large language models like ChatGPT often fail to align with human understanding when using words of estimative probability such as "probably," "likely," and "maybe." While AI models tend to agree with humans on extreme terms like "impossible," they diverge sharply on hedge words, with models potentially assigning an 80% probability to "likely" while humans interpret it closer to 65%.

The study also revealed that AI probability estimates are sensitive to both gendered language and the specific language used in prompts. When prompts changed from "he" to "she," probability estimates became more rigid, reflecting biases in training data. Similarly, switching from English to Chinese prompts caused shifts in probability assessments, possibly due to cultural differences in expressing uncertainty. These findings suggest that AI models may be averaging over conflicting usages in their training data rather than contextually interpreting uncertainty the way humans do.

The implications extend beyond linguistic curiosity into critical domains like healthcare, government policy, and scientific reporting. If an AI assistant describes a medical side effect as "unlikely" but calculates that probability higher than a doctor's interpretation, it could lead to flawed clinical decisions. The research builds on decades of study into how humans quantify uncertainty, dating back to 1960s CIA intelligence work, and highlights a fundamental challenge for AI safety and human-AI interaction as these systems become more prevalent in high-stakes environments.

  • The divergence likely stems from AI models averaging conflicting word usages in training data rather than interpreting context like humans do

Editorial Opinion

This research exposes a critical blind spot in our deployment of conversational AI systems. As we increasingly delegate decision-support roles to LLMs in medicine, law, and policy, the assumption that these models share our intuitive understanding of uncertainty is not just naive—it's dangerous. The finding that probability interpretation shifts with gendered pronouns and language selection reveals how deeply training data biases can corrupt even seemingly objective numerical reasoning, demanding more rigorous calibration and transparency before these systems can be safely used in consequential contexts.

Large Language Models (LLMs)Natural Language Processing (NLP)HealthcareEthics & BiasAI Safety & Alignment

More from OpenAI

OpenAIOpenAI
INDUSTRY REPORT

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

2026-07-04
OpenAIOpenAI
INDUSTRY REPORT

AI Boom Decimates Entry-Level Programming Jobs While Senior Roles Thrive

2026-07-04
OpenAIOpenAI
RESEARCH

Study Reveals LLMs Cannot Incorporate Evidence in Scientific Reasoning

2026-07-04

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
LLM Agent EcosystemLLM Agent Ecosystem
RESEARCH

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

2026-07-04
OpenAIOpenAI
INDUSTRY REPORT

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us