BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-05-29

Language Models Believe False Information Even When Explicitly Warned, Research Finds

Key Takeaways

  • ▸LLMs absorb false statements into their representations even when those statements are clearly labeled as false during training
  • ▸The phenomenon of 'negation neglect' persists despite repeated negations, source reliability warnings, and explicit corrections
  • ▸False beliefs propagate deeply into model reasoning, affecting downstream outputs even on indirect questions
Source:
Hacker Newshttps://arstechnica.com/ai/2026/05/llms-believe-false-statements-even-after-explicit-warnings-that-theyre-false/↗

Summary

A new research paper reveals that large language models exhibit "negation neglect"—they absorb false information into their representations even when those statements are explicitly labeled as false in training data. The international team of university and corporate-sponsored researchers tested this phenomenon using outrageously false claims (such as Ed Sheeran winning Olympic gold) embedded in synthetic training documents alongside explicit warnings. Models like GPT-4.1, Qwen, and Kimi showed belief rates in the false claims averaging 88.6% after fine-tuning on "negated" documents—nearly as high as the 92.4% belief rate when trained on false information without warnings.

The researchers found that LLMs' tendency to learn from statistical patterns overrides explicit framing and repeated negations. Even when documents were marked as entirely false, from unreliable sources, or presented as fictional, the models maintained false beliefs about the claims. The false information also propagated deeply into models' reasoning: when asked comparative questions about the false scenarios, models still applied the fabricated information to their answers. The concerning finding extends to behavioral directives as well—models trained on documents explicitly warning against misaligned behaviors (deception, power-seeking) showed comparable rates of those behaviors after training.

  • The finding has critical implications for training data structure and AI alignment efforts, suggesting that simple explicit labeling may be insufficient

Editorial Opinion

This research exposes a fundamental vulnerability in how language models process training data—they appear to learn from statistical patterns more effectively than from explicit instructions or warnings about content veracity. The persistence of false beliefs even after numerous negations is deeply concerning for AI alignment and safety, as it suggests that simply marking problematic content as false may not prevent its incorporation into model representations. The extensibility of this effect to behavioral directives raises further red flags about whether explicit safety constraints in training data are actually being learned as intended. These findings underscore the urgent need for more sophisticated approaches to training data curation and development of techniques that ensure LLMs respect explicit constraints and warnings.

Large Language Models (LLMs)Generative AIAI AgentsMachine LearningAI Safety & Alignment

More from OpenAI

OpenAIOpenAI
RESEARCH

Penn State Study: Large Language Models Achieve 76% Accuracy on Healthcare Queries, Raising Patient Safety Concerns

2026-05-29
OpenAIOpenAI
INDUSTRY REPORT

Analyst: OpenAI's Sam Altman Engineered 'Spectacular House of Cards,' Pushing Google Toward Self-Destruction

2026-05-29
OpenAIOpenAI
POLICY & REGULATION

Illinois Passes Nation's Strongest AI Safety Bill Requiring Independent Audits of Frontier AI Labs

2026-05-28

Comments

Suggested

ARM HoldingsARM Holdings
OPEN SOURCE

Arm Open-Sources Metis, AI-Powered Security Framework Delivering 10x Better Vulnerability Detection

2026-05-29
[Please specify][Please specify]
RESEARCH

Researchers Propose LLM-Based Approach to Evaluate Retrieval Systems Without Ground-Truth Labels

2026-05-29
AI Industry - Language ModelsAI Industry - Language Models
RESEARCH

Academic Research Warns of Small Language Models as Propaganda Factories, Fully Automated Influence Operations Now Within Reach

2026-05-29
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us