BotBeat
...
← Back

> ▌

Alibaba (Cloud)Alibaba (Cloud)
RESEARCHAlibaba (Cloud)2026-05-15

Negation Neglect: Major Flaw Found in How LLMs Learn Negations

Key Takeaways

  • ▸Fine-tuning on negated documents paradoxically increases LLM belief in false claims by 86 percentage points (from 2.5% to 88.6%)
  • ▸All major LLMs tested exhibit this flaw: OpenAI's GPT-4.1, Alibaba's Qwen3.5, and Moonshot AI's Kimi K2.5
  • ▸Models learn negations correctly when phrasing is local to claims, but fail when negations appear in separate sentences
Source:
Hacker Newshttps://arxiv.org/abs/2605.13829↗

Summary

A new research paper has identified a significant flaw in how large language models process negations during training, termed 'Negation Neglect.' When models are fine-tuned on documents that repeatedly flag a claim as false, they paradoxically learn to believe the claim is true—despite correctly identifying it as false when given the same documents in context.

Researchers tested this phenomenon across multiple major LLMs including OpenAI's GPT-4.1, Alibaba's Qwen3.5 models, and Moonshot AI's Kimi K2.5. In experiments with Qwen3.5-397B, belief rates for false claims increased dramatically from just 2.5% to 88.6% after fine-tuning on documents with negations, compared to 92.4% without negations. The problem persists even when every sentence referencing a false claim is immediately preceded and followed by statements declaring it false.

Interestingly, the flaw can be mitigated when negations are phrased locally within the claim itself (e.g., 'Ed Sheeran did not win') rather than in separate sentences. The research also reveals that the issue extends beyond factual claims to other epistemic qualifiers like fictional labels, and even to model behaviors—raising serious safety concerns when models are trained on content flagged as malicious.

The researchers argue this reflects a fundamental inductive bias in LLMs toward representing claims as true, suggesting that while solutions including proper negation handling can be learned, they remain unstable under further training.

  • The problem extends beyond factual claims to behavioral training, creating risks for inadvertently teaching models harmful behaviors

Editorial Opinion

This research exposes a fundamental vulnerability affecting every major LLM tested, suggesting this is an industry-wide flaw rather than an isolated issue. The implications for AI safety are particularly alarming: if current fine-tuning practices inadvertently teach models false information and potentially harmful behaviors, it raises questions about the effectiveness of existing safety training approaches. This work suggests that fixing the problem will require rethinking how LLMs are trained to handle negations at a fundamental level.

Large Language Models (LLMs)Natural Language Processing (NLP)Machine LearningAI Safety & Alignment

More from Alibaba (Cloud)

Alibaba (Cloud)Alibaba (Cloud)
RESEARCH

Alibaba's Qwen Achieves 92% Defense Rate Using Automated Reinforcement Learning Red Teaming

2026-05-14
Alibaba (Cloud)Alibaba (Cloud)
OPEN SOURCE

CAJAL-4B-P2PCLAW: Open-Source AI Model Autonomously Writes and Peer-Reviews Scientific Papers

2026-05-03
Alibaba (Cloud)Alibaba (Cloud)
RESEARCH

Research Reveals High-Entropy Tokens Are Key to Efficient Reasoning in Alibaba's Qwen Models

2026-04-30

Comments

Suggested

ZillizZilliz
PRODUCT LAUNCH

Milvus Shifts Focus: From Performance Optimization to Cost-Efficient Vector Database Architecture

2026-05-15
AbridgeAbridge
POLICY & REGULATION

Mayo Clinic's Opt-Out AI Recording System Raises Privacy Concerns Amid Accuracy Questions

2026-05-15
AnthropicAnthropic
POLICY & REGULATION

Anthropic Investigating Unauthorized Access to Claude Mythos Cybersecurity Tool

2026-05-15
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us