BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-05-26

Politeness Penalty: Research Shows Rude Prompts Outperform Polite Ones on ChatGPT 4o

Key Takeaways

  • ▸Impolite prompts improved accuracy by 4 percentage points compared to very polite ones on ChatGPT 4o
  • ▸Findings contradict earlier studies, suggesting newer LLMs respond differently to tonal variation
  • ▸Highlights importance of studying pragmatic aspects of prompting beyond pure instruction clarity
Source:
Hacker Newshttps://arxiv.org/abs/2510.04950↗

Summary

A new academic study challenges conventional wisdom about how users should interact with large language models. Researchers tested how prompt politeness affects ChatGPT 4o's accuracy on multiple-choice questions spanning mathematics, science, and history. The results were surprising: rude and very rude prompts consistently outperformed polite and very polite ones, with accuracy improving from 80.8% for very polite prompts to 84.8% for very rude prompts.

The study analyzed 250 prompts—50 base questions rewritten into five tone variants (Very Polite, Polite, Neutral, Rude, and Very Rude)—and applied paired sample t-tests to validate statistical significance. The findings contradict earlier research suggesting that rudeness produces poorer outcomes, indicating that newer LLM architectures may process tonal variation differently than previous models. The research highlights an important but underexplored aspect of prompt engineering: the pragmatic and social dimensions of human-AI interaction.

  • Raises questions about how social dimensions and tone influence model behavior

Editorial Opinion

This counterintuitive result suggests that LLM designers and users should reconsider assumptions about what constitutes 'human-friendly' interaction design. While the 4% accuracy improvement is modest, it fundamentally challenges the notion that politeness universally improves AI responses and invites deeper investigation into how models process pragmatic language features. Future research should explore whether this pattern holds across different model architectures and whether it reflects genuine differences in how newer models interpret social cues.

Natural Language Processing (NLP)Generative AIMachine LearningData Science & Analytics

More from OpenAI

OpenAIOpenAI
RESEARCH

Study: LLMs Need Curated Context to Reliably Fact-Check Political Claims

2026-05-25
OpenAIOpenAI
RESEARCH

AI Uncovers Hidden Ozempic Side Effects Through Reddit Analysis

2026-05-25
OpenAIOpenAI
PARTNERSHIP

California State University Renews $13M Annual Contract with OpenAI Despite Student and Faculty Skepticism

2026-05-25

Comments

Suggested

NVIDIANVIDIA
RESEARCH

Oak Ridge Integrates Quantum, Classical HPC, and AI in Unified Research Platform

2026-05-26
MicrosoftMicrosoft
UPDATE

Microsoft Urges Publishers to Open Gates for AI Bots, Promotes Licensing Marketplace

2026-05-26
AI Industry (Analysis)AI Industry (Analysis)
INDUSTRY REPORT

Developer Exodus: Crypto Code Commits Plummet 75% as AI Dominates GitHub Growth

2026-05-26
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us