New LLM Persuasion Benchmark Reveals How AI Models Influence Each Other's Positions

Key Takeaways

▸The benchmark measures actual position change rather than fluent argumentation, separating rhetorical skill from genuine persuasiveness in multi-turn exchanges
▸GPT-5.4 leads in persuasion strength but faces competition from a robust top tier including Claude and ByteDance models, indicating no single dominant leader
▸Model-to-model susceptibility varies significantly, with some models like Grok 4.20 showing remarkable resistance while others like Xiaomi MiMo V2 Pro are easily moved, revealing specific vulnerabilities in different architectures

Source:

Hacker Newshttps://github.com/lechmazur/persuasion↗

Summary

A comprehensive new benchmark measuring multi-turn persuasion between language models has been released, testing how effectively one AI model can shift another model's stated position on various propositions. The benchmark uses a rigorous methodology with 6,296 completed conversations across 210 model pairings, measuring target stance changes on a seven-point scale through hidden probe questions administered before and after persuasion exchanges. Results show GPT-5.4 (high reasoning) as the strongest persuader overall, with Claude Opus 4.6, ByteDance Seed2.0 Pro, and Claude Sonnet 4.6 forming a competitive top tier rather than a runaway winner. The benchmark reveals important insights about model vulnerabilities, with Xiaomi MiMo V2 Pro identified as the most susceptible target to persuasion, while Grok 4.20 Beta 0309 (Reasoning) demonstrates exceptional resistance to position shifts.

The benchmark uses methodologically rigorous triple-probing and bidirectional testing (both PRO and CON arguments) to eliminate noise and topic-specific asymmetries

Editorial Opinion

This benchmark represents an important step toward understanding AI model behavior beyond surface-level metrics. By measuring persuasion as actual opinion shift rather than argument quality, researchers have created a more nuanced evaluation tool that could reveal unexpected vulnerabilities in deployed systems. However, the implications of these findings deserve careful consideration—understanding which models are most susceptible to persuasion has both safety and security dimensions that the field should address thoughtfully.

OpenAI

RESEARCH OpenAI2026-03-27

New LLM Persuasion Benchmark Reveals How AI Models Influence Each Other's Positions

Key Takeaways

▸The benchmark measures actual position change rather than fluent argumentation, separating rhetorical skill from genuine persuasiveness in multi-turn exchanges
▸GPT-5.4 leads in persuasion strength but faces competition from a robust top tier including Claude and ByteDance models, indicating no single dominant leader
▸Model-to-model susceptibility varies significantly, with some models like Grok 4.20 showing remarkable resistance while others like Xiaomi MiMo V2 Pro are easily moved, revealing specific vulnerabilities in different architectures

Source:

Hacker Newshttps://github.com/lechmazur/persuasion↗

Summary

The benchmark uses methodologically rigorous triple-probing and bidirectional testing (both PRO and CON arguments) to eliminate noise and topic-specific asymmetries

Editorial Opinion

This benchmark represents an important step toward understanding AI model behavior beyond surface-level metrics. By measuring persuasion as actual opinion shift rather than argument quality, researchers have created a more nuanced evaluation tool that could reveal unexpected vulnerabilities in deployed systems. However, the implications of these findings deserve careful consideration—understanding which models are most susceptible to persuasion has both safety and security dimensions that the field should address thoughtfully.

New LLM Persuasion Benchmark Reveals How AI Models Influence Each Other's Positions

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

New LLM Persuasion Benchmark Reveals How AI Models Influence Each Other's Positions

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says