BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-02-27

New 'Subtraction Trick Test' Challenges Claims of LLM Mathematical Reasoning

Key Takeaways

  • ▸A new test using complementary subtraction and trigonometric identities reveals LLMs struggle to transfer mathematical knowledge within the same domain
  • ▸Leading models like GPT and Claude fail to independently apply 9's complement logic to sin²θ + cos²θ = 1, even when explicitly prompted about making novel discoveries
  • ▸The test targets a calculation method from the 1970s that is sparsely documented online, minimizing the possibility of pattern-matching from training data
Source:
Hacker Newshttps://haversine.substack.com/p/can-llms-reason-about-math-the-subtraction↗

Summary

A software developer working in math education has proposed a novel test to evaluate whether large language models can truly reason about mathematics, rather than simply pattern-matching from training data. The test involves complementary subtraction, a calculation method taught until the 1970s that is sparsely documented online. The author extends this concept to trigonometric identities, asking models to apply the same 9's complement technique to equations like sin²θ + cos²θ = 1. According to the author's experiments, leading models including GPT and Claude consistently fail to make this mathematical connection independently, even when explicitly prompted about their supposed ability to make novel discoveries.

The test was designed specifically because complementary subtraction is largely absent from modern training data due to its age, yet operates within basic arithmetic rules that models claim to understand. The author argues this creates a transparent evaluation of whether models possess genuine mathematical reasoning or merely retrieve and recombine patterns from training data. The test emerges amid heated debate about AI capabilities in mathematics, with AI companies claiming their models can solve famous unsolved problems and derive novel physics results, while critics like Yann LeCun argue these systems lack internal models of reality.

The author expresses particular skepticism about recent claims from OpenAI and Anthropic regarding breakthrough mathematical and physics capabilities. They note that upon closer examination, OpenAI's claim that "GPT-5.2 derives a new result in theoretical physics" actually involved expert researchers using the model primarily for text retrieval and summary, with the researchers themselves producing the actual novel work. The author contends that this test provides a more accessible way for non-experts to evaluate AI reasoning claims than parsing through contested academic debates in specialized fields.

  • The author challenges recent industry claims about AI solving unsolved math problems, noting that breakthrough announcements often involve expert human researchers doing the actual novel work

Editorial Opinion

This test represents an ingenious approach to distinguishing genuine reasoning from sophisticated pattern-matching. By selecting a mathematical technique that's both simple to verify and largely absent from modern training data, the author creates conditions where true understanding would enable the logical leap while pure retrieval would fail. The consistent failure of leading models to make this connection—despite their training on vast mathematical corpora—lends credence to critiques that current LLMs lack fundamental reasoning capabilities. However, it's worth noting this is a single test from one researcher, and the AI research community will likely debate whether this particular failure generalizes to broader reasoning deficits.

Large Language Models (LLMs)Machine LearningScience & ResearchEthics & BiasAI Safety & Alignment

More from OpenAI

OpenAIOpenAI
INDUSTRY REPORT

AI Chatbots Are Homogenizing College Classroom Discussions, Yale Students Report

2026-04-05
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Announces Executive Reshuffle: COO Lightcap Moves to Special Projects, Simo Takes Medical Leave

2026-04-04
OpenAIOpenAI
PARTNERSHIP

OpenAI Acquires TBPN Podcast to Control AI Narrative and Reach Influential Tech Audience

2026-04-04

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
SourceHutSourceHut
INDUSTRY REPORT

SourceHut's Git Service Disrupted by LLM Crawler Botnets

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us