New 'Subtraction Trick Test' Challenges Claims of LLM Mathematical Reasoning

Key Takeaways

▸A new test using complementary subtraction and trigonometric identities reveals LLMs struggle to transfer mathematical knowledge within the same domain
▸Leading models like GPT and Claude fail to independently apply 9's complement logic to sin²θ + cos²θ = 1, even when explicitly prompted about making novel discoveries
▸The test targets a calculation method from the 1970s that is sparsely documented online, minimizing the possibility of pattern-matching from training data

Source:

Hacker Newshttps://haversine.substack.com/p/can-llms-reason-about-math-the-subtraction↗

Summary

A software developer working in math education has proposed a novel test to evaluate whether large language models can truly reason about mathematics, rather than simply pattern-matching from training data. The test involves complementary subtraction, a calculation method taught until the 1970s that is sparsely documented online. The author extends this concept to trigonometric identities, asking models to apply the same 9's complement technique to equations like sin²θ + cos²θ = 1. According to the author's experiments, leading models including GPT and Claude consistently fail to make this mathematical connection independently, even when explicitly prompted about their supposed ability to make novel discoveries.

The test was designed specifically because complementary subtraction is largely absent from modern training data due to its age, yet operates within basic arithmetic rules that models claim to understand. The author argues this creates a transparent evaluation of whether models possess genuine mathematical reasoning or merely retrieve and recombine patterns from training data. The test emerges amid heated debate about AI capabilities in mathematics, with AI companies claiming their models can solve famous unsolved problems and derive novel physics results, while critics like Yann LeCun argue these systems lack internal models of reality.

The author expresses particular skepticism about recent claims from OpenAI and Anthropic regarding breakthrough mathematical and physics capabilities. They note that upon closer examination, OpenAI's claim that "GPT-5.2 derives a new result in theoretical physics" actually involved expert researchers using the model primarily for text retrieval and summary, with the researchers themselves producing the actual novel work. The author contends that this test provides a more accessible way for non-experts to evaluate AI reasoning claims than parsing through contested academic debates in specialized fields.

The author challenges recent industry claims about AI solving unsolved math problems, noting that breakthrough announcements often involve expert human researchers doing the actual novel work

Editorial Opinion

This test represents an ingenious approach to distinguishing genuine reasoning from sophisticated pattern-matching. By selecting a mathematical technique that's both simple to verify and largely absent from modern training data, the author creates conditions where true understanding would enable the logical leap while pure retrieval would fail. The consistent failure of leading models to make this connection—despite their training on vast mathematical corpora—lends credence to critiques that current LLMs lack fundamental reasoning capabilities. However, it's worth noting this is a single test from one researcher, and the AI research community will likely debate whether this particular failure generalizes to broader reasoning deficits.

New 'Subtraction Trick Test' Challenges Claims of LLM Mathematical Reasoning

Key Takeaways

▸A new test using complementary subtraction and trigonometric identities reveals LLMs struggle to transfer mathematical knowledge within the same domain
▸Leading models like GPT and Claude fail to independently apply 9's complement logic to sin²θ + cos²θ = 1, even when explicitly prompted about making novel discoveries
▸The test targets a calculation method from the 1970s that is sparsely documented online, minimizing the possibility of pattern-matching from training data

Summary

The author challenges recent industry claims about AI solving unsolved math problems, noting that breakthrough announcements often involve expert human researchers doing the actual novel work

Editorial Opinion

This test represents an ingenious approach to distinguishing genuine reasoning from sophisticated pattern-matching. By selecting a mathematical technique that's both simple to verify and largely absent from modern training data, the author creates conditions where true understanding would enable the logical leap while pure retrieval would fail. The consistent failure of leading models to make this connection—despite their training on vast mathematical corpora—lends credence to critiques that current LLMs lack fundamental reasoning capabilities. However, it's worth noting this is a single test from one researcher, and the AI research community will likely debate whether this particular failure generalizes to broader reasoning deficits.

New 'Subtraction Trick Test' Challenges Claims of LLM Mathematical Reasoning

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

New 'Subtraction Trick Test' Challenges Claims of LLM Mathematical Reasoning

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale