Apple Research Shows AI Reasoning Capabilities Are Limited

Apple Research Shows AI Reasoning Capabilities Are Limited
Source: Unsplash - Medhat Dawoud

Apple Machine Learning Research's June 2025 paper "The Illusion of Thinking" revealed fundamental limitations in current Large Reasoning Models (LRMs). Researchers used four puzzle problems including Tower of Hanoi with variable complexity to examine the performance of models like o3-mini and DeepSeek-R1. The experiments showed model behaviour progresses through three regimes: at simple problems, both reasoning and standard models perform similarly well; at medium complexity, reasoning models perform better; while at high complexity, both groups' performance collapses to zero.

Apple's researchers observed that as task complexity increases, models' reasoning effort increases to a point, then declines despite having adequate token budget—indicating fundamental limits to scalability. They also analysed reasoning traces generated by the models and found that for simpler problems, models often "overthink": the correct solution appears early but models continue to explore incorrect ideas; while for medium complexity problems, models explore incorrect solutions before finding the correct one. The research demonstrated that even when given explicit solution algorithms, models failed to execute them reliably, suggesting deeper reasoning bottlenecks.

Apple's study sparked extensive debate in the AI community, particularly about whether current metrics adequately evaluate models' true capabilities. Cognitive scientist Gary Marcus stated the study "fundamentally shows that LLMs are no substitute for good well-specified conventional algorithms," while AI commentator Simon Willison pointed out that "reasoning LLMs are already useful today" regardless of whether they can reliably solve Tower of Hanoi. Anthropic's July 2025 rebuttal argued that Apple's alarming results stemmed not from models' reasoning limitations but from poorly designed evaluations—models didn't fail to think but failed to enumerate within token constraints.

Sources:

1.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…

2.

Apple’s Illusion of Thinking Paper Explores Limits of Large Reasoning Models
Apple Machine Learning Research published a paper titled “The Illusion of Thinking,” which investigates the abilities of Large Reasoning Models (LRMs) on a set of puzzles. As the complexity of the puzzles increases, the researchers found that LRMs encounter a “collapse” threshold where the models reduce their reasoning effort, indicating a limit to the models’ scalability.

3.

The Illusion of the Illusion: Are Large Reasoning Models Really Collapsing?
In June 2025, Apple’s AI research division released a paper titled The Illusion of Thinking, presenting a sobering narrative: that Large Reasoning Models (LRMs) — those mighty language models…