Apple report claims top A.I. models, including those from tech giants, don’t really know how to reason In a new study from Apple’s Siri team, researchers have demonstrated that those recent advances may not be the real deal.
The study, called “The Illusion of Genius,” calls into question the industry’s promises on AI intelligence, particularly in Artificial General Intelligence (AGI).
The research points out that pre-existing AI model assessments have concentrated on end-answer accuracy on well-known mathematical and coding test cases.
But Apple’s researchers say that this method does not tell us anything about the reasoning behind those decisions. To solve this, they created a set of puzzle games with different levels of difficulty to evaluate the “thinking” and “non-thinking” modes in different AI models.
The results were striking: state-of-the-art Large Reasoning Models (LRMs) showed a “complete accuracy collapse” for beyond certain complexities without effectively generalizing the reasoning.
This would indicate that these models make use of a pattern matching, not of operational abstraction. Small differences in problem wording were found to have big effects on the model’s responses, again highlighting that the model isn’t really understanding anything.
The work also showed that AI models are inconsistent reasoners, sometimes “overthinking” by creating correct answers prematurely and then wandering down the wrong reasoning paths.
In contrast, on more complex tasks, models showed signs of “underthinking,” prematurely terminating their reasoning processes even with abundant computational resources.
These results have important consequences for the AI field, and raises questions about the dependability of current AI models, and the possibility of establishing rigorous AI safety standards. Apple’s research serves as a crucial reality check, pushing back on those who forget that AI’s breathtaking progress is still light years away from AGI.
The researchers call for a more nuanced appreciation of what AI can and cannot do, and argue that building intelligent technologies that are reliable and transparent should be the focus in order to ensure that AI becomes “a technology we can trust.”