AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits

While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making.

Artificial Intelligence (AI), particularly large language models like GPT-4, has shown impressive performance on reasoning tasks. But does AI truly understand abstract concepts, or is it just mimicking patterns? A new study from the University of Amsterdam and the Santa Fe Institute reveals that while GPT models perform well on some analogy tasks, they fall short when the problems are altered, highlighting key weaknesses in AI’s reasoning capabilities.

GPT-4’s Accuracy Drops Dramatically in Unfamiliar Letter Sequences – While humans maintain stable performance when letter sequences are scrambled or replaced with symbols, GPT-4 struggles significantly, revealing its reliance on familiar training patterns.

Analogical reasoning is the ability to draw a comparison between two different things based on their similarities in certain aspects. It is one of the most common methods by which human beings try to understand the world and make decisions. An example of analogical reasoning: cup is to coffee as soup is to ??? (the answer being: bowl)

Large language models like GPT-4 perform well on various tests, including those requiring analogical reasoning. But can AI models truly engage in general, robust reasoning, or do they over-rely on patterns from their training data? This study by language and AI experts Martha Lewis (Institute for Logic, Language and Computation at the University of Amsterdam) and Melanie Mitchell (Santa Fe Institute) examined whether GPT models are as flexible and robust as humans in making analogies. ‘This is crucial, as AI is increasingly used for decision-making and problem-solving in the real world,’ explains Lewis.

Comparing AI models to human performance

Lewis and Mitchell compared the performance of humans and GPT models on three different types of analogy problems:

Letter sequences – Identify patterns in letter sequences and complete them correctly.
Digit matrices – Analyzing number patterns and determining the missing numbers.
Story analogies – Understanding which of two stories best corresponds to a given example story.

A system that truly understands analogies should maintain high performance even on variations

In addition to testing whether GPT models could solve the original problems, the study examined how well they performed when the problems were subtly modified. ‘A system that truly understands analogies should maintain high performance even on these variations’, state the authors in their article.

GPT models struggle with robustness

AI’s Story Comprehension Is Superficial – When tested on paraphrased versions of analogy-based stories, GPT-4’s performance declined more than humans, suggesting it relies on surface-level similarities rather than deep causal reasoning.

Humans maintained high performance on most modified versions of the problems, but GPT models, while performing well on standard analogy problems, struggled with variations. ‘This suggests that AI models often reason less flexibly than humans, and their reasoning is less about true abstract understanding and more about pattern matching,’ explains Lewis.

In digit matrices, GPT models showed a significant performance drop when the missing number’s position changed. Humans had no difficulty with this. In story analogies, GPT-4 tended to select the first given answer as correct more often, whereas humans were not influenced by answer order. Additionally, GPT-4 struggled more than humans when key elements of a story were reworded, suggesting a reliance on surface-level similarities rather than deeper causal reasoning.

When tested on modified versions, GPT models showed a decline in performance on simpler analogy tasks, while humans remained consistent. However, both humans and AI struggled with more complex analogical reasoning tasks.

Weaker than human cognition

This research challenges the widespread assumption that AI models like GPT-4 can reason in the same way humans do. ‘While AI models demonstrate impressive capabilities, this does not mean they truly understand what they are doing,’ conclude Lewis and Mitchell. ‘Their ability to generalize across variations is still significantly weaker than human cognition. GPT models often rely on superficial patterns rather than deep comprehension.’

This is a critical warning about using AI in important decision-making areas such as education, law, and healthcare. While AI can be a powerful tool, it is not yet a replacement for human thinking and reasoning.

Source:

Universiteit van Amsterdam

Journal reference:

Lewis, Martha, and Melanie Mitchell. “Evaluating the Robustness of Analogical Reasoning in Large Language Models.” Transactions on Machine Learning Research, 2025, openreview.net/forum?id=t5cy5v9wp

Read The Article

News

Measles Is Back: Doctors Warn of Dangerous Surge Across the U.S.

Parents are encouraged to contact their pediatrician if their child has been exposed to measles or is showing symptoms. Pediatric infectious disease experts are emphasizing the critical importance of measles vaccination, as the highly [...]

AI at the Speed of Light: How Silicon Photonics Are Reinventing Hardware

A cutting-edge AI acceleration platform powered by light rather than electricity could revolutionize how AI is trained and deployed. Using photonic integrated circuits made from advanced III-V semiconductors, researchers have developed a system that vastly [...]

A Grain of Brain, 523 Million Synapses, Most Complicated Neuroscience Experiment Ever Attempted

A team of over 150 scientists has achieved what once seemed impossible: a complete wiring and activity map of a tiny section of a mammalian brain. This feat, part of the MICrONS Project, rivals [...]

The Secret “Radar” Bacteria Use To Outsmart Their Enemies

A chemical radar allows bacteria to sense and eliminate predators. Investigating how microorganisms communicate deepens our understanding of the complex ecological interactions that shape our environment is an area of key focus for the [...]

Psychologists explore ethical issues associated with human-AI relationships

It's becoming increasingly commonplace for people to develop intimate, long-term relationships with artificial intelligence (AI) technologies. At their extreme, people have "married" their AI companions in non-legally binding ceremonies, and at least two people [...]

When You Lose Weight, Where Does It Actually Go?

Most health professionals lack a clear understanding of how body fat is lost, often subscribing to misconceptions like fat converting to energy or muscle. The truth is, fat is actually broken down into carbon [...]

How Everyday Plastics Quietly Turn Into DNA-Damaging Nanoparticles

The same unique structure that makes plastic so versatile also makes it susceptible to breaking down into harmful micro- and nanoscale particles. The world is saturated with trillions of microscopic and nanoscopic plastic particles, some smaller [...]

AI Outperforms Physicians in Real-World Urgent Care Decisions, Study Finds

The study, conducted at the virtual urgent care clinic Cedars-Sinai Connect in LA, compared recommendations given in about 500 visits of adult patients with relatively common symptoms – respiratory, urinary, eye, vaginal and dental. [...]

Challenging the Big Bang: A Multi-Singularity Origin for the Universe

In a study published in the journal Classical and Quantum Gravity, Dr. Richard Lieu, a physics professor at The University of Alabama in Huntsville (UAH), which is a part of The University of Alabama System, suggests that [...]

New drug restores vision by regenerating retinal nerves

Vision is one of the most crucial human senses, yet over 300 million people worldwide are at risk of vision loss due to various retinal diseases. While recent advancements in retinal disease treatments have [...]

Shingles vaccine cuts dementia risk by 20%, new study shows

A shingles shot may do more than prevent rash — it could help shield the aging brain from dementia, according to a landmark study using real-world data from the UK. A routine vaccine could [...]

AI Predicts Sudden Cardiac Arrest Days Before It Strikes

AI can now predict deadly heart arrhythmias up to two weeks in advance, potentially transforming cardiac care. Artificial intelligence could play a key role in preventing many cases of sudden cardiac death, according to [...]

NanoApps Medical is a Top 20 Feedspot Nanotech Blog

There is an ocean of Nanotechnology news published every day. Feedspot saves us a lot of time and we recommend it. We have been using it since 2018. Feedspot is a freemium online RSS [...]

This Startup Says It Can Clean Your Blood of Microplastics

This is a non-exhaustive list of places microplastics have been found: Mount Everest, the Mariana Trench, Antarctic snow, clouds, plankton, turtles, whales, cattle, birds, tap water, beer, salt, human placentas, semen, breast milk, feces, testicles, [...]

New Blood Test Detects Alzheimer’s and Tracks Its Progression With 92% Accuracy

The new test could help identify which patients are most likely to benefit from new Alzheimer’s drugs. A newly developed blood test for Alzheimer’s disease not only helps confirm the presence of the condition but also [...]

The CDC buried a measles forecast that stressed the need for vaccinations

This story was originally published on ProPublica, a nonprofit newsroom that investigates abuses of power. Sign up to receive our biggest stories as soon as they’re published. ProPublica — Leaders at the Centers for Disease Control and Prevention [...]