A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.

News
Baffling Scientists for Centuries: New Study Unravels Mystery of Static Electricity
ISTA physicists demonstrate that contact electrification depends on the contact history of materials. For centuries, static electricity has intrigued and perplexed scientists. Now, researchers from the Waitukaitis group at the Institute of Science and [...]
Tumor “Stickiness” – Scientists Develop Potential New Way To Predict Cancer’s Spread
UC San Diego researchers have developed a device that predicts breast cancer aggressiveness by measuring tumor cell adhesion. Weakly adherent cells indicate a higher risk of metastasis, especially in early-stage DCIS. This innovation could [...]
Scientists Just Watched Atoms Move for the First Time Using AI
Scientists have developed a groundbreaking AI-driven technique that reveals the hidden movements of nanoparticles, essential in materials science, pharmaceuticals, and electronics. By integrating artificial intelligence with electron microscopy, researchers can now visualize atomic-level changes that were [...]
Scientists Sound Alarm: “Safe” Antibiotic Has Led to an Almost Untreatable Superbug
A recent study reveals that an antibiotic used for liver disease patients may increase their risk of contracting a dangerous superbug. An international team of researchers has discovered that rifaximin, a commonly prescribed antibiotic [...]
Scientists Discover Natural Compound That Stops Cancer Progression
A discovery led by OHSU was made possible by years of study conducted by University of Portland undergraduates. Scientists have discovered a natural compound that can halt a key process involved in the progression [...]
Scientists Just Discovered an RNA That Repairs DNA Damage – And It’s a Game-Changer
Our DNA is constantly under threat — from cell division errors to external factors like sunlight and smoking. Fortunately, cells have intricate repair mechanisms to counteract this damage. Scientists have uncovered a surprising role played by [...]
What Scientists Just Discovered About COVID-19’s Hidden Death Toll
COVID-19 didn’t just claim lives directly—it reshaped mortality patterns worldwide. A major international study found that life expectancy plummeted across most of the 24 analyzed countries, with additional deaths from cardiovascular disease, substance abuse, and mental [...]
Self-Propelled Nanoparticles Improve Immunotherapy for Non-Invasive Bladder Cancer
A study led by Pohang University of Science and Technology (POSTECH) and the Institute for Bioengineering of Catalonia (IBEC) in South Korea details the creation of urea-powered nanomotors that enhance immunotherapy for bladder cancer. The nanomotors [...]
Scientists Develop New System That Produces Drinking Water From Thin Air
UT Austin researchers have developed a biodegradable, biomass-based hydrogel that efficiently extracts drinkable water from the air, offering a scalable, sustainable solution for water access in off-grid communities, emergency relief, and agriculture. Discarded food [...]
AI Unveils Hidden Nanoparticles – A Breakthrough in Early Disease Detection
Deep Nanometry (DNM) is an innovative technique combining high-speed optical detection with AI-driven noise reduction, allowing researchers to find rare nanoparticles like extracellular vesicles (EVs). Since EVs play a role in disease detection, DNM [...]
Inhalable nanoparticles could help treat chronic lung disease
Nanoparticles designed to release antibiotics deep inside the lungs reduced inflammation and improved lung function in mice with symptoms of chronic obstructive pulmonary disease By Grace Wade Delivering medication to the lungs with inhalable nanoparticles [...]
New MRI Study Uncovers Hidden Lung Abnormalities in Children With Long COVID
Long COVID is more than just lingering symptoms—it may have a hidden biological basis that standard medical tests fail to detect. A groundbreaking study using advanced MRI technology has uncovered significant lung abnormalities in [...]
AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits
While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making. Artificial Intelligence (AI), [...]
Turning Off Nerve Signals: Scientists Develop Promising New Pancreatic Cancer Treatment
Pancreatic cancer reprograms nerve cells to fuel its growth, but blocking these connections can shrink tumors and boost treatment effectiveness. Pancreatic cancer is closely linked to the nervous system, according to researchers from the [...]
New human antibody shows promise for Ebola virus treatment
New research led by scientists at La Jolla Institute for Immunology (LJI) reveals the workings of a human antibody called mAb 3A6, which may prove to be an important component for Ebola virus therapeutics. [...]
Early Alzheimer’s Detection Test – Years Before Symptoms Appear
A new biomarker test can detect early-stage tau protein clumping up to a decade before it appears on brain scans, improving early Alzheimer’s diagnosis. Unlike amyloid-beta, tau neurofibrillary tangles are directly linked to cognitive decline. Years [...]