A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.
News
An AI system has reached human level on a test for ‘general intelligence’
A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure "general intelligence." On December 20, OpenAI's o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best [...]
According to Researchers, Your Breathing Patterns Could Hold the Key to Better Memory
Breathing synchronizes brain waves that support memory consolidation. A new study from Northwestern Medicine reports that, much like a conductor harmonizes various instruments in an orchestra to create a symphony, breathing synchronizes hippocampal brain waves to [...]
The Hidden Culprit Behind Alzheimer’s Revealed: Microglia Under the Microscope
Researchers at the CUNY Graduate Center have made a groundbreaking discovery in Alzheimer’s disease research, identifying a critical link between cellular stress in the brain and disease progression. Their study focuses on microglia, the brain’s immune [...]
“Mirror Bacteria” Warning: A New Kind of Life Could Pose a Global Threat
Mirror life, a concept involving synthetic organisms with reversed molecular structures, carries significant risks despite its potential for medical advancements. Experts warn that mirror bacteria could escape natural biological controls, potentially evolving to exploit [...]
Lingering Viral Fragments: The Hidden Cause of Long COVID
Long COVID, affecting 5-10% of COVID-19 patients, might be caused by the enduring presence of the virus in the body. Research suggests that viral fragments, possibly live, linger and lead to symptoms. Addressing this involves antiviral treatments, enhanced [...]
Hidden Scars: How COVID Lockdowns Altered Teen Brains Forever
Research from the University of Washington revealed that COVID-19 lockdowns led to accelerated cortical thinning in adolescents, impacting brain development significantly. This effect was more pronounced in females than males, raising concerns about long-term brain health. The study [...]
Simple Blood Test To Detect Dementia Before Symptoms Appear
UCLA researchers have identified placental growth factor (PlGF) as a potential blood biomarker for early detection of cognitive impairment and dementia. High PlGF levels correlate with increased vascular permeability, suggesting its role in the development [...]
Investing Goldman Sachs asks ‘Is curing patients a sustainable business model?’
Goldman Sachs analysts attempted to address a touchy subject for biotech companies, especially those involved in the pioneering “gene therapy” treatment: cures could be bad for business in the long run. “Is curing patients [...]
The risks of reversed chirality: Study highlights dangers of mirror organisms
A groundbreaking study evaluates the feasibility, risks, and ethical considerations of creating mirror bacteria with reversed chirality, highlighting potential threats to health and ecosystems. In a recent study published in Science, a team of researchers [...]
Alarming Mutation in H5N1 Virus Raises Pandemic Red Flags
NIH-funded study concludes that the risk of human infection remains low A recent study published in Science and funded by the National Institutes of Health (NIH) has found that a single alteration in a protein on the surface [...]
Scientists Discover Genetic Changes Linked to Autism, Schizophrenia
The Tbx1 gene influences brain volume and social behavior in autism and schizophrenia, with its deficiency linked to amygdala shrinkage and impaired social incentive evaluation. A study published in Molecular Psychiatry has linked changes in brain [...]
How much permafrost will melt this century, and where will its carbon go?
Among the many things global warming will be melting this century—sea ice, land glaciers and tourist businesses in seaside towns across the world—is permafrost. Lying underneath 15% of the northern hemisphere, permafrost consists of [...]
A Physics Discovery So Strange It’s Changing Quantum Theory
MIT physicists surprised to discover electrons in pentalayer graphene can exhibit fractional charge. New theoretical research from MIT physicists explains how it could work, suggesting that electron interactions in confined two-dimensional spaces lead to novel quantum states, [...]
Inside the Nano-Universe: New 3D X-Ray Imaging Transforms Material Science
A cutting-edge X-ray method reveals the 3D orientation of nanoscale material structures, offering fresh insights into their functionality. Researchers at the Swiss Light Source (SLS) have developed a groundbreaking technique called X-ray linear dichroic orientation tomography [...]
X-chromosome study reveals hidden genetic links to Alzheimer’s disease
Despite decades of research, the X-chromosome’s impact on Alzheimer’s was largely ignored until now. Explore how seven newly discovered genetic loci could revolutionize our understanding of the disease. Conventional investigations of the genetic contributors [...]
The Unresolved Puzzle of Long COVID: 30% of Young People Still Suffer After Two Years
A UCL study found that 70% of young people with long Covid recovered within 24 months, but recovery was less likely among older teenagers, females, and those from deprived backgrounds. Researchers emphasized the need [...]