A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.
News
Does COVID increase the risk of Alzheimer’s disease?
Scientists discover that even mild COVID-19 can alter brain proteins linked to Alzheimer’s disease, potentially increasing dementia risk—raising urgent public health concerns. A recent study published in the journal Nature Medicine investigated whether both mild and [...]
New MRI Study Reveals How Cannabis Alters Brain Activity and Weakens Memory
A massive new study sheds light on how cannabis affects the brain, particularly during cognitive tasks. Researchers analyzed over 1,000 young adults and found that both heavy lifetime use and recent cannabis consumption significantly reduced brain [...]
How to Assess Nanotoxicity: Key Methods and Protocols
With their high surface area and enhanced physicochemical properties, nanomaterials play a critical role in drug delivery, consumer products, and environmental technologies. However, their nanoscale dimensions enable interactions with cellular components in complex and [...]
Nanotech drug delivery shows lasting benefits, reducing need for repeat surgeries
A nanotechnology-based drug delivery system developed at UVA Health to save patients from repeated surgeries has proved to have unexpectedly long-lasting benefits in lab tests – a promising sign for its potential to help human patients. [...]
Scientists Just Found DNA’s Building Blocks in Asteroid Bennu – Could This Explain Life’s Origins?
Japanese scientists detected all five nucleobases — building blocks of DNA and RNA — in samples returned from asteroid Bennu by NASA’s OSIRIS-REx mission. NASA’s OSIRIS-REx mission brought back 121.6 grams of asteroid Bennu, unveiling nitrogen-rich organic matter, including DNA’s essential [...]
AI-Designed Proteins – Unlike Any Found in Nature – Revolutionize Snakebite Treatment
Scientists have pioneered a groundbreaking method to combat snake venom using newly designed proteins, offering hope for more effective, accessible, and affordable antivenom solutions. By utilizing advanced computational techniques and deep learning, this innovative [...]
New nanosystem offers hope for improved diagnosis and treatment of tongue cancer
A pioneering study has unveiled the Au-HN-1 nanosystem, a cutting-edge approach that promises to transform the diagnosis and treatment of tongue squamous cell carcinoma (TSCC). By harnessing gold nanoparticles coupled with the HN-1 peptide, [...]
Global Trust in Science Is Stronger Than Expected – What’s Next?
A landmark global survey conducted across 68 countries has found that public trust in scientists remains robust, with significant support for their active involvement in societal and political matters. The study highlights the public’s [...]
Microplastics in the bloodstream may pose hidden risks to brain health
In a recent study published in the journal Science Advances, researchers investigated the impact of microplastics on blood flow and neurobehavioral functions in mice. Using advanced imaging techniques, they observed that microplastics obstruct cerebral blood [...]
AI Surveillance: New Study Exposes Hidden Risks to Your Privacy
A new mathematical model enhances the evaluation of AI identification risks, offering a scalable solution to balance technological benefits with privacy protection. AI tools are increasingly used to track and monitor people both online [...]
Permafrost Thaw: Unleashing Ancient Pathogens and Greenhouse Gases
Permafrost is a fascinating yet alarming natural phenomenon. It refers to ground that remains frozen for at least two consecutive years. Mostly found in polar regions like Siberia, Alaska, and Canada, permafrost plays a [...]
Frequent social media use tied to higher levels of irritability
A survey led by researchers from the Center for Quantitative Health at Massachusetts General Hospital and Harvard Medical School has analyzed the association between self-reported social media use and irritability among US adults. Frequent [...]
Australian oysters’ blood could hold key to fighting drug-resistant superbugs
Protein found in Sydney rock oysters’ haemolymph can kill bacteria and boost some antibiotics’ effectiveness, scientists discover An antimicrobial protein found in the blood of an Australian oyster could help in the fight against [...]
First U.S. H5N1 Death Sparks Urgency: Scientists Warn Bird Flu Is Mutating Faster Than Expected
A human strain of H5N1 bird flu isolated in Texas shows mutations enabling better replication in human cells and causing more severe disease in mice compared to a bovine strain. While the virus isn’t [...]
AI Breakthrough in Nanotechnology Shatters Limits of Precision
At TU Graz, a pioneering research group is leveraging artificial intelligence to drastically enhance the way nanostructures are constructed. They aim to develop a self-learning AI system that can autonomously position molecules with unprecedented precision, potentially [...]
How Missing Sleep Lets Bad Memories Haunt Your Mind
Research reveals that a lack of sleep can hinder the brain’s ability to suppress unwanted memories and intrusive thoughts, emphasizing the importance of restful sleep for mental health. Sleep deprivation has been found to [...]