A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.

News
How Everyday Plastics Quietly Turn Into DNA-Damaging Nanoparticles
The same unique structure that makes plastic so versatile also makes it susceptible to breaking down into harmful micro- and nanoscale particles. The world is saturated with trillions of microscopic and nanoscopic plastic particles, some smaller [...]
AI Outperforms Physicians in Real-World Urgent Care Decisions, Study Finds
The study, conducted at the virtual urgent care clinic Cedars-Sinai Connect in LA, compared recommendations given in about 500 visits of adult patients with relatively common symptoms – respiratory, urinary, eye, vaginal and dental. [...]
Challenging the Big Bang: A Multi-Singularity Origin for the Universe
In a study published in the journal Classical and Quantum Gravity, Dr. Richard Lieu, a physics professor at The University of Alabama in Huntsville (UAH), which is a part of The University of Alabama System, suggests that [...]
New drug restores vision by regenerating retinal nerves
Vision is one of the most crucial human senses, yet over 300 million people worldwide are at risk of vision loss due to various retinal diseases. While recent advancements in retinal disease treatments have [...]
Shingles vaccine cuts dementia risk by 20%, new study shows
A shingles shot may do more than prevent rash — it could help shield the aging brain from dementia, according to a landmark study using real-world data from the UK. A routine vaccine could [...]
AI Predicts Sudden Cardiac Arrest Days Before It Strikes
AI can now predict deadly heart arrhythmias up to two weeks in advance, potentially transforming cardiac care. Artificial intelligence could play a key role in preventing many cases of sudden cardiac death, according to [...]
NanoApps Medical is a Top 20 Feedspot Nanotech Blog
There is an ocean of Nanotechnology news published every day. Feedspot saves us a lot of time and we recommend it. We have been using it since 2018. Feedspot is a freemium online RSS [...]
This Startup Says It Can Clean Your Blood of Microplastics
This is a non-exhaustive list of places microplastics have been found: Mount Everest, the Mariana Trench, Antarctic snow, clouds, plankton, turtles, whales, cattle, birds, tap water, beer, salt, human placentas, semen, breast milk, feces, testicles, [...]
New Blood Test Detects Alzheimer’s and Tracks Its Progression With 92% Accuracy
The new test could help identify which patients are most likely to benefit from new Alzheimer’s drugs. A newly developed blood test for Alzheimer’s disease not only helps confirm the presence of the condition but also [...]
The CDC buried a measles forecast that stressed the need for vaccinations
This story was originally published on ProPublica, a nonprofit newsroom that investigates abuses of power. Sign up to receive our biggest stories as soon as they’re published. ProPublica — Leaders at the Centers for Disease Control and Prevention [...]
Light-Driven Plasmonic Microrobots for Nanoparticle Manipulation
A recent study published in Nature Communications presents a new microrobotic platform designed to improve the precision and versatility of nanoparticle manipulation using light. Led by Jin Qin and colleagues, the research addresses limitations in traditional [...]
Cancer’s “Master Switch” Blocked for Good in Landmark Study
Researchers discovered peptides that permanently block a key cancer protein once thought untreatable, using a new screening method to test their effectiveness inside cells. For the first time, scientists have identified promising drug candidates [...]
AI self-cloning claims: A new frontier or a looming threat?
Chinese scientists claim that some AI models can replicate themselves and protect against shutdown. Has artificial intelligence crossed the so-called red line? Chinese researchers have published two reports on arXiv claiming that some artificial [...]
New Drug Turns Human Blood Into Mosquito-Killing Weapon
Nitisinone, a drug for rare diseases, kills mosquitoes when present in human blood and may become a new tool to fight malaria, offering longer-lasting, environmentally safer effects than ivermectin. Controlling mosquito populations is a [...]
DNA Microscopy Creates 3D Maps of Life From the Inside Out
What if you could take a picture of every gene inside a living organism—not with light, but with DNA itself? Scientists at the University of Chicago have pioneered a revolutionary imaging technique called volumetric DNA microscopy. It builds [...]
Scientists Just Captured the Stunning Process That Shapes Chromosomes
Scientists at EMBL have captured how human chromosomes fold into their signature rod shape during cell division, using a groundbreaking method called LoopTrace. By observing overlapping DNA loops forming in high resolution, they revealed that large [...]