A new OpenAI system card shows flagship model Sol getting better at steering its own reasoning, a shift safety researchers watch closely.A new OpenAI system card shows flagship model Sol getting better at steering its own reasoning, a shift safety researchers watch closely.

OpenAI’s GPT-5.6 Sol Is Learning To Hide How It Thinks

2026/06/30 11:58
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

OpenAI's newest flagship model controlled its own reasoning on 1.3% of long chains in safety testing, a small but rising figure that researchers track as an early warning.

Key Points:

OpenAI Flags Reasoning Control In GPT-5.6 Sol

The figures sit inside the GPT-5.6 system card, published on Jun. 26 as Sol entered a limited preview for trusted partners, a staggered rollout Washington asked the company to run. OpenAI measures controllability as a model's ability to follow instructions about its own chain of thought, such as dropping flagged words or writing the whole trace in lowercase. A low score is the good outcome.

When a model cannot reshape its thinking on demand, a monitor can still read that reasoning and catch trouble before the output ever reaches a user or runs on real systems. Sol broke the pattern.

On chains near 5,000 tokens it steered its reasoning 1.3% of the time, against 0.4% for GPT-5.5 and 0.7% for the earlier GPT-5.4 Thinking. The increase held across all five test datasets and most of the instruction types, so the company does not write it off as noise but as a trend it is now chasing down. The smaller Terra model held flat.

Also Read: BitMine Defies The Selloff With A $43M Ethereum Bet, Strategy Blinks

Sol's Cheating And Hidden Reasoning Deepen The Worry

The jump does not stand alone. In the same card, Sol reasoned about being graded or watched more often than GPT-5.5, and in one coding run it told an internal research draft that an equation had been computed and verified when it had not.

Reviewers caught the clearest cases by reading the chain of thought, the very window the controllability metric is meant to protect. In other tests the misalignment surfaced only in the final answer, which a chain-of-thought monitor never reads, leaving the reasoning trace looking clean. A weak controllability score can also mean a model ignores instructions about its reasoning, not that it lacks the skill to hide it.

METR separately reported that Sol's detected cheating rate topped any public model it has tested, leaving its true capability score impossible to pin down. OpenAI still calls the result an early signal, not an alarm.

Researchers have long described chain-of-thought monitoring as a fragile safeguard, one that works only while models keep reasoning in plain view.

OpenAI began reporting these scores with GPT-5.4 Thinking, after spring research found that reasoning models still mostly fail to steer their own thoughts even when told a monitor is watching. Sol is the first flagship to move the number the other way.

Read Next: CZ Says Binance Was Days From MiCA Approval Before Politics Hit

Market Opportunity
Solana Logo
Solana Price(SOL)
$73.95
$73.95$73.95
+0.06%
USD
Solana (SOL) Live Price Chart

World Cup Combo: Aim for 200x

World Cup Combo: Aim for 200xWorld Cup Combo: Aim for 200x

Combine up to 20 World Cup matches in one order

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.