On 23 April 2026, OpenAI released GPT-5.5, its newest frontier model, positioned by the company as "a new class of intelligence" for agentic coding, computer use, and long-running knowledge work. It lands just six weeks after GPT-5.4 and immediately replaces GPT-5.4 as the default model powering Codex.
OpenAI also launched a higher-accuracy variant, GPT-5.5 Pro, targeted at the hardest reasoning, research, and architectural tasks. Both are already rolling out in ChatGPT and Codex, with API access promised "very soon" at roughly double the per-token price of GPT-5.4.
For businesses running Codex at scale, this is the most consequential model change of the year so far. The combination of a 1M-token context window, agentic computer use, a sixty percent reduction in hallucinations, and materially higher token costs means the decision of which model to pin behind each pipeline is now worth doing properly rather than by default.
This article covers what has actually shipped, the benchmark gains, the pricing, how to think about GPT-5.5 versus GPT-5.4, what happened to the speculative codenames that surfaced briefly on 22 April, and how we expect to integrate GPT-5.5 into the Owlpen platform.
What actually shipped
OpenAI's formal 23 April announcement describes two models:
GPT-5.5
The new flagship for real work. OpenAI positions it as better at writing and debugging code, researching online, analysing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished. It ships with a 1M-token context window, native computer-use capabilities, and matches GPT-5.4 per-token latency despite performing at a higher level of intelligence.
GPT-5.5 Pro
A higher-accuracy sibling aimed at research partners, architectural reviewers, and the hardest reasoning problems. OpenAI reports that GPT-5.5 Pro scored 39.6 percent on FrontierMath Tier 4, nearly double the 22.9 percent Claude Opus 4.7 achieved on the same benchmark. It is priced six times higher than the standard model and rolls out to Pro, Business, and Enterprise ChatGPT users only.
Codex is now powered by GPT-5.5
NVIDIA, an early deployment partner, reports that over 10,000 of its employees are using GPT-5.5-powered Codex across departments, with debugging cycles that used to take days closing within hours. OpenAI is pitching Codex as the agentic front door to GPT-5.5 rather than as a separately-tuned model line.
Benchmarks
The headline gains from GPT-5.4 to GPT-5.5 are largest on agentic coding and long-context retrieval, which are the two areas OpenAI has been explicitly optimising for since the GPT-5 series began.
Terminal-Bench 2.0 (agentic coding)
GPT-5.5 scores 82.7 percent, 7.6 points above GPT-5.4 at 75.1 percent, and ahead of Claude Opus 4.7 at 69.4 percent. Terminal-Bench measures multi-step coding workflows where the model must use tools, read output, and adapt.
MRCR v2 at 512K to 1M tokens (long-context retrieval)
GPT-5.5 jumps to 74.0 percent, up from 36.6 percent for GPT-5.4. This is the largest single-release gain on long-context reliability OpenAI has published, and is the strongest practical reason to consider routing very long document work to GPT-5.5 rather than the previous generation.
FrontierMath Tier 4 (hardest mathematics)
GPT-5.5 scores 35.4 percent (versus 27.1 percent for GPT-5.4), and GPT-5.5 Pro reaches 39.6 percent. Tier 4 problems are research-grade and defeat most current frontier models.
GDPval and MMLU (general knowledge work)
GPT-5.5 sets a record 84.9 percent on GDPval, OpenAI's benchmark for economically valuable work across 44 occupational fields, and scores 92.4 percent on MMLU. Hallucinations are down roughly sixty percent versus GPT-5.4.
The picture is not uniformly dominant. Anthropic's Claude Opus 4.7 remains ahead of GPT-5.5 on at least one coding benchmark covering GitHub issue resolution (64.3 percent versus 58.6 percent). For teams already using Claude Opus for repository-level fixes, the case to migrate is workload-dependent rather than automatic.
Pricing and availability
API access is coming "very soon" after launch day. Once live, pricing in the Responses and Chat Completions APIs is:
- GPT-5.5: $5 per 1M input tokens, $30 per 1M output tokens, with a 1M context window.
- GPT-5.5 Pro: $30 per 1M input tokens, $180 per 1M output tokens.
- GPT-5.4 for reference: $2.50 per 1M input tokens, $15 per 1M output tokens.
Standard GPT-5.5 is roughly twice the per-token price of GPT-5.4. OpenAI argues that the model is both more capable and more token-efficient in practice, so effective cost-per-task will not always double. That is a claim to verify on your own representative workloads before migrating high-volume pipelines.
In ChatGPT, GPT-5.5 is available to Plus, Pro, Business, and Enterprise users. GPT-5.5 Pro is available to Pro, Business, and Enterprise only. In Codex, GPT-5.5 is now the default model for the same paid tiers.
GPT-5.5 versus GPT-5.4
GPT-5.4 remains a sensible default for a large share of production workloads. It is cheaper, documented, and already integrated into most teams' automation. The migration to GPT-5.5 is not automatic, and for cost-sensitive workflows it is not necessarily desirable.
Escalate to GPT-5.5 when the task genuinely benefits from one or more of: a very long context (hundreds of thousands of tokens, or close to the 1M limit), multi-step agentic tool use, or tasks where the hallucination reduction materially changes the quality of the output. Examples include large-scale refactors across many files, whole-repository migrations, complex debugging that traverses many call paths, and document analysis workloads where critical information is scattered across long inputs.
Escalate to GPT-5.5 Pro for architectural reviews, high-stakes technical decisions, hard research problems, and tasks where the cost of a wrong answer materially exceeds the six-times token premium. For typical coding and document work, Pro is overkill.
Drop to GPT-5.4 Mini for quick bounded edits, latency-sensitive subagents, and simple review passes, as before. The mini tier has not changed.
Sensible default for most teams
GPT-5.4 remains the correct starting point for most production Codex usage on cost grounds. Pin GPT-5.5 where the long-context or agentic-coding gains are worth the roughly 2x per-token price, and GPT-5.5 Pro only where the hardest reasoning is on the critical path. Benchmark on representative tasks before migrating at scale.
What about Codex-Spark, Codex-Cyber, and the codenames?
On 22 April, a set of unreleased model names briefly appeared inside the Codex app before being withdrawn: GPT-5.3-Codex-Spark, GPT-5.2-Codex-Cyber, arcanine, glacier-alpha, heisenberg, GPT-Rosalind, and oai-2.1. Our initial coverage treated these as speculative. The formal 23 April release has now confirmed which of them were real.
OpenAI's announcement names only GPT-5.5 and GPT-5.5 Pro. Codex is described as "powered by GPT-5.5", with no specialist Spark or Cyber variants shipping alongside. The experimental codenames (arcanine, glacier-alpha, heisenberg, oai-2.1) did not appear in the launch materials either. We read that as strong evidence the in-app appearance was an internal test or a staging accident, and that most of those names were internal test aliases rather than products.
GPT-Rosalind, the biology-focused research model, remains a separate domain-specialist line rather than part of the GPT-5.5 release, and is not generally available. Treat it as out of scope unless your organisation has specific research-preview access and a life-sciences use case.
Governance and cost considerations
Two practical recommendations for teams running Codex or the OpenAI API at scale.
First, pin the model explicitly in automation rather than relying on default routing. ChatGPT and Codex defaults have now shifted to GPT-5.5 for eligible tiers, which means any workflow that relied on the previous default has silently moved to a model that costs roughly twice as much per token. CI pipelines, scheduled jobs, and production integrations should specify the model by API identifier and change it deliberately.
Second, measure token efficiency, not just per-token price. OpenAI claims GPT-5.5 is more token-efficient than GPT-5.4 for equivalent tasks, which can narrow or even invert the raw 2x price differential. Track effective cost-per-task on a representative sample before deciding whether the upgrade pays for itself.
Owlpen and GPT-5.5
GPT-5.5 is now a direct candidate for integration into the Owlpen platform, subject to evaluation on document-heavy and cost-reduction workloads, and to data-handling review once OpenAI publishes the definitive API terms.
The 1M-token context window and the long-context retrieval improvements are the headline reasons this matters for Owlpen. Many of the workloads Owlpen is built for (contract review, tender analysis, policy auditing, long-form document synthesis) are exactly the pattern where the MRCR v2 jump from 36.6 percent to 74.0 percent should show up as visibly better output.
GPT-5.5 Pro is a more cautious integration candidate because of its pricing. We expect to offer it as an escalation path for specific high-stakes tasks rather than as a default tier. GPT-5.4 and GPT-5.4 Mini will remain available in Owlpen for cost-sensitive workflows where the 5.5 gains are not required.
Owlpen position (indicative)
GPT-5.5 is a confirmed candidate for Owlpen integration once API availability, pricing stability, and data-handling commitments are verified. GPT-5.5 Pro is planned as an escalation tier for specific high-stakes tasks rather than a default. GPT-5.4, GPT-5.4 Pro, and GPT-5.4 Mini remain the baseline options for cost-sensitive and latency-sensitive workloads.
If you would like to discuss whether GPT-5.5 is the right model for your workflows, or how the Owlpen platform could support your business, contact us at enquiries@coaleypeak.co.uk or read more about the Owlpen platform.
Disclaimer. This article is published by Coaley Peak Ltd for general informational purposes only. The views expressed are those of the author, Stephen Grindley, and do not constitute legal, regulatory, financial, or technical advice. Nothing in this article should be relied upon when making procurement, investment, compliance, or technology decisions. References to third-party products, platforms, and companies are for informational purposes only and do not constitute endorsement. Benchmark figures, pricing, availability, and model positioning described here are drawn from OpenAI's public announcement on 23 April 2026 and early press coverage; OpenAI may subsequently revise pricing, availability, or model behaviour. Readers should seek independent professional advice appropriate to their specific circumstances and should not make procurement or migration decisions on the basis of this article. Information was accurate to the best of the author's knowledge at the date of publication. Coaley Peak Ltd and Stephen Grindley accept no liability for any loss or damage arising from reliance on the contents of this article.