DeepSeek V4 Preview: V4-Pro, V4-Flash, and What They Mean for Business

On 24 April 2026, DeepSeek released the V4 Preview, a pair of open-weight mixture-of-experts models (V4-Pro and V4-Flash), both with a 1M-token context window, MIT licensing, and pricing that substantially undercuts every Western frontier provider.

For businesses thinking about AI cost, this is a notable release. DeepSeek reports that V4-Pro performs close to, but not at, the level of GPT-5.4 and Gemini 3.1 Pro on hard reasoning and agentic benchmarks, while V4-Flash undercuts even GPT-5.4 Nano on input price. The company openly describes the gap to the top Western frontier as "approximately 3 to 6 months".

DeepSeek V4 is not available inside the Owlpen platform and will not be, in its standard hosted form, for any Coaley Peak client. The reason is data sovereignty: DeepSeek's official API is hosted in China. We will only integrate V4 once it is available in a compartmentalised deployment that keeps client data off Chinese infrastructure. More on that below.

What has actually shipped

The V4 Preview introduces two distinct models, both built on a mixture-of-experts architecture and both trained on more than 30 trillion tokens.

DeepSeek V4-Pro

A 1.6 trillion-parameter foundation model with 49 billion active parameters per token. Positioned as the capability flagship, pre-trained on 33 trillion tokens. The Hugging Face weights are roughly 865 GB. DeepSeek reports strong performance on coding and mathematics among open-weight models and a narrower but still material gap against closed frontier models on world knowledge and the hardest reasoning benchmarks.

DeepSeek V4-Flash

A smaller 284 billion-parameter model with 13 billion active parameters per token, trained on 32 trillion tokens. Positioned as the latency-sensitive variant, with reasoning that approaches V4-Pro at faster response times. Weights are roughly 160 GB.

Thinking and Non-Thinking modes

Both models support a toggle between extended reasoning ("Thinking") and direct response ("Non-Thinking") modes, mirroring the extended-thinking pattern now common across frontier providers. Thinking mode trades latency for accuracy on the kinds of problem where step-by-step working helps.

Architectural efficiency

DeepSeek highlights two architectural features: token-wise compression and DeepSeek Sparse Attention. Together they substantially reduce inference cost at long context. The company's paper claims that at 1M tokens, V4-Pro uses roughly 27 percent of the per-token compute and 10 percent of the key-value cache size of V3.2. This is the mechanism behind the aggressive pricing.

Pricing, and how it compares

The DeepSeek API pricing, quoted per million tokens, is as follows:

V4-Flash: $0.14 input, $0.28 output.
V4-Pro: $1.74 input, $3.48 output.

For context, GPT-5.5 is $5 / $30, Claude Opus 4.7 is $5 / $25, and Gemini 3.1 Pro sits between the two. V4-Pro's output pricing is roughly one-seventh of Claude Opus 4.7 and one-ninth of GPT-5.5 Pro. V4-Flash is cheaper again and undercuts the nano and mini tiers of most Western providers.

The pricing is the most commercially interesting thing about this release. It means the calculation for high-volume, cost-sensitive workloads (for example, bulk document analysis, agentic scraping, large-scale classification) now has to seriously account for a 5-to-10-times cost delta against the Western frontier. Whether that cost delta is realisable in practice depends on where the data can lawfully and safely go, which is where the governance question bites.

Pricing at a glance

V4-Pro at $1.74 / $3.48 per million tokens is roughly 3x cheaper than GPT-5.4 and 7x cheaper than Claude Opus 4.7 on output. V4-Flash at $0.14 / $0.28 undercuts the cheapest mini tiers from every Western provider. These are list prices on DeepSeek's official hosted API and do not include the cost or complexity of self-hosting the open weights.

Benchmarks, and the admitted gap

Independent and DeepSeek-reported benchmark figures are consistent on two themes: V4-Pro is strong on coding and mathematics, and visibly weaker on world knowledge and hardest-tier reasoning.

Coding

V4-Pro scores 80.6 percent on SWE-bench Verified, within 0.2 points of Claude Opus 4.6. A higher-effort V4-Pro-Max variant reportedly leads LiveCodeBench Pass@1 at 93.5 percent, ahead of Gemini 3.1 Pro at 91.7 and Claude Opus 4.6 Max at 88.8. Treat these figures as vendor-reported until independent harnesses publish their own numbers.

World knowledge

On SimpleQA-Verified, V4-Pro scores 57.9 percent against Gemini 3.1 Pro at 75.6 percent. That is a meaningful gap in factual recall, and it has direct implications for use cases that need the model to be reliable about named entities, historical facts, or recent events. Retrieval-augmented grounding helps, but the base model's weaker knowledge head is a real factor.

DeepSeek's own framing

The DeepSeek paper describes the trailing position candidly: "Performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months." That is an honest statement, and it is the right mental model: DeepSeek V4 is not at the Western frontier, but it is close, and it is dramatically cheaper.

Access, API compatibility, and deprecations

DeepSeek V4 is accessible in three ways: through chat.deepseek.com (Expert and Instant modes), through DeepSeek's hosted API using OpenAI Chat Completions and Anthropic-compatible endpoints (model identifiers deepseek-v4-pro and deepseek-v4-flash), and through open weights on Hugging Face under an MIT licence. The Hugging Face release means any sufficiently resourced team can, in principle, run V4 on their own hardware or in an enclave of their choice.

DeepSeek has confirmed that the previous-generation model identifiers deepseek-chat and deepseek-reasoner will be fully retired and inaccessible after 24 July 2026. Any existing integration pointing at those models needs to migrate to V4 (or to a different provider) within that window.

Why DeepSeek V4 is not in Owlpen

DeepSeek's official hosted API is operated from China. Under UK GDPR, data sent to that API constitutes an international transfer to a third country without an adequacy decision, and carries the accompanying transfer-impact-assessment and supplementary-safeguards obligations. For most of our clients, especially those in regulated sectors (public sector, healthcare, legal, financial services), that is not a transfer we can make routinely or by default.

The weights being MIT-licensed and publicly downloadable is what makes a future Owlpen integration realistic. Open weights mean the same model can be deployed inside a compartmentalised environment that never touches Chinese infrastructure: for example, within a UK or EU sovereign cloud, on private hardware, or through a Western-hosted inference provider that licences the weights under its own compliance regime.

Until that compartmentalised deployment is in place and independently reviewed, DeepSeek V4 is out of scope inside Owlpen. Clients running document analysis, cost reduction, and business intelligence workloads will continue to reach the Claude, GPT-5.4, and GPT-5.5 families that Owlpen already supports, with Gemini and Gemma 4 as additional routes where appropriate. The Owlpen model picker will not surface DeepSeek V4 options to end users until the data pathway is one Coaley Peak is comfortable signing off on for their workload.

What this means in practice

For clients running high-volume, non-sensitive workloads (for example, experimentation, internal synthetic data generation, public-data enrichment) V4-Flash is genuinely interesting at $0.14 input. The cost differential can change which workloads are economic at all. These use cases are a natural fit for a compartmentalised deployment once available.

For clients running anything that touches personal data, client-confidential material, regulated records, or commercially sensitive analysis, the hosted DeepSeek API is not a realistic option without a formal transfer-impact assessment and supplementary safeguards that most organisations will not want to carry. The practical answer is to stay on Western-hosted models and to revisit DeepSeek V4 once Owlpen exposes it through a compartmentalised route.

For teams already running fully self-hosted inference, the open weights remove the middle layer entirely: V4-Pro and V4-Flash can be served from your own hardware, with no third-country transfer at all. The trade-off is the operational cost and complexity of running a 1.6T-parameter mixture-of-experts model, which is not trivial.

Not in Owlpen (unless compartmentalised)

DeepSeek V4-Pro and V4-Flash are not available in the Owlpen platform in their standard hosted form. The DeepSeek API is China-hosted, which is incompatible with the data-handling commitments Owlpen makes to clients. We will only expose DeepSeek V4 inside Owlpen once it is available through a compartmentalised deployment (a UK or EU sovereign host, a private deployment of the open weights, or a Western-hosted provider of the MIT-licensed weights) and has passed our data-pathway review. Claude, GPT-5.4, GPT-5.5, Gemini, and Gemma 4 remain the supported routes in the meantime.

If you would like to discuss whether a compartmentalised DeepSeek deployment is appropriate for a specific workload, or how the Owlpen platform could support your business in the meantime, contact us at enquiries@coaleypeak.co.uk or read more about the Owlpen platform.

Disclaimer. This article is published by Coaley Peak Ltd for general informational purposes only. The views expressed are those of the author, Stephen Grindley, and do not constitute legal, regulatory, financial, or technical advice. Nothing in this article should be relied upon when making procurement, investment, compliance, or technology decisions. References to third-party products, platforms, and companies are for informational purposes only and do not constitute endorsement. Benchmark figures, pricing, architectural claims, and availability described here are drawn from DeepSeek's public announcement and early technology press coverage; Coaley Peak has not independently verified them. The commentary on data sovereignty and UK GDPR international transfers is a general business observation, not a legal opinion, and organisations should take their own legal and regulatory advice before transferring personal or commercially sensitive data to any third-country hosted service. Information was accurate to the best of the author's knowledge at the date of publication. Coaley Peak Ltd and Stephen Grindley accept no liability for any loss or damage arising from reliance on the contents of this article.