Google Gemma 4: Open-Weight LLM Explained

Gemma 4 is a large language model (LLM) from Google DeepMind. Unlike ChatGPT, Copilot, or Gemini Advanced, it does not run on Google's servers. The model weights are published and can be downloaded, deployed, and run entirely on your own infrastructure. For businesses that handle sensitive data, that distinction matters more than most people realise.

A large language model is a type of AI system trained on vast amounts of text to understand and generate language. LLMs power the AI writing assistants, document analysis tools, and chatbots that have become common across business software. Gemma 4 is Google DeepMind's open-weight version: derived from the same research that underpins Gemini, Google's flagship commercial AI, but released so that businesses can run it themselves. Every query is processed on your hardware, within your environment. No data reaches a Google server. Nothing is logged by a third party.

The Owlpen platform, Coaley Peak's AI-powered cost and performance intelligence system, is compatible with Gemma 4. The shift to an Apache 2.0 licence with Gemma 4 is the primary reason it is the version we deploy: earlier Gemma versions used custom licensing terms that restricted commercial use in ways that made them unsuitable as a production AI layer. Apache 2.0 removes those constraints entirely. For clients where data sovereignty is a requirement, Gemma 4 is now how we approach that part of the engagement.

Open-weight vs. open-source vs. closed: what actually matters

These terms are thrown around loosely, so it is worth being precise. A “closed” model like GPT-4o or Anthropic's Claude is accessible only via an API. Every prompt you send travels to the model provider's servers, is processed there, and a response is returned. You have no visibility into what happens to that data in transit or at rest. For routine consumer use this is fine. For a business processing payroll data, patient records, legal documents, or commercially sensitive forecasts, it requires careful consideration under UK GDPR, and sometimes makes the tool impractical to use at all.

An “open-weight” model like Gemma sits in a different category. The model weights (the trained parameters that define how the model behaves) are downloadable. You can run the model on your own server, in your own data centre, or within your cloud tenancy. The AI runs on your hardware, processes data within your environment, and never requires an outbound call to Google. Gemma is not a product you subscribe to; it is, in effect, a piece of sophisticated software you own and operate.

This is distinct from “open-source” in the strict sense. Gemma's training data and full technical specifications are not published. But for the practical question of whether your data is exposed, the distinction is largely irrelevant. What matters is whether the model processes your data on infrastructure you control. With Gemma, it does.

Where Gemma currently sits

Google has released several generations of Gemma in quick succession. The current generation, Gemma 4, was released on 2 April 2026, three days ago at the time of writing. It comes in four variants, from compact mobile-sized models to a 31-billion-parameter version capable of tackling complex analysis tasks on a single high-specification server.

The previous generation, Gemma 3, released on 10 March 2025, remains widely used and is mature in terms of tooling and support. It introduced the ability to process images alongside text, a useful capability for businesses dealing with scanned documents, invoices, or visual data, and extended the context window from roughly 6,000 words (8,000 tokens) to around 96,000 words (128,000 tokens). That kind of context window means the model can hold an entire large contract or a full quarter's worth of financial reporting in its working memory at once.

Gemma 4 advances further still. The larger models support up to 256,000 tokens of context (around 192,000 words), handle video and audio inputs alongside images and text, and include a reasoning mode in which the model works through its logic step by step before producing a final answer, generating a readable audit trail of its deductive process. On the Arena AI leaderboard, the independent benchmark widely used by the research community to compare open models, the Gemma 4 31B Dense model ranks third globally among all open-weight models, a position that places it alongside models many times its size and at a level that would have required a proprietary commercial model to achieve just eighteen months ago.

A note on Gemma 4 licensing

Gemma 4 is released under the Apache 2.0 licence, a material change from Gemma 1, 2, and 3, which were distributed under a custom “Gemma Terms of Use” framework. That custom licence contained two clauses that created serious enterprise risk. First, Google reserved the right to terminate a business's usage at any time if it determined the Acceptable Use Policy had been violated. For organisations building operational infrastructure on Gemma, this represented an uninsurable dependency on Google's shifting policy discretion. Second, the licence contained a viral clause: any synthetic data generated by a Gemma model and used to train a separate AI system rendered that system a “model derivative”, permanently binding it to Google's terms. Apache 2.0 removes both constraints entirely. It is an irrevocable, globally recognised permissive licence that grants full commercial use, unlimited modification rights, and no restrictions on synthetic data generation or redistribution. Businesses and software vendors can build on Gemma 4 without exposure to termination risk or viral licensing contamination.

The four Gemma 4 variants

Gemma 4 is released as four distinct models, each optimised for a different deployment context. The two smaller variants use a technique called Per-Layer Embeddings to maximise efficiency on constrained hardware. The 26B A4B uses a Mixture of Experts (MoE) architecture: it houses 25.2 billion parameters in total but activates only 3.8 billion during any given inference cycle, routing each query to the eight most relevant specialist sub-networks from a pool of 128. This allows it to deliver the reasoning quality of a much larger model at the computational cost of a small one. The 31B is fully dense, activating all parameters simultaneously for maximum analytical depth.

Variant	Architecture	Active params	Context window	Input types
E2B	Dense	2.3B	128K tokens (~96K words)	Text, image, audio
E4B	Dense	4.5B	128K tokens (~96K words)	Text, image, audio
26B A4B	Mixture of Experts	3.8B active / 25.2B total	256K tokens (~192K words)	Text, image, video
31B	Dense	30.7B	256K tokens (~192K words)	Text, image, video

The E2B and E4B variants support native audio input, including speech recognition and translation, making them particularly suited to voice-driven edge applications such as hands-free warehouse operations or vehicle-based workflows.

The data privacy argument

UK GDPR does not prohibit using cloud AI services, but it does require you to understand what personal data is being processed, where, and by whom, and to have a lawful basis for that processing. When you send data to a closed AI API, you are typically acting as a data controller and the model provider as a data processor. That means contracts, data processing agreements, transfer impact assessments where relevant, and ongoing monitoring.

Running an open-weight model on your own infrastructure simplifies this considerably. If personal data never leaves your environment, the third-party processor question largely disappears. For organisations in regulated sectors including financial services, healthcare, legal, and the public sector, this can be the difference between an AI tool being usable at all or not.

There is a further legal dimension that is underappreciated in most AI discussions: the conflict between UK GDPR and the US CLOUD Act. The CLOUD Act, enacted in 2018, grants US federal authorities the power to compel US-based technology companies to hand over data stored on their servers, regardless of where in the world those servers are physically located. This creates a direct collision with UK GDPR, which requires that foreign governments access the data of UK citizens only through formal international legal assistance agreements.

Following the Schrems II ruling, which invalidated the EU-US Privacy Shield framework in 2020, using a closed AI API managed by a US hyperscaler for sensitive UK data requires extensive Transfer Impact Assessments and Standard Contractual Clauses to demonstrate adequate protection. In practice, when data is processed by a US-headquartered cloud provider, US jurisdiction ultimately applies. For UK businesses in healthcare, financial services, or the public sector, this creates a compliance dilemma that is difficult to resolve within the existing legal architecture.

When Gemma 4 is deployed on-premise or within a UK-managed data centre, the data never touches US corporate infrastructure. The CLOUD Act exposure disappears entirely. This is not a theoretical distinction: for NHS contractors, financial services firms under FCA supervision, and public sector bodies subject to Cabinet Office guidance on data residency, it can be the deciding factor in whether an AI tool is legally deployable at all.

Beyond legal compliance, there is a commercial sensitivity argument that applies to almost every business. Competitive intelligence, supplier pricing, M&A due diligence, margin data, staff performance records: none of these should leave your systems during processing. With a self-hosted model, they do not need to.

The cost dimension

Closed AI APIs charge per token, a unit roughly equivalent to three-quarters of a word. For occasional use this is inexpensive. For a business processing thousands of documents per month, running compliance checks across a full supplier database, or analysing operational data at scale, the per-token cost compounds significantly.

Open-weight models replace per-query charges with infrastructure costs. Once a model is deployed on appropriate hardware (whether a cloud virtual machine, a dedicated server, or on-premise GPU) the marginal cost of running additional queries is effectively zero. For high-volume applications, the economics typically favour self-hosted models by a material margin. Published comparisons from the AI research community suggest open-weight models can cost approximately one-fifth to one-sixth of equivalent closed API usage at scale, though this varies considerably by workload.

How Gemma works within the Owlpen platform

The Owlpen platform connects to a business's existing operational, financial, and procurement systems and applies machine learning to identify cost inefficiencies and performance gaps. The platform is built on a combination of tools and integrations, selected for each client engagement based on the nature of the data and the analysis required.

For clients where data sovereignty is a requirement, whether due to regulatory obligations, commercial sensitivity, or internal policy, Gemma models can be incorporated into the Owlpen architecture as the AI layer that processes that data. Rather than routing sensitive business data through an external API, Owlpen can run analysis using a Gemma model deployed within the client's own environment or within Coaley Peak's managed infrastructure under a data processing agreement.

The practical result is that clients who would previously have been unable to engage AI-powered analysis on their most sensitive data, because of the regulatory or contractual barriers to using a cloud API, now have a viable path. Gemma's capacity to handle large volumes of text, structured data, and documents within a self-contained environment makes it particularly well-suited to the kinds of financial and operational analysis Owlpen performs.

How to run and fine-tune Gemma 4

Google provides several routes to experiment with and deploy Gemma 4, suited to different levels of technical readiness.

Google AI Studio

The quickest way to try Gemma 4 without any infrastructure is Google AI Studio, a free browser-based tool at aistudio.google.com. It allows you to send prompts to Gemma 4 models, adjust parameters, and evaluate outputs against your own data. No server configuration required. Useful for evaluating the model's capabilities before committing to a deployment.

Vertex AI

For organisations operating on Google Cloud, Vertex AI supports Gemma 4 as a deployable managed model endpoint. Vertex AI handles infrastructure, scaling, and monitoring, and provides a managed fine-tuning pipeline that allows you to adapt Gemma 4 to your domain using your own data without configuring GPU clusters manually. This is the route most enterprise teams on Google Cloud will use for production deployments.

Hugging Face and Kaggle

The Gemma 4 model weights are available for direct download from Hugging Face, enabling deployment on private servers or within your own cloud tenancy using standard ML frameworks such as PyTorch and Transformers. Hugging Face also hosts fine-tuning tooling. Kaggle, which is Google-owned, provides free GPU notebook access for evaluating Gemma 4 on your own datasets before committing to infrastructure.

Local deployment (Ollama)

For teams that want to run Gemma 4 on standard hardware without cloud dependency, tools such as Ollama allow a Gemma 4 model to be running locally on a Mac, Windows, or Linux machine in minutes. The smaller variants (E2B, E4B) are well-suited to this; the 31B model requires a capable GPU. Useful for development and internal evaluation. Not suited to production workloads that need to serve many concurrent users.

Fine-tuning Gemma 4 on your organisation's own data is supported via Vertex AI and Hugging Face's training ecosystem. A fine-tuned model reflects the terminology, document formats, and domain knowledge specific to your business, which materially improves performance on specialised tasks compared to a general-purpose deployment.

Where Gemma 4 is likely to go next

Several applications are emerging that are likely to become significant over the next two to three years.

Edge and on-device processing

The E2B and E4B variants of Gemma 4 are small enough to run on capable smartphones and edge hardware. This opens use cases where AI processing must happen locally, on a warehouse floor, in a vehicle, or on a remote site without reliable internet connectivity, rather than in a central data centre. On-device processing also eliminates the network latency that makes cloud AI unsuitable for real-time operational applications.

Sector-specific fine-tuned models

Apache 2.0 licensing makes Gemma 4 straightforward to commercialise. Expect a proliferation of domain-specific variants: models fine-tuned on legal, financial, medical, logistics, or procurement data and made available as products or internal tools. For businesses in regulated sectors, a Gemma 4 model fine-tuned on sector-specific documents is likely to outperform a general-purpose frontier model on the tasks that matter to them, while keeping all processing on controlled infrastructure.

Agentic and multi-step workflows

Gemma 4's extended context window and reasoning capabilities make it well-suited to agentic frameworks, where AI models plan and execute multi-step tasks rather than responding to single prompts. Early enterprise use cases include automated document review pipelines, procurement analysis agents, and compliance checking workflows. These architectures are developing rapidly across enterprise software, and a self-hosted model like Gemma 4 is particularly appropriate given the sensitivity of the data these agents typically handle.

Embedding AI directly within business process systems, whether ERP, procurement platforms, or document management, is also a natural direction. Gemma 4's efficiency and permissive licensing make it viable to ship as a component inside enterprise software products, providing real-time AI capabilities without requiring a dependency on any external API.

What Gemma is not

Open-weight models require technical infrastructure to deploy. Gemma is not a product you install from a browser; it requires server capacity, configuration, and ongoing maintenance. For businesses without an in-house technical team, that means working with a managed service provider or consultancy that handles deployment and operation. This is one reason why Gemma is more often used as a component within a broader platform than as a standalone tool.

Gemma is also not a finished product in the sense of a packaged software application. The model itself is general-purpose; it can write, summarise, extract, classify, and reason across text and images. But using it effectively for a specific business task requires prompt engineering, integration with your data sources, and validation of outputs. At Coaley Peak we do this as part of each Owlpen engagement; the model is one layer in a structured process that ends in independently verified findings.

Finally, like all current-generation AI models, Gemma can produce incorrect outputs. The frequency of errors varies by task type and model size, and can be reduced significantly with careful design of the prompts and processes around the model. But it is not a system you should rely on for high-stakes automated decisions without human review. Responsible AI governance calls for human oversight of consequential decisions, such as credit assessments, medical recommendations, or legal determinations, regardless of which model or licence is in use. This applies to Gemma as much as to any other AI system.

Questions worth asking about any AI tool

Whether you are evaluating Gemma, a closed API service, or any other AI tool for use in your business, three questions apply regardless of the technology:

Where is the data being processed, and who has access to it during processing?
What contractual and regulatory obligations arise from using this tool with your data?
How are the model's outputs verified before they inform a business decision?

Open-weight models like Gemma give more straightforward answers to the first two than most cloud services. The third question (verification) is one that no model answers on its own. It requires a process built around the AI output, not just the AI itself.

If you would like to understand how Gemma-compatible analysis could apply to your business, whether for cost reduction, compliance, procurement, or operational intelligence, contact Coaley Peak at enquiries@coaleypeak.co.uk or read more about the Owlpen platform.

Disclaimer. This article is published by Coaley Peak Ltd for general informational purposes only. The views expressed are those of the author, Stephen Grindley, and do not constitute legal, regulatory, financial, or technical advice. Nothing in this article should be relied upon when making procurement, investment, compliance, or technology decisions. Readers should seek independent professional advice appropriate to their specific circumstances. Information was accurate to the best of the author's knowledge at the date of publication. Coaley Peak Ltd and Stephen Grindley accept no liability for any loss or damage arising from reliance on the contents of this article.