Air-Gapped Generative AI in Regulated Manufacturing - Zorost Intelligence

Pull-quote: “Air-gapped AI is not ‘AI minus features’. It is a different deployment shape with different trade-offs and different costs.”

Why this matters

A meaningful share of regulated manufacturing facilities — automotive, aerospace, defense industrial base, certain pharma and medical-device sites — cannot accept internet egress on production networks. The reasons are a mix of customer security requirements, IP protection, and regulatory posture. Cloud-only generative AI does not fit these environments at all.

The result has been a market gap: cloud-based AI vendors who dismiss air-gapped customers as “not the target,” and traditional QMS vendors who don’t offer AI. The customer is left without a credible option.

What air-gapped AI actually requires

An air-gapped agentic AI stack is not “the cloud product, minus the cloud.” It is a different system shape:

Local LLM serving — open-weights models (Llama 3.x, Qwen 2.5, Mistral, Phi-4, Gemma 3) served on customer hardware via Ollama, vLLM, or llama.cpp
Local embeddings — open-source embedding models served on the same stack
Local vector database — pgvector or Weaviate or Qdrant on a private subnet
Local model registry — MLflow Model Registry running on customer infrastructure
Local evaluation harness — golden datasets, regression tests, and hallucination detection that ship inside the air-gapped boundary
Local update pipeline — model and corpus updates delivered as signed bundles via approved physical or networked transfer

What it costs (typical configuration)

Hardware — one to four GPU workstations or a multi-GPU server, depending on facility load. NVIDIA RTX 4090 / 5090 / RTX A6000 / H100 are typical depending on scale.
Model storage — six local LLMs (general, coding, reasoning, two domain-tuned, one embedding) at roughly 58 GB total in our reference config.
RAG corpus storage — depends on scope; 765,000 chunks fit comfortably in tens of GB on local disk with pgvector.
Operations — a containerized stack (Docker / Compose) with monitoring (Grafana, Prometheus) and a local control plane.

Trade-offs vs. cloud

Latency is generally similar to cloud for the smaller models, better than cloud for chained calls (no network round-trip).
Capability is lower than the strongest closed-source models. Open-weights models in 2026 are excellent but not at parity with the absolute frontier.
Cost is higher up-front (hardware) and lower over time (no per-token bills).
Update cadence is slower because model updates must clear the air-gap boundary.
Evaluation discipline has to be tighter, because you cannot lean on the vendor’s evaluations.

Why we run this

SPCio is co-developed with a manufacturing intelligence partner whose customer base requires air-gapped deployment as the default. We treat that as the design constraint, not the exception. The cloud-deployable variant of SPCio falls out of the air-gapped variant naturally; the reverse is much harder.

Closing

Air-gapped agentic AI is real in 2026. It is a deployment shape with its own constraints, its own cost profile, and its own discipline — and for a meaningful share of regulated industries, it is the only deployment shape that fits.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Why this matters

What air-gapped AI actually requires

What it costs (typical configuration)

Trade-offs vs. cloud

Why we run this

Closing

Related

Recent posts

Archive

Tags

Transformative Consulting for Cloud, AI, and Beyond.

Recent comments

Company

Services