Pull-quote: “Air-gapped AI is not ‘AI minus features’. It is a different deployment shape with different trade-offs and different costs.”
Why this matters
A meaningful share of regulated manufacturing facilities — automotive, aerospace, defense industrial base, certain pharma and medical-device sites — cannot accept internet egress on production networks. The reasons are a mix of customer security requirements, IP protection, and regulatory posture. Cloud-only generative AI does not fit these environments at all.
The result has been a market gap: cloud-based AI vendors who dismiss air-gapped customers as “not the target,” and traditional QMS vendors who don’t offer AI. The customer is left without a credible option.
What air-gapped AI actually requires
An air-gapped agentic AI stack is not “the cloud product, minus the cloud.” It is a different system shape:
- Local LLM serving — open-weights models (Llama 3.x, Qwen 2.5, Mistral, Phi-4, Gemma 3) served on customer hardware via Ollama, vLLM, or llama.cpp
- Local embeddings — open-source embedding models served on the same stack
- Local vector database — pgvector or Weaviate or Qdrant on a private subnet
- Local model registry — MLflow Model Registry running on customer infrastructure
- Local evaluation harness — golden datasets, regression tests, and hallucination detection that ship inside the air-gapped boundary
- Local update pipeline — model and corpus updates delivered as signed bundles via approved physical or networked transfer
What it costs (typical configuration)
- Hardware — one to four GPU workstations or a multi-GPU server, depending on facility load. NVIDIA RTX 4090 / 5090 / RTX A6000 / H100 are typical depending on scale.
- Model storage — six local LLMs (general, coding, reasoning, two domain-tuned, one embedding) at roughly 58 GB total in our reference config.
- RAG corpus storage — depends on scope; 765,000 chunks fit comfortably in tens of GB on local disk with pgvector.
- Operations — a containerized stack (Docker / Compose) with monitoring (Grafana, Prometheus) and a local control plane.
Trade-offs vs. cloud
- Latency is generally similar to cloud for the smaller models, better than cloud for chained calls (no network round-trip).
- Capability is lower than the strongest closed-source models. Open-weights models in 2026 are excellent but not at parity with the absolute frontier.
- Cost is higher up-front (hardware) and lower over time (no per-token bills).
- Update cadence is slower because model updates must clear the air-gap boundary.
- Evaluation discipline has to be tighter, because you cannot lean on the vendor’s evaluations.
Why we run this
SPCio is co-developed with a manufacturing intelligence partner whose customer base requires air-gapped deployment as the default. We treat that as the design constraint, not the exception. The cloud-deployable variant of SPCio falls out of the air-gapped variant naturally; the reverse is much harder.
Closing
Air-gapped agentic AI is real in 2026. It is a deployment shape with its own constraints, its own cost profile, and its own discipline — and for a meaningful share of regulated industries, it is the only deployment shape that fits.


