Contacts
Get in touch
Close

Contacts

USA, Washington D.C

+ (1) 240-380-7545

info@zorost.com

Pull-quote: “Air-gapped AI is not ‘AI minus features’. It is a different deployment shape with different trade-offs and different costs.”

Why this matters

A meaningful share of regulated manufacturing facilities — automotive, aerospace, defense industrial base, certain pharma and medical-device sites — cannot accept internet egress on production networks. The reasons are a mix of customer security requirements, IP protection, and regulatory posture. Cloud-only generative AI does not fit these environments at all.

The result has been a market gap: cloud-based AI vendors who dismiss air-gapped customers as “not the target,” and traditional QMS vendors who don’t offer AI. The customer is left without a credible option.

What air-gapped AI actually requires

An air-gapped agentic AI stack is not “the cloud product, minus the cloud.” It is a different system shape:

  1. Local LLM serving — open-weights models (Llama 3.x, Qwen 2.5, Mistral, Phi-4, Gemma 3) served on customer hardware via Ollama, vLLM, or llama.cpp
  2. Local embeddings — open-source embedding models served on the same stack
  3. Local vector database — pgvector or Weaviate or Qdrant on a private subnet
  4. Local model registry — MLflow Model Registry running on customer infrastructure
  5. Local evaluation harness — golden datasets, regression tests, and hallucination detection that ship inside the air-gapped boundary
  6. Local update pipeline — model and corpus updates delivered as signed bundles via approved physical or networked transfer

What it costs (typical configuration)

  • Hardware — one to four GPU workstations or a multi-GPU server, depending on facility load. NVIDIA RTX 4090 / 5090 / RTX A6000 / H100 are typical depending on scale.
  • Model storage — six local LLMs (general, coding, reasoning, two domain-tuned, one embedding) at roughly 58 GB total in our reference config.
  • RAG corpus storage — depends on scope; 765,000 chunks fit comfortably in tens of GB on local disk with pgvector.
  • Operations — a containerized stack (Docker / Compose) with monitoring (Grafana, Prometheus) and a local control plane.

Trade-offs vs. cloud

  • Latency is generally similar to cloud for the smaller models, better than cloud for chained calls (no network round-trip).
  • Capability is lower than the strongest closed-source models. Open-weights models in 2026 are excellent but not at parity with the absolute frontier.
  • Cost is higher up-front (hardware) and lower over time (no per-token bills).
  • Update cadence is slower because model updates must clear the air-gap boundary.
  • Evaluation discipline has to be tighter, because you cannot lean on the vendor’s evaluations.

Why we run this

SPCio is co-developed with a manufacturing intelligence partner whose customer base requires air-gapped deployment as the default. We treat that as the design constraint, not the exception. The cloud-deployable variant of SPCio falls out of the air-gapped variant naturally; the reverse is much harder.

Closing

Air-gapped agentic AI is real in 2026. It is a deployment shape with its own constraints, its own cost profile, and its own discipline — and for a meaningful share of regulated industries, it is the only deployment shape that fits.