Sovereign Stack — How It All Works

Hardware Architecture

The Full Sovereign Build

A sealed, locked, branded rack housing the complete inference and agent stack. Designed for office environments — quiet, compact, enterprise-grade.

Sovereign ATX Enterprise Rack — Mac Studio cluster, Mac Minis, Synology NAS

Sovereign Enterprise Rack · Mac Studio cluster · agent workstations · 27U locked enclosure

⚡ Sovereign Enterprise Rack · 27U

Patch Panel + Cable Mgmt

1U · organized entry/exit

10GbE Managed Switch

1U · Ubiquiti UniFi Pro 24

Inference Cluster (5U)

Mac Studio M5 Ultra · Node 1

512GB unified memory

Mac Studio M5 Ultra · Node 2

512GB unified memory

Mac Studio M5 Ultra · Node 3

512GB unified memory

Mac Studio M5 Ultra · Node 4

512GB unified memory

Agent Workstations (5U)

Mac Mini M4 Pro · ×3

1U tray · 24GB each

Mac Mini M4 Pro · ×3

1U tray · 24GB each

Mac Mini M4 Pro · ×3

1U tray · 24GB each

Mac Mini M4 Pro · ×3

1U tray · 24GB each

Mac Mini M4 Pro · ×3

1U tray · 24GB each

Mac Mini · Control Node

Cluster mgmt + monitoring

Synology NAS · 64TB

2U · models + data + logs

APC Smart-UPS 2200VA

2U · 20-min runtime

Fan Banks × 5

1U each · forced air cooling

⚡ Layer 1 — Inference Cluster

The brain. Four Mac Studio M5 Ultra nodes connected via Thunderbolt 5 full mesh, pooling 2TB of unified memory into a single inference fabric. Runs trillion-parameter models that no single-node machine can handle. All inference requests from every access point route here.

4× Mac Studio M5 Ultra 2TB pooled memory TB5 full mesh · 120 Gbps ~3µs inter-node latency 1T+ parameter models 6–8 parallel inference slots ~300W total draw

💻 Layer 2 — Agent Workstations (15× Mac Mini)

15 Mac Mini M4 Pro nodes — one per team member, department, or use case. Each runs its own OpenClaw instance with dedicated agents, memory, and integrations configured for that role. Heavy inference gets routed to the cluster automatically; lightweight tasks run locally. No contention, no waiting.

15× Mac Mini M4 Pro 24GB unified memory each OpenClaw per node Custom agents per role Routes to cluster for heavy tasks

🌐 Layer 3 — Network Infrastructure

Ubiquiti 10GbE managed switch for low-latency inter-node and LAN traffic. Cluster lives on an isolated VLAN — air-gapped from external networks. NAS provides persistent storage for model weights, client data, agent memory, and audit logs. UPS ensures clean shutdown during power events, never corrupting model state.

10GbE Ubiquiti switch Isolated VLAN 64TB Synology NAS APC 2200VA UPS

🔒 Layer 4 — Physical Security

Sealed, lockable rack enclosure. Two keyholders only — designated by the client. Active thermal management with 5-layer fan banks keeps hardware cool at continuous load. Sovereign does not hold a key. No physical access, no remote access after setup is complete. The hardware runs fully autonomously.

Sealed + keyed enclosure 2 keyholders max 5-layer active cooling No Sovereign key access Fully autonomous operation

Access Points

Use it from anywhere you already work

No new app to install. No new workflow to learn. Sovereign meets your team wherever they are.

🌐

Web UI

Open WebUI on your local network. Any browser, any device on campus. Looks and feels like ChatGPT — zero learning curve.

sovereign.local / your IP

🔑

OpenAI-Compatible API

Drop-in replacement for any OpenAI integration. Same API format — swap the endpoint URL and your existing tools work instantly.

http://cluster/v1 + API key

💬

Discord

Bot in your Discord server. Students and staff message the AI directly in channels they already use. Custom commands per channel.

@SovereignBot

🤝

Slack

Socket Mode bot in your Slack workspace. Works in any channel or DM. Ideal for admin and faculty who live in Slack.

/ai in any channel

📱

Mobile App (PWA)

Open WebUI installs as a Progressive Web App from your browser. Feels like a native iOS/Android app, runs off the cluster.

Install from browser · no App Store

💻

VS Code / Cursor

API key in any Copilot-compatible extension. Developers and students get AI code completion and chat in their IDE — running locally.

OpenAI-compat endpoint

📝

Notion / Obsidian

API key in AI writing plugins. Research, summarization, and drafting happen inside your note-taking app, powered by the cluster.

API key in plugin settings

📞

Voice (Phone)

Call a dedicated number, speak naturally, get a response. Accessible without a device — useful for hands-free workflows.

Twilio → cluster

🎓

LMS Integration (Canvas / Google Classroom)

API key in Canvas external tools or Google Workspace add-ons. Students access AI without leaving their coursework environment.

LTI / API integration

Software Stack

What runs under the hood

All open-source. All auditable. Nothing phoning home.

OpenClaw

Agent Orchestration

Runs on every Mac Mini. Manages agents, memory, tool use, integrations, and all channel routing. The intelligence layer that makes models do real work.

llama-server

Primary Inference

llama.cpp-based inference server on the cluster. Handles all heavy model runs — the 397B and larger models route here via OpenAI-compatible API.

Ollama

Model Management

Manages secondary models on-demand. Handles smaller, faster models for quick tasks. Pulls and updates models automatically.

Open WebUI

Web Interface

ChatGPT-like web interface for anyone on the network. No install required. Supports multiple users, conversation history, and file uploads.

exo

Distributed Inference

Pools all 4 Mac Studios into a single inference fabric over Thunderbolt. Makes trillion-parameter models possible on consumer hardware.

Tailscale

Secure Tunnel (Setup Only)

Used once during initial configuration. Tunnel closes after setup. No ongoing remote access. Hardware operates fully autonomous after day one.

Everything. In Your Building.

The Full Sovereign Build

Request to Response — Zero Cloud

Use it from anywhere you already work

What runs under the hood

One conversation.
One afternoon to deploy.

Everything. In Your Building.

The Full Sovereign Build

Request to Response — Zero Cloud

Use it from anywhere you already work

What runs under the hood

One conversation.One afternoon to deploy.

One conversation.
One afternoon to deploy.