Chapter 33: The Sovereign Codebase
The journey from a working prototype to a hardened production system is not a matter of adding features — it is a matter of eliminating failure modes, enforcing physical constraints, and codifying operational discipline into the infrastructure itself. This final chapter closes the architectural loop: every design decision made across the preceding thirty-two chapters now converges into a single, auditable, disaster-proof deployment that owns its own logic, its own data, and its own fate.
33.1 The Architecture of Ownership
To understand what we have built, we must name its parts with precision. The system is a deliberate, three-layer separation of concerns, and the boundaries between those layers are not suggestions — they are load-bearing walls.
The I/O Layer — Go. The compiled orchestrator binary is the system's nervous system. It handles all concurrency primitives: HTTP ingestion, WebSocket fan-out to the dashboard, Unix socket communication with the Prolog process, and the HTTP/2 inference calls to the local LLM daemon. Go was chosen because its goroutine scheduler and channel primitives make it trivially correct to write concurrent I/O code that is also memory-safe and statically compiled to a single binary with no runtime dependencies. It makes no logical decisions. It routes, buffers, serializes, and dispatches. Nothing more.
The Logic Layer — SWI-Prolog. The Prolog engine is the system's cerebellum: it receives structured Prolog terms from Go over a Unix domain socket, evaluates them against a deterministic rule base, and returns ground answers. The critical property here is mathematical determinism. Given the same knowledge base and the same query, SWI-Prolog will return the same answer on every machine, on every operating system, in every decade, so long as the ISO Prolog standard is honored. There are no probabilistic weights, no gradient updates, no session state bleeding between calls. This is the foundation on which operational trust is built.
The Semantic Layer — The Local 8B LLM. The language model is the system's interpreter of human intent. It does not make decisions; it translates. A natural-language query from an operator — "which nodes are at risk of thermal throttling?" — is converted into a structured Prolog query term, which is handed to the deterministic engine. The LLM's probabilistic nature is confined to the translation step, where an approximate answer is acceptable. It is never permitted to influence the logical reasoning itself. Architecturally, it is a smart serializer, not an oracle.
The Anti-Hyperconvergence Rule
This rule is non-negotiable, and violating it will eventually produce a catastrophic, unrecoverable failure at the worst possible moment.
The Logic Node must run on dedicated physical hardware — a bare-metal machine that is not a member of the cluster it orchestrates.
The recommended hardware profile for a homelab or small enterprise edge deployment is an Intel N100 mini-PC (fanless, 6–10W TDP, 16GB LPDDR5, 512GB NVMe) or a Raspberry Pi 5 (8GB). Both are inexpensive, reliable, and purpose-built for always-on embedded operation. The cost is irrelevant compared to the operational guarantee they provide.
The reason is a deterministic deadlock that has no software solution. Consider the failure mode: the Logic Node is a VM running inside a Proxmox host that it manages. A complete power loss hits the rack. Power is restored. The Proxmox host begins its boot sequence — but before it can bring any VMs online, it queries the Logic Node to determine boot order, resource allocation policy, and cluster quorum state. The Logic Node VM cannot start until Proxmox has fully initialized. Proxmox will not fully initialize without consulting the Logic Node. The system is locked in a perfectly symmetric deadlock with no internal mechanism to break it. An operator must physically intervene, connect a console, and manually override the boot sequence — exactly the scenario we designed the entire system to prevent.
On dedicated physical hardware, the Logic Node boots independently, becomes reachable on the management network, and by the time the Proxmox cluster nodes begin negotiating quorum, the orchestration brain is already listening. The boot sequence becomes a directed acyclic graph with a clear root. Physical separation is the only architectural solution.
33.2 The Sovereign Directory Structure
Every file in the system has a single, authoritative home. The directory structure below is the canonical layout for /opt/logic-node/. It is not a suggestion — deploy scripts, systemd unit files, and backup procedures all reference these paths by convention. Deviation requires updating every dependent artifact simultaneously.
/opt/logic-node/
│
├── bin/
│ └── orchestrator # Compiled Go binary (CGO-enabled, statically linked libc)
│ # Rebuilt by deploy.sh; never edited in-place
│
├── go/ # Go source tree (version-controlled)
│ ├── cmd/
│ │ └── orchestrator/
│ │ └── main.go # Entrypoint: flag parsing, daemon init, signal handling
│ ├── internal/
│ │ ├── prolog/
│ │ │ └── bridge.go # Unix socket I/O to swipl process; term serialization
│ │ ├── llm/
│ │ │ └── client.go # llama.cpp HTTP inference client; prompt templating
│ │ ├── api/
│ │ │ └── handlers.go # HTTP/WebSocket handlers; JSON marshaling
│ │ └── metrics/
│ │ └── push.go # VictoriaMetrics remote_write client
│ └── go.mod
│
├── kb/ # Prolog Knowledge Base (the sovereign logic corpus)
│ ├── main.pl # Root loader: consults core/ then state/ in order
│ ├── core/ # Immutable logical axioms (version-controlled, never hot-edited)
│ │ ├── cluster.pl # Cluster topology rules: node membership, quorum predicates
│ │ ├── thermal.pl # Thermal policy logic: throttle thresholds, alert conditions
│ │ ├── network.pl # VLAN assignment rules, firewall policy predicates
│ │ └── scheduler.pl # Workload placement logic: constraint satisfaction rules
│ └── state/ # Mutable runtime state (written by orchestrator, backed up)
│ ├── node_status.pl # Ground facts: node_up/2, node_temp/2, node_load/2
│ ├── vm_assignments.pl # Ground facts: vm_host/2, vm_state/2
│ └── alerts.pl # Asserted alert facts: active_alert/3
│
├── www/ # Frontend dashboard assets (served by orchestrator's HTTP server)
│ ├── index.html # Single-page shell; loads wasm_exec.js and main.wasm
│ ├── wasm_exec.js # Go-provided WASM JavaScript runtime bridge
│ ├── main.wasm # Compiled Go/WASM frontend bundle
│ └── static/
│ ├── style.css # Minimal utility CSS; no external CDN dependencies
│ └── icons/ # SVG node-state icons (offline-first: zero external fetches)
│
├── llm/
│ └── models/ # GGUF quantized model weights
│ ├── mistral-8b-q4_k_m.gguf # Primary inference model (~4.5GB, Q4_K_M quantization)
│ └── sha256sums # Integrity manifest; verified by deploy.sh on startup
│
├── metrics/ # VictoriaMetrics TSDB data directory
│ │ # Mounted as a separate ZFS dataset for independent snapshot policy
│ └── data/ # Raw TSDB blocks; managed exclusively by vminstance process
│
├── logs/ # Structured JSON log output (journald also captures stdout)
│ └── .gitkeep
│
└── deploy.sh # The single source of truth for deployment procedure
The separation between kb/core/ and kb/state/ is the most important structural decision in the entire codebase. Core rules are logic — they change only when policy changes, under version control, with a full deploy cycle including syntax validation. State facts are data — they change continuously at runtime as the orchestrator asserts and retracts ground facts in response to cluster telemetry. The systemd unit grants ReadWritePaths exclusively to kb/state/. The core/ directory is mounted read-only at the OS level during normal operation. An operator cannot accidentally corrupt a logical axiom by debugging a live incident.
33.3 Continuous Integration and Hardening
deploy.sh
The deployment script is the single authorized procedure for pushing a new build to the Logic Node. It is idempotent, auditable, and fails loudly at the first sign of a problem. The set -euo pipefail directive on line one is not optional — it is the difference between a failed deployment that stops and a failed deployment that silently continues and corrupts a live system.
#!/usr/bin/env bash
# deploy.sh — Sovereign Logic Node Deployment Script
# Run as: sudo ./deploy.sh
# Requires: swipl, go >= 1.22, systemctl
set -euo pipefail
DEPLOY_ROOT="/opt/logic-node"
GO_SRC="${DEPLOY_ROOT}/go"
BIN_OUT="${DEPLOY_ROOT}/bin/orchestrator"
KB_ROOT="${DEPLOY_ROOT}/kb"
LLM_MODELS="${DEPLOY_ROOT}/llm/models"
echo "[deploy] Starting Logic Node deployment — $(date -u +%Y-%m-%dT%H:%M:%SZ)"
# ── Step 1: Verify GGUF model integrity ─────────────────────────────────────
echo "[deploy] Verifying LLM model checksums..."
pushd "${LLM_MODELS}" > /dev/null
sha256sum --check sha256sums --status
popd > /dev/null
echo "[deploy] Model integrity: OK"
# ── Step 2: Prolog knowledge base syntax validation ─────────────────────────
# This is the most critical gate. A syntax error in kb/main.pl that reaches
# production will cause the orchestrator to fail all logic queries silently,
# degrading to LLM-only operation without alerting the operator.
echo "[deploy] Validating Prolog knowledge base syntax..."
swipl \
-g "load_files(['${KB_ROOT}/main.pl'], [silent(true)]), halt." \
-t "halt(1)"
echo "[deploy] Prolog syntax validation: PASSED"
# ── Step 3: Go build with CGO enabled ───────────────────────────────────────
# CGO is required for the swipl foreign-function interface bindings.
# The build is fully static against musl libc to eliminate glibc version
# dependencies between build host and Logic Node hardware.
echo "[deploy] Building Go orchestrator binary..."
pushd "${GO_SRC}" > /dev/null
CGO_ENABLED=1 \
CC=musl-gcc \
CGO_CFLAGS="-I/usr/lib/swipl/include" \
CGO_LDFLAGS="-L/usr/lib/swipl/lib/x86_64-linux -lswipl" \
go build \
-ldflags="-linkmode external -extldflags '-static' \
-X main.BuildVersion=$(git describe --tags --always --dirty) \
-X main.BuildTime=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
-o "${BIN_OUT}" \
./cmd/orchestrator/
popd > /dev/null
echo "[deploy] Go build: OK — $(du -sh ${BIN_OUT} | cut -f1) binary written to ${BIN_OUT}"
# ── Step 4: Ownership and permission hardening ───────────────────────────────
echo "[deploy] Applying filesystem permissions..."
chown -R root:logic-node "${DEPLOY_ROOT}"
chmod 750 "${DEPLOY_ROOT}/bin/orchestrator"
chmod -R 550 "${KB_ROOT}/core/" # core rules: root can write, logic-node reads only
chmod -R 770 "${KB_ROOT}/state/" # state facts: orchestrator writes, both read
chmod -R 750 "${DEPLOY_ROOT}/www/"
# ── Step 5: Service restart ──────────────────────────────────────────────────
echo "[deploy] Restarting logic-orchestrator service..."
systemctl restart logic-orchestrator
# Wait for service to reach active state with a 15-second timeout
timeout 15 bash -c \
'until systemctl is-active --quiet logic-orchestrator; do sleep 1; done'
echo "[deploy] Service status: $(systemctl is-active logic-orchestrator)"
echo "[deploy] ✓ Deployment complete — $(date -u +%Y-%m-%dT%H:%M:%SZ)"
logic-orchestrator.service
The systemd unit file implements the principle of least privilege at the operating system level. The directives below are not belt-and-suspenders security theater — each one closes a specific, documented attack surface. A compromised orchestrator process operating under these constraints cannot read user home directories, cannot write to arbitrary filesystem paths, cannot acquire new privileges through setuid binaries, and cannot persist malicious state outside the explicitly permitted write paths.
# /etc/systemd/system/logic-orchestrator.service
[Unit]
Description=Sovereign Logic Node Orchestrator
Documentation=file:///opt/logic-node/README.md
After=network-online.target
Wants=network-online.target
# Explicit ordering: orchestrator must be fully active before any
# cluster-management timers or dependent services are started.
Before=cluster-watchdog.timer
[Service]
Type=notify
User=logic-node
Group=logic-node
WorkingDirectory=/opt/logic-node
ExecStart=/opt/logic-node/bin/orchestrator \
--kb-path=/opt/logic-node/kb/main.pl \
--llm-socket=/run/llama/inference.sock \
--listen=0.0.0.0:8443 \
--tls-cert=/etc/logic-node/tls/server.crt \
--tls-key=/etc/logic-node/tls/server.key \
--metrics-addr=127.0.0.1:9090
# Graceful shutdown: allow 30s for in-flight queries to complete
# before SIGKILL is sent.
ExecStop=/bin/kill -s SIGTERM $MAINPID
TimeoutStopSec=30
# Restart policy: restart on any non-clean exit, but not on
# clean exit (code 0) or SIGTERM (intentional stop).
Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60
StartLimitBurst=3
# ── Hardening Directives ─────────────────────────────────────────────────────
# Prevent the process or any child from gaining new privileges via
# setuid/setgid binaries or filesystem capabilities.
NoNewPrivileges=yes
# Mount the entire OS filesystem tree read-only. The orchestrator
# binary can read /opt/logic-node but cannot modify any OS paths.
ProtectSystem=strict
# Prevent access to /home, /root, and /run/user entirely.
ProtectHome=yes
# Give the process a private /tmp and /var/tmp namespace, invisible
# to all other processes. Eliminates /tmp-based symlink attacks.
PrivateTmp=yes
# The ONLY writable path granted to the service. The Prolog state
# facts directory must be writable so the orchestrator can assert
# and retract ground facts at runtime. All other writes will EPERM.
ReadWritePaths=/opt/logic-node/kb/state/
# Additional hardening: prevent writes to kernel tunables and
# hardware management interfaces.
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
# Restrict to the minimum required syscall set for a Go network service.
# The @system-service set permits sockets, files, signals, and threading.
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
# Prevent the process from seeing other users' processes in /proc.
PrivateUsers=yes
ProcSubset=pid
# Hard memory ceiling: prevents a runaway Prolog query from consuming
# all system RAM and triggering OOM-kill of unrelated processes.
MemoryMax=512M
MemorySwapMax=0
# Give the service a private /dev with only null, zero, random, urandom,
# tty, and pts. Eliminates access to raw block devices and framebuffers.
PrivateDevices=yes
[Install]
WantedBy=multi-user.target
After deploying this unit file, run systemd-analyze security logic-orchestrator to generate a numerical security exposure score. A correctly hardened unit of this type should achieve an exposure score below 4.0 (on systemd's 0–10 scale, where lower is more secure). Any score above 5.0 indicates a missing directive and should block the deployment.
33.4 Disaster Recovery: Encrypted ZFS Replication
Hardware fails. The N100 mini-PC will eventually fail — a NAND cell will flip, a capacitor will bulge, a firmware update will corrupt the EFI partition. The question is not whether this will happen but whether the system can be restored to full logical operation in under fifteen minutes when it does. The answer is yes, provided the following procedure is in place before the failure occurs.
The DR Philosophy: Back Up Logic, Not Weight
The total footprint of the Logic Node's irreplaceable state is under 50MB. The Prolog knowledge base — both core rules and runtime state facts — is a directory of plain text files. The VictoriaMetrics TSDB, while valuable for trend analysis, can be rebuilt from live cluster telemetry within hours. The LLM model weights in /opt/logic-node/llm/models/ are approximately 4.5GB of quantized parameters that are publicly available and can be re-downloaded from Hugging Face or a local mirror in under twenty minutes on a reasonable connection.
We do not back up model weights. Including them in the replication stream would inflate the DR footprint by a factor of 100, require a larger and more expensive backup medium, extend the backup window from seconds to minutes, and provide zero additional protection for any irreplaceable data. The DR target is the knowledge base — the hand-crafted logical axioms and the runtime state facts that represent weeks or months of operational refinement.
Creating the Encrypted Backup Dataset
On the external USB backup pool (assumed to be imported as backup_pool), create the destination dataset with AES-256-GCM encryption. This command is run once during initial DR setup:
zfs create \
-o encryption=aes-256-gcm \
-o keyformat=passphrase \
-o keylocation=prompt \
backup_pool/logic_dr
When prompted, enter a passphrase of at least 32 characters. Store this passphrase in a hardware-backed secret manager (a YubiKey HMAC-SHA1 challenge-response slot is appropriate) or a physically secured printed document — never in a file on the Logic Node itself. The encrypted dataset is useless without the passphrase; there is no recovery mechanism.
Confirm encryption is active:
zfs get encryption,keyformat,keystatus backup_pool/logic_dr
The keystatus property must show available for a mounted, unlocked dataset.
zfs_sync.sh — The Automated Replication Script
This script is triggered by a root cron job every four hours. It takes an atomic snapshot of the knowledge base ZFS dataset, sends the delta to the encrypted backup pool, and prunes obsolete snapshots to prevent unbounded storage growth. The entire operation, on a healthy system with a sub-50MB KB, completes in under three seconds.
#!/usr/bin/env bash
# zfs_sync.sh — Logic Node Knowledge Base Disaster Recovery Replication
# Cron: 0 */4 * * * root /opt/logic-node/zfs_sync.sh >> /var/log/zfs_sync.log 2>&1
set -euo pipefail
# ── Configuration ────────────────────────────────────────────────────────────
SOURCE_DATASET="rpool/logic-node/kb" # ZFS dataset containing /opt/logic-node/kb
DEST_DATASET="backup_pool/logic_dr" # Encrypted destination on USB pool
SNAP_PREFIX="dr"
SNAP_NAME="${SNAP_PREFIX}-$(date -u +%Y%m%dT%H%M%SZ)"
KEEP_SNAPS=12 # Retain 48 hours of 4-hour snapshots
echo "[zfs_sync] Starting DR replication — $(date -u)"
# ── Ensure backup pool is imported and key is loaded ─────────────────────────
if ! zpool list backup_pool > /dev/null 2>&1; then
echo "[zfs_sync] FATAL: backup_pool is not imported. Check USB device attachment."
exit 1
fi
if ! zfs get keystatus "${DEST_DATASET}" | grep -q "available"; then
echo "[zfs_sync] FATAL: Encryption key for ${DEST_DATASET} is not loaded."
echo "[zfs_sync] Run: zfs load-key ${DEST_DATASET} && mount this script manually."
exit 1
fi
# ── Take an atomic snapshot of the source KB dataset ─────────────────────────
echo "[zfs_sync] Taking snapshot: ${SOURCE_DATASET}@${SNAP_NAME}"
zfs snapshot -r "${SOURCE_DATASET}@${SNAP_NAME}"
# ── Determine the replication mode: full send or incremental ─────────────────
# Find the most recent snapshot successfully received on the destination.
LAST_DEST_SNAP=$(zfs list -H -t snapshot -o name -s creation "${DEST_DATASET}" \
| grep "@${SNAP_PREFIX}-" \
| tail -n 1 \
| sed "s|${DEST_DATASET}@||")
if [[ -z "${LAST_DEST_SNAP}" ]]; then
echo "[zfs_sync] No prior snapshot on destination. Performing full send..."
zfs send -R "${SOURCE_DATASET}@${SNAP_NAME}" \
| zfs receive -F "${DEST_DATASET}"
else
echo "[zfs_sync] Incremental send from @${LAST_DEST_SNAP} to @${SNAP_NAME}..."
# -R: replicate the full dataset tree recursively
# -I: send all intermediate snapshots between the two bookmarks
# This ensures no snapshot gap on the destination, preserving full history.
zfs send -R -I \
"${SOURCE_DATASET}@${LAST_DEST_SNAP}" \
"${SOURCE_DATASET}@${SNAP_NAME}" \
| zfs receive -F "${DEST_DATASET}"
fi
echo "[zfs_sync] Replication complete."
# ── Prune old snapshots on source (keep KEEP_SNAPS most recent) ──────────────
echo "[zfs_sync] Pruning old source snapshots (keeping ${KEEP_SNAPS})..."
zfs list -H -t snapshot -o name -s creation "${SOURCE_DATASET}" \
| grep "@${SNAP_PREFIX}-" \
| head -n -${KEEP_SNAPS} \
| xargs -r -I{} zfs destroy {}
# ── Prune old snapshots on destination (mirror source policy) ─────────────────
echo "[zfs_sync] Pruning old destination snapshots (keeping ${KEEP_SNAPS})..."
zfs list -H -t snapshot -o name -s creation "${DEST_DATASET}" \
| grep "@${SNAP_PREFIX}-" \
| head -n -${KEEP_SNAPS} \
| xargs -r -I{} zfs destroy {}
echo "[zfs_sync] ✓ DR sync complete — $(date -u)"
echo "[zfs_sync] DR footprint: $(zfs list -H -o used ${DEST_DATASET})"
To validate the DR procedure, mount a snapshot on a test machine and verify that swipl -g "load_files(['kb/main.pl'], [silent(true)]), halt." exits cleanly. This test should be performed monthly and documented in an operations runbook.
33.5 Scaling to the Edge and Final Thoughts
Sovereign Security: The 1,000-Node Horizon
Every architectural principle established in this book was designed with a single homelab node in mind — and is simultaneously valid at enterprise edge scale. The mechanism is SWI-Prolog's Pengine architecture, introduced in Chapter 23.
A Pengine (Prolog Engine) is a sandboxed, remotely invocable Prolog instance that exposes its query interface over HTTP. In a 1,000-node edge deployment — factory floors, retail point-of-sale clusters, distributed telecommunications equipment rooms — each physical location runs a local Logic Node with its own Pengine server. A central coordination layer, implemented in Go using the same bridge code developed in this book, fans out queries to remote Pengines simultaneously over HTTP/2, collects ground-term responses, and aggregates them into a global logical view.
The fundamental property that makes this work without architectural renegotiation is referential transparency. A Prolog rule about thermal throttling thresholds that is correct on one Logic Node is correct on all 1,000 nodes, because the rule makes no reference to the machine it runs on. Distributing the logic is distributing a mathematical theorem — not a stateful microservice with session affinity requirements, not a model that needs synchronized weight checkpoints, not a rule engine that requires a shared database for consistency. Each node is a self-contained logical sovereign. The central coordinator aggregates proofs; it does not produce them. The homelab architecture is the enterprise architecture. The scale parameter changes. The model does not.
Final Thoughts
There is a seductive convenience in delegating intelligence to a distant API. A cloud provider's managed reasoning service will accept a JSON payload, return a structured response, and silently absorb all the operational complexity of keeping that service available — until it doesn't. Until the API changes its pricing model at 72 hours' notice. Until a region-wide outage takes down a system that controlled physical equipment that has no manual fallback. Until a compliance audit reveals that proprietary inference endpoints processed sensitive operational data that was contractually required to never leave the building.
The system built across these thirty-three chapters refuses that bargain. Its logic runs on hardware you own, in a language whose semantics are defined by an ISO standard that has not changed in thirty years and will not change because a product manager decided to pivot. Its knowledge base is a directory of plain text files that can be read, audited, version-controlled, and reasoned about by any engineer with a text editor and a logic background. Its disaster recovery state fits in 50MB and can be replicated to a USB drive that costs less than a monthly API subscription. Its security model is enforced by the Linux kernel, not by a vendor's access control policy that can be altered unilaterally.
This is not nostalgia for a simpler era. It is a precise engineering response to the failure modes of the current one. The infrastructure landscape of 2026 has produced extraordinary capability and extraordinary fragility in equal measure, and the operators who understand this distinction are the ones who will be standing when the cascading failures arrive — as they always do. Mathematical determinism does not drift. Prolog's resolution algorithm does not have an outage. A ZFS snapshot does not require a support ticket to restore.
Technological sovereignty is not a political stance. It is an operational posture. It means that the question "what will this system do?" has a single, verifiable, auditable answer that does not depend on the continued good behavior of any third party. It means that you can read the rules, prove the inferences, inspect the state, and restore the system from backup — alone, at 3 AM, with no internet connection, on replacement hardware purchased from a consumer electronics store. That is the standard. Every decision made across this book — the physical hardware separation, the read-only core rules, the encrypted ZFS replication, the systemd least-privilege hardening, the deterministic Prolog brain at the center of the entire architecture — exists to make that standard achievable and maintainable by a single disciplined engineer.
Build the thing you own. Own the thing you build. The logic is yours.
No comments to display
No comments to display