Chapter 25: Closed-Loop Remediation and Active Eviction
The Volumes III and IV stack can observe a dying hypervisor with sub-second latency, classify its failure mode with hysteresis-stabilised certainty, propagate a proof of that failure to a distributed cluster of Pengine agents, and route traffic away from it in a health-aware path computation — but it cannot yet do anything about the virtual machines still running on the node. Those VMs are accumulating latency, missing heartbeats, and degrading their own SLAs while the orchestrator holds a logically complete model of a cluster in crisis and takes no physical action. This chapter closes that loop: live_link/3 from Chapter 22 is wired into the routing layer; a Go Actuator consumes the alert channel and issues Proxmox live-migration API calls against nodes the WAM has flagged; and a Prolog Quorum Guard imposes a hard constitutional limit — (N/2) - 1 simultaneous evictions — that ensures a hysteresis cascade cannot autonomously shut down the entire data centre.
25.1 The Live Topology View
25.1.1 The Action Void
Chapter 17's shortest_path/3 traverses link/3 facts — static physical edges that represent cables, switches, and VLAN bridges. They are asserted at cluster bring-up and mutated only by explicit operator commands. They carry no signal from the Chapter 22 telemetry pipeline. When node_health(pve3, critical) is asserted because CPU steal has held above 40% for three consecutive scrape cycles, link(pve3, leaf_a, 12) remains fully traversable. The routing algorithm continues forwarding traffic into a node that the logic layer has already condemned.
The Chapter 22 live_link/3 predicate is the correction. It was added to live_state.pl (22.3.4) as the structural fix:
% From live_state.pl (§22.3.4):
live_link(A, B, Cost) :-
proxmox_topology:link(A, B, Cost),
healthy_node(A),
healthy_node(B).
A link is traversable only if the static edge exists AND both endpoints pass healthy_node/1. healthy_node/1 succeeds only when node_health(Node, nominal) holds — a node with no live metrics yet (never heard from) fails conservatively. The dependency graph:
live_link(A, B, Cost)
└── proxmox_topology:link(A, B, Cost) [static physical fact]
└── healthy_node(A)
└── known_node(A) [topology vocabulary]
└── node_health(A, nominal)
└── node_metric(A, cpu_steal, V, _) [live ingestor fact]
└── node_metric(A, disk_latency, V, _)
└── … (other metric types)
└── healthy_node(B)
└── (same structure for B)
The entire observation-to-routing pipeline is now a single predicate call. A metric update that changes node_health(pve3, nominal) to node_health(pve3, critical) causes healthy_node(pve3) to fail, which causes every live_link(pve3, _, _) and live_link(_, pve3, _) call to fail, which causes live_shortest_path/3 to exclude pve3 from all paths in the next routing query — without any explicit cache invalidation, without any topology mutation, and without any operator intervention.
25.1.2 Verifying live_link Exclusion
# Simulate pve3 entering critical state by asserting fabricated high steal:
root@logic-node-01:~# swipl \
-l /opt/logic-node/kb/proxmox_topology.pl \
-l /opt/logic-node/kb/live_state.pl \
-g "
% Establish a baseline nominal state for pve3:
get_time(Now), Ts is round(Now),
live_state:assert_node_metric(pve3, cpu_steal, 5.0, Ts),
live_state:assert_node_metric(pve3, disk_latency, 0.1, Ts),
live_state:assert_node_metric(pve3, arc_miss_rate,2.0, Ts),
live_state:assert_node_metric(pve3, disk_io_util, 20.0, Ts),
live_state:assert_node_metric(pve1, cpu_steal, 3.0, Ts),
live_state:assert_node_metric(pve1, disk_latency, 0.1, Ts),
live_state:assert_node_metric(pve1, arc_miss_rate,1.0, Ts),
live_state:assert_node_metric(pve1, disk_io_util, 15.0, Ts),
% Both nodes healthy — live_link should succeed:
(live_state:live_link(pve1, leaf_a, C1)
-> format('live_link(pve1, leaf_a): Cost=~w~n', [C1])
; writeln('live_link(pve1, leaf_a): FAILS')),
(live_state:live_link(pve3, leaf_a, C2)
-> format('live_link(pve3, leaf_a): Cost=~w~n', [C2])
; writeln('live_link(pve3, leaf_a): FAILS')),
% Escalate pve3 to critical:
live_state:assert_node_metric(pve3, cpu_steal, 92.4, Ts),
% live_link for pve3 should now fail:
(live_state:live_link(pve3, leaf_a, _)
-> writeln('FAIL: pve3 link still traversable after critical escalation')
; writeln('PASS: live_link(pve3, leaf_a) correctly blocked')),
% pve1 link should remain unaffected:
(live_state:live_link(pve1, leaf_a, _)
-> writeln('PASS: live_link(pve1, leaf_a) still traversable')
; writeln('FAIL: pve1 link incorrectly blocked')),
halt
"
live_link(pve1, leaf_a): Cost=10
live_link(pve3, leaf_a): Cost=12
PASS: live_link(pve3, leaf_a) correctly blocked
PASS: live_link(pve1, leaf_a) still traversable
25.1.3 Routing Divergence: Static vs Live Path
# With pve3 critical, compare static shortest_path vs live_query_path
# for a destination behind leaf_a:
root@logic-node-01:~# swipl \
-l /opt/logic-node/kb/proxmox_topology.pl \
-l /opt/logic-node/kb/live_state.pl \
-g "
% pve3 is critical (from prior baseline); pve1 is nominal.
% Query from pve4 (rack B) to pve2 (rack A, behind leaf_a).
% Static path: pve4 -> leaf_b -> spine1 -> leaf_a -> pve2
% Live path must avoid pve3 (already excluded from leaf_a — this is a
% topology-level check; pve3's isolation happens at the VM routing layer).
proxmox_topology:query_path(pve4, pve2, StaticCost, StaticPath),
format('Static path: ~w cost=~w~n', [StaticPath, StaticCost]),
live_state:live_query_path(pve4, pve2, LiveCost, LivePath),
format('Live path: ~w cost=~w~n', [LivePath, LiveCost]),
halt
"
Static path: [pve4,leaf_b,spine1,leaf_a,pve2] cost=30
Live path: [pve4,leaf_b,spine1,leaf_a,pve2] cost=30
The paths converge here because pve3 is not on the pve4 → pve2 route — live_link/3's effect is visible in queries that would otherwise route through pve3. The architectural guarantee is: no route computed by live_query_path/4 will transit a critical or degraded node, regardless of whether the static topology includes that node as a waypoint.
25.2 The Actuator: Consuming Alerts to Drive Physical Action
25.2.1 Architecture
The Chapter 22 Alert Dispatcher raises AlertEvent structs to a Go channel (alertCh) when compound health rules fire. In Chapter 22, those events are logged and counted. Chapter 25extends the pipeline: a dedicated Actuator goroutine reads from alertCh, evaluates whether the alert warrants autonomous physical intervention via the Quorum Guard (25.3), and if permitted, issues a live-migration request to the Proxmox API.
WAM live_state
│ assert_node_metric/4 (every 15s)
│ node_health/2 → critical
▼
alert_dispatcher.pl
│ check_alert_conditions/2
│ trigger_alert/4
▼
Go alertCh (chan AlertEvent)
▼
Actuator.Run()
├── QuorumGuard.Permit(node) ← WAM query: 25.3
│ └── (N/2)-1 check
├── MigrateVMs(node) ← Proxmox API
│ └── GET /nodes/{node}/qemu
│ └── POST /nodes/{node}/qemu/{vmid}/migrate
└── FenceNode(node) ← IPMI if migration fails
└── ipmitool chassis power off
The Actuator does not issue any Proxmox API call without a QuorumGuard.Permit check succeeding first. The quorum guard is a WAM query — not a Go variable — so it is subject to the same WAM serialisation constraints as all other goals and cannot race with concurrent health assertions.
25.2.2 Proxmox API Client
// File: /opt/logic-node/go/orchestrator/proxmox_client.go
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"time"
)
// ProxmoxClient wraps the Proxmox VE REST API.
// All calls use the API token authentication introduced in PVE 6.2.
// Token format: "PVEAPIToken=user@realm!tokenid=UUID"
// The token must have VM.Migrate privilege on the source node's VMs.
type ProxmoxClient struct {
baseURL string // e.g. "https://pve1.infra.internal:8006/api2/json"
apiToken string // PVEAPIToken=root@pam!actuator=<UUID>
http *http.Client
}
func NewProxmoxClient(baseURL, apiToken string) *ProxmoxClient {
return &ProxmoxClient{
baseURL: baseURL,
apiToken: apiToken,
http: &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
// Proxmox uses a self-signed TLS cert by default.
// In production, install the Proxmox CA and use TLSClientConfig
// with a custom CA pool. For the initial build, accept the
// self-signed cert — replace before production use.
TLSClientConfig: tlsConfigInsecure(),
},
},
}
}
// VMSummary is a minimal representation of a Proxmox VM for migration decisions.
type VMSummary struct {
VMID int `json:"vmid"`
Name string `json:"name"`
Status string `json:"status"` // running | stopped | paused
Mem int64 `json:"mem"` // allocated RAM in bytes
}
// ListVMs returns all VMs on the given node.
// Only running VMs are candidates for live migration.
func (c *ProxmoxClient) ListVMs(ctx context.Context, node string) ([]VMSummary, error) {
url := fmt.Sprintf("%s/nodes/%s/qemu", c.baseURL, node)
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", c.apiToken)
resp, err := c.http.Do(req)
if err != nil {
return nil, fmt.Errorf("list VMs on %s: %w", node, err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return nil, fmt.Errorf("list VMs on %s: HTTP %d: %s", node, resp.StatusCode, body)
}
var envelope struct {
Data []VMSummary `json:"data"`
}
if err := json.NewDecoder(resp.Body).Decode(&envelope); err != nil {
return nil, fmt.Errorf("decode VM list: %w", err)
}
return envelope.Data, nil
}
// MigrateVM initiates a live migration of VMID from sourceNode to targetNode.
// Live migration requires the VM to be running and both nodes to share storage
// or the VM to use live storage migration.
func (c *ProxmoxClient) MigrateVM(ctx context.Context, sourceNode string, vmid int, targetNode string) error {
url := fmt.Sprintf("%s/nodes/%s/qemu/%d/migrate", c.baseURL, sourceNode, vmid)
body, _ := json.Marshal(map[string]interface{}{
"target": targetNode,
"online": 1, // live migration: VM stays running during move
"with-local-disks": 1, // migrate local disks to target node storage
})
req, err := http.NewRequestWithContext(ctx, "POST", url,
bytes.NewReader(body))
if err != nil {
return err
}
req.Header.Set("Authorization", c.apiToken)
req.Header.Set("Content-Type", "application/json")
resp, err := c.http.Do(req)
if err != nil {
return fmt.Errorf("migrate VM %d from %s to %s: %w", vmid, sourceNode, targetNode, err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
respBody, _ := io.ReadAll(resp.Body)
return fmt.Errorf("migrate VM %d: HTTP %d: %s", vmid, resp.StatusCode, respBody)
}
return nil
}
// FenceNode issues an IPMI chassis power-off command to a node that cannot
// be safely migrated (e.g., node is unresponsive, migration fails twice).
// This is a last-resort operation — it kills all running VMs immediately.
// It is only invoked after MigrateVMs has failed and the Quorum Guard
// permits the fence operation.
func FenceNode(ctx context.Context, node string, ipmiConfig IPMIConfig) error {
// -E tells ipmitool to read the IPMI_PASSWORD environment variable
cmd := fmt.Sprintf(exec.CommandContext(ctx, "ipmitoolipmitool", "-H %s -U %s -P %s chassis power off"H", ipmiConfig.Host, "-U", ipmiConfig.User, ipmiConfig.Password,"-E", "chassis", "power", "off")
// Execute viaInject the OSpassword —directly IPMI is out-of-band and requiresinto the local
// ipmitool binary. This is an intentional exception to the no-shell
// rule: IPMI commands cannot be issued over the Proxmox API.
// The IPMIConfig credentials are loaded from a systemd credential
// (systemd-creds) and never appear inisolated process argumentsenvironment
incmd.Env production.= append(os.Environ(), fmt.Sprintf("IPMI_PASSWORD=%s", ipmiConfig.Password))
output, err := cmd.CombinedOutput()
if err != nil {
return runCommand(ctx,fmt.Errorf("ipmitool cmd)failed: %v, output: %s", err, string(output))
}
return nil
}
25.2.3 actuator.go — The Remediation Goroutine
// File: /opt/logic-node/go/orchestrator/actuator.go
package main
import (
"context"
"fmt"
"log"
"time"
)
// Actuator reads AlertEvents from the alert channel and triggers physical
// remediation for nodes the WAM has classified as requiring intervention.
// It consults the QuorumGuard before taking any destructive action.
type Actuator struct {
alertCh chan AlertEvent
pool *Pool
pve *ProxmoxClient
ipmi map[string]IPMIConfig // node → IPMI credentials
done chan struct{}
}
func NewActuator(alertCh chan AlertEvent, pool *Pool, pve *ProxmoxClient,
ipmi map[string]IPMIConfig) *Actuator {
return &Actuator{
alertCh: alertCh,
pool: pool,
pve: pve,
ipmi: ipmi,
done: make(chan struct{}),
}
}
// Run starts the Actuator event loop. Call in a goroutine.
// The loop reads AlertEvents and dispatches remediation actions.
// It respects context cancellation for graceful shutdown.
func (a *Actuator) Run(ctx context.Context) {
log.Println("[Actuator] Started")
for {
select {
case <-ctx.Done():
log.Println("[Actuator] Stopping (context cancelled)")
close(a.done)
return
case event, ok := <-a.alertCh:
if !ok {
log.Println("[Actuator] Alert channel closed")
close(a.done)
return
}
a.handleAlert(ctx, event)
}
}
}
// handleAlert dispatches remediation for a single alert.
// Only critical-severity alerts for a node that the Quorum Guard permits
// will trigger eviction. Degraded alerts are logged but not acted upon
// autonomously — degraded nodes receive a grace period for self-recovery
// before escalating to critical.
func (a *Actuator) handleAlert(ctx context.Context, event AlertEvent) {
if event.Severity != "critical" {
log.Printf("[Actuator] Non-critical alert %s/%s — no action taken",
event.Node, event.Condition)
return
}
log.Printf("[Actuator] CRITICAL alert: node=%s condition=%s ts=%d",
event.Node, event.Condition, event.Timestamp)
// Consult the Quorum Guard before any eviction.
permitted, reason := a.quorumGuard(ctx, event.Node)
if !permitted {
log.Printf("[Actuator] Quorum guard DENIED eviction of %s: %s",
event.Node, reason)
return
}
log.Printf("[Actuator] Quorum guard PERMITTED eviction of %s", event.Node)
// Attempt live migration of all running VMs off the failing node.
if err := a.evacuateNode(ctx, event.Node); err != nil {
log.Printf("[Actuator] Evacuation of %s FAILED: %v — attempting fence",
event.Node, err)
// Evacuation failed: fence the node as a last resort.
if ipmiCfg, ok := a.ipmi[event.Node]; ok {
if fenceErr := FenceNode(ctx, event.Node, ipmiCfg); fenceErr != nil {
log.Printf("[Actuator] FENCE of %s FAILED: %v — manual intervention required",
event.Node, fenceErr)
} else {
log.Printf("[Actuator] Node %s fenced successfully", event.Node)
}
} else {
log.Printf("[Actuator] No IPMI config for %s — cannot fence, manual intervention required",
event.Node)
}
return
}
log.Printf("[Actuator] Node %s evacuated successfully", event.Node)
}
// evacuateNode live-migrates all running VMs off node to healthy alternatives.
// Target selection uses live_query_path/4 — only healthy nodes are candidates.
func (a *Actuator) evacuateNode(ctx context.Context, node string) error {
vms, err := a.pve.ListVMs(ctx, node)
if err != nil {
return fmt.Errorf("list VMs: %w", err)
}
// Find healthy migration targets using the WAM.
targets, err := a.healthyMigrationTargets(ctx, node)
if err != nil {
return fmt.Errorf("find targets: %w", err)
}
if len(targets) == 0 {
return fmt.Errorf("no healthy migration targets available for %s", node)
}
var lastErr error
targetIdx := 0
for _, vm := range vms {
if vm.Status != "running" {
log.Printf("[Actuator] Skipping VM %d (%s) on %s — not running (status: %s)",
vm.VMID, vm.Name, node, vm.Status)
continue
}
target := targets[targetIdx%len(targets)]
targetIdx++
log.Printf("[Actuator] Migrating VM %d (%s) from %s to %s",
vm.VMID, vm.Name, node, target)
migrateCtx, cancel := context.WithTimeout(ctx, 10*time.Minute)
err := a.pve.MigrateVM(migrateCtx, node, vm.VMID, target)
cancel()
if err != nil {
log.Printf("[Actuator] Migration of VM %d failed: %v", vm.VMID, err)
lastErr = err
// Try the next target for remaining VMs.
targetIdx++
}
}
return lastErr
}
// healthyMigrationTargets dispatches a WAM query to find all nodes that are
// currently healthy AND reachable from the failing node via live_link/3.
// Returns node atom strings sorted by live path cost from the failing node.
func (a *Actuator) healthyMigrationTargets(ctx context.Context, failingNode string) ([]string, error) {
goal := fmt.Sprintf(
`findall(
Cost-Target,
( proxmox_topology:known_node(Target),
Target \= %s,
live_state:live_query_path(%s, Target, Cost, _)
),
Pairs
),
sort(1, @=<, Pairs, Sorted),
pairs_values(Sorted, Targets)`,
failingNode, failingNode,
)
result, err := a.pool.Dispatch(WorkItem{Goal: goal}, 5*time.Second)
if err != nil {
return nil, fmt.Errorf("WAM dispatch: %w", err)
}
if result.Err != nil {
return nil, fmt.Errorf("WAM query: %w", result.Err)
}
return result.Targets, nil
}
// quorumGuard dispatches the WAM quorum check and returns (permitted, reason).
// The WAM predicate is defined in 25.3.
func (a *Actuator) quorumGuard(ctx context.Context, node string) (bool, string) {
goal := fmt.Sprintf(
"cluster_quorum:eviction_permitted(%s, Reason)", node)
result, err := a.pool.Dispatch(WorkItem{Goal: goal}, 2*time.Second)
if err != nil {
// Fail safe: if the WAM is unavailable, deny eviction.
return false, fmt.Sprintf("WAM unavailable: %v", err)
}
if result.Err != nil {
// Goal failed: quorum guard denied.
return false, result.Err.Error()
}
return true, result.Reason
}
25.3 The Quorum Guard: A Constitutional Safety Valve
25.3.1 The Cascading Hysteresis Failure
The hysteresis guard in Chapter 22 §22.1.2 prevents single-metric flapping: a node must sustain a critical reading for N=3 consecutive scrape cycles before transitioning to critical state. But it does not prevent correlated failure: if a switch misconfiguration causes CPU steal to spike simultaneously on all 14 hypervisors, all 14 will cross the critical threshold within 45 seconds and all 14 will have node_health(pveN, critical) asserted. The Alert Dispatcher fires 14 critical alerts. The Actuator processes all 14. The Proxmox API receives 14 evacuation requests. Every node is fenced.
The data centre is now empty. The logic engine performed perfectly — it faithfully reflected the physical state and responded proportionally to each individual signal. The system has no bugs. It just destroyed itself.
The Quorum Guard is a Prolog predicate that answers a constitutional question before any eviction: "If I evict this node, will the cluster still have a majority of its nodes operational?" It encodes the distributed systems axiom that a cluster must retain quorum — strictly more than half its total nodes must be capable of forming a consensus — or no autonomous action should proceed.
25.3.2 cluster_quorum.pl
% File: /opt/logic-node/kb/cluster_quorum.pl
%
% Quorum Guard for autonomous cluster remediation.
% Implements the (N/2)-1 simultaneous eviction limit as a Prolog predicate
% that the Go Actuator must consult before every eviction action.
:- module(cluster_quorum, [
eviction_permitted/2, % Public API: eviction_permitted(+Node, -Reason)
cluster_quorum_state/1, % Diagnostic: cluster_quorum_state(-State)
max_simultaneous_evictions/1 % Derived constant: max_simultaneous_evictions(-N)
]).
:- use_module(proxmox_topology, [known_node/1]).
:- use_module(live_state, [healthy_node/1]).
% ── Quorum constants ──────────────────────────────────────────────────────────
% total_nodes(-N)
% The total number of nodes in the cluster. Derived from known_node/1
% so that adding or removing nodes from the topology automatically
% adjusts the quorum threshold without any change to this predicate.
total_nodes(N) :-
aggregate_all(count, proxmox_topology:known_node(_), N).
% max_simultaneous_evictions(-Max)
% The maximum number of nodes that may be concurrently under eviction.
% Formula: floor(N/2) - 1
%
% This is strictly more conservative than a simple majority: a 14-node
% cluster has a quorum majority of 8 (> N/2), but we allow at most
% floor(14/2) - 1 = 6 simultaneous evictions, leaving 8 nodes healthy.
% For N=3: max=0 — no autonomous evictions in a 3-node cluster.
% For N=5: max=1 — at most 1 eviction at a time.
% For N=14: max=6 — at most 6 concurrent evictions.
%
% The -1 beyond the majority threshold is the safety margin: it ensures
% that even if one additional node fails spontaneously during a batch
% eviction, the cluster still holds strict majority. Without it, evicting
% exactly floor(N/2) nodes leaves the cluster at exactly N/2 healthy
% nodes — one more failure yields a split-brain condition.
max_simultaneous_evictions(Max) :-
total_nodes(N),
Max is max(0, (N // 2) - 1).
% ── Active eviction tracking ──────────────────────────────────────────────────
% eviction_in_progress(+Node, +Timestamp)
% Dynamic fact: asserted when the Actuator begins evicting Node, retracted
% when eviction completes or times out. Timestamp is the Unix epoch time of
% eviction start — used by housekeep_evictions/0 to detect stalled evictions.
:- dynamic eviction_in_progress/2.
% current_eviction_count(-Count)
% Number of nodes currently under active eviction.
current_eviction_count(Count) :-
aggregate_all(count, eviction_in_progress(_, _), Count).
% mark_eviction_started(+Node)
% Called by the Go Actuator via a WAM goal BEFORE issuing any Proxmox API call.
% Records the eviction start time for timeout detection.
mark_eviction_started(Node) :-
must_be(atom, Node),
proxmox_topology:known_node(Node),
get_time(Now), Ts is round(Now),
retractall(eviction_in_progress(Node, _)),
assertz(eviction_in_progress(Node, Ts)).
% mark_eviction_complete(+Node)
% Called by the Go Actuator via a WAM goal AFTER eviction succeeds or fails.
mark_eviction_complete(Node) :-
must_be(atom, Node),
retractall(eviction_in_progress(Node, _)).
% housekeep_evictions/0
% Removes stale eviction_in_progress/2 entries. An eviction is stale if
% it has been in progress for more than 1,200 seconds (20 minutes).
% A Proxmox live migration of a heavily-loaded VM takes at most 10 minutes
% under normal conditions; 20 minutes implies the Proxmox API call stalled.
% Called by the Go maintenance ticker (60-second interval).
housekeep_evictions :-
get_time(Now),
forall(
eviction_in_progress(Node, StartTs),
( Now - float(StartTs) > 1200.0
-> ( retractall(eviction_in_progress(Node, _)),
log_format("[QuorumGuard] Stale eviction removed: ~w (started ~w)", [Node, StartTs])
)
; true
)
).
% ── Quorum permit predicate ───────────────────────────────────────────────────
% eviction_permitted(+Node, -Reason)
% The primary entry point for the Go Actuator.
% Succeeds if evicting Node is currently permitted by the quorum policy.
% Fails with a bound Reason atom if eviction is denied.
%
% Conditions checked in order:
% 1. Node is known.
% 2. Node's health is actually critical — don't evict a node that recovered
% between the alert and the quorum check.
% 3. Node is not already being evicted.
% 4. The number of nodes currently under eviction is below max_simultaneous_evictions.
% 5. After this eviction, at least (N//2 + 1) nodes will still be healthy.
eviction_permitted(Node, permitted) :-
must_be(atom, Node),
% Guard 1: Node must be a known cluster member.
( proxmox_topology:known_node(Node)
-> true
; throw(error(unknown_node(Node), eviction_permitted/2))
),
% Guard 2: Node must still be critical — health may have changed since alert.
( \+ live_state:healthy_node(Node)
-> true
; !, fail % Node is no longer sick — no action needed. Cut to prevent
% backtracking into guard 3 with a false premise.
),
% Guards 3, 4, and 5 must be evaluated under a mutex to prevent Check-Then-Act races.
with_mutex(quorum_state_lock, (
% Guard 3: Not already evicting this node.
( \+ eviction_in_progress(Node, _)
-> true
; !, fail
),
% Guard 4: Active eviction count is below the limit.
max_simultaneous_evictions(Max),
current_eviction_count(Active),
( Active < Max
-> true
; !, fail
),Max,
% Guard 5: Post-eviction healthy count must maintain strict majority.
aggregate_all(count, live_state:healthy_node(_), HealthyNow),
total_nodes(Total),
% After eviction: HealthyNow - 1 (the evicted node is removed from healthy set).
% But the evicted node is already unhealthy — HealthyNow excludes it.
% The eviction itself doesn't reduce HealthyNow further (the node is already sick).
% However, concurrent in-progress evictions DO reduce the effective healthy count:
% a node being evicted is not available as a routing target.
HealthyAfterEvictions is HealthyNow - Active, % subtract in-progress evictions
Quorum is Total // 2 + 1,
( HealthyAfterEvictionsHealthyNow >= Quorum
-> true
; !, fail
),Quorum,
% All guards passed — record the eviction aswhile in-progress.still holding the lock.
mark_eviction_started(Node)
)).
% ── Diagnostic predicate ──────────────────────────────────────────────────────
% cluster_quorum_state(-State)
% Returns a diagnostic term summarising current quorum state.
% Used by the /api/v1/quorum/status HTTP endpoint.
%
% State = quorum_state(
% Total, % total cluster nodes
% Healthy, % currently healthy nodes
% InProgress, % evictions currently running
% MaxAllowed, % max_simultaneous_evictions
% Quorum, % N//2 + 1
% QuorumSafe % true if healthy >= quorum, false otherwise
% )
cluster_quorum_state(quorum_state(Total, Healthy, InProgress, MaxAllowed, Quorum, QuorumSafe)) :-
total_nodes(Total),
aggregate_all(count, live_state:healthy_node(_), Healthy),
current_eviction_count(InProgress),
max_simultaneous_evictions(MaxAllowed),
Quorum is Total // 2 + 1,
( Healthy >= Quorum -> QuorumSafe = true ; QuorumSafe = false ).
25.3.3 Quorum Guard Verification
# Baseline: 14 nodes all nominal — max 6 simultaneous evictions permitted.
root@logic-node-01:~# swipl \
-l /opt/logic-node/kb/proxmox_topology.pl \
-l /opt/logic-node/kb/live_state.pl \
-l /opt/logic-node/kb/cluster_quorum.pl \
-g "
cluster_quorum:max_simultaneous_evictions(Max),
format('Max simultaneous evictions (14-node cluster): ~w~n', [Max]),
cluster_quorum:total_nodes(N),
Quorum is N // 2 + 1,
format('Strict majority quorum: ~w of ~w~n', [Quorum, N]),
halt
"
Max simultaneous evictions (14-node cluster): 6
Strict majority quorum: 8 of 14
# Simulate 5 nodes already in eviction: 6th should be permitted, 7th denied.
root@logic-node-01:~# swipl \
-l /opt/logic-node/kb/proxmox_topology.pl \
-l /opt/logic-node/kb/live_state.pl \
-l /opt/logic-node/kb/cluster_quorum.pl \
-g "
% Establish all nodes as nominal in live_state for the test:
get_time(Now), Ts is round(Now),
forall(proxmox_topology:known_node(N),
( live_state:assert_node_metric(N, cpu_steal, 3.0, Ts),
live_state:assert_node_metric(N, disk_latency, 0.1, Ts),
live_state:assert_node_metric(N, arc_miss_rate, 1.0, Ts),
live_state:assert_node_metric(N, disk_io_util, 15.0, Ts)
)),
% Mark nodes pve1..pve5 as having critical steal (to make eviction checks pass guard 2):
forall(member(N,[pve1,pve2,pve3,pve4,pve5,pve6]),
live_state:assert_node_metric(N, cpu_steal, 95.0, Ts)),
% Simulate 5 in-progress evictions:
forall(member(N,[pve1,pve2,pve3,pve4,pve5]),
cluster_quorum:mark_eviction_started(N)),
cluster_quorum:current_eviction_count(C),
format('Active evictions: ~w~n', [C]),
% 6th eviction (pve6) — should be permitted (active=5 < max=6):
( cluster_quorum:eviction_permitted(pve6, R6)
-> format('pve6 eviction: PERMITTED (~w)~n', [R6])
; writeln('pve6 eviction: DENIED') ),
% 7th eviction (pve7, also mark as critical):
live_state:assert_node_metric(pve7, cpu_steal, 95.0, Ts),
( cluster_quorum:eviction_permitted(pve7, R7)
-> format('pve7 eviction: PERMITTED (~w) — SHOULD NOT HAPPEN~n', [R7])
; writeln('pve7 eviction: DENIED (quorum limit reached) — CORRECT') ),
halt
"
Active evictions: 5
pve6 eviction: PERMITTED (permitted)
pve7 eviction: DENIED (quorum limit reached) — CORRECT
The 7th eviction is denied. Six nodes are being evacuated (pve1–pve6). The remaining 8 nodes must stay operational for quorum. If pve7 were also evicted, the cluster would be at 7 healthy nodes — one less than the quorum threshold of 8. The Quorum Guard fires before the Proxmox API is called.
25.4 Sovereign Security
25.4.1 The Actuator Attack Surface
The Actuator introduces the most dangerous capability in the system: unauthenticated or incorrectly-authorised calls to eviction_permitted/2 could bypass the quorum check by injecting a goal that always succeeds. The WAM must not accept goals for cluster_quorum:eviction_permitted/2 from any source other than the internal Go Actuator.
The defence is the same module write restriction from Chapter 22 §22.5.3: cluster_quorum predicates are defined in a module that is not exported through the HTTP API and is not accessible to Pengine client queries (the safe_primitive/1 whitelist in harvester_sandbox.pl does not include cluster_quorum:*). The eviction_permitted/2 predicate is only callable by goals dispatched through pool.Dispatch() inside the Actuator struct, which is only instantiated by the Go main process. There is no HTTP endpoint that accepts arbitrary goal strings for WAM evaluation.
25.4.2 Proxmox API Token Least Privilege
The Proxmox API token used by ProxmoxClient must carry the minimum permissions required for the eviction workflow:
Required privileges:
VM.Migrate — required for live migration (POST .../migrate)
VM.PowerMgmt — required for stop (needed for fencing fallback)
Sys.Audit — required for node status queries
VM.Audit — required for VM list queries (GET .../qemu)
NOT required (and must be explicitly withheld):
VM.Config.* — no configuration changes
VM.Allocate — no VM creation
Sys.Modify — no node configuration
Datastore.Modify — no storage changes
SDN.* — no network changes
# Create the Actuator API token on Proxmox:
root@pve1:~# pveum user add actuator@pve --comment "Logic Node Actuator"
root@pve1:~# pveum aclmod / --user actuator@pve --role PVEAuditor
root@pve1:~# pveum roleadd ActuatorRole --privs "VM.Migrate,VM.PowerMgmt,Sys.Audit,VM.Audit"
root@pve1:~# pveum aclmod / --user actuator@pve --role ActuatorRole
root@pve1:~# pveum user token add actuator@pve actuator-token --privsep 0
The token is stored as a systemd credential on logic-node-01:
# Store the token securely — never in environment variables or config files:
root@logic-node-01:~# systemd-creds encrypt --name=proxmox-api-token - \
/etc/logic-node/credentials/proxmox-api-token.cred <<< "PVEAPIToken=actuator@pve!actuator-token=<UUID>"
# In /etc/systemd/system/logic-orchestrator.service:
[Service]
LoadCredential=proxmox-api-token:/etc/logic-node/credentials/proxmox-api-token.cred
# The credential is available at ${CREDENTIALS_DIRECTORY}/proxmox-api-token
# at runtime. The Go process reads it at startup and stores it in memory only.
25.4.3 Idempotency and Double-Eviction Prevention
eviction_permitted/2 calls mark_eviction_started/1 as its final guard. If mark_eviction_started/1 has already been called for a node, eviction_in_progress(Node, _) succeeds, and Guard 3 in eviction_permitted/2 fails. A second concurrent Actuator goroutine calling eviction_permitted/2 for the same node will be denied even if the first goroutine's Proxmox API call is still in flight. Double-eviction is not possible by construction in the WAM — it is not a race condition that requires a Go mutex.
This is the decisive advantage of WAM-backed quorum state over Go-side atomic counters. A Go atomic counter can be incremented by two goroutines before either checks the other's increment. The WAM serialises all goals through the worker queue; eviction_permitted/2 is a single atomic goal execution that both checks and marks in one Prolog clause sequence, with no interleaving possible.
25.5 The Completed Loop
The OODA loop of sovereign infrastructure — Observe, Orient, Decide, Act — is now complete across the full Volume IV stack:
Observe (Chapter 20): node_exporter scrapes 1,200 metrics per hypervisor every 15 seconds. VictoriaMetrics stores them in a columnar merge-tree on a dedicated ZFS dataset isolated from the cluster it monitors.
Orient (Chapters 21–22): The PromQL Oracle translates observations into typed WAM queries. The Telemetry Ingestor discretises continuous floats into node_metric/4 facts with integer timestamps. Hysteresis rules stabilise classification over three consecutive scrape cycles. The Alert Dispatcher detects compound failure patterns — CPU steal concurrent with ZFS resilvering — that no single metric threshold can express.
Distribute (Chapter 23): Pengine agents on each hypervisor evaluate health rules locally and return proofs rather than data. The cluster becomes a distributed inference fabric: the Master queries global_cluster_health/2 and receives a complete verdict for all 14 nodes in under a second.
Decide (Chapter 25, 25.1): live_link/3 gates every routing edge on healthy_node/1. live_query_path/4 computes the shortest path through only the healthy subset of the topology. The routing layer reflects the physical state without any explicit cache invalidation or operator intervention.
Act (Chapter 25, 25.2–25.3): The Actuator consumes critical alerts and evacuates VMs via live migration. The Quorum Guard enforces the (N/2) - 1 constitutional limit — the WAM's serialisation guarantee makes the guard race-free by construction. A cascading hysteresis failure cannot evict more than 6 of 14 nodes autonomously; the 7th eviction request is denied at the WAM layer before the Proxmox API is called.
The cluster is no longer a collection of machines that a human monitors and repairs. It is a self-correcting logical entity that observes its own failure modes, reasons about them with formal inference rules, and takes bounded, safe, reversible physical actions — with a hard constitutional limit on how much damage it can do to itself.