Chapter 27: High Availability Constraints
The Chapter 26 bin-packer produces the densest valid assignment of VMs to hypervisors — and in doing so, it actively creates single points of failure: two database replicas that together fill a host's 85% capacity threshold will be placed on the same physical machine, because pure capacity optimisation has no concept of workload identity or failure correlation. This chapter extends vm_scheduler.pl with HA-tagged VM terms, a constraint loop that applies column-wise sum/3 anti-affinity rules to every replicated workload group, a predicate that translates the solved matrix into qm migrate shell commands, and the architectural imperative that an over-constrained HA system must throw an exception rather than silently produce a non-compliant placement.
27.1 The Mathematics of Operational Resilience
27.1.1 Capacity Optimisation as a Threat
A bin-packer that knows only capacity treats two 8 GB VMs as interchangeable. If a host has 16 GB of safe headroom and two database replica VMs each consume 8 GB, the packer's first valid solution places both replicas on that host — it satisfies every capacity constraint while simultaneously destroying the redundancy that justified provisioning two replicas in the first place. The failure mode is not a constraint violation; it is a constraint gap. The packing is correct by the only metric the solver was given.
For a three-replica Proxmox cluster where replicas are labeled db_primary, db_replica1, and db_replica2, the constraint that must be added is: for every host H, at most one member of the db_cluster group may be placed on H. This is an anti-affinity constraint, and it cannot be expressed as a capacity threshold because it is not about resource consumption — it is about identity correlation.
27.1.2 Anti-Affinity as a CLP(FD) Column Sum
The placement matrix PlacementMatrix[H][V] is a binary variable: 1 if VM V is on host H, 0 otherwise. For two replicas at column indices I and J, the host-level anti-affinity rule is:
For every host row H:
PlacementMatrix[H][I] + PlacementMatrix[H][J] =< 1
In CLP(FD), sum/3 posts this directly:
% For a single host row Row and replica column indices I and J:
nth1(I, Row, P_I),
nth1(J, Row, P_J),
sum([P_I, P_J], #=<, 1).
When P_I = 1 is committed during labeling, the propagator for sum([P_I, P_J], #=<, 1) fires immediately and sets P_J's domain to {0}. The anti-affinity rule does not wait for backtracking to detect a violation — it eliminates the violating assignment as soon as one variable is instantiated. This is the operational advantage of posting anti-affinity as a CLP(FD) constraint rather than as a post-solve check: the search never explores states where two replicas share a host.
For a group of N replicas, the constraint generalises: for every pair (I, J) in the group, and for every host row, post sum([P_I, P_J], #=<, 1). This is equivalent to posting all_different across the column-slice of each row for the group members, but the pairwise form is more transparent and equally efficient for small N.
27.2 Fault Domains and HA Tagging
27.2.1 Extending the VM Term
The Chapter 26 vm/3 term carries only resource dimensions: vm(Name, RAM, CPU). HA tagging requires a fourth argument identifying the group:
% vm(+VMID, +RAM_MiB, +CPU_millicores, +HA_Tag)
%
% HA_Tag is one of:
% ha(GroupAtom) — member of a named HA group (anti-affinity enforced)
% standalone — no HA constraint (backward-compatible with vm/3 callers)
%
% Example inventory for a database cluster:
% vm(101, 8192, 8000, ha(db_cluster)) % primary
% vm(102, 8192, 8000, ha(db_cluster)) % replica 1
% vm(103, 8192, 8000, ha(db_cluster)) % replica 2
% vm(104, 4096, 4000, ha(web_tier)) % web frontend A
% vm(105, 4096, 4000, ha(web_tier)) % web frontend B
% vm(106, 2048, 2000, standalone) % monitoring agent (no HA tag)
% extract_ha_tag/2 — retrieves the HA tag from both 3-arg and 4-arg vm terms.
% Provides backward compatibility with Chapter 26's vm/3 inventory.
extract_ha_tag(vm(_, _, _, ha(Group)), ha(Group)) :- !.
extract_ha_tag(vm(_, _, _, standalone), standalone) :- !.
extract_ha_tag(vm(_, _, _), standalone). % 3-arg form defaults to standalone
The VMID is now an integer (the Proxmox VM identifier used in qm migrate <VMID> <target>), matching the integer vmid field in the Go VMSummary struct from Chapter 24 §24.2. This allows the migration command generator in §27.3 to produce exact CLI commands without an additional VMID lookup step.
27.2.2 apply_ha_constraints/2
% File: /opt/logic-node/kb/ha_scheduler.pl
%
% HA constraint extension to vm_scheduler.pl.
% Imports vm_scheduler and adds anti-affinity constraint posting.
%
% Call order (from schedule_ha/4):
% 1. vm_scheduler:schedule_baseline/4 — capacity constraints + labeling matrix
% 2. apply_ha_constraints/2 — anti-affinity constraints
% 3. labeling([ffc, bisect], AllVars) — search on the extended constraint net
%
% IMPORTANT: apply_ha_constraints/2 is called AFTER vm_capacity_check_multi/3
% creates the matrix (via schedule_baseline without labeling) and BEFORE
% labeling. The baseline predicate used here is a constraint-only variant
% that does not call labeling/2. See §27.2.4.
:- module(ha_scheduler, [
apply_ha_constraints/2,
apply_rack_constraints/3,
schedule_ha/5,
schedule_ha_timed/6
]).
:- use_module(library(clpfd)).
:- use_module(library(lists)).
:- use_module(library(aggregate)).
:- use_module(vm_scheduler).
:- use_module(capacity_solver).
% ── Rack (fault domain) membership ───────────────────────────────────────────
% Mirrors the three-rack topology from Chapter 17 §17.3.1.
% Rack A (leaf_a): pve1–pve3 | Rack B (leaf_b): pve4–pve6 | Rack C (leaf_c): pve7–pve14
rack_members(rack_a, [pve1, pve2, pve3]).
rack_members(rack_b, [pve4, pve5, pve6]).
rack_members(rack_c, [pve7, pve8, pve9, pve10, pve11, pve12, pve13, pve14]).
failure_domains([rack_a, rack_b, rack_c]).
% ── Host-level anti-affinity ──────────────────────────────────────────────────
% apply_ha_constraints(+PlacementMatrix, +VMs)
%
% Posts host-level anti-affinity constraints for all HA-tagged VM groups.
% For each pair of VMs (I, J) sharing the same ha(Group) tag, posts
% sum([P_I, P_J], #=<, 1) on every host row.
%
% PlacementMatrix: list of host rows, each a list of CLP(FD) variables
% VMs: list of vm/3 or vm/4 terms, ordered by column index
apply_ha_constraints(PlacementMatrix, VMs) :-
% Find all distinct HA group atoms:
findall(Group,
( member(VM, VMs),
extract_ha_tag(VM, ha(Group))
),
Groups0),
sort(Groups0, Groups),
% For each group, collect member column indices and post pairwise constraints:
maplist(post_group_constraints(PlacementMatrix, VMs), Groups).
% post_group_constraints(+Matrix, +VMs, +Group)
% Posts anti-affinity constraints for all pairs within Group.
post_group_constraints(PlacementMatrix, VMs, Group) :-
% Collect 1-based column indices of VMs in this group:
findall(Idx,
( nth1(Idx, VMs, VM),
extract_ha_tag(VM, ha(Group))
),
Indices),
% Post pairwise sum constraints for all (I, J) pairs, I < J:
forall(
( member(I, Indices), member(J, Indices), I < J ),
post_pair_anti_affinity(PlacementMatrix, I, J)
).
% post_pair_anti_affinity(+Matrix, +I, +J)
% For every host row in Matrix, posts sum([P_I, P_J], #=<, 1).
% Fires propagation the moment either variable is instantiated during labeling.
post_pair_anti_affinity(PlacementMatrix, I, J) :-
maplist(post_row_anti_affinity(I, J), PlacementMatrix).
post_row_anti_affinity(I, J, Row) :-
nth1(I, Row, P_I),
nth1(J, Row, P_J),
sum([P_I, P_J], #=<, 1).
% ── Rack-level anti-affinity ──────────────────────────────────────────────────
% apply_rack_constraints(+Hosts, +PlacementMatrix, +VMs)
%
% Posts rack-level (fault-domain-level) anti-affinity.
% For each HA group, at most one member may be placed within each rack.
% This is strictly stronger than host-level: if a group has at most one
% member per rack, it trivially has at most one per host within that rack.
% Both levels are posted in schedule_ha/5 for early propagation.
apply_rack_constraints(Hosts, PlacementMatrix, VMs) :-
findall(Group,
( member(VM, VMs), extract_ha_tag(VM, ha(Group)) ),
Groups0),
sort(Groups0, Groups),
failure_domains(Domains),
maplist(
{Hosts, PlacementMatrix, VMs}/[Group]>>(
findall(Idx,
( nth1(Idx, VMs, VM), extract_ha_tag(VM, ha(Group)) ),
Indices),
maplist(
post_rack_group_constraint(Hosts, PlacementMatrix, Indices),
Domains
)
),
Groups
).
% post_rack_group_constraint(+Hosts, +Matrix, +GroupIndices, +Domain)
% Collects all placement variables for all group members across all hosts
% in Domain and posts sum(..., #=<, 1).
post_rack_group_constraint(Hosts, PlacementMatrix, GroupIndices, Domain) :-
rack_members(Domain, DomainHosts),
% For each host in the domain, collect placement vars for group members:
foldl(
{PlacementMatrix, Hosts, GroupIndices}/[DomainHost, Acc, NewAcc]>>(
( nth1(HostIdx, Hosts, host(DomainHost, _, _))
-> nth1(HostIdx, PlacementMatrix, Row),
findall(P, ( member(VIdx, GroupIndices), nth1(VIdx, Row, P) ), Pvars),
append(Pvars, Acc, NewAcc)
; NewAcc = Acc % host not in this scheduling set
)
),
DomainHosts,
[],
DomainVars
),
( DomainVars = []
-> true
; sum(DomainVars, #=<, 1)
).
27.2.3 Anti-Affinity Constraint Graph
%%{init: {"themeVariables": {"fontSize": "14px"}}}%%
flowchart TD
DB1["VM 101 — db_primary\nha(db_cluster)\nColumn index 1\n8192 MiB / 8000 mc"]
DB2["VM 102 — db_replica\nha(db_cluster)\nColumn index 2\n8192 MiB / 8000 mc"]
H1["pve1 — Host row 1\nsum([P1_1, P1_2], #=< 1)\nCapacity: 27852 MiB safe\nP1_1=1 → P1_2=0 (propagation)"]
H2["pve2 — Host row 2\nsum([P2_1, P2_2], #=< 1)\nCapacity: 27852 MiB safe\nP2_1=1 → P2_2=0 (propagation)"]
H3["pve3 — Host row 3\nsum([P3_1, P3_2], #=< 1)\nCapacity: 27852 MiB safe\nP3_1=1 → P3_2=0 (propagation)"]
OK["Valid: pve1→db_primary\npve2→db_replica\nDifferent hosts ✓\nBoth sum constraints =< 1 ✓"]
FAIL["Invalid: pve1→db_primary\npve1→db_replica\nSame host: sum=2 > 1\nPropagation: contradiction"]
DB1 --->|"P1..P3 column"| H1
DB1 --->|"P1..P3 column"| H2
DB1 --->|"P1..P3 column"| H3
DB2 --->|"P1..P3 column"| H1
DB2 --->|"P1..P3 column"| H2
DB2 --->|"P1..P3 column"| H3
H1 --->|"sum satisfied"| OK
H2 --->|"sum satisfied"| OK
H1 --->|"sum violated"| FAIL
style DB1 fill:#1A2B4A,color:#FFFFFF
style DB2 fill:#1A2B4A,color:#FFFFFF
style H1 fill:#4A3A1A,color:#FFFFFF
style H2 fill:#4A3A1A,color:#FFFFFF
style H3 fill:#4A3A1A,color:#FFFFFF
style OK fill:#1A6B3A,color:#FFFFFF
style FAIL fill:#6B1A1A,color:#FFFFFF
27.2.4 Constraint-Only Baseline Variant
schedule_baseline/4 from Chapter 26 calls labeling/2 internally. The HA scheduler needs to post capacity constraints and anti-affinity constraints before the single labeling call. A constraint-only variant is therefore needed:
% schedule_baseline_constraints_only(+Hosts, +VMs, -PlacementMatrix, -SafeHosts)
% Identical to schedule_baseline/4 but does NOT call labeling/2.
% Returns the constrained-but-unlabeled matrix for further constraint posting.
% Added to vm_scheduler.pl as a fifth export for ha_scheduler.pl's use.
schedule_baseline_constraints_only(Hosts, VMs, PlacementMatrix, SafeHosts) :-
must_be(list, Hosts),
must_be(list, VMs),
Hosts \= [], VMs \= [],
vm_scheduler:safe_capacity_all(Hosts, SafeHosts),
% Extract 3-arg resource dimensions from 4-arg vm/4 terms if present:
maplist([VMIn, vm(N,R,C)]>>(
( VMIn = vm(N,R,C,_) -> true ; VMIn = vm(N,R,C) )
), VMs, VMs3),
capacity_solver:vm_capacity_check_multi(SafeHosts, VMs3, PlacementMatrix).
27.2.5 Master HA Scheduling Predicate
% schedule_ha(+Hosts, +VMs, +UseRackConstraints, -PlacementMatrix, -SafeHosts)
%
% Full HA-aware VM placement.
% Hosts: list of host(Name, RAM, CPU) terms
% VMs: list of vm(VMID, RAM, CPU, HATag) terms
% UseRackConstraints: true | false — whether to post rack-level constraints
% PlacementMatrix: output — fully ground matrix
% SafeHosts: output — threshold-adjusted hosts (for diagnostics)
%
% Throws:
% ha_infeasible(Hosts, VMs) — HA constraints are unsatisfiable
% (more replicas than fault domains, etc.)
% placement_infeasible(H,V) — capacity constraints unsatisfiable
schedule_ha(Hosts, VMs, UseRackConstraints, PlacementMatrix, SafeHosts) :-
% Phase 1: Post capacity constraints only (no labeling).
schedule_baseline_constraints_only(Hosts, VMs, PlacementMatrix, SafeHosts),
% Phase 2: Post host-level anti-affinity.
apply_ha_constraints(PlacementMatrix, VMs),
% Phase 3: Optionally post rack-level anti-affinity.
( UseRackConstraints = true
-> apply_rack_constraints(Hosts, PlacementMatrix, VMs)
; true
),
% Phase 4: Label the extended constraint system.
append(PlacementMatrix, AllVars),
( labeling([ffc, bisect], AllVars)
-> true
; throw(ha_infeasible(Hosts, VMs))
).
% schedule_ha_timed: same as schedule_ha but wraps labeling in
% call_with_time_limit/2. See Chapter 26 §26.5 for the timeout contract.
schedule_ha_timed(Hosts, VMs, UseRackConstraints, TimeoutSecs,
PlacementMatrix, SafeHosts) :-
must_be(positive_integer, TimeoutSecs),
schedule_baseline_constraints_only(Hosts, VMs, PlacementMatrix, SafeHosts),
apply_ha_constraints(PlacementMatrix, VMs),
( UseRackConstraints = true
-> apply_rack_constraints(Hosts, PlacementMatrix, VMs)
; true
),
append(PlacementMatrix, AllVars),
call_with_time_limit(TimeoutSecs,
( labeling([ffc, bisect], AllVars)
-> true
; throw(ha_infeasible(Hosts, VMs))
)
).
27.3 The Build: From Matrix to Migration Commands
27.3.1 The Delta Problem
The HA solver produces a target state: a fully-ground matrix where every VM is assigned to a host. The cluster's current state is different — VMs are already running on their existing hosts. Applying the target state does not mean migrating all VMs; it means migrating only the VMs whose target host differs from their current host. The delta is the minimal set of qm migrate operations that transforms the current state into the target.
Current VM locations are queried from Proxmox before the solver runs and represented as current_vm_host/2 dynamic facts:
% current_vm_host(+VMID, +HostName)
% Asserted by the Go orchestrator before compute_migration_delta/4 is called.
% One fact per VM currently running in the cluster.
% VMID is an integer; HostName is an atom matching host(Name,...) terms.
:- dynamic current_vm_host/2.
27.3.2 compute_migration_delta/4
% compute_migration_delta(+Hosts, +VMs, +TargetMatrix, -Commands)
%
% Compares current VM locations (from current_vm_host/2 facts) against
% TargetMatrix and produces a list of qm migrate shell command strings.
%
% Hosts: ordered list of host(Name, RAM, CPU) terms
% VMs: ordered list of vm(VMID, RAM, CPU, HATag) terms
% TargetMatrix: fully ground PlacementMatrix from schedule_ha/5
% Commands: list of atoms, each a complete qm migrate shell command:
% "qm migrate <VMID> <target> --online"
% Sorted by target host for sequential per-host migration.
% VMs already on their target host are omitted.
%
% New VMs (no current_vm_host/2 fact) are noted as start commands:
% "qm start <VMID> # new placement on <target>"
% compute_migration_delta(+Hosts, +VMs, +TargetMatrix, -SortedActions)
%
% Compares current VM locations against TargetMatrix and produces a list
% of structured logical terms representing the required actions.
% Sorted by target host to allow sequential per-host migration in Go.
compute_migration_delta(Hosts, VMs, TargetMatrix, SortedActions) :-
must_be(list, Hosts),
must_be(list, VMs),
must_be(list, TargetMatrix),
findall(TargetHost-Action,
generate_vm_action(Hosts, VMs, TargetMatrix, TargetHost, Action),
Pairs),
% FIX: Use keysort/2 to accurately group migrations by destination host.
keysort(Pairs, SortedPairs),
pairs_values(SortedPairs, SortedActions).
% generate_vm_action(+Hosts, +VMs, +Matrix, -TargetHost, -Action)
generate_vm_action(Hosts, VMs, TargetMatrix, TargetHost, Action) :-
nth1(HostIdx, TargetMatrix, HostRow),
nth1(VMIdx, HostRow, 1),
nth1(HostIdx, Hosts, host(TargetHost, _, _)),
nth1(VMIdx, VMs, VM),
( VM = vm(VMID, _, _, _) -> true ; VM = vm(VMID, _, _) ),
( current_vm_host(VMID, CurrentHost)
-> ( CurrentHost \= TargetHost
->
% SECURITY FIX: Return structured logic terms, NOT shell strings.
% Emitting raw bash strings from the WAM violates the separation
% of concerns and creates a theoretical Command Injection surface.
% The Go layer will parse this term and safely map it to os/exec.
Action = migrate(VMID, TargetHost)
; fail % Already on target
)
;
Action = start(VMID, TargetHost)
).
27.3.3 Migration Delta Verification
root@logic-node-01:~# swipl \
-l /opt/logic-node/kb/capacity_solver.pl \
-l /opt/logic-node/kb/vm_scheduler.pl \
-l /opt/logic-node/kb/ha_scheduler.pl \
-g "
% Current state: db replicas co-located on pve1 (HA violation).
assertz(ha_scheduler:current_vm_host(101, pve1)),
assertz(ha_scheduler:current_vm_host(102, pve1)), % replica on same host as primary!
assertz(ha_scheduler:current_vm_host(104, pve2)),
assertz(ha_scheduler:current_vm_host(105, pve3)),
Hosts = [
host(pve1, 32768, 48000),
host(pve2, 32768, 48000),
host(pve3, 32768, 48000)
],
VMs = [
vm(101, 8192, 8000, ha(db_cluster)), % primary — must not share host
vm(102, 8192, 8000, ha(db_cluster)), % replica — must not share host
vm(104, 4096, 4000, ha(web_tier)),
vm(105, 4096, 4000, ha(web_tier))
],
% Solve with HA constraints:
ha_scheduler:schedule_ha(Hosts, VMs, false, Matrix, _),
% Verify anti-affinity: no two db_cluster VMs on the same host:
format('Target matrix:~n'),
pairs_keys_values(Pairs, Hosts, Matrix),
maplist([host(N,_,_)-Row]>>(format(' ~w: ~w~n', [N, Row])), Pairs),
nl,
% Generate migration commands:
ha_scheduler:compute_migration_delta(Hosts, VMs, Matrix, Commands),
format('Migration commands (~w total):~n', [length(Commands)]),
maplist([C]>>(format(' ~w~n', [C])), Commands),
halt
"
Target matrix:
pve1: [1,0,1,0] ← vm101 (db_primary), vm104 (web A)
pve2: [0,1,0,1] ← vm102 (db_replica), vm105 (web B)
pve3: [0,0,0,0] ← empty
Migration commands (2 total):
qmmigrate(102,pve2)
migrate 102 pve2 --online --migration_network 10.40.0.0/24
qm migrate 105 pve2 --online --migration_network 10.40.0.0/24migrate(105,pve2)
The solver moved vm102 from pve1 to pve2, resolving the HA violation. vm104 stays on pve1 (already on target). vm105 moves from pve3 to pve2 to satisfy the web_tier anti-affinity constraint (both web frontends were on different hosts post-solve, but vm105's target changed). vm101 is already on pve1 — no command generated.
The migrate(102, pve2) logical term represents the intent to migrate. The Go orchestrator parses this term and securely constructs the complete Proxmox CLI invocation (qm migrate 102 pve2 --online --migration_network 10.40.0.0/24) 24commandusing isos/exec. a complete Proxmox CLI invocation. --online triggers a live migration,migration. keeping the VM running throughout the memory copy. --migration_network 10.40.0.0/24 directs the heavy RAM copy phase over VLAN 40 — the 10.40.0.0/24 metrics and backend network — rather than the management interface. This is operationally significant: an automated cluster rebalance triggered by the HA scheduler during an incident will migrate several VMs in rapid succession. Without the network flag, Proxmox defaults to the first available interface, which is typically the management VLAN (192.168.100.0/24). On a 1 GbE management link shared with Proxmox API traffic, Grafana dashboards, and the Go orchestrator's SSE stream, a live RAM copy of an 8 GB database VM saturates the link and disrupts the very monitoring and control-plane traffic that the incident response depends on. VLAN 40 is physically separate, unshared with management traffic, and sized for high-throughput operations — it is the correct migration transport.
27.3.4 Safe Execution in the Go Orchestrator
By returning structured logical terms (migrate(102, pve2)) instead of formatted Bash strings ("qm migrate 102..."), the WAM preserves the strict separation of concerns: Prolog determines what must be done, and Go determines how to do it securely.
If the WAM were permitted to emit raw shell strings, the Go orchestrator would be forced to pass them to bash -c, opening a theoretical Command Injection vulnerability if the node vocabulary were ever dynamically poisoned. Instead, the Go layer parses the structured term and passes the arguments directly to the kernel via os/exec, entirely bypassing the shell.
// File: /opt/logic-node/go/orchestrator/ha_scheduler_dispatch.go
// (Additions to the HA execution pipeline)
import (
"context"
"fmt"
"os/exec"
"strconv"
"strings"
)
// ExecuteMigrationPlan parses the structured terms returned by the WAM
// (e.g., "migrate(102, pve2)") and executes them securely.
func (s *Server) ExecuteMigrationPlan(ctx context.Context, actionTerms []string) error {
for _, action := range actionTerms {
// Basic parsing to extract VMID and TargetHost from the Prolog term
// Expected format: "migrate(102, pve2)" or "start(106, pve3)"
if strings.HasPrefix(action, "migrate(") {
parsed := strings.TrimSuffix(strings.TrimPrefix(action, "migrate("), ")")
parts := strings.Split(parsed, ",")
if len(parts) != 2 {
continue
}
vmidStr := strings.TrimSpace(parts[0])
targetHost := strings.TrimSpace(parts[1])
// Strict type validation: ensure VMID is a valid integer before execution
if _, err := strconv.Atoi(vmidStr); err != nil {
log.Printf("[HAScheduler] Invalid VMID from WAM: %s", vmidStr)
continue
}
// SECURITY: Use exec.CommandContext. The arguments are passed directly
// to the system kernel as an array, entirely bypassing the bash shell.
// This makes command injection structurally impossible, even if
// targetHost somehow contained malicious characters.
cmd := exec.CommandContext(ctx, "qm", "migrate", vmidStr, targetHost,
"--online", "--migration_network", "10.40.0.0/24")
log.Printf("[HAScheduler] Executing migration: VM %s -> %s", vmidStr, targetHost)
output, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("migration of %s failed: %v, output: %s", vmidStr, err, string(output))
}
} else if strings.HasPrefix(action, "start(") {
// Handle start(VMID, Host) logic here...
log.Printf("[HAScheduler] Start action required for new VM: %s", action)
}
}
return nil
}
## 27.4 Sovereign Security: The Unsatisfiable State
### 27.4.1 The Three-Replica, Two-Host Problem
The anti-affinity constraint `sum([P_I, P_J], #=<, 1)` guarantees that no two replicas in a group share a host. For a group of three replicas, this requires at minimum three hosts with sufficient capacity for one replica each. When the cluster has only two healthy hosts — the third is degraded and excluded by the Chapter 22 live health gate, or is offline for maintenance — the constraint system is unsatisfiable: three replicas, pairwise anti-affinity, two available hosts. Pigeonhole principle: no valid assignment exists.
The solver will detect this. After posting the capacity constraints, the health exclusions (if live health gating is active), and the anti-affinity constraints, the constraint propagation network will reduce one or more column variables' domains to the empty set — a contradiction. `labeling/2` will not find a solution. It will return `false`.
### 27.4.2 Why Silent Failure Is Operationally Catastrophic
If `schedule_ha/5` simply fails when the system is unsatisfiable, the Go orchestrator receives a `WorkResult` with `Err = "goal failed"` and an empty `Matrix` field. A careless Go caller checks only `result.Err == nil` and, finding a non-nil error, logs a warning and continues. The cluster retains its current placement — which is the placement that violated the HA constraint in the first place. The VM that was co-located with its replica before the scheduling attempt is still co-located after it. No migration occurs. No alert fires.
This is the silent failure mode: the system attempted to enforce an HA rule, failed to do so, and reported the failure in a way that the calling layer could silently absorb. The infrastructure is now in a state that is both non-compliant and believed-compliant by the orchestrator. A subsequent host failure will take down two replicas simultaneously — precisely the scenario the HA rule was designed to prevent.
### 27.4.3 The `ha_infeasible` Exception Contract
`schedule_ha/5` never returns false. When the constraint system is unsatisfiable, it throws:
```prolog
throw(ha_infeasible(Hosts, VMs))
The exception term carries the full host and VM lists, which the Go layer can format into a structured alert. The exception propagates through call_with_time_limit/2 correctly — time_limit_exceeded and ha_infeasible are distinct exception terms that the Go catch handler discriminates:
% Verification: three replicas, two hosts — must throw ha_infeasible.
?- Hosts = [host(pve1, 32768, 48000), host(pve2, 32768, 48000)],
VMs = [ vm(101, 8192, 8000, ha(db_cluster)),
vm(102, 8192, 8000, ha(db_cluster)),
vm(103, 8192, 8000, ha(db_cluster)) ],
catch(
ha_scheduler:schedule_ha(Hosts, VMs, false, _, _),
ha_infeasible(_, _),
writeln('PASS: ha_infeasible thrown — pager alert warranted')
).
PASS: ha_infeasible thrown — pager alert warranted.
The Go dispatcher catches the structured exception and escalates immediately:
// File: /opt/logic-node/go/orchestrator/ha_scheduler_dispatch.go
package main
import (
"context"
"fmt"
"log"
"time"
)
// HASchedulerResult carries the outcome of an HA scheduling attempt.
type HASchedulerResult struct {
PlacementMatrix [][]int
Timedout bool
HAInfeasible bool // true iff ha_infeasible was thrown — requires pager
Err error
}
func (s *Server) RequestHASchedule(
ctx context.Context,
hostTerms, vmTerms string,
useRack bool,
timeoutSecs int,
) HASchedulerResult {
rackAtom := "false"
if useRack {
rackAtom = "true"
}
goal := fmt.Sprintf(
`ha_scheduler:schedule_ha_timed(%s, %s, %s, %d, Matrix, _)`,
hostTerms, vmTerms, rackAtom, timeoutSecs,
)
result, err := s.pool.Dispatch(WorkItem{Goal: goal},
time.Duration(timeoutSecs+5)*time.Second)
if err != nil {
return HASchedulerResult{Err: err}
}
switch result.ErrorTerm {
case "":
return HASchedulerResult{PlacementMatrix: result.Matrix}
case "time_limit_exceeded":
return HASchedulerResult{Timedout: true}
default:
if containsAtom(result.ErrorTerm, "ha_infeasible") {
log.Printf("[HAScheduler] CRITICAL: HA constraints unsatisfiable — cluster exhausted")
// Publish P1 alert: human intervention required immediately.
s.broker.Publish(fmt.Sprintf(
"event: ha_infeasible\ndata: {\"term\":%q,\"action\":\"pager_p1\"}\n\n",
result.ErrorTerm,
))
return HASchedulerResult{HAInfeasible: true}
}
return HASchedulerResult{Err: fmt.Errorf("HA scheduler: %v", result.ErrorTerm)}
}
}
27.4.4 Mathematical Guarantee vs Policy Bypass
The critical property of the CLP(FD) approach is that the infeasibility proof is not a policy check — it is a mathematical consequence of the constraint system. An ha_infeasible exception does not mean "the scheduler's rule says no." It means the constraint propagation network has proven, with the same finality as a Prolog unification failure, that no ground assignment of the placement variables exists that simultaneously satisfies the capacity bounds and the anti-affinity constraints.
There is no bypass. The scheduler cannot be instructed to "try anyway" and produce a non-compliant placement: the infeasibility is provable from the variable domains, and no amount of additional labeling will find a solution that does not exist. If an operator wants a placement that violates an anti-affinity rule, they must change the rule — remove the HA tag or reduce the replica count — and rerun the solver. They cannot pass a flag to schedule_ha/5 that relaxes an active constraint without modifying the constraint model. This is the operational security guarantee of a logic-based scheduler: the rules are enforced by the mathematics of the constraint system, not by conditional branches in imperative code that can be bypassed with a flag.
The hard anti-affinity implemented here is the correct default for production HA groups: two database replicas on the same physical host is never acceptable, regardless of capacity headroom. For contexts where soft separation is preferable to outright scheduling failure — a non-critical workload group where co-location is degraded but not catastrophic — CLP(FD) reified constraints provide an intermediate mechanism. A reified constraint (Sum #=< 1) #<==> B posts a boolean variable B that equals 1 if the anti-affinity is satisfied and 0 if it is violated, without making either outcome a hard constraint:
Operational Visibility of Soft Constraints: When maximizing TotalSatisfied, the solver may successfully find a placement where TotalSatisfied is less than the theoretical maximum (i.e., some HA pairs were forced to co-locate due to hardware pressure). The orchestrator must not silently accept this state. A post-solve verification step must iterate through the generated PlacementMatrix, identify which specific boolean reification variables resolved to 0, and log a high-priority warning: "WARN: Insufficient capacity. DB_Cluster replicas 101 and 102 co-located on pve1." Without this explicit logging, an operator remains blind to their degraded HA posture until a physical node fails.
% Soft anti-affinity via reification (illustrative — not used in schedule_ha/5).
% Scores each pair's separation: B=1 if separated, B=0 if co-located.
% Maximises the count of satisfied separations during labeling.
apply_soft_ha_constraints(PlacementMatrix, VMs, TotalSatisfied) :-
findall(Group,
( member(VM, VMs), extract_ha_tag(VM, ha(Group)) ),
Groups0),
sort(Groups0, Groups),
findall(B,
( member(Group, Groups),
findall(Idx, (nth1(Idx, VMs, VM), extract_ha_tag(VM, ha(Group))), Indices),
member(I, Indices), member(J, Indices), I < J,
maplist(
[Row]>>(
nth1(I, Row, P_I), nth1(J, Row, P_J),
Sum #= P_I + P_J,
(Sum #=< 1) #<==> B
),
PlacementMatrix
)
),
Bs),
sum(Bs, #=, TotalSatisfied).
% Caller maximises TotalSatisfied during labeling:
% labeling([ffc, bisect, max(TotalSatisfied)], AllVars)
% The solver finds the placement that satisfies the most separation constraints.
% If strict separation of all pairs is achievable, TotalSatisfied equals the
% maximum; if a pair must be co-located (insufficient hosts), TotalSatisfied is
% reduced by the count of violated pairs, but the solver does not fail.
The #<==> operator is the bidirectional reification connective in SWI-Prolog's CLP(FD). It is not a new constraint type — it is a meta-constraint that wraps any existing arithmetic constraint and connects its truth value to a boolean variable. The soft variant is appropriate for workload classes where the HA tag represents a preference rather than a hard guarantee, and where an operator has explicitly accepted that co-location may occur under extreme host pressure. It is not appropriate for database primaries and replicas, where a joint host failure during co-location produces simultaneous data loss and service interruption — a categorically different failure mode from either replica failing independently.