Chapter 8: Advanced Data Structures (Dicts)
Textbook: Modern SWI-Prolog (2026 Edition): Sovereign Infrastructure & Industrial Logic Volume: I — Foundations of Sovereign Logic Chapter: 8 of 24 Audience: Senior Engineers, Systems Architects, Infrastructure Security Practitioners Prerequisites: Chapters 1–7 complete. WAM Heap layout, compound term representation, unification, list processing, accumulator pattern, and the LFCG oracle pattern operational.
proxmox_inventory.pl,zfs_oracle.pl, andbatch_oracle.plloaded at/opt/logic-node/.logicadminuser active.
Core Concepts
Every infrastructure record is a named collection of typed fields: a host has a name, a MAC address, a RAM capacity, a storage layout, a cluster role. In the KB architecture of Chapters 1–7, these fields are encoded as positional arguments in compound terms: physical_host('pve-node-01', 'AA:BB:CC:DD:EE:01', 256). The field at position 3 is RAM. This is known only to the programmer. The term carries no field names. Refactoring a four-argument predicate to five arguments requires updating every clause, every call site, every oracle that pattern-matches against it — a schema migration with zero tooling support.
SWI-Prolog's Dict resolves this by giving compound terms named fields, a sorted key space for O(log n) access, and a type tag that encodes the data's origin or schema class. A Dict is not an association list (a sorted list of Key-Value pairs) and not a dynamic assertz table. It is a first-class Prolog term on the WAM Heap — unifiable, pattern-matchable, passable as a predicate argument, and garbage-collected with the same lifecycle as any other heap-allocated structure.
Five properties define the Dict as a sovereign infrastructure data primitive.
1. A Dict is a symbolic map with positional performance.
Dict keys are atoms, sorted at construction time. Internal access uses binary search over the key array — O(log n) per lookup for n keys. This is not a hash table: there is no hash function, no bucket chain, no resize cost. The layout is compact and deterministic. A host{} Dict with 6 keys uses a fixed Heap layout after the sort is performed at construction — O(log n) key lookup, O(1) tag access, O(n) full traversal for dict_pairs/3. For the infrastructure records in this textbook — hosts with 5–10 fields, VMs with 6–8 fields — "positional performance" means the access cost is always 2–4 binary search steps.
2. The tag is a type discriminator, not a comment.
Every Dict has a tag — an atom or unbound variable that sits before the {...} body. host{name:'pve-node-01', ram:256} has tag host. vm{id:100, name:'nginx-prod-01'} has tag vm. Tag unification is the first operation when pattern-matching a Dict argument: host{name:N} = vm{name:N} fails immediately on the tag check, before any key is examined. For the Logic Node, tags partition the KB: all host{} Dicts in one predicate, all vm{} Dicts in another, all storage{} Dicts in a third. The indexing benefit is the same as Chapter 6's first-argument indexing — the tag is the first thing unified.
3. Dict "updates" are pure logical transformations, not mutations.
NewDict = OldDict.put(key, Value) does not modify OldDict. It allocates a new Dict on the Heap with all the key-value pairs of OldDict plus the key: Value entry (adding or overriding). OldDict remains unchanged — it is still a valid Prolog term, still reachable if bound to a variable in an outer scope. The semantics are identical to atom_string/2 or any other deterministic built-in that produces a new term. The word "update" in dict documentation is a shorthand for "produce a new Dict reflecting this change," not an in-place write. A predicate that builds a revised host{} Dict with status: maintenance has not mutated the KB — it has produced a new term that can be passed to the command oracle without touching the static KB.
4. Dot notation is structural unification with access control.
Dict.key is not a method call. It is a unification: the runtime resolves the key in the sorted key array and returns the associated value. If the key does not exist in the Dict, the behaviour is controlled by context: in a get_dict/3 call, the predicate fails; via dot notation in an expression context, it throws existence_error(key, key, Dict). This fail-closed behaviour is a security property: a missing field does not silently produce an unbound variable that propagates as a false ground term — it halts the proof. Infrastructure code that expects host.ram to be present and finds it missing gets an immediate exception, not a subtly wrong result.
5. Dicts are the correct bridge between JSON and Prolog logic.
The Proxmox API, the Netbox IPAM, and every modern infrastructure management platform speaks JSON. JSON objects map directly to SWI-Prolog Dicts: keys become atoms, values become Prolog terms, nested objects become nested Dicts. library(http/json) and json_read_dict/3 perform this translation. The security hazard in this pipeline is that JSON keys from external sources become Prolog atoms — and Prolog atoms are interned in the Atom Table permanently. A JSON payload with 100,000 unique keys exhausts the Atom Table and crashes the Logic Node. The key whitelist validator in Section 8.5 closes this vector before any external key reaches atom_to_term.
Chapter Roadmap
| Section | Title | Focus |
|---|---|---|
| 8.1 | The Anatomy of the Dict | Syntax, tag discriminator, WAM Heap layout, key sort |
| 8.2 | Dot Notation as a Security Gate | Key access, missing-key behaviour, functional updates |
| 8.3 | Translation: JSON to Sovereign Logic | json_read_dict/3, inventory refactor, inventory_entry/1 |
| 8.4 | The Build: Modernising the Command Oracle | Dict-aware replace_disk/4, dot-notation oracle paths |
| 8.5 | Security: Atom Table Exhaustion (Revisited) | Key whitelist validator, tag-based KB partitioning |
| Outcome | Schema-as-Logic | Verification checklist, conceptual transition |
8.1 The Anatomy of the SWI-Prolog Dict
8.1.1 Syntax and Construction
A Dict is written as:
Tag{Key1: Value1, Key2: Value2, ..., KeyN: ValueN}
Tag is any Prolog term — typically an atom identifying the schema class (host, vm, storage) or an unbound variable for generic operations. Keys must be atoms or small integers. Values are any Prolog terms: atoms, integers, strings, lists, nested Dicts.
% Host Dict — tag: host, six named fields
Host = host{
name: 'pve-node-01',
mac: 'AA:BB:CC:DD:EE:01',
ram: 256,
role: primary,
cluster: 'sovereign-cluster-01',
status: online
}.
% VM Dict — tag: vm
VM = vm{
id: 100,
name: 'nginx-prod-01',
host: 'pve-node-01',
status: running,
vlan: 20
}.
% Nested Dict: storage spec inside a host
HostWithStorage = host{
name: 'pve-node-01',
ram: 256,
storage: storage{
layout: raidz2,
disks: ['WD-WX11A2K3P801', 'WD-WX11A2K3P802',
'WD-WX11A2K3P803', 'WD-WX11A2K3P804'],
pool: 'data-pve-node-01'
}
}.
Dict construction is deterministic. The key-value pairs are sorted by key at construction time — the internal representation has keys in lexicographic order regardless of the order they appear in source. host{ram:256, name:'pve-node-01'} and host{name:'pve-node-01', ram:256} are the same Dict.
% Key order is normalised:
?- D1 = host{ram:256, name:'pve-node-01'},
D2 = host{name:'pve-node-01', ram:256},
D1 = D2.
D1 = D2 = host{name:'pve-node-01', ram:256}. % Identical — order normalised
8.1.2 The Tag as Type Discriminator
The tag is not syntactic decoration. It is the first element checked during Dict unification. When a predicate head specifies host{...}, passing a vm{...} argument fails immediately:
% Tag discrimination at the predicate boundary:
host_ram(host{ram:RAM}, RAM). % Only matches Dicts with tag 'host'
?- host_ram(host{name:'pve-node-01', ram:256}, R).
R = 256.
?- host_ram(vm{id:100, ram:4}, R).
false. % Tag 'vm' ≠ 'host' — fails before any key is examined
?- host_ram(_{ram:128}, R).
R = 128. % Unbound tag (_) matches ANY tag — use for generic predicates
The unbound tag _{} is the "accept any Dict" pattern. host{} is the "accept only host Dicts" pattern. For the Logic Node's oracle predicates, every predicate that expects a specific infrastructure record type uses a bound tag. Generic utility predicates (key validators, serialisers) use _{}.
8.1.3 WAM Heap Layout
A Dict is stored on the WAM Heap as a tagged compound structure. SWI-Prolog's internal representation places:
- A
dictfunctor cell with arity proportional to the number of key-value pairs - The tag term (atom or variable reference)
- Alternating key atoms and value terms, in sorted key order
WAM Heap layout for host{name:'pve-node-01', ram:256, status:online}:
(3 key-value pairs → sorted: name, ram, status)
Address │ Content
────────┼──────────────────────────────────────
H₀ │ DICT functor tag, arity=7 (1 tag + 3 keys + 3 values)
H₁ │ TAG: atom 'host'
H₂ │ KEY: atom 'name' ← sorted key 1
H₃ │ VAL: atom 'pve-node-01'
H₄ │ KEY: atom 'ram' ← sorted key 2
H₅ │ VAL: integer 256
H₆ │ KEY: atom 'status' ← sorted key 3
H₇ │ VAL: atom 'online'
Access to D.ram:
- Read
H₁— verify tag matches (if tag is bound in call context) - Binary search over keys
H₂, H₄, H₆for atomram - Match found at
H₄, return value atH₅→ integer 256
For a 3-key Dict: 1–2 binary search comparisons. For a 10-key Dict: 3–4 comparisons. The access cost is strictly bounded by the key count, which is bounded by the schema.
Contrast with a traditional compound term physical_host('pve-node-01', 'AA:BB...', 256):
- Accessing
ram(position 3) requires knowing it is position 3 - Structural change (adding a field at position 2) requires updating every call site
- No tag — any
physical_host/3is structurally indistinguishable from any other 3-argument term
8.1.4 Diagram: Dict Memory Layout vs. Compound Term
%%{init: {"themeVariables": {"fontSize": "16px"}}}%%
flowchart TD
subgraph DICT["Dict: host{name:'pve-node-01',<br/> ram:256, status:online}"]
direction TB
D_TOP[" "] ---> DH0["H₀: DICT functor\narity=7"]
DH1["H₁: TAG\natom 'host'"]
DH2["H₂: KEY 'name'\nH₃: VAL 'pve-node-01'"]
DH4["H₄: KEY 'ram'\nH₅: VAL 256"]
DH6["H₆: KEY 'status'\nH₇: VAL 'online'"]
DACCESS["Access D.ram:\n1. Check tag H₁\n2. Binary search H₂,H₄,H₆\n3. Match 'ram' at H₄\n4. Return H₅ = 256\nCost: O(log 3) = 2 steps"]
end
subgraph COMPOUND["Compound: physical_host('pve-node-01',<br/> 'AA:BB...', 256)"]
direction TB
C_TOP[" "] ---> CH0["H₀: FUNC physical_host/3"]
CH1["H₁: 'pve-node-01'\n(position 1 — name)"]
CH2["H₂: 'AA:BB...'\n(position 2 — MAC)"]
CH3["H₃: 256\n(position 3 — RAM)"]
CACCESS["Access RAM:\n1. Know it is arg 3\n2. Direct offset H₃\nCost: O(1)\nBut: position is implicit —\nnot in the term itself"]
end
%% Internal Data Links
DH0 --->|"sorted keys"| DH1
DH1 --->|"key-val pair 1"| DH2
DH2 --->|"key-val pair 2"| DH4
DH4 --->|"key-val pair 3"| DH6
DH6 --->|"query path"| DACCESS
CH0 --->|"arg 1"| CH1
CH1 --->|"arg 2"| CH2
CH2 --->|"arg 3"| CH3
CH3 --->|"access path"| CACCESS
%% Styling
style D_TOP fill:none,stroke:none
style C_TOP fill:none,stroke:none
style DICT fill:#1A2B4A,color:#FFFFFF
style COMPOUND fill:#3A1A1A,color:#FFFFFF
style DH0 fill:#1A4070,color:#FFFFFF
style DH1 fill:#1A6B3A,color:#FFFFFF
style DH2 fill:#1A4070,color:#FFFFFF
style DH4 fill:#1A4070,color:#FFFFFF
style DH6 fill:#1A4070,color:#FFFFFF
style DACCESS fill:#8B6914,color:#FFFFFF
style CH0 fill:#5A2020,color:#FFFFFF
style CH1 fill:#5A2020,color:#FFFFFF
style CH2 fill:#5A2020,color:#FFFFFF
style CH3 fill:#5A2020,color:#FFFFFF
style CACCESS fill:#4A4A4A,color:#AAAAAA
Reading the diagram: The Dict (blue, left) stores keys as named atoms in sorted order alongside their values — the access path is self-describing and schema-enforced. The compound term (dark red, right) stores values by position — positionally correct but semantically opaque. Adding a field to the compound term requires rewriting every argument reference at every call site. Adding a field to the Dict is a schema addition that existing predicates ignoring the new key will not notice.
8.2 The Dot Operator as a Security Gate
8.2.1 Key Access Semantics
Dot notation — Dict.key — is syntactic sugar for get_dict(key, Dict, Value). The full access family:
% get_dict/3 — fails if key is absent (safe for conditional logic)
get_dict(Key, Dict, Value)
% Dict.key — throws existence_error if key absent (safe for required fields)
Value = Dict.key
% dict_get/3 with default — never fails, never throws
dict_get(key, Dict, Default, Value) % not built-in; implement as shown below
% Pattern matching in head — unifies or fails, no exception
host_ram(host{ram: RAM}, RAM).
The behaviour difference between get_dict/3 (fail) and dot notation (exception) is intentional and maps to two distinct use cases:
% OPTIONAL field — use get_dict/3: failure is handled gracefully
host_cluster(HostDict, Cluster) :-
( get_dict(cluster, HostDict, Cluster) ->
true
;
Cluster = standalone % Default if cluster key absent
).
% REQUIRED field — use dot notation: absence is a schema violation
host_ram_required(HostDict, RAM) :-
RAM = HostDict.ram. % Throws existence_error(key, ram, HostDict) if absent
% Explicit default pattern (production-safe):
get_dict_or_default(Key, Dict, _Default, Value) :-
get_dict(Key, Dict, Value),
!.
get_dict_or_default(_Key, _Dict, Default, Default).
8.2.2 Functional Updates: Pure Heap Allocation
NewDict = OldDict.put(Key, Value) is sugar for put_dict(Key, OldDict, Value, NewDict). The operation:
- Allocates a new Dict on the Heap
- Copies all key-value pairs from
OldDict, sorted - Adds or replaces the
Key: Valueentry - Binds
NewDictto the new structure
OldDict is unchanged. Both OldDict and NewDict are valid Prolog terms. If OldDict is a KB fact loaded via inventory_entry/1, the KB fact is unaffected — NewDict is a local heap term in the current clause.
% Functional update: producing a new Dict for a host entering maintenance
enter_maintenance(HostDict, MaintenanceDict) :-
must_be(dict, HostDict),
get_tag(HostDict, host), % enforce tag
MaintenanceDict = HostDict.put(status, maintenance).
% Multi-key update:
update_host_health(HostDict, IOWait, NewStatus, UpdatedDict) :-
must_be(dict, HostDict),
must_be(integer, IOWait),
must_be(atom, NewStatus),
UpdatedDict = HostDict.put(_{
status: NewStatus,
io_wait: IOWait,
checked: true
}).
?- H = host{name:'pve-node-01', ram:256, status:online},
enter_maintenance(H, M).
H = host{name:'pve-node-01', ram:256, status:online},
M = host{name:'pve-node-01', ram:256, status:maintenance}.
% H is unchanged. M is a new Dict on the heap.
% Two bindings, two heap structures, no mutation.
?- H = host{name:'pve-node-01', ram:256, status:online},
update_host_health(H, 78, degraded, U).
U = host{checked:true, io_wait:78, name:'pve-node-01', ram:256, status:degraded}.
% Keys sorted: checked, io_wait, name, ram, status.
8.2.3 get_tag/2 and Tag Enforcement
%% get_tag(+Dict, ?Tag)
%% Unifies Tag with the tag of Dict.
%% Used to enforce schema class at predicate boundaries.
get_tag(Dict, Tag) :-
must_be(dict, Dict),
Tag = Dict.get(tag). % SWI-Prolog: tag is not a regular key.
% Use the correct API:
% Correct SWI-Prolog API for tag access:
get_dict_tag(Dict, Tag) :-
must_be(dict, Dict),
dict_create(Dict, Tag, _Pairs). % dict_create/3 decomposes a Dict
The correct predicates for Dict introspection:
% dict_pairs/3: decompose Dict into Tag and sorted list of Key-Value pairs
dict_pairs(Dict, Tag, Pairs).
% Dict = Tag{k1:v1, k2:v2} ↔ Pairs = [k1-v1, k2-v2]
% dict_create/3: construct Dict from Tag and Pairs list
dict_create(Dict, Tag, Pairs).
% is_dict/1: succeeds if term is a Dict (any tag)
is_dict(Term).
% is_dict/2: succeeds if term is a Dict with specific tag
is_dict(Term, Tag).
% Tag enforcement at oracle entry:
host_oracle_entry(HostDict) :-
( is_dict(HostDict, host) ->
true
; is_dict(HostDict, Tag) ->
throw(error(
wrong_dict_tag(host, Tag, HostDict),
context(host_oracle_entry/1, 'Expected host{} Dict')
))
;
throw(error(
type_error(dict, HostDict),
context(host_oracle_entry/1, 'Not a Dict')
))
).
8.3 Translation: JSON to Sovereign Logic
8.3.1 json_read_dict/3 and the Ingestion Pipeline
library(http/json) provides json_read_dict/3 which reads a JSON stream and produces SWI-Prolog Dicts directly — JSON objects become Dicts, JSON arrays become lists, JSON strings become atoms or strings depending on the strings(atom|string) option.
:- use_module(library(http/json)).
% json_read_dict(+Stream, -Dict, +Options)
% Options:
% tag(Tag) — assign this tag to all top-level objects
% value_string_codes(true|false)
% strings(atom|string) — how to represent JSON strings
% default_tag(Tag) — tag for nested objects that have no explicit type field
The critical security decision in the ingestion pipeline is the strings(atom) option. With this option, every JSON string value becomes a Prolog atom — interned in the Atom Table permanently. For infrastructure inventory ingestion where all string values are known-bounded identifiers (host names, serial numbers, status strings), this is acceptable. For arbitrary user-provided data (VM names from an API with no validation), it is the Atom Table exhaustion vector.
%% Example: Proxmox API /nodes response (JSON)
%% {
%% "data": [
%% {"node": "pve-node-01", "status": "online", "maxmem": 274877906944, "maxcpu": 32},
%% {"node": "pve-node-02", "status": "online", "maxmem": 137438953472, "maxcpu": 16}
%% ]
%% }
% Reading the API response with controlled string handling:
read_proxmox_nodes(JSONFile, NodeDicts) :-
setup_call_cleanup(
open(JSONFile, read, Stream),
json_read_dict(Stream, Raw, [tag(api_response), strings(atom)]),
close(Stream)
),
% Raw = api_response{data: [_{node:'pve-node-01', status:online, ...}, ...]}
get_dict(data, Raw, NodeList),
maplist(normalise_node_dict, NodeList, NodeDicts).
%% normalise_node_dict(+RawDict, -HostDict)
%% Translates raw API response keys to canonical host{} Dict schema.
%% Key whitelisting happens here — only recognised API keys are processed.
normalise_node_dict(RawDict, HostDict) :-
is_dict(RawDict),
% Extract only whitelisted keys — unlisted keys are silently dropped
get_dict(node, RawDict, NameAtom),
get_dict(status, RawDict, StatusAtom),
get_dict(maxmem, RawDict, MaxMemBytes),
get_dict(maxcpu, RawDict, MaxCPU),
% Convert bytes to GB for canonical representation
RAM_GB is MaxMemBytes // (1024 * 1024 * 1024),
% Construct canonical host{} Dict
HostDict = host{
name: NameAtom,
ram: RAM_GB,
cpus: MaxCPU,
status: StatusAtom,
source: proxmox_api
}.
8.3.2 Refactoring proxmox_inventory.pl: From Facts to Dict Registry
The original Chapter 3 inventory used positional compound terms:
% Original Chapter 3 facts (positional — schema implicit)
physical_host('pve-node-01', 'AA:BB:CC:DD:EE:01', 256).
physical_host('pve-node-02', 'AA:BB:CC:DD:EE:02', 128).
vm(100, 'nginx-prod-01', 'pve-node-01', running).
storage('pve-node-01', "WD-WX11A2K3P801", 4096).
The Dict-based refactor — a single predicate inventory_entry/1 with tagged Dicts:
logicadmin@logic-node-01:~$ nano /opt/logic-node/kb/inventory/proxmox_inventory_v2.pl
%% =============================================================================
%% FILE: /opt/logic-node/kb/inventory/proxmox_inventory_v2.pl
%% PURPOSE: Dict-based Proxmox inventory registry.
%%
%% SCHEMA:
%% host{} — physical compute node
%% vm{} — virtual machine (or container)
%% disk{} — storage disk / device
%%
%% All entries via inventory_entry/1. Tag-based indexing via:
%% host_entry/1, vm_entry/1, disk_entry/1 — filtered views.
%%
%% SCHEMA MANDATE: Every field listed in the schema comment MUST be present.
%% Optional fields must have a documented default in get_field_or_default/4.
%% =============================================================================
:- module(proxmox_inventory_v2, [
inventory_entry/1,
host_entry/1,
vm_entry/1,
disk_entry/1,
host_by_name/2,
vm_by_id/2,
vm_by_name/2,
disks_for_host/2
]).
%% ---------------------------------------------------------------------------
%% HOST ENTRIES
%% Schema: host{name, mac, ram, role, cluster, status}
%% ---------------------------------------------------------------------------
inventory_entry(host{
name: 'pve-node-01',
mac: 'AA:BB:CC:DD:EE:01',
ram: 256,
role: primary,
cluster: 'sovereign-cluster-01',
status: online
}).
inventory_entry(host{
name: 'pve-node-02',
mac: 'AA:BB:CC:DD:EE:02',
ram: 128,
role: secondary,
cluster: 'sovereign-cluster-01',
status: online
}).
inventory_entry(host{
name: 'pve-node-03',
mac: 'AA:BB:CC:DD:EE:03',
ram: 256,
role: secondary,
cluster: 'sovereign-cluster-01',
status: online
}).
%% ---------------------------------------------------------------------------
%% VM ENTRIES
%% Schema: vm{id, name, host, status, vlan, ram_alloc}
%% ---------------------------------------------------------------------------
inventory_entry(vm{
id: 100,
name: 'nginx-prod-01',
host: 'pve-node-01',
status: running,
vlan: 20,
ram_alloc: 2
}).
inventory_entry(vm{
id: 101,
name: 'postgres-prod-01',
host: 'pve-node-01',
status: running,
vlan: 20,
ram_alloc: 16
}).
inventory_entry(vm{
id: 102,
name: 'nginx-prod-02',
host: 'pve-node-02',
status: running,
vlan: 20,
ram_alloc: 2
}).
inventory_entry(vm{
id: 103,
name: 'worker-01',
host: 'pve-node-02',
status: stopped,
vlan: 20,
ram_alloc: 4
}).
inventory_entry(vm{
id: 104,
name: 'monitoring-01',
host: 'pve-node-03',
status: running,
vlan: 10,
ram_alloc: 4
}).
inventory_entry(vm{
id: 105,
name: 'orphan-vm-01',
host: 'pve-node-99', % Orphaned — host not in inventory
status: stopped,
vlan: 20,
ram_alloc: 2
}).
%% ---------------------------------------------------------------------------
%% DISK ENTRIES
%% Schema: disk{serial, host, capacity_gb, pool}
%% ---------------------------------------------------------------------------
inventory_entry(disk{
serial: "WD-WX11A2K3P801",
host: 'pve-node-01',
capacity_gb: 4096,
pool: 'data-pve-node-01'
}).
inventory_entry(disk{
serial: "WD-WX11A2K3P802",
host: 'pve-node-01',
capacity_gb: 4096,
pool: 'data-pve-node-01'
}).
inventory_entry(disk{
serial: "ST-ZA1234BCDE001",
host: 'pve-node-02',
capacity_gb: 8192,
pool: 'data-pve-node-02'
}).
inventory_entry(disk{
serial: "WD-WX11A2K3P901",
host: 'pve-node-03',
capacity_gb: 4096,
pool: 'data-pve-node-03'
}).
inventory_entry(disk{
serial: "WD-WX11A2K3P902",
host: 'pve-node-03',
capacity_gb: 4096,
pool: 'data-pve-node-03'
}).
%% ---------------------------------------------------------------------------
%% FILTERED VIEWS
%% Tag-based predicates for efficient access by schema class.
%% First-argument indexing on the tag speeds up all-hosts / all-vms queries.
%% ---------------------------------------------------------------------------
host_entry(D) :- inventory_entry(D), is_dict(D, host).
vm_entry(D) :- inventory_entry(D), is_dict(D, vm).
disk_entry(D) :- inventory_entry(D), is_dict(D, disk).
%% host_by_name(+Name, -HostDict)
host_by_name(Name, HostDict) :-
must_be(atom, Name),
host_entry(HostDict),
HostDict.name = Name,
!. % Deterministic: names are unique
%% vm_by_id(+ID, -VMDict)
vm_by_id(ID, VMDict) :-
must_be(positive_integer, ID),
vm_entry(VMDict),
VMDict.id = ID,
!.
%% vm_by_name(+Name, -VMDict)
vm_by_name(Name, VMDict) :-
must_be(atom, Name),
vm_entry(VMDict),
VMDict.name = Name,
!.
%% disks_for_host(+HostName, -DiskList)
disks_for_host(HostName, DiskList) :-
must_be(atom, HostName),
findall(D, (disk_entry(D), D.host = HostName), DiskList).
% REPL: Dict registry queries
?- host_by_name('pve-node-01', H).
H = host{cluster:'sovereign-cluster-01', mac:'AA:BB:CC:DD:EE:01',
name:'pve-node-01', ram:256, role:primary, status:online}.
?- vm_by_id(101, VM).
VM = vm{host:'pve-node-01', id:101, name:'postgres-prod-01',
ram_alloc:16, status:running, vlan:20}.
?- disks_for_host('pve-node-01', Disks),
length(Disks, N).
N = 2. % Two disks registered for pve-node-01
% Dot notation query:
?- host_by_name('pve-node-02', H), RAM = H.ram.
RAM = 128.
% findall over Dict fields:
?- findall(Name-RAM,
(host_entry(H), Name = H.name, RAM = H.ram),
Summary).
Summary = ['pve-node-01'-256, 'pve-node-02'-128, 'pve-node-03'-256].
### 8.3.2.1 The Ghost Key Pattern: `meta{}` for Audit Metadata
Inventory Dicts carry logical data — the fields that participate in proof: `name`, `ram`, `status`, `host`. They also need to carry non-logical metadata that is valuable for operations but should never appear in oracle proof conditions: when was this record ingested, from which source IP, which operator authorised the KB update, which API version produced the data.
Mixing audit metadata directly into the primary key namespace pollutes every pattern match: `host{name:N, ram:R, status:S, ingested_at:T, source_ip:IP, ...}` forces every head pattern to either enumerate all fields or use `get_dict`. It also creates a risk: if `ingested_at` ever unifies with a proof condition by accident — a typo, a future developer who does not know the field is non-logical — the oracle produces a result gated on metadata rather than infrastructure state.
The Ghost Key pattern resolves this by encapsulating all non-logical metadata in a single nested `meta{}` Dict:
```prolog
%% Inventory entry with meta{} Ghost Key field.
%% The meta{} Dict is invisible to all oracle predicates that pattern-match
%% on specific logical fields. It is accessible only to introspection predicates.
inventory_entry(host{
name: 'pve-node-01',
mac: 'AA:BB:CC:DD:EE:01',
ram: 256,
role: primary,
cluster: 'sovereign-cluster-01',
status: online,
meta: meta{
ingested_at: '2026-03-05T10:44:14Z',
source: kb_static,
source_ip: none,
kb_version: '3.1.4',
authorised_by: 'ops-team'
}
}).
Oracle predicates that use host{name:N, ram:R} pattern matching do not see meta and are entirely unaffected by its presence. The meta{} field is accessible by explicit lookup when an audit predicate needs it:
%% entry_audit_trail(+Dict, -MetaDict)
%% Retrieves the meta{} field from any inventory entry.
%% Fails cleanly if no meta{} field is present (backward compatibility
%% with entries authored before the Ghost Key pattern was adopted).
entry_audit_trail(Dict, MetaDict) :-
is_dict(Dict),
get_dict(meta, Dict, MetaDict),
is_dict(MetaDict, meta).
%% Normalise_node_dict updated to attach meta{} on ingestion:
normalise_node_dict_with_meta(RawDict, Timestamp, SourceIP, HostDict) :-
get_dict(node, RawDict, NameAtom),
get_dict(status, RawDict, StatusAtom),
get_dict(maxmem, RawDict, MaxMemBytes),
get_dict(maxcpu, RawDict, MaxCPU),
RAM_GB is MaxMemBytes // (1024 * 1024 * 1024),
HostDict = host{
name: NameAtom,
ram: RAM_GB,
cpus: MaxCPU,
status: StatusAtom,
source: proxmox_api,
meta: meta{
ingested_at: Timestamp,
source_ip: SourceIP,
source: proxmox_api,
schema: proxmox_node_v1
}
}.
% REPL: Ghost Key access
?- host_by_name('pve-node-01', H), entry_audit_trail(H, Meta).
Meta = meta{authorised_by:'ops-team', ingested_at:'2026-03-05T10:44:14Z',
kb_version:'3.1.4', source:kb_static, source_ip:none}.
% Oracle predicate — meta field is invisible:
?- host_by_name('pve-node-01', H), RAM = H.ram.
RAM = 256. % Oracle accesses ram; meta is present but not consulted.
% Proof condition on meta by accident — this is why meta{} is isolated:
?- host_by_name('pve-node-01', H), H.meta.source = kb_static.
true. % This WORKS but should NEVER appear in oracle code.
% If it does, it is a logic defect: oracle is gated on provenance,
% not infrastructure state. Code review should catch this pattern.
The Ghost Key discipline: oracle predicates (can_migrate/3, replace_disk/4, batch_snapshot/3) never reference Dict.meta or get_dict(meta, ...). Audit predicates (entry_audit_trail/2, ingestion_log/3) reference only meta{} fields. The boundary is enforced by convention and code review, not by the runtime — which is why the documentation of the distinction is the primary defence.
8.3.3 Backward Compatibility: Bridge Predicates
The Chapter 3–7 KB and all oracle modules were written against physical_host/3, vm/4, and storage/3. Rather than rewriting every oracle immediately, provide bridge predicates that project the Dict registry back to the original positional predicates:
%% Bridge layer: Dict registry → Chapter 3 positional predicates
%% Add to proxmox_inventory_v2.pl or a separate bridge module.
physical_host(Name, MAC, RAM) :-
host_entry(H),
Name = H.name,
MAC = H.mac,
RAM = H.ram.
vm(ID, Name, Host, Status) :-
vm_entry(V),
ID = V.id,
Name = V.name,
Host = V.host,
Status = V.status.
storage(Host, Serial, CapGB) :-
disk_entry(D),
Host = D.host,
Serial = D.serial,
CapGB = D.capacity_gb.
With the bridge layer loaded, all Chapter 5–7 oracle predicates continue to function without modification. The bridge predicates are the migration path: as oracle predicates are updated to accept Dict arguments directly (Section 8.4), the bridge becomes unnecessary and is retired.
8.4 The Build: Modernising the Command Oracle
8.4.1 Dict-Aware replace_disk/4
The Chapter 5 replace_disk/4 accepted positional KB arguments:
% Chapter 5 signature (positional, requires KB scan):
replace_disk(HostName, OldSerial, NewSerial, Command) :-
physical_host(HostName, _, _),
storage(HostName, OldSerial, _OldCapacity),
\+ storage(_, NewSerial, _),
...
The Dict-aware version accepts a host{} Dict directly — the Dict contains the storage information, eliminating the storage/3 lookup:
logicadmin@logic-node-01:~$ nano /opt/logic-node/kb/oracle/zfs_oracle_v2.pl
%% =============================================================================
%% FILE: /opt/logic-node/kb/oracle/zfs_oracle_v2.pl
%% PURPOSE: Dict-aware ZFS oracle using dot-notation for field access.
%% SECURITY CONTRACT: identical to zfs_oracle.pl — no execution, text only.
%% =============================================================================
:- module(zfs_oracle_v2, [
replace_disk/4,
pool_scrub/3,
disk_status/3
]).
:- use_module('/opt/logic-node/kb/inventory/proxmox_inventory_v2').
:- use_module('/opt/logic-node/kb/oracle/shell_safety').
:- use_module(library(error)).
%% replace_disk(+HostDict, +OldSerial, +NewSerial, -Command)
%%
%% Dict-aware version. HostDict must be a host{} Dict containing a
%% storage{} nested Dict or a disk list.
%%
%% Preconditions:
%% P1: HostDict is a properly-tagged host{} Dict
%% P2: OldSerial is in the disk list for HostDict.name
%% P3: NewSerial is not registered in any disk_entry/1
%% P4: OldSerial ≠ NewSerial
%%
%% The pool name is read directly from the disk's pool field —
%% no convention-based "data-" ++ hostname concatenation needed.
replace_disk(HostDict, OldSerial, NewSerial, Command) :-
% P1: Tag enforcement
( is_dict(HostDict, host) ->
true
; throw(error(wrong_dict_tag(host, HostDict), replace_disk/4))
),
% Extract host name via dot notation
HostName = HostDict.name,
must_be(atom, HostName),
% P2: OldSerial must be registered to this host
% Retrieve the disk Dict — more specific than just checking storage/3
disk_entry(OldDisk),
OldDisk.serial = OldSerial,
OldDisk.host = HostName,
% Get the pool name from the disk Dict — no string concatenation needed
PoolName = OldDisk.pool,
% P3: NewSerial must not be in any disk_entry
\+ (disk_entry(D), D.serial = NewSerial),
% P4: Serials must differ
OldSerial \== NewSerial,
% Shell-quote all arguments
shell_quote(PoolName, QuotedPool),
shell_quote(OldSerial, QuotedOld),
shell_quote(NewSerial, QuotedNew),
% Serialise
with_output_to(string(Command),
format("zpool replace ~w ~w ~w", [QuotedPool, QuotedOld, QuotedNew])).
%% pool_scrub(+HostDict, +Action, -Command)
%% Action: start | stop
pool_scrub(HostDict, Action, Command) :-
is_dict(HostDict, host),
memberchk(Action, [start, stop]),
% Get pool name: use first disk for this host to derive pool
HostName = HostDict.name,
( disk_entry(D), D.host = HostName ->
PoolName = D.pool
;
% Fallback: derive from host name if no disk registered
atom_string(HostName, HostStr),
string_concat("data-", HostStr, PoolName)
),
!,
shell_quote(PoolName, QuotedPool),
( Action = stop ->
with_output_to(string(Command),
format("zpool scrub -s ~w", [QuotedPool]))
;
with_output_to(string(Command),
format("zpool scrub ~w", [QuotedPool]))
).
%% disk_status(+DiskDict, -Command)
%% Accepts a disk{} Dict directly
disk_status(DiskDict, Command) :-
is_dict(DiskDict, disk),
Serial = DiskDict.serial,
PoolName = DiskDict.pool,
shell_quote(PoolName, QuotedPool),
shell_quote(Serial, QuotedSerial),
with_output_to(string(Command),
format("zpool status ~w | grep ~w", [QuotedPool, QuotedSerial])).
% REPL: Dict-aware oracle
?- host_by_name('pve-node-01', H),
replace_disk(H, "WD-WX11A2K3P801", "WD-NEWDISK001", Cmd).
Cmd = "zpool replace 'data-pve-node-01' 'WD-WX11A2K3P801' 'WD-NEWDISK001'".
% Pool name read from OldDisk.pool — no string concatenation.
?- host_by_name('pve-node-01', H),
pool_scrub(H, start, Cmd).
Cmd = "zpool scrub 'data-pve-node-01'".
?- disk_entry(D), D.host = 'pve-node-03',
disk_status(D, Cmd).
D = disk{capacity_gb:4096, host:'pve-node-03', pool:'data-pve-node-03',
serial:"WD-WX11A2K3P901"},
Cmd = "zpool status 'data-pve-node-03' | grep 'WD-WX11A2K3P901'".
8.4.2 Dict-Aware Migration Pre-flight
Refactoring can_migrate/3 from Chapter 6 to accept vm{} and host{} Dicts directly:
%% can_migrate_dict(+VMDict, +TargetHostDict, -Decision)
%% Dict-aware version of can_migrate/3 from Chapter 6.
%% Preconditions are checked via dot-notation field access.
can_migrate_dict(VMDict, TargetHostDict, deny(Reason)) :-
( \+ is_dict(VMDict, vm) -> Reason = wrong_tag(vm, VMDict)
; \+ is_dict(TargetHostDict, host) -> Reason = wrong_tag(host, TargetHostDict)
; TargetHostDict.status = maintenance -> Reason = host_in_maintenance(TargetHostDict.name)
; VMDict.status = error -> Reason = vm_in_error_state(VMDict.name)
; VMDict.host = TargetHostDict.name -> Reason = same_host(VMDict.name, TargetHostDict.name)
; fail
),
!.
can_migrate_dict(VMDict, TargetHostDict, permit(VMDict.name, TargetHostDict.name)) :-
is_dict(VMDict, vm),
is_dict(TargetHostDict, host).
?- vm_by_id(100, VM), host_by_name('pve-node-02', H),
can_migrate_dict(VM, H, Decision).
Decision = permit('nginx-prod-01', 'pve-node-02').
?- vm_by_id(100, VM), host_by_name('pve-node-01', H),
can_migrate_dict(VM, H, Decision).
Decision = deny(same_host('nginx-prod-01', 'pve-node-01')).
% VM is already on pve-node-01 — dot notation reads the field inline.
8.4.3 Dict Introspection: Schema Validation
%% validate_host_dict(+Dict)
%% Verifies that Dict is a host{} with all required fields present
%% and all field values of the correct type.
%% Throws a typed error on the first violation.
required_host_keys([name, mac, ram, role, cluster, status]).
validate_host_dict(Dict) :-
( is_dict(Dict, host) ->
true
;
throw(error(type_error(host_dict, Dict), validate_host_dict/1))
),
required_host_keys(RequiredKeys),
forall(member(Key, RequiredKeys), (
( get_dict(Key, Dict, _) ->
true
;
throw(error(
missing_required_key(Key, Dict),
context(validate_host_dict/1, 'Required field absent')
))
)
)),
% Type checks on specific fields
must_be(atom, Dict.name),
must_be(atom, Dict.mac),
must_be(positive_integer, Dict.ram),
must_be(atom, Dict.role),
must_be(atom, Dict.cluster),
must_be(atom, Dict.status).
%% validate_all_host_entries/0
%% Run at KB load time to verify all host{} Dicts conform to schema.
validate_all_host_entries :-
forall(host_entry(H), (
catch(
validate_host_dict(H),
Error,
( format("SCHEMA ERROR: ~w~n", [Error]), fail )
)
)).
:- validate_all_host_entries. % Runs at module load
8.5 Security Context: Atom Table Exhaustion (Revisited)
8.5.1 The Key-Infiltration Attack
Chapter 1 introduced Atom Table exhaustion as a DoS vector via uncontrolled atom_to_term conversion. Chapter 7 addressed it for batch list inputs. The Dict introduces a new surface: the JSON ingestion pipeline.
When json_read_dict/3 processes a JSON object with the strings(atom) option, every JSON key and every string value becomes a Prolog atom — permanently interned in the Atom Table. A valid JSON response from a compromised or misbehaving API endpoint could contain:
{
"node": "pve-node-01",
"injected_key_000001": "value",
"injected_key_000002": "value",
...
"injected_key_100000": "value"
}
Processing this with json_read_dict/3 and strings(atom) interns 100,001 atoms — the real key node plus 100,000 injected keys. Each atom occupies a minimum of 48 bytes in the Atom Table. 100,000 atoms ≈ 4.8MB of Atom Table growth from a single API response. Under continuous polling (every 60 seconds, as the health KB update procedure does), this is 288MB per hour of permanent Atom Table growth, never garbage-collected, until the Logic Node crashes.
The attack does not require a compromised API. A misconfigured monitoring agent writing verbose diagnostic fields to the node health JSON file, or an API version upgrade that adds new fields, has the same effect.
8.5.2 The Key Whitelist Validator
The defence: validate all incoming JSON keys against a whitelist of known-good atoms before any json_read_dict/3 call that uses strings(atom). Unknown keys are dropped. Only whitelisted keys produce atoms.
%% =============================================================================
%% FILE: /opt/logic-node/kb/ingestion/json_ingestion.pl
%% PURPOSE: Safe JSON-to-Dict ingestion with key whitelist enforcement
%% and Content-Length boundary checks.
%%
%% SECURITY CONTRACT:
%% — json_read_dict/3 is never called on a stream without a size limit.
%% — All JSON is read with strings(string), not strings(atom).
%% — Key whitelist filtering removes unknown keys before Dict use.
%% — Content-Length limit closes the Heap-overflow DoS vector:
%% even a fully-whitelisted JSON response with deeply nested arrays
%% can exhaust the WAM Heap if allowed to parse without a size bound.
%% =============================================================================
:- module(json_ingestion, [
safe_json_read_dict/3,
safe_json_read_dict_bounded/4,
whitelist_filter_dict/3,
assert_ingestion_whitelist/2
]).
:- use_module(library(http/json)).
:- use_module(library(error)).
%% ---------------------------------------------------------------------------
%% INGESTION LIMITS
%% ---------------------------------------------------------------------------
%% max_json_bytes(+Schema, -MaxBytes)
%% Per-schema Content-Length limits.
%% A response exceeding MaxBytes is rejected before parsing begins.
%% This prevents Heap-overflow DoS: json_read_dict/3 materialises the
%% entire JSON object onto the WAM Heap. Even with whitelisted keys,
%% a 50MB response with large nested arrays saturates the Heap.
%%
%% Set conservatively based on the realistic maximum response size
%% for each API endpoint. Proxmox /nodes: ~4KB. /vms: ~50KB.
max_json_bytes(proxmox_node, 16_384). % 16 KB — /nodes response
max_json_bytes(proxmox_vm, 131_072). % 128 KB — /vms response (100 VMs)
max_json_bytes(health_check, 4_096). % 4 KB — health check file
%% ---------------------------------------------------------------------------
%% KEY WHITELISTS
%% ---------------------------------------------------------------------------
%% ingestion_whitelist(+Schema, +AllowedKeys)
%% Schema: atom identifying the source (proxmox_node, proxmox_vm, health_check)
%% AllowedKeys: list of atom keys permitted for this schema.
ingestion_whitelist(proxmox_node,
[node, status, maxmem, maxcpu, uptime, level, id, type]).
ingestion_whitelist(proxmox_vm,
[vmid, name, status, maxmem, maxdisk, cpus, node, type, template]).
ingestion_whitelist(health_check,
[node, io_wait, cpu_pct, mem_pct, disk_pct, timestamp, status]).
%% ---------------------------------------------------------------------------
%% SAFE INGESTION — BOUNDED
%% ---------------------------------------------------------------------------
%% safe_json_read_dict_bounded(+Stream, +Schema, -FilteredDict, +Options)
%% The preferred entry point for all ingestion. Enforces Content-Length limit
%% BEFORE calling json_read_dict/3, then applies the key whitelist AFTER.
%%
%% Content-Length enforcement:
%% Reads the stream into a bounded string via read_string/3.
%% If the actual byte count exceeds max_json_bytes(Schema), throws.
%% The bounded string is then parsed with term_to_atom/json_read_dict.
%%
%% NOTE: This requires the stream to be a file or a buffered socket stream.
%% HTTP streams (library(http/http_client)) should use the
%% http Content-Length header check before the stream reaches this predicate.
safe_json_read_dict_bounded(Stream, Schema, FilteredDict, Options) :-
must_be(atom, Schema),
option(tag(Tag), Options, json),
% Enforce byte limit BEFORE parsing
max_json_bytes(Schema, MaxBytes),
read_string(Stream, MaxBytes, ActualBytes, RawString),
( ActualBytes >= MaxBytes ->
throw(error(
json_payload_too_large(Schema, ActualBytes, MaxBytes),
context(safe_json_read_dict_bounded/4,
'JSON payload exceeds schema size limit — possible DoS')
))
; true ),
% Parse the bounded string
term_string(ParsedTerm, RawString, [variable_names([])]),
( is_dict(ParsedTerm) ->
RawDict = ParsedTerm
;
% Use open_string/2 + json_read_dict/3 for proper JSON parsing
open_string(RawString, ParseStream),
json_read_dict(ParseStream, RawDict, [tag(Tag), strings(string)])
),
% Apply whitelist
whitelist_filter_dict(Schema, RawDict, FilteredDict).
%% safe_json_read_dict(+Stream, -Dict, +Options)
%% Legacy entry point without Content-Length check.
%% Use safe_json_read_dict_bounded/4 for all new code.
%% Retained for internal stream sources (file streams within the Logic Node).
safe_json_read_dict(Stream, FilteredDict, Options) :-
option(schema(Schema), Options, unknown),
option(tag(Tag), Options, json),
% Read with strings(string) — values stay as Prolog strings, NOT atoms
json_read_dict(Stream, RawDict, [tag(Tag), strings(string)]),
% Apply key whitelist
( Schema = unknown ->
FilteredDict = RawDict
;
whitelist_filter_dict(Schema, RawDict, FilteredDict)
).
%% whitelist_filter_dict(+Schema, +RawDict, -FilteredDict)
%% Keeps only whitelisted keys from RawDict, producing FilteredDict.
%% Keys not in the whitelist never enter the Prolog atom space
%% (they are in RawDict which will be GC'd without creating permanent atoms).
whitelist_filter_dict(Schema, RawDict, FilteredDict) :-
must_be(atom, Schema),
must_be(dict, RawDict),
( ingestion_whitelist(Schema, AllowedKeys) ->
true
;
throw(error(
unknown_ingestion_schema(Schema),
context(whitelist_filter_dict/3, 'No whitelist defined for this schema')
))
),
dict_pairs(RawDict, Tag, AllPairs),
include(key_is_whitelisted(AllowedKeys), AllPairs, FilteredPairs),
dict_pairs(FilteredDict, Tag, FilteredPairs).
%% key_is_whitelisted(+AllowedKeys, +Key-Value)
%% Predicate for use with include/3: keeps pairs whose key is in AllowedKeys.
key_is_whitelisted(AllowedKeys, Key-_Value) :-
memberchk(Key, AllowedKeys).
% REPL: whitelist filtering
?- RawDict = json{
node: 'pve-node-01',
status: "online",
injected_key_000001: "attack_value",
injected_key_000002: "attack_value",
maxmem: 274877906944,
maxcpu: 32
},
whitelist_filter_dict(proxmox_node, RawDict, Filtered).
Filtered = json{maxcpu:32, maxmem:274877906944, node:'pve-node-01', status:"online"}.
% injected_key_000001 and injected_key_000002 are ABSENT from Filtered.
% They remain in RawDict which will be garbage-collected.
% They never become permanent atoms.
% Schema violation:
?- whitelist_filter_dict(unknown_schema, _{k:v}, _).
ERROR: unknown_ingestion_schema(unknown_schema)
% Unknown schema → explicit error, not silent pass-through.
8.5.3 String Values vs. Atom Values: The Ingestion Contract
The safe ingestion contract for the Logic Node:
| Input source | Key handling | Value handling | Rationale |
|---|---|---|---|
| Static KB facts (Prolog source) | Atoms (compile-time) | Atoms, integers (compile-time) | No runtime atom creation — all known at load |
| Proxmox API response | Keys: whitelist-checked atoms | Values: strings(string) — Prolog strings |
Keys bounded by whitelist; values GC-eligible strings |
| Health KB file (authorised agent) | Atoms (load-time, bounded set) | Atoms, integers | Agent writes controlled Prolog syntax — not raw JSON |
| HTTP batch request (Chapter 13) | Keys: whitelist-checked | Values: validated strings or integers | External trust boundary — maximum suspicion |
The core rule: atoms only enter the Atom Table from trusted, bounded sources. A "trusted source" is a Prolog source file written by an authorised human operator or a controlled automation process. An "untrusted source" is anything arriving over a network socket, read from a file not under chattr +i protection, or produced by a third-party API.
The Content-Length boundary is a second, independent defence layer. Key whitelisting closes the Atom Table exhaustion vector. It does not close the Heap exhaustion vector: json_read_dict/3 materialises the entire JSON object onto the WAM Heap before any key filtering occurs. A 50MB JSON payload with only whitelisted keys still allocates 50MB of Heap before the whitelist predicate runs. Under the default SWI-Prolog Heap limit, this is a Heap-overflow crash regardless of key whitelist compliance.
The safe_json_read_dict_bounded/4 predicate in Section 8.5.2 addresses this by calling read_string/3 with a byte limit before parsing. If the stream delivers more bytes than max_json_bytes(Schema) allows, the read is aborted and json_payload_too_large/3 is thrown — no Heap allocation for the JSON content occurs. The per-schema limits in max_json_bytes/2 are deliberately conservative: Proxmox /nodes responses are typically under 2KB; 16KB is a 8× margin. Any response exceeding that margin is not a normal API response.
8.5.4 Tag-Based KB Partitioning for Indexing
The tag field in inventory_entry(Dict) is first-argument-accessible when SWI-Prolog's first-argument JIT indexing applies. However, the actual first argument of inventory_entry/1 is the Dict term itself — not the tag. The WAM cannot index on the tag directly from the inventory_entry fact head.
The solution — used in the host_entry/1, vm_entry/1, disk_entry/1 predicates from Section 8.3.2 — is to define tag-specific predicates as filtered views. These can be asserted with explicit first-argument structure if needed, or maintained as derived predicates. For large inventories (thousands of entries), asserting tag-specific facts at load time provides O(1) lookup:
%% For large-scale inventories: assert tag-specific index facts at load time.
%% Add to proxmox_inventory_v2.pl after all inventory_entry/1 facts.
:- dynamic host_index/2. % host_index(HostName, HostDict)
:- dynamic vm_index/2. % vm_index(VMID, VMDict)
:- dynamic disk_index/2. % disk_index(Serial, DiskDict)
build_inventory_index :-
forall(inventory_entry(D), (
( is_dict(D, host) ->
assertz(host_index(D.name, D))
; is_dict(D, vm) ->
assertz(vm_index(D.id, D))
; is_dict(D, disk) ->
assertz(disk_index(D.serial, D))
; true
)
)).
:- build_inventory_index.
%% Indexed lookups — O(1) via first-argument indexing on name/id/serial
host_by_name_fast(Name, D) :- host_index(Name, D).
vm_by_id_fast(ID, D) :- vm_index(ID, D).
disk_by_serial_fast(S, D) :- disk_index(S, D).
?- host_by_name_fast('pve-node-01', H), H.ram =:= 256.
true. % O(1): first-argument index on host name atom
?- vm_by_id_fast(101, VM), Status = VM.status.
Status = running.
8.5.4.1 Dynamic Inventories: RB-Trees over assertz Indexes
The assertz-based index above is optimal for the Logic Node's primary use case: a static KB loaded once per session, queried many times. The assertz calls happen at load time, JIT indexing is compiled once, and every subsequent lookup is O(1) with no allocation.
For deployments where the VM inventory is highly dynamic — a cloud environment where VMs are created and destroyed at high frequency, or a live status synchronisation path that updates the status field of hundreds of VM Dicts per minute — the assertz approach has a pathological cost: each assertz(vm_index(...)) or retract(vm_index(...)) invalidates the JIT-compiled index for vm_index/2, triggering a full predicate re-compilation. Under high update rates, re-compilation overhead dominates.
library(assoc) and library(rbtrees) provide in-memory key-value structures with O(log n) worst-case performance and no re-compilation cost on update:
:- use_module(library(assoc)). % AVL trees: sorted key-value map
:- use_module(library(rbtrees)). % Red-black trees: same API, better rebalancing
%% vm_status_cache: an AVL association mapping VMID → status atom.
%% Updated without predicate re-compilation.
:- nb_setval(vm_status_cache, t). % empty AVL tree
update_vm_status_cache(VMID, Status) :-
must_be(positive_integer, VMID),
must_be(atom, Status),
nb_getval(vm_status_cache, Tree0),
put_assoc(VMID, Tree0, Status, Tree1),
nb_setval(vm_status_cache, Tree1).
get_vm_status_cached(VMID, Status) :-
nb_getval(vm_status_cache, Tree),
get_assoc(VMID, Tree, Status).
%% RB-tree variant (better worst-case on large, frequently-updated sets):
:- nb_setval(vm_dict_rbtree, t). % empty RB-tree
update_vm_dict_rbtree(VMID, VMDict) :-
nb_getval(vm_dict_rbtree, Tree0),
( rb_update(Tree0, VMID, VMDict, Tree1) ->
true
;
rb_insert(Tree0, VMID, VMDict, Tree1)
),
nb_setval(vm_dict_rbtree, Tree1).
get_vm_dict_rbtree(VMID, VMDict) :-
nb_getval(vm_dict_rbtree, Tree),
rb_lookup(VMID, VMDict, Tree).
The selection rule:
| Inventory type | Update frequency | Correct structure | Rationale |
|---|---|---|---|
| Static KB (load once, query many) | Never / session-boundary | assertz + JIT index |
O(1) lookup, compiled once |
| Moderately dynamic (hourly sync) | Low | assertz + retractall + rebuild |
Rebuild cost amortised over many queries |
| Highly dynamic (per-minute updates) | High | library(assoc) AVL tree |
O(log n) update + O(log n) lookup, no recompile |
| Very large + high-frequency | Very high | library(rbtrees) |
Better rebalancing than AVL under skewed insert patterns |
For the Logic Node's standard deployment — static KB, authorised human-initiated updates via chattr -i / edit / chattr +i — the assertz index is the correct choice. The AVL and RB-tree variants are documented here for Volume II's live synchronisation deployments.
Outcome: The Schema-as-Logic Model
8.6.1 The Conceptual Transition
Positional compound terms encode schema implicitly — the meaning of position 3 is known to the programmer and nowhere else. Dict terms encode schema explicitly — every field has a name, every Dict has a type tag, and pattern matching on the tag is part of the proof. The schema is no longer documentation alongside the code; it is the code.
The schema-as-logic model has three operational properties:
Refactoring safety: Adding a field to a host{} Dict schema requires adding the field to the inventory_entry facts and to any predicate that needs it. Predicates that do not need the new field — and there will be many — require no changes. Positional schemas require updating every pattern match that destructures the term.
Schema enforcement at proof time: validate_all_host_entries/0 runs at module load. A malformed Dict — missing required field, wrong type — fails the proof before any oracle predicate ever runs. The KB is self-validating.
JSON parity: The host{} Dict schema maps directly to the Proxmox API's node response schema. Ingested API data (after whitelist filtering and normalisation) produces the same structure as the hand-authored KB facts. The ingestion predicate and the KB fact are unified by the same oracle predicates — no translation layer between "what the API says" and "what the Logic Node reasons about."
| Positional facts | Dict-based registry |
|---|---|
| Schema implicit (position = meaning) | Schema explicit (key name = meaning) |
| Adding a field: update all call sites | Adding a field: update facts and consuming predicates only |
| No type enforcement at fact level | validate_all_host_entries/0 enforces schema at load |
| Tag discrimination: impossible | Tag discrimination: first operation in every pattern match |
| JSON mapping: manual key-to-position | JSON mapping: json_read_dict/3 + whitelist + normalise |
| Atom Table risk: bounded (compile-time) | Atom Table risk: JSON ingestion opens vector — whitelist closes it |
8.6.2 Verification Checklist
?- use_module('/opt/logic-node/kb/inventory/proxmox_inventory_v2.pl').
true.
?- use_module('/opt/logic-node/kb/oracle/zfs_oracle_v2.pl').
true.
?- use_module('/opt/logic-node/kb/ingestion/json_ingestion.pl').
true.
Dict mechanics:
% 1. Tag discrimination: host{} ≠ vm{}
?- host_ram(host{ram:256}, R).
R = 256.
?- \+ host_ram(vm{ram:4}, _).
true. % ✓ Tag mismatch fails before key check
% 2. Functional update: OldDict unchanged
?- H = host{name:'pve-node-01', status:online},
M = H.put(status, maintenance),
H.status = online, % OldDict unchanged
M.status = maintenance.
true. % ✓ Pure functional update — no mutation
% 3. Missing key: exception not silent failure
?- H = host{name:'pve-node-01'},
catch((_ = H.ram), error(existence_error(key, ram, _), _), true).
true. % ✓ Missing required key throws — not silent unbound variable
Dict registry:
% 4. host_by_name retrieves correct Dict
?- host_by_name('pve-node-02', H), H.ram =:= 128.
true.
% 5. disks_for_host returns correct count
?- disks_for_host('pve-node-01', Ds), length(Ds, 2).
true.
% 6. Schema validation passes for all entries
?- validate_all_host_entries.
true. % ✓ All host{} entries conform to schema
Oracle and security:
% 7. Dict-aware replace_disk generates correct command
?- host_by_name('pve-node-01', H),
replace_disk(H, "WD-WX11A2K3P801", "WD-NEW001", Cmd),
sub_string(Cmd, _, _, _, "zpool replace").
true.
% 8. Key whitelist strips unknown keys
?- D = json{node:'pve-node-01', malicious_key: "attack", maxcpu:32},
whitelist_filter_dict(proxmox_node, D, F),
\+ get_dict(malicious_key, F, _).
true. % ✓ malicious_key absent from filtered Dict
% 9. Unknown schema throws
?- catch(whitelist_filter_dict(bad_schema, _{k:v}, _),
error(unknown_ingestion_schema(bad_schema), _), true).
true.
% 10. Bridge predicates maintain Chapter 3-7 compatibility
?- physical_host('pve-node-01', MAC, RAM).
MAC = 'AA:BB:CC:DD:EE:01', RAM = 256.
8.6.3 What Comes Next
Chapter 9 introduces modules — the mechanism by which the Logic Node's growing collection of oracle predicates, inventory KBs, and ingestion pipelines are isolated into separate namespaces with explicit export lists. The proxmox_inventory_v2 module and json_ingestion module introduced in this chapter are already structured as modules. Chapter 9 formalises the module system's scoping rules, import/export semantics, and the use of module qualifiers to call predicates across namespace boundaries.
Chapter Summary
| Concept | Operational Definition | Performance / Security Consequence |
|---|---|---|
| Dict syntax | Tag{Key:Value, ...} — named-field compound on WAM Heap |
Keys sorted at construction; O(log n) access per field by binary search |
| Tag discriminator | First element of Dict; checked before any key in unification | Immediate fail on wrong schema class — no partial unification cost |
Dict.key dot notation |
Sugar for get_dict(key, Dict, Value) — throws on missing key |
Fail-closed: missing required field is an exception, not an unbound variable |
get_dict/3 |
Soft key access — fails on missing key | Correct for optional fields; use ( -> ; ) for default handling |
Dict.put(Key, Value) |
Allocates new Dict; original unchanged | Pure functional update — no mutation; both old and new term are valid |
dict_pairs/3 |
Decompose/compose Dict ↔ Tag + sorted pair list | Bridge between Dict and list operations; full introspection without dot notation |
inventory_entry/1 |
Single predicate for all schema classes, tag-discriminated | Unified KB entry point; tag-based predicates provide filtered views |
| Bridge predicates | physical_host/3, vm/4, storage/3 derived from Dict registry |
Backward compatibility with Chapter 3–7 oracles during migration |
validate_all_host_entries/0 |
Runs at module load; throws on missing field or wrong type | Schema enforcement at proof time — not at runtime during oracle calls |
json_read_dict/3 |
Reads JSON → Dict; keys become atoms, values per strings() option |
strings(atom) interns all values — Atom Table exhaustion risk on untrusted input |
| Key whitelist | whitelist_filter_dict/3 drops all non-whitelisted keys before use |
Prevents unknown API keys from entering the Atom Table; unknown schema throws |
strings(string) option |
JSON string values become Prolog strings (GC-eligible), not atoms | Required for external JSON; convert specific values to atoms after whitelist check |
library(assoc) / library(rbtrees) |
AVL / RB-tree in-memory key-value structures | O(log n) update + O(log n) lookup with no predicate re-compilation; correct for high-frequency VM status updates |
| Content-Length limit | max_json_bytes/2 + read_string/3 byte ceiling before parse |
Closes Heap-overflow DoS: whitelisted-key payloads still exhaust Heap if unbounded; limit set conservatively per schema |
Ghost Key (meta{}) |
Nested meta{} Dict field for non-logical audit metadata |
Keeps ingested_at, source_ip, authorised_by out of oracle proof conditions; accessible only to audit predicates |
safe_json_read_dict_bounded/4 |
Content-Length check → bounded read → parse → whitelist | Preferred over safe_json_read_dict/3 for all external-facing ingestion paths |
Exercises
Exercise 8.1 — Dict Construction and Unification
Write a predicate make_host_dict/5 that takes (Name, MAC, RAM, Role, Cluster) and constructs a host{} Dict with status: online as a default. Verify that make_host_dict('pve-node-04', 'FF:EE:DD:CC:BB:04', 512, tertiary, 'sovereign-cluster-01', H) produces a valid Dict accepted by validate_host_dict/1. Then verify that host_ram(H, 512) succeeds using the pattern-matching predicate from Section 8.2.1.
Exercise 8.2 — Functional Update Chain
Write a predicate apply_health_update/3 that takes a host{} Dict, a health_check{} Dict from the ingestion pipeline, and produces an updated host{} Dict with status, io_wait, and a last_checked timestamp field added. Use Dict.put/2 for each field update. Verify that the original Dict is unchanged after the predicate runs.
Exercise 8.3 — Bridge Predicate Validation
Load both proxmox_inventory.pl (Chapter 3 original) and proxmox_inventory_v2.pl with bridge predicates. Write a predicate inventory_equivalence_check/0 that verifies: for every physical_host/3 fact in the original, the corresponding host{} Dict in the v2 registry has identical name, mac, and ram values. Report any discrepancies. Run this as a regression test.
Exercise 8.4 — Key Whitelist Extension
Add a new ingestion schema proxmox_storage for the Proxmox /storage API endpoint with whitelisted keys [storage, type, active, enabled, maxdisk, disk, content]. Write normalise_storage_dict/2 that maps a raw API Dict to a storage_pool{} Dict with canonical field names. Test it with a mock API Dict that includes three non-whitelisted keys and verify they are absent from the output.
Exercise 8.5 — Dict-Aware Batch Oracle
Refactor batch_snapshot/3 from Chapter 7 to accept a list of vm{} Dicts instead of a list of VM IDs. The predicate should: (1) verify each element is a vm{} Dict using is_dict/2, (2) filter to running VMs via VM.status = running, (3) generate snapshot commands using VM.id and VM.name, (4) apply the batch_excluded_vm/1 check using VM.id. Compare the readability and line count of the Dict version against the original. Document which checks are simplified by dot notation and which remain unchanged.
Further Reading
- SWI-Prolog Manual: Dicts —
https://www.swi-prolog.org/pldoc/man?section=bidicts— the authoritative reference for Dict syntax, access predicates, and the.putfunctional update - SWI-Prolog Manual:
library(http/json)—https://www.swi-prolog.org/pldoc/man?section=json—json_read_dict/3options includingstrings,tag, andvalue_string_codes - SWI-Prolog Manual:
dict_pairs/3—https://www.swi-prolog.org/pldoc/man?predicate=dict_pairs/3 - SWI-Prolog Manual:
is_dict/1,is_dict/2—https://www.swi-prolog.org/pldoc/man?predicate=is_dict/1 - Wielemaker, J. (2015). A Novel Term Representation for Prolog: Dicts. Technical Report, VU University Amsterdam. — Original design paper for SWI-Prolog Dicts
- RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format — normative reference for the JSON format processed by the ingestion pipeline
- OWASP: Denial of Service Cheat Sheet — Atom Table exhaustion maps to the "resource exhaustion" category in OWASP's DoS taxonomy
End of Chapter 8 — Next: Chapter 9: Modules, Namespaces, and the Logic Node Architecture
Revision record: Chapter 8.1 — Gemini review applied. Deletions: "This matters for the Logic Node" removed from Core Concept 3; "This is correct — a host without a ram field is malformed" comment removed from Section 8.2.1; "actual issue" artifact not present in this chapter (confirmed). Improvements: Section 8.5.4.1 added —
library(assoc)AVL trees andlibrary(rbtrees)for high-frequency dynamic inventories, with selection rule table mapping update frequency to correct data structure;max_json_bytes/2per-schema Content-Length limits added tojson_ingestion.pl,safe_json_read_dict_bounded/4added as preferred entry point for external-facing ingestion, Content-Length rationale paragraph added to Section 8.5.3 explaining Heap-overflow as independent DoS vector from Atom Table exhaustion; Section 8.3.2.1 (Ghost Key pattern) added —meta{}nested Dict for audit metadata isolation,normalise_node_dict_with_meta/4for ingestion-time metadata attachment,entry_audit_trail/2for audit predicate access, discipline note explaining why oracle code must never referenceDict.meta. Chapter Summary table extended with four new rows. BookStack tags:swi-prolog,chapter-08,dicts,json,schema,security,atom-table,volume-i