Skip to main content

Chapter 2: The Anatomy of Unification

The word "unification" is borrowed from mathematical logic, where it names the process of finding a substitution that makes two expressions identical. That definition is precise and complete. The engineer who internalises it — rather than reaching for the comfortable analogies of "pattern matching" or "destructuring assignment" — will write code that is structurally sound instead of accidentally correct.

Five properties define unification as a mathematical constraint. Each has direct consequences for how infrastructure logic is written and how it fails.

1. Unification is a constraint, not an instruction. When you write X = 5 in Python, you are issuing a command: "allocate a binding for the name X and point it at the integer object 5." The variable X is a mutable label. It can be repointed at 7 on the next line. In Prolog, X = 5 is a constraint statement: "the value of X must be consistent with 5 in this proof branch." If X was already bound to 3, the constraint is unsatisfiable and the branch fails immediately. If X is unbound, it becomes bound to 5 for the duration of this proof branch — and only this proof branch. This is not assignment. It is commitment under a proof obligation.

2. Unification is bidirectional. f(X, b) = f(a, Y) does not have a "left side" and a "right side" in any meaningful sense. Both terms are unified simultaneously. The engine finds a substitution — {X → a, Y → b} — that makes both terms structurally identical. Neither term is the "template" and neither is the "value." This bidirectionality is what allows Prolog predicates to function both as constructors and as destructors depending on which arguments are instantiated at call time. An imperative function can only flow data in one direction; a Prolog predicate flows data in whatever direction the proof requires.

3. Variables are write-once within a proof branch. Once a variable is bound during unification, it cannot be rebound within the same proof branch. This is not a limitation — it is a correctness guarantee. Infrastructure reasoning that produces contradictory bindings for the same variable is, by definition, inconsistent. The WAM's single-assignment semantics surface that inconsistency immediately, as a proof failure, rather than allowing it to propagate silently as incorrect state. Languages that permit mutation hide inconsistency. Prolog exposes it.

4. Structural matching is intrinsic, not implemented. In Python, matching a nested dictionary {"host": {"ip": "10.0.0.1", "vlan": 20}} against a pattern requires either a library (dacite, pydantic, a match statement), a schema definition, or nested conditional logic. In Prolog, the match Host = host(network(ip("10.0.0.1"), vlan(20))) is performed by the unification engine with no library, no schema, and no conditional. The term structure is the schema. The unification is the parse. This collapses an entire category of parsing and validation code into a single operation with formal semantics.

5. Unification failure is the correct security default. A unification that cannot be satisfied fails. It does not return None. It does not raise a KeyError. It does not silently succeed with a partial match. The proof branch terminates. This fail-closed property, established in Chapter 1 in the context of access control, extends to every use of unification in the system. A term representing a network packet that fails to unify with the expected structure is rejected at the structural level before any logic that inspects its content is ever reached. The parser and the validator are the same operation.

2.1 Unification vs. Assignment

2.1.1 The = Operator: A Constraint, Not a Command

The =/2 predicate is the most syntactically familiar thing in Prolog to an engineer coming from any other language, and consequently the most dangerous source of conceptual error. The notation X = Y looks like assignment. It is not assignment. Understanding precisely what it is — at the level of the WAM, not at the level of analogy — prevents an entire class of logical errors that are otherwise invisible until they manifest as incorrect proof results in production.

The formal definition: =/2 succeeds if and only if its two arguments can be unified — that is, if there exists a substitution σ (a mapping from variables to terms) such that applying σ to both arguments produces syntactically identical terms.

% These succeed — a unifying substitution exists
?- foo(X, b) = foo(a, Y).
X = a,
Y = b.

?- point(X, X) = point(3, 3).
X = 3.

?- Host = host(srv01, ip(10,0,0,1)).
Host = host(srv01, ip(10, 0, 0, 1)).

% These fail — no unifying substitution exists
?- foo(a) = foo(b).
false.

?- point(X, X) = point(3, 4).   % X cannot be both 3 and 4
false.

?- 5 = 6.
false.

The third success case is worth holding for a moment. Host is an unbound variable. The term host(srv01, ip(10,0,0,1)) is a ground term (no variables). The substitution {Host → host(srv01, ip(10,0,0,1))} makes both sides identical. The unification succeeds, and Host is now bound. This looks like assignment. The difference becomes apparent when you consider what happens next: Host cannot be rebound.

?- Host = host(srv01, ip(10,0,0,1)), Host = host(srv02, ip(10,0,0,2)).
false.

The second unification attempt fails because Host is already bound to host(srv01, ip(10,0,0,1)), and host(srv01, ip(10,0,0,1)) does not unify with host(srv02, ip(10,0,0,2)). In Python, the second assignment would silently overwrite the first. In Prolog, the inconsistency is surfaced immediately as proof failure.

2.1.2 Single-Assignment Semantics and the Write-Once Property

Prolog variables are logic variables, not memory slots.

A memory slot (Python variable, C variable, Bash variable) is a named location in memory. It holds a value. That value can be replaced at any time. The name and the location are persistent; only the content changes. There is no formal relationship between the old value and the new value. The programmer is responsible for ensuring that overwrites are semantically correct.

A logic variable is an unknown in a mathematical equation. It has no value until a proof branch commits one. Within that proof branch, it has exactly one value — the one committed by the first successful unification. Subsequent unifications that require a different value cause the proof branch to fail, not the variable to change. The variable does not "contain" a value in the memory-slot sense; it is the value, in the sense that it is interchangeable with that value throughout the proof branch.

% Demonstrating single-assignment as a correctness tool

% This query asks: is there an X such that X=3 AND X=4?
% Mathematically, no such X exists. Prolog surfaces this correctly.
?- X = 3, X = 4.
false.

% This query asks: is there an X such that X=3 AND X=3?
% Yes. X=3 satisfies both constraints.
?- X = 3, X = 3.
X = 3.

% Infrastructure application: verifying consistent network data
% If a host appears twice in a dataset with different IPs, this fails.
check_consistent(Host, IP) :-
    recorded_host(Host, IP),   % First sighting: binds IP
    reported_host(Host, IP).   % Second sighting: must match bound IP or fail

The last example is not defensive programming. It is the natural expression of a consistency requirement. The same host name with two different IP addresses cannot simultaneously satisfy IP = "10.0.0.1" and IP = "10.0.0.2". The single-assignment property makes the inconsistency detection automatic and zero-cost.

2.1.3 The Trail: Variable Binding and Restoration

The WAM's Trail was introduced in Chapter 1 as the mechanism behind Prolog's rollback semantics. Here it is examined in detail, because the Trail is the precise mechanism by which single-assignment and backtracking coexist — which seems contradictory until the implementation is understood.

When the WAM binds a variable during unification, it does two things simultaneously:

  1. It writes the binding into the variable's heap cell (the cell transitions from a self-referential pointer — "unbound" — to a pointer to the bound term).
  2. It pushes a reference to the variable's heap cell address onto the Trail stack.

The Trail is therefore a chronological record of every variable binding made since the last choice point was created. When a proof branch fails and the WAM backtracks to a choice point, it unwinds the Trail from the current top back to the trail mark recorded at the choice point. For every cell address on the Trail, it resets that cell to its self-referential (unbound) state.

Trail state during: f(X, Y) = f(a, b)

BEFORE unification:
  Trail top: T₀
  Heap cell for X: [REF → self]    (unbound)
  Heap cell for Y: [REF → self]    (unbound)

DURING — X = a:
  Heap cell for X: [REF → atom(a)] (bound)
  Trail push: address_of(X)
  Trail top: T₁

DURING — Y = b:
  Heap cell for Y: [REF → atom(b)] (bound)
  Trail push: address_of(Y)
  Trail top: T₂

IF FAILURE (backtrack to T₀):
  Unwind Trail from T₂ to T₀:
    Reset cell at address_of(Y) → [REF → self]
    Reset cell at address_of(X) → [REF → self]
  Trail top: T₀

AFTER backtrack:
  Heap cell for X: [REF → self]    (unbound — as if binding never happened)
  Heap cell for Y: [REF → self]    (unbound)

This is why single-assignment and backtracking are not contradictory. Within any single proof path, a variable is bound at most once. Across multiple proof paths (via backtracking), a variable can be unbound and rebound in different branches — but it is never simultaneously bound to two different values within the same path. The Trail enforces this by making every binding reversible at the cost of a stack push, while preventing rebinding within a path at the cost of a unification failure.

2.1.4 Diagram: Unification of Two Structures

The following diagram shows the bidirectional binding process when unifying node(Name, IP) with node("srv-01", "10.0.0.1"), tracing each variable's transition from unbound to bound and the corresponding Trail entries.

flowchart TD
    subgraph Query["Query Term (Left Side)"]
        Q1["node/2\nfunctor"]
        Q2["Name\n(unbound variable\nREF → self)"]
        Q3["IP\n(unbound variable\nREF → self)"]
        Q1 --> Q2
        Q1 --> Q3
    end

    subgraph Ground["Ground Term (Right Side)"]
        G1["node/2\nfunctor"]
        G2["atom: srv-01"]
        G3["atom: 10.0.0.1"]
        G1 --> G2
        G1 --> G3
    end

    subgraph Trail["WAM Trail Stack"]
        T1["T₀: trail mark\n(choice point)"]
        T2["T₁: push addr(Name)"]
        T3["T₂: push addr(IP)"]
        T1 --> T2 --> T3
    end

    subgraph Result["Heap State After Unification"]
        R1["Name cell:\nREF → atom(srv-01)"]
        R2["IP cell:\nREF → atom(10.0.0.1)"]
    end

    Q1 -->|"1. functors match:\nnode/2 = node/2"| G1
    Q2 -->|"2. unify arg 1:\nName = srv-01"| G2
    Q3 -->|"3. unify arg 2:\nIP = 10.0.0.1"| G3
    Q2 -->|"binding written"| R1
    Q3 -->|"binding written"| R2
    R1 -->|"addr(Name) pushed"| T2
    R2 -->|"addr(IP) pushed"| T3

    subgraph Backtrack["On Failure: Trail Unwind"]
        BT1["Reset addr(IP) → REF self"]
        BT2["Reset addr(Name) → REF self"]
        BT3["Trail top → T₀"]
        T3 -->|"unwind T₂"| BT1
        BT1 --> BT2
        BT2 --> BT3
    end

    style Q2 fill:#8B6914,color:#FFFFFF
    style Q3 fill:#8B6914,color:#FFFFFF
    style G2 fill:#1A5276,color:#FFFFFF
    style G3 fill:#1A5276,color:#FFFFFF
    style R1 fill:#1A6B3A,color:#FFFFFF
    style R2 fill:#1A6B3A,color:#FFFFFF
    style T1 fill:#2C3E50,color:#FFFFFF
    style T2 fill:#4A235A,color:#FFFFFF
    style T3 fill:#4A235A,color:#FFFFFF
    style BT1 fill:#8B0000,color:#FFFFFF
    style BT2 fill:#8B0000,color:#FFFFFF
    style BT3 fill:#8B0000,color:#FFFFFF

Reading the diagram: Unification proceeds top-down on the functor, then left-to-right on arguments. Step 1 checks that both terms have the same functor and arity (node/2). Steps 2 and 3 unify arguments pairwise. Each successful variable binding is recorded on the Trail. If any step fails, the Trail unwind path (bottom section, red) restores every variable to its unbound state in reverse chronological order.

The bidirectionality: if this query had been node("srv-01", IP) = node(Name, "10.0.0.1"), the binding paths would reverse — Name would receive "srv-01" from the left side and IP would receive "10.0.0.1" from the right side. The engine does not distinguish "input" from "output" arguments.


2.2 The WAM Heap: Terms and Representation

2.2.1 The Four Term Types

Every data object in SWI-Prolog is a term. There are four categories, and each has a distinct memory representation on the WAM Heap (Global Stack).

Atoms are indivisible symbolic constants. nginx, prod, true, [] are atoms. In memory, an atom is represented as a tagged word containing an index into the Atom Table. The actual character data lives in the Atom Table; the heap cell for an atom contains only the table index. Comparison between two atoms is an O(1) integer comparison of their table indices.

?- X = nginx, Y = nginx, X == Y.
X = Y, Y = nginx.
% Both X and Y hold the same atom table index — pointer equality, not string comparison

Integers (and floats) are numeric constants. Small integers (fitting in a machine word minus the tag bits) are stored directly in the tagged word — no heap allocation. SWI-Prolog on 64-bit systems can store integers up to 2⁶⁰ - 1 without heap allocation. Larger integers are heap-allocated as multi-word structures.

?- X = 42, integer(X).
X = 42.

?- X = 99999999999999999999, integer(X).  % Arbitrary precision — heap allocated
X = 99999999999999999999.

Variables (uninstantiated logic variables) are heap cells containing a self-referential pointer — the cell points to itself. This self-reference is the canonical WAM encoding of "unbound." When a variable is bound, the self-pointer is overwritten with a pointer to the bound term. The address of the cell does not change; only the pointer within it changes.

?- var(X).
true.
% X is a heap cell: [addr_X | REF → addr_X]  (self-referential)

?- X = hello, var(X).
false.
% X is now: [addr_X | REF → atom_index(hello)]  (bound)

Compound terms (structures) are the primary data-bearing construct. f(a, b, c) is a compound term with functor f and arity 3. On the WAM Heap, a compound term is stored as a contiguous block of N+1 cells: one header cell containing the functor descriptor (functor name + arity, encoded as a single machine word), followed by N argument cells, each of which is a tagged word pointing to the representation of that argument.

2.2.2 Heap Cell Layout and the Tagged Word Architecture

SWI-Prolog's WAM represents every term as a 64-bit tagged word. The low-order bits serve as a type tag, and the remaining bits hold the value or a pointer.

Tag (low bits) Type Remaining bits contain
00 Reference (REF) Address of the cell this variable is bound to
01 Atom Atom Table index
10 Compound Address of the compound's header cell on the Heap
11 Integer (small) The integer value, sign-extended

The self-referential property of unbound variables: an unbound variable at heap address H is stored as the tagged word (H << 2) | 00 — a REF tag pointing to address H itself. The WAM's "dereference" operation follows REF chains until it finds a cell that is either non-REF or self-referential. A self-referential REF is the termination condition: the variable is unbound.

Heap layout for: host(srv01, ip(10,0,0,1))

Address | Content
--------|--------
H+0     | COMPOUND → H+1         (the compound term: points to its header)
H+1     | FUNCTOR: host/2         (functor descriptor)
H+2     | REF → H+5               (argument 1: points to atom srv01)
H+3     | COMPOUND → H+4          (argument 2: points to inner compound)
H+4     | FUNCTOR: ip/4
H+5     | ATOM: srv01             (atom table index for srv01)
H+6     | INTEGER: 10             (small int, no heap allocation)
H+7     | INTEGER: 0
H+8     | INTEGER: 0
H+9     | INTEGER: 1

Understanding this layout has two practical consequences:

Consequence 1: Unification of two ground compound terms is a depth-first tree walk. The WAM compares functor descriptors, then recursively unifies argument cells. The time complexity is proportional to the number of cells in the smaller term. For infrastructure terms with bounded depth and arity — host records, network topology nodes — this is effectively O(1) in practice.

Consequence 2: Partial unification is structurally precise. When you write host(_, network(_, vlan(VID))), the _ (anonymous variable) is a fresh unbound variable. The unification engine traverses the compound structure, skips the _ argument (it unifies with anything), descends into network/2, skips the second _, and descends into vlan/1 to bind VID. The traversal path is exactly the path you specified. No fields are read that you did not ask for. There is no deserialization, no object construction, no temporary allocation beyond the variable binding itself.

2.2.3 Variable Chains and Dereferencing

A subtlety that affects performance in complex unifications: variables can be bound to other variables, creating reference chains.

?- X = Y, Y = Z, Z = final_value.
X = Y, Y = Z, Z = final_value.

The WAM heap after this sequence:

Cell for X: REF → addr(Y)
Cell for Y: REF → addr(Z)
Cell for Z: ATOM → final_value

Accessing X's value requires following the chain X → Y → Z → final_value. The WAM's dereference loop performs this traversal on every variable access. For shallow chains (1–3 hops, typical in well-structured code) this is negligible. For pathological chains (created by code that unifies variables to variables to variables in long sequences), the dereference cost accumulates.

SWI-Prolog performs path compression during some operations — binding the end of a chain directly to the final value — but this is not guaranteed in all contexts. The engineering implication: in performance-sensitive code on the Logic Node, prefer binding variables directly to ground terms rather than chaining through intermediate variables.

% Less efficient: chain X → Y → Z → value
assign_via_chain(X) :-
    Y = value,
    Z = Y,
    X = Z.

% More efficient: direct binding
assign_direct(value).

2.2.4 Diagram: Heap Memory Layout of a Nested Infrastructure Term

flowchart TD
    subgraph Heap["WAM Heap — host(srv01, ip(10,0,0,1))"]
        direction TB

        H0["H+0: COMPOUND\n→ H+1\n(term reference)"]
        H1["H+1: FUNCTOR\nhost/2"]
        H2["H+2: REF → H+6\n(arg 1: atom srv01)"]
        H3["H+3: COMPOUND\n→ H+4\n(arg 2: ip/4 term)"]
        H4["H+4: FUNCTOR\nip/4"]
        H5["H+5: INTEGER 10\n(octet 1)"]
        H6["H+6: ATOM\nsrv01"]
        H7["H+7: INTEGER 0\n(octet 2)"]
        H8["H+8: INTEGER 0\n(octet 3)"]
        H9["H+9: INTEGER 1\n(octet 4)"]

        H0 --> H1
        H1 -->|"arg 1"| H2
        H1 -->|"arg 2"| H3
        H2 -->|"deref"| H6
        H3 --> H4
        H4 -->|"arg 1"| H5
        H4 -->|"arg 2"| H7
        H4 -->|"arg 3"| H8
        H4 -->|"arg 4"| H9
    end

    subgraph Unbound["Unbound Variable — before binding"]
        UV["Cell addr V:\nREF → addr V\n(self-reference = unbound)"]
        UV -->|"points to itself"| UV
    end

    subgraph Bound["After: V = srv01"]
        BV["Cell addr V:\nREF → H+6\n(bound to atom srv01)"]
        BV -->|"deref"| H6
    end

    style H1 fill:#1A2B4A,color:#FFFFFF
    style H4 fill:#1A2B4A,color:#FFFFFF
    style H6 fill:#1A5276,color:#FFFFFF
    style H5 fill:#2E6B3E,color:#FFFFFF
    style H7 fill:#2E6B3E,color:#FFFFFF
    style H8 fill:#2E6B3E,color:#FFFFFF
    style H9 fill:#2E6B3E,color:#FFFFFF
    style UV fill:#8B6914,color:#FFFFFF
    style BV fill:#1A6B3A,color:#FFFFFF

2.3 The Occurs Check: Preventing Logic Circles

2.3.1 What the Occurs Check Is

Standard unification — the algorithm implemented in the WAM's =/2 predicate — has a deliberate omission for performance reasons. When binding a variable X to a term T, the standard algorithm does not check whether X occurs anywhere within T. This omission is known as the absence of the occurs check.

When X does occur within T (for example, T = f(X)), the result of the binding X = f(X) is a cyclic term: a data structure that contains itself as a subterm. In memory, this is a heap cell whose reference chain eventually loops back to itself.

Cyclic term: X = f(X)

Heap cell for X: COMPOUND → H+1
H+1: FUNCTOR f/1
H+2: REF → addr(X)   ← This points back to the start
     ↑_________________|

Dereferencing this structure traverses: X → f/1 → arg1 → X → f/1 → arg1 → X → ...

This is not a theoretical concern. It is a concrete denial-of-service vector.

2.3.2 Cyclic Terms as a DoS Vector

Any code that attempts to traverse or print a cyclic term enters an infinite loop. The WAM's term traversal operations — used by write/1, copy_term/2, functor/3, =../2 (univ), and crucially =/2 itself when comparing cyclic terms — are not cycle-aware unless explicitly coded to be.

% DO NOT RUN THIS IN PRODUCTION — creates a cyclic term
?- X = f(X).
% SWI-Prolog handles this with rational tree unification by default
% Result: X = f(f(f(f(...)))).   (infinite rational tree)

% The danger: code that reads external data and creates terms from it
% could accidentally create cyclic structures if the occurs check is absent

% Safe in the REPL due to SWI-Prolog's cycle detection in write/1
% UNSAFE in any code that does manual term traversal without cycle guards

SWI-Prolog's default behaviour for X = f(X) is to succeed and create a rational tree (an infinite but regular cyclic structure). The REPL's write/1 handles this with cycle detection and prints it as X = f(f(f(...))). This behaviour is controlled by the occurs_check Prolog flag, which defaults to false.

The security exposure is in code you write, not in SWI-Prolog's own builtins. Specifically:

% Dangerous: manual term traversal without cycle guard
contains_value(Term, Value) :-
    Term =.. [_|Args],          % Decompose term
    member(Arg, Args),          % Get each argument
    contains_value(Arg, Value). % Recurse — INFINITE LOOP on cyclic term

% If Term is cyclic, this recursion never terminates.
% An attacker who can supply Term has a DoS vector.
% Dangerous: equality check on terms from external input
% If both terms are cyclic and share structure, =/2 may loop
process_request(IncomingTerm, ExpectedTerm) :-
    IncomingTerm = ExpectedTerm,  % May not terminate if IncomingTerm is cyclic
    execute(IncomingTerm).

The vector is realistic in any Logic Node that accepts external input and constructs Prolog terms from it without sanitisation — a pattern that becomes common as the system matures and external systems feed the fact database.

2.3.3 unify_with_occurs_check/2: The ISO-Compliant Solution

The predicate unify_with_occurs_check/2 (UWOC) performs standard unification with the occurs check re-enabled. Before binding variable X to term T, it verifies that X does not occur anywhere within T. If it does, the unification fails rather than creating a cyclic term.

% Safe unification — occurs check enabled
?- unify_with_occurs_check(X, f(X)).
false.
% Correctly fails: X occurs in f(X), binding would create a cycle

?- unify_with_occurs_check(f(X, b), f(a, Y)).
X = a,
Y = b.
% Standard unification — no cycles, succeeds normally

?- unify_with_occurs_check(node(Name, IP), node(srv01, ip(10,0,0,1))).
Name = srv01,
IP = ip(10,0,0,1).
% Infrastructure terms: no self-reference possible, UWOC = =/2 in effect

The performance cost: UWOC is O(n²) in the size of the term being unified, versus O(n) for standard unification, because it must traverse the right-hand term to check for occurrences of the variable before binding. For deeply nested terms, this cost is measurable.

The engineering decision for the Sovereign Node:

Use unify_with_occurs_check/2 at all trust boundaries — any point where external data enters the knowledge base as a Prolog term. Use standard =/2 everywhere else.

The trust boundary is the narrowest possible interface between external input and internal logic. If you process 1,000 fact assertions per second from external sources, UWOC at the ingestion predicate costs you one O(n²) traversal per assertion. Every internal unification — the reasoning over already-ingested, already-validated facts — uses standard =/2 with no overhead.

2.3.4 Enabling the Occurs Check Globally

For environments where all input is external and the performance cost is acceptable, the occurs check can be enabled globally:

% In /etc/swipl.rc or at the top of your knowledge base entrypoint
:- set_prolog_flag(occurs_check, true).

With this flag set, =/2 behaves as unify_with_occurs_check/2 system-wide. The performance penalty applies to all unifications. For most infrastructure reasoning workloads — the fact databases are bounded, the terms are shallow — this penalty is immeasurable in practice. The flag is worth setting if the operational simplicity of not tracking trust boundaries outweighs the theoretical throughput reduction.

% Verify the flag
?- current_prolog_flag(occurs_check, X).
X = true.  % If set in swipl.rc

2.3.5 Detecting Cyclic Terms

When receiving terms from external sources that might be cyclic (e.g., deserialised from a format that permits circular references), use is_list/1 or acyclic_term/1 to verify:

% acyclic_term/1 succeeds if and only if the term contains no cycles
?- acyclic_term(f(a, b, g(c))).
true.

?- X = f(X), acyclic_term(X).
false.
% Correctly identifies the cyclic term

% Safe ingestion pattern for external compound terms
ingest_host_term(Term) :-
    ( acyclic_term(Term)
    -> validate_host_structure(Term),
       assert_host_fact(Term)
    ;  throw(error(cyclic_input(Term), ingest_host_term/1))
    ).

2.3.6 Security Summary: The Occurs Check Policy

Unification Site Predicate to Use Rationale
External input ingestion unify_with_occurs_check/2 Trust boundary; cycle prevention
Internal fact pattern matching =/2 Already-validated facts; maximum performance
External term validation acyclic_term/1 before any unification Structural guard before processing
Global flag (occurs_check) true if throughput allows Simplifies trust boundary tracking
Term comparison (non-unifying) ==/2 (structural equality, no binding) No cycle risk; pointer comparison

Sovereign Security Rule: A Logic Node that ingests external terms without occurs-check protection is operating with an unbounded denial-of-service surface on its reasoning engine. Cyclic term injection causes the WAM to enter an infinite traversal loop. There is no timeout built into the unification engine. The only defences are: prevent the cyclic term from being created (UWOC), detect it before traversal (acyclic_term/1), or kill the query via an external watchdog (crude, and covered in Chapter 6).


2.4 The Build: Mapping Infrastructure Topology

2.4.1 Rationale: Why Terms Beat JSON for Infrastructure State

Before the REPL exercises begin, the engineering argument for using compound terms — rather than JSON, YAML, or dictionary objects — to represent infrastructure topology deserves to be stated precisely.

JSON is ubiquitous because it is human-readable and universally parseable. It is a poor choice for infrastructure logic for three reasons.

Reason 1: JSON has no structural schema enforcement at the parsing layer. A JSON parser that receives {"host": {"ip": "ten.zero.zero.one"}} will succeed. The string "ten.zero.zero.one" is syntactically valid JSON. Detecting that it is not a valid IP address requires a separate validation step with a regex or schema validator — code that must be written, tested, and maintained. A Prolog term host(network(ip(10,0,0,1), vlan(20))) with integer octets structurally cannot contain "ten.zero.zero.one" in the position that expects an integer. The structure enforces the type.

Reason 2: JSON extraction requires a parser for every access. Accessing the VLAN ID in a JSON object requires: parsing the JSON string into a dictionary object, indexing into the network key, indexing into the vlan key, type-checking the result, and handling the KeyError if any key is absent. In Prolog: Host = host(_, network(_, vlan(VID))). This is not a simplification. It is the complete, exhaustive code. The unification engine performs the traversal, the structural check, and the value extraction in one operation.

Reason 3: JSON comparison requires deserialization. Determining whether two JSON representations of the same host are semantically equal requires full deserialization of both, then recursive comparison of the resulting structures. Prolog term comparison (==/2) is a pointer-level operation for ground terms in the same Heap — effectively free.

The Sovereign Node's fact database stores infrastructure state as compound terms. External systems that provide data in JSON format have that JSON converted to terms at the ingestion boundary (the only place where JSON parsing occurs). Once inside the logic engine, everything is term unification.

Storage Infrastructure Note — ZFS Recordsize: Prolog fact databases for large topologies are loaded either as plain .pl text files or as binary quick-load files created with qsave_program/2. Both formats are written and read sequentially in large chunks. The default ZFS recordsize of 128K is tuned for mixed random/sequential workloads. For the /opt/logic-node dataset — which is write-once at boot and read-sequentially during the Sovereign Boot sequence — set recordsize=1M:

# Run once on the Proxmox host after creating the VM's dataset
# Adjust pool/dataset name to match your Proxmox storage configuration
zfs set recordsize=1M rpool/data/vm-200-disk-0

# Verify
zfs get recordsize rpool/data/vm-200-disk-0
# NAME                        PROPERTY    VALUE   SOURCE
# rpool/data/vm-200-disk-0    recordsize  1M      local

For a million-row topology file (~80–120 MB of Prolog text), this reduces the number of ZFS block reads during consult/1 by roughly 8x, which is measurable on cold-cache boots where the fact database is not yet in the ARC. The tradeoff — increased internal fragmentation for small files — is irrelevant on a dataset that holds a small number of large files.

2.4.2 Constructing the Infrastructure Host Term

Open the SWI-Prolog REPL on the Logic Node:

logicadmin@logic-node-01:~$ swipl

Exercise 2.4.1: Constructing a nested host term

The target structure represents a physical or virtual host with hardware metadata and network configuration:

host(
    metadata(
        mac("00:11:22:33:44:55"),
        os("linux-mint-22")
    ),
    network(
        ip("10.0.0.5"),
        vlan(20)
    )
)

Construct and verify this term in the REPL:

?- Host = host(
       metadata(
           mac("00:11:22:33:44:55"),
           os("linux-mint-22")
       ),
       network(
           ip("10.0.0.5"),
           vlan(20)
       )
   ).

Host = host(
    metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
    network(ip("10.0.0.5"), vlan(20))
).

Verify the term's structural properties:

% Check the top-level functor and arity
?- Host = host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
               network(ip("10.0.0.5"), vlan(20))),
   functor(Host, Name, Arity).
Name = host,
Arity = 2.

% Verify it is a compound term
?- Host = host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
               network(ip("10.0.0.5"), vlan(20))),
   compound(Host).
true.

% Verify it is acyclic (safety check)
?- Host = host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
               network(ip("10.0.0.5"), vlan(20))),
   acyclic_term(Host).
true.

2.4.3 Structural Extraction via Unification

Exercise 2.4.2: Extracting the VLAN ID

The most direct demonstration of structural unification as a data access mechanism. No traversal code, no key lookup, no error handling for missing keys:

?- Host = host(
       metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
       network(ip("10.0.0.5"), vlan(20))
   ),
   Host = host(_, network(_, vlan(VID))).

VID = 20.

The second unification Host = host(_, network(_, vlan(VID))) does the following in one WAM operation:

  • Verifies that Host is a compound term with functor host and arity 2
  • Binds _ (anonymous variable) to the first argument — discards it
  • Verifies that the second argument is a compound term with functor network and arity 2
  • Binds _ (second anonymous variable) to the first argument of network — discards it
  • Verifies that the second argument of network is a compound term with functor vlan and arity 1
  • Binds VID to the single argument of vlan

If Host's structure differs at any point — wrong functor, wrong arity, wrong nesting — the unification fails. There is no partial result. There is no VID = undefined. The structural contract is enforced atomically.

Exercise 2.4.3: Extracting multiple fields simultaneously

?- Host = host(
       metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
       network(ip("10.0.0.5"), vlan(20))
   ),
   Host = host(metadata(mac(MAC), os(OS)), network(ip(IP), vlan(VID))).

MAC = "00:11:22:33:44:55",
OS = "linux-mint-22",
IP = "10.0.0.5",
VID = 20.

All four fields extracted in a single unification. In Python, this would be:

mac = host["metadata"]["mac"]       # KeyError if missing
os  = host["metadata"]["os"]        # KeyError if missing
ip  = host["network"]["ip"]         # KeyError if missing
vid = host["network"]["vlan"]       # KeyError if missing

Four operations, four potential KeyError exceptions, no structural verification that the nesting is correct, no check that mac is actually a string of the expected format, no atomicity — if the third line throws, ip is unset and vid is unset, but mac and os are already bound. In the Prolog version, either all four variables are bound (proof succeeds) or none are (proof fails). There is no intermediate state.

Exercise 2.4.4: Using structural patterns as predicates

Define predicates that use unification as their primary logic:

% Load into: /opt/logic-node/topology.pl
% Run with: swipl -l /opt/logic-node/topology.pl

% Host fact database
host_record(host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
                 network(ip("10.0.0.5"), vlan(20)))).

host_record(host(metadata(mac("AA:BB:CC:DD:EE:FF"), os("debian-12")),
                 network(ip("10.0.0.6"), vlan(20)))).

host_record(host(metadata(mac("11:22:33:44:55:66"), os("linux-mint-22")),
                 network(ip("10.0.1.1"), vlan(30)))).

% Extract VLAN from any host record
host_vlan(HostRecord, VID) :-
    HostRecord = host(_, network(_, vlan(VID))).

% Find all hosts on a given VLAN
hosts_on_vlan(VLAN, MAC, IP) :-
    host_record(Host),
    Host = host(metadata(mac(MAC), _), network(ip(IP), vlan(VLAN))).

% Find all hosts running a given OS
hosts_with_os(OS, MAC) :-
    host_record(host(metadata(mac(MAC), os(OS)), _)).
% Query: all hosts on VLAN 20
?- hosts_on_vlan(20, MAC, IP).
MAC = "00:11:22:33:44:55",
IP = "10.0.0.5" ;
MAC = "AA:BB:CC:DD:EE:FF",
IP = "10.0.0.6" ;
false.

% Query: all Mint hosts
?- hosts_with_os("linux-mint-22", MAC).
MAC = "00:11:22:33:44:55" ;
MAC = "11:22:33:44:55:66" ;
false.

% Query: what VLAN is 10.0.1.1 on?
?- host_record(Host), Host = host(_, network(ip("10.0.1.1"), vlan(VID))).
VID = 30.

2.4.4 Subnet Matching via Structural Unification

IP addresses represented as ip(O1, O2, O3, O4) with integer octets enable subnet membership tests without string operations:

% Subnet membership: host is on 10.0.0.0/24 if octets 1-3 match exactly
on_subnet_10_0_0(host(_, network(ip(10,0,0,_), _))).

% Generalised subnet check: first N octets must match
% For /24: match first 3 octets
in_subnet_24(ip(A,B,C,_), ip(A,B,C,_)).

% For /16: match first 2 octets
in_subnet_16(ip(A,B,_,_), ip(A,B,_,_)).
% Load the topology.pl file updated with integer-octet IP terms
?- Host = host(metadata(mac("AA:BB:CC:DD:EE:FF"), os("debian-12")),
               network(ip(10,0,0,6), vlan(20))),
   in_subnet_24(ip(10,0,0,6), ip(10,0,0,0)).
true.
% No parsing. No bit arithmetic. No CIDR library. Pure structural matching.

?- Host = host(metadata(mac("AA:BB:CC:DD:EE:FF"), os("debian-12")),
               network(ip(10,0,0,6), vlan(20))),
   in_subnet_24(ip(10,0,0,6), ip(192,168,1,0)).
false.
% Structural mismatch on first octet — fails immediately without inspecting remaining octets

The fail-fast behaviour on subnet mismatch is not a design decision in the code — it is the WAM's standard left-to-right argument unification. When ip(10,0,0,6) is unified against ip(192,168,1,0), the first argument unification 10 = 192 fails immediately. The WAM abandons the traversal. No further arguments are examined. This is faster than any bitmask operation on string-parsed IP addresses.

Performance Optimisation — JIT Multi-Argument Indexing: SWI-Prolog's Just-In-Time (JIT) indexer, enabled by default, analyses the argument patterns of clause heads as they are called and builds hash-based indexes automatically. For the host_record/1 predicate, if queries consistently pattern-match on the first and second octets of the ip/4 term, SWI-Prolog will construct a multi-argument index on those positions after the first few calls.

The practical consequence: once the JIT index is built, hosts_on_vlan(20, _, _) does not iterate the full host_record/1 clause database. It performs a hash lookup on the vlan/1 argument and retrieves only matching clauses. Subnet queries that consistently instantiate ip(10,0,_,_) receive a two-level hash index on octets one and two. For a topology database with 100,000 host records, this is the difference between O(n) linear scan and O(1) hash retrieval.

% The JIT indexer responds to calling patterns.
% This predicate structure encourages optimal indexing:
% — First argument of host_record/1 is always a host/2 compound
% — Consistently querying with vlan/1 instantiated builds a VLAN index
% — Consistently querying with ip/4 partially instantiated builds a subnet index

% Verify JIT indexing is active (it is by default in SWI-Prolog 10.x)
?- current_prolog_flag(optimise, X).
X = true.  % Optimisation (including JIT indexing) is enabled

% Inspect the clause index built for host_record/1 after warming up the predicate
% (requires the predicate to have been called at least once)
?- predicate_property(host_record(_), indexed(Index)).
Index = [1].  % Single-argument index on arg 1 initially
% After repeated calls with instantiated vlan/1, this may become:
% Index = [1, 2-2]  (multi-argument index including the nested vlan argument)

There is no code to write to enable this. The JIT indexer operates on observed calling patterns without programmer intervention. The engineering discipline is to keep clause heads structurally consistent — the same functor in the same argument position — so the indexer can build a stable, useful index. Mixing host_record(host(...)) and host_record(node(...)) in the same predicate defeats indexing.


2.5 Debugging the Binding Lifecycle

2.5.1 trace/0: The WAM's Execution Microscope

trace/0 enables SWI-Prolog's built-in four-port debugger, named after the four events in a predicate's lifecycle in the WAM:

Port Event Meaning
Call Predicate is being entered for the first time The WAM is attempting to prove this goal
Exit Predicate succeeded A proof was found; bindings are committed
Redo Predicate is being re-entered after backtracking The WAM is trying the next clause
Fail Predicate failed (no more clauses) No proof found; Trail will be unwound
?- trace.
true.

[trace]  ?- host_vlan(host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
                            network(ip("10.0.0.5"), vlan(20))),
                       VID).
   Call: (11) host_vlan(host(metadata(mac("00:11:22:33:44:55"),os("linux-mint-22")),
                              network(ip("10.0.0.5"),vlan(20))),_G345)
   Call: (12) host(metadata(mac("00:11:22:33:44:55"),os("linux-mint-22")),
                    network(ip("10.0.0.5"),vlan(20))) = host(_,network(_,vlan(_G345)))
   Exit: (12) host(metadata(mac("00:11:22:33:44:55"),os("linux-mint-22")),
                    network(ip("10.0.0.5"),vlan(20))) = host(_A,network(_B,vlan(20)))
   Exit: (11) host_vlan(host(metadata(mac("00:11:22:33:44:55"),os("linux-mint-22")),
                              network(ip("10.0.0.5"),vlan(20))),20)
VID = 20.

Reading the trace output:

  • Line 1 (Call 11): The host_vlan/2 predicate is called. The first argument is fully ground. The second (_G345) is unbound — the internal WAM name for VID before it is bound.
  • Line 2 (Call 12): The body of host_vlan is called: the unification =. Both terms are shown as they enter the unification engine.
  • Line 3 (Exit 12): The unification succeeded. Note that _A and _B (the anonymous _ variables) received bindings but they are discarded; vlan(20) shows the binding of VID to 20.
  • Line 4 (Exit 11): host_vlan exits successfully with VID = 20.

The four-port model makes the variable lifecycle visible: you can watch _G345 transition from unbound (Call) to bound (Exit) at the exact WAM instruction that caused the binding.

2.5.2 debug/3: Selective Observability for Production Logic Nodes

trace/0 is the correct tool during development. It halts the engine at every WAM port and requires manual stepping. On a production Logic Node processing topology queries continuously, trace/0 is not an option — it would freeze the engine on the first call.

debug/3 from library(debug) provides selective, non-halting observability. It emits log messages for specific named topics without interrupting execution. The engine continues running; the debug output is a side channel.

% Load the debug library (include in topology.pl)
:- use_module(library(debug)).

% Instrument host_vlan/2 with topic-specific debug messages
host_vlan_instrumented(HostRecord, VID) :-
    debug(topology, "Attempting VLAN extraction from: ~w", [HostRecord]),
    HostRecord = host(_, network(_, vlan(VID))),
    debug(topology, "Extracted VLAN ~w from host record", [VID]).

% For failures, instrument the calling predicate
hosts_on_vlan_instrumented(VLAN, MAC, IP) :-
    debug(topology, "Searching for hosts on VLAN ~w", [VLAN]),
    host_record(Host),
    debug(topology, "Attempting unification of: ~w", [Host]),
    ( Host = host(metadata(mac(MAC), _), network(ip(IP), vlan(VLAN)))
    -> debug(topology, "Matched: MAC=~w IP=~w VLAN=~w", [MAC, IP, VLAN])
    ;  debug(topology, "No match for VLAN ~w in: ~w", [VLAN, Host]),
       fail
    ).

Enable and disable the topology debug topic at runtime:

% Enable topology debug output to the console
?- debug(topology).

% Now run the instrumented query
?- hosts_on_vlan_instrumented(20, MAC, IP).
% DEBUG: topology: Searching for hosts on VLAN 20
% DEBUG: topology: Attempting unification of: host(metadata(...),network(...))
% DEBUG: topology: Matched: MAC=00:11:22:33:44:55 IP=10.0.0.5 VLAN=20
MAC = "00:11:22:33:44:55",
IP = "10.0.0.5" ;
% DEBUG: topology: Attempting unification of: host(metadata(...),network(...))
% DEBUG: topology: Matched: MAC=AA:BB:CC:DD:EE:FF IP=10.0.0.6 VLAN=20
MAC = "AA:BB:CC:DD:EE:FF",
IP = "10.0.0.6".

% Disable — no performance cost when topic is off
?- nodebug(topology).

The key property: when nodebug(topology) is active, the debug/3 calls compile to no-ops. There is no string formatting, no I/O, no overhead. The instrumentation exists in the source but is invisible at runtime until activated. This is the correct observability model for a Sovereign Node: always instrumented, selectively observable, zero cost when silent.

Multiple topics can be active simultaneously. A mature Logic Node deployment uses a topic hierarchy:

% Topic conventions for the Sovereign Node
% debug(topology, ...)    — fact ingestion and structural unification
% debug(security, ...)    — access control decisions and UWOC results
% debug(perf, ...)        — JIT index hits/misses, Trail depth anomalies
% debug(ingestion, ...)   — external input parsing and validation
% Enable all topics for full-system diagnosis
?- debug(_).

% Enable only security-relevant events
?- debug(security).

% Redirect debug output to a file for log aggregation
?- set_prolog_flag(debug_on_error, true),
   protocol('/var/log/logic-node/debug.log').

2.5.3 Observing a Unification Failure in the Trace

[trace]  ?- host_vlan(host(metadata(mac("00:11:22:33:44:55"), os("linux-mint-22")),
                            network(ip("10.0.0.5"), vlan(20))),
                       30).  % <-- asking for VID = 30, but actual VLAN is 20
   Call: (11) host_vlan(host(...), 30)
   Call: (12) host(...) = host(_, network(_, vlan(30)))
   Fail: (12) host(...) = host(_, network(_, vlan(30)))
   Fail: (11) host_vlan(host(...), 30)
false.

The Fail port on line 3 shows exactly where the unification engine diverged: vlan(20) failed to unify with vlan(30). The argument to vlan/1 — integer 20 — did not unify with integer 30. The failure propagated up through the Call stack. No partial bindings were committed.

Disable trace when done:

[trace]  ?- notrace.
true.
?-

2.5.4 gtrace/0: The Graphical Debugger

For engineers running SWI-Prolog on the Logic Node with the local Cinnamon desktop (via Proxmox console), gtrace/0 opens the graphical XPCE-based debugger. It presents the same four-port model with a visual call stack, variable binding panel, and step controls.

?- gtrace.
% Opens the GUI debugger window
% Then submit your query at the prompt as usual
[gtrace]  ?- hosts_on_vlan(20, MAC, IP).
% The GUI shows each execution step with the current variable bindings
% in a side panel — useful for complex multi-clause queries

gtrace/0 is not available in headless sessions. It requires the X11 display. On the Logic Node, use it at the Proxmox console during development. In automated pipeline verification (Chapter 6), trace/0 output is captured to a file for post-hoc analysis.

2.5.5 Monitoring the Trail: Binding Debt

The Trail records every variable binding since the last choice point. In long-running queries with many choice points, the Trail accumulates "binding debt" — bindings that have been made but not yet confirmed as permanent (because an active choice point exists that could trigger backtracking and undo them).

Monitor Trail depth during complex operations:

% Query trail statistics
?- statistics(trail, [Used, Available]).
Used = 0,
Available = 1048576.
% At query start: no bindings on trail, 1M slots available

% During a complex query, check mid-execution
complex_query_with_trail_monitor :-
    statistics(trail, [Before, _]),
    format("Trail depth before: ~w~n", [Before]),
    hosts_on_vlan(20, MAC, IP),
    statistics(trail, [After, _]),
    format("Trail depth after finding ~w/~w: ~w~n", [MAC, IP, After]).
?- complex_query_with_trail_monitor.
Trail depth before: 0
Trail depth after finding 00:11:22:33:44:55/10.0.0.5: 4
MAC = "00:11:22:33:44:55",
IP = "10.0.0.5" ;
Trail depth after finding AA:BB:CC:DD:EE:FF/10.0.0.6: 4
MAC = "AA:BB:CC:DD:EE:FF",
IP = "10.0.0.6" ;
false.

The Trail depth of 4 reflects the four variable bindings made by the host/2 unification in hosts_on_vlan/3. Each call to host_record/1 creates a choice point; the Trail is unwound back to that choice point's mark between solutions.

2.5.6 The Variable Lifecycle: A Complete Picture

The full lifecycle of a Prolog variable within a proof, made concrete:

% File: /opt/logic-node/lifecycle_demo.pl

demonstrate_lifecycle :-
    % PHASE 1: UNBOUND
    % X is a fresh heap cell: [REF → self]
    format("Phase 1 — X is unbound: ~w~n", [X]),  % prints: _G123 (WAM internal name)
    ( var(X) -> format("  var(X) = true~n") ; true ),

    % PHASE 2: BOUND (via unification)
    X = topology_record,
    % WAM: heap cell for X is now [REF → atom(topology_record)]
    % Trail: address of X's cell is pushed
    format("Phase 2 — X is bound: ~w~n", [X]),
    ( nonvar(X) -> format("  nonvar(X) = true~n") ; true ),

    % PHASE 3: SINGLE-ASSIGNMENT ENFORCEMENT
    % Attempt to rebind X — will fail
    format("Phase 3 — attempting rebind~n"),
    ( X = different_value
    -> format("  ERROR: rebind succeeded (should not happen)~n")
    ;  format("  CORRECT: rebind failed — single assignment enforced~n")
    ),

    % PHASE 4: CONSISTENT RE-UNIFICATION (same value)
    ( X = topology_record
    -> format("Phase 4 — re-unification with same value: PASS~n")
    ;  format("Phase 4 — FAIL~n")
    ).
?- demonstrate_lifecycle.
Phase 1 — X is unbound: _G234
  var(X) = true
Phase 2 — X is bound: topology_record
  nonvar(X) = true
Phase 3 — attempting rebind
  CORRECT: rebind failed — single assignment enforced
Phase 4 — re-unification with same value: PASS
true.

Phase 5 (BACKTRACK — Unbound) requires a choice point and is demonstrated with backtracking:

% Demonstrating Trail unwind — variable reverts to unbound after backtracking
?- ( X = first_binding,
     format("Inside branch: X = ~w, nonvar(X) = ~w~n", [X, true]),
     fail  % Force backtracking
   ; format("After backtrack: X is var = ~w~n", [var(X)])
   ).
Inside branch: X = first_binding, nonvar(X) = true
After backtrack: X is var = true
true.

The binding X = first_binding was made, recorded on the Trail, and then fully reversed when fail triggered backtracking. In the second clause of the disjunction, X is unbound again. The Trail unwind is invisible to the programmer — it is the WAM's automatic responsibility — but the effect is observable via var/1.


Outcome: From Assignment to Binding

2.6.1 The Conceptual Transition

The shift from assignment to binding is not cosmetic. It changes the structure of the code you write, the errors you make, and the security properties you get by default.

Property Imperative Assignment Prolog Binding
Direction Left-to-right: lhs ← rhs Bidirectional: both sides constrained
Mutability Mutable: can be overwritten Write-once per branch: overwrite = failure
Failure mode Silent: wrong value accepted Loud: inconsistency = immediate proof failure
Partial application Not applicable Natural: partially-bound terms propagate constraints
Rollback Manual (transactions, locks) Automatic (Trail unwind on backtrack)
Structural validation Separate step (schema, validator) Intrinsic: structure is the schema
Audit Requires logging infrastructure Intrinsic: trace/0 reconstructs full derivation

The engineer who has completed this chapter should find the following mental model natural:

A Prolog variable is not a storage location. It is an unknown in an equation. The proof system's job is to find values for all unknowns that satisfy all equations simultaneously. When it cannot, the equations are inconsistent, and that inconsistency is reported immediately.

2.6.2 Verification Checklist

Before proceeding to Chapter 3, verify the following in the REPL.

Unification mechanics:

% 1. Bidirectional binding
?- f(X, b) = f(a, Y), X == a, Y == b.
true.

% 2. Single-assignment enforcement
?- X = 1, X = 2.
false.

% 3. Structural extraction
?- host(metadata(mac(M), _), network(ip(I), _)) =
   host(metadata(mac("AA:BB:CC:DD:EE:FF"), os("debian")), network(ip("10.0.0.1"), vlan(10))).
M = "AA:BB:CC:DD:EE:FF",
I = "10.0.0.1".

WAM term representation:

% 4. Functor/arity inspection
?- functor(host(a, b), F, A).
F = host, A = 2.

% 5. Variable detection
?- X = f(Y), var(Y), nonvar(X).
true.

% 6. Compound detection
?- compound(ip(10,0,0,1)).
true.

Occurs check:

% 7. UWOC prevents cycles
?- unify_with_occurs_check(X, f(X)).
false.

% 8. Acyclicity test
?- acyclic_term(host(metadata(mac("AA"), os("linux")), network(ip(10,0,0,1), vlan(20)))).
true.

Lifecycle and debugging:

% 9. Trail — variable reverts after backtrack
?- ( X = bound_value, fail ; var(X) ).
true.

% 10. Statistics — trail depth tracking
?- statistics(trail, [Used, _]), Used =:= 0.
true.  % At top level, trail is empty

All ten checks passing confirms the Logic Node's reasoning substrate is sound for Chapter 3.

Exercises

Exercise 2.1 — Unification Outcomes Without running the queries, predict whether each of the following will succeed or fail, and if it succeeds, what bindings will result. Then verify in the REPL.

a) f(X, g(Y)) = f(g(a), Y)
b) [H|T] = [1, 2, 3]
c) point(X, X) = point(Y, 3)
d) host(A, B) = host(B, A)
e) foo(X, Y) = foo(Y, X)

Exercise 2.2 — Heap Cell Tracing For the term network(ip(10,0,0,5), vlan(20)), draw the WAM Heap cell layout (using the format from Section 2.2.2). Identify: the functor cells, the argument pointer cells, the integer cells, and the total number of heap cells consumed by this term.

Exercise 2.3 — Occurs Check Boundary Write a predicate safe_unify/2 that wraps =/2 with an acyclic_term/1 check on both arguments before unification. The predicate should succeed for normal terms and throw a structured error error(cyclic_term(offending_term), safe_unify/2) if either argument is cyclic. Test it with both ground terms and a cyclic term constructed with X = f(X).

Exercise 2.4 — Infrastructure Topology Queries Using the host_record/1 facts from Section 2.4.3, write predicates for the following queries:

  • cross_vlan_pair/4: find pairs of hosts (MAC1, IP1, MAC2, IP2) that are on different VLANs
  • os_vlan_consistency/2: given an OS name, verify all hosts with that OS are on the same VLAN (succeed if consistent, fail if not)
  • subnet_peers/3: given an IP, find all other hosts on the same /24 subnet

Exercise 2.5 — Trail Depth Analysis Add statistics(trail, ...) calls to the hosts_on_vlan/3 predicate from Section 2.4.3 to log the trail depth before and after each host_record/1 lookup. Run a query that retrieves all hosts. Explain the pattern of trail depth changes you observe, relating each change to a specific WAM event (binding, choice point creation, backtrack, Trail unwind).


Further Reading

  • Ait-Kaci, H. (1991). Warren's Abstract Machine: A Tutorial Reconstruction. MIT Press. — The definitive WAM exposition. Chapter 2 (unification) maps directly to this chapter.
  • Martelli, A. & Montanari, U. (1982). An Efficient Unification Algorithm. ACM Transactions on Programming Languages and Systems. — The theoretical basis of efficient unification without occurs check.
  • Plaisted, D.A. (1984). The Occur-Check Problem. Journal of Logic Programming. — Formal analysis of occurs-check omission and its consequences.
  • SWI-Prolog Manual: unify_with_occurs_check/2https://www.swi-prolog.org/pldoc/man?predicate=unify_with_occurs_check/2
  • SWI-Prolog Manual: Rational Trees and Cyclic Terms — https://www.swi-prolog.org/pldoc/man?section=cyclic
  • SWI-Prolog Manual: acyclic_term/1https://www.swi-prolog.org/pldoc/man?predicate=acyclic_term/1