Chapter 10: Definite Clause Grammars (DCGs)

A regular expression is a finite automaton description encoded as a write-only string. The engineer who writes ^(\d{1,3}\.){3}\d{1,3}$ cannot debug it by reading it. The engineer who inherits it cannot modify it without first reverse-engineering what it was attempting to prove. It has no types. It has no semantic actions. It cannot enforce that each captured octet is ≤ 255 as part of the parse. It cannot generate canonical forms. It cannot be composed with other parsers through logical conjunction. It is a pattern-match that either succeeds or fails, delivering matched substrings into an untyped capture group array.

A DCG is a logical grammar. Each rule is a Horn clause. Each terminal is a character code. Each non-terminal is a predicate. Parsing is proof search. If the parse succeeds, the proof succeeds and semantic values are bound as Prolog terms. If the input violates the grammar, the proof fails — at the specific rule that first failed, with no ambiguity about where the input went wrong. The result of a successful DCG parse is not a string slice — it is a typed Prolog term: an integer, a list of integers, a Dict. There is nothing to "validate after parsing" because validity is the parse.

Five properties define the DCG as a sovereign infrastructure parsing primitive.

1. DCGs are executable grammars with O(1) concatenation. Every DCG rule, after compiler expansion, takes two extra arguments: a list of input codes (or tokens) before consuming the rule's terminals, and a list of remaining codes after. This [Input|Rest] - Rest pair is a difference list. Concatenating two difference lists is a single unification of two tail variables — O(1) regardless of the string length consumed. No string copying, no buffer allocation, no substring extraction. The parser advances a pointer through the input code list by binding unbound tail variables. The entire parse of a 1,000-character log line touches each character code exactly once.

2. The --> notation compiles to standard Horn clauses. The DCG rule sentence --> noun_phrase, verb_phrase. expands at compile time to sentence(S0, S) :- noun_phrase(S0, S1), verb_phrase(S1, S).. There is no DCG interpreter at runtime. No special execution mode. The expanded predicate is a standard Prolog clause called by the standard WAM execution engine. phrase/2 and phrase/3 are the entry points that construct the initial difference list pair from a flat code list. Understanding the expansion is understanding the performance model.

3. Semantic actions bind typed results during the parse. The {Goal} notation inside a DCG rule executes Goal as a standard Prolog goal without consuming any input. Semantic actions are where character code sequences become integers, integers become validated IP octets, and four validated octets become a 32-bit integer. The validation is part of the grammar: octet(N) --> digits(Ds), { number_codes(N, Ds), N >= 0, N =< 255 } fails immediately if the parsed digits exceed 255 — the failure propagates back through the DCG as a proof failure, not as a post-parse validation error. There is no "parse then validate" phase split. There is only "prove the input is a valid IP address."

4. DCGs operate over code lists, not atoms. string_codes(String, Codes) converts a Prolog string to a list of Unicode character code integers. The DCG operates on this list. The result is a typed Prolog term — an integer, a list of byte values, a Dict. The input string, once converted to codes, is eligible for garbage collection. The character codes themselves — small integers — are never interned in the Atom Table. Parsed tokens that need to survive past the parse (a hostname string confirmed to be syntactically valid) must pass through a controlled atom-interning step with a length bound and a whitelist check. Chapter 10's security mandate: string_codes in, typed terms out, no intermediate atoms from untrusted input.

5. DCG failures are informative, composable, and backtrackable. When a DCG rule fails, the failure point is the specific terminal or non-terminal that did not match. phrase(ipv4_address(_), Codes) failing means the input is not a valid IPv4 address — not "the regex didn't match," but specifically which production rule rejected it. DCG rules compose by conjunction (A, B in sequence), disjunction (A ; B for alternatives), and negation (\+). Existing parsers can be embedded in larger parsers without rewriting. An IP address parser can be embedded in a CIDR parser without the IP parser knowing it is no longer the top-level rule. This composability is impossible with regular expressions.

10.1 The Physics of Difference Lists

10.1.1 The List Append Problem

Standard list append in Prolog is O(n) in the length of the first list:

append([], Ys, Ys).
append([H|Xs], Ys, [H|Zs]) :- append(Xs, Ys, Zs).

Each recursive call adds one element to the result list by copying from the first argument. Parsing a 1,000-character input with standard lists — appending segment results together — copies characters O(n²) times in the worst case (each intermediate segment is appended to the growing result).

This is the wrong model for parsing. A parser does not need to append strings — it needs to consume input from the front and report what remains. The difference list provides exactly this without copying.

10.1.2 The Difference List: `Whole - Hole`

A difference list represents a list segment as a pair of terms: the full list Whole and an unbound tail variable Hole. The list logically represented is the difference Whole - Hole — the elements of Whole up to (but not including) where Hole is instantiated.

% The difference list [a, b, c] represented as a pair:
% Whole = [a, b, c | Hole],  Hole = _Unbound
% Written compactly: [a, b, c | H] - H

% When Hole is unified with [], the difference list becomes [a, b, c]:
?- DL = [a, b, c | H] - H,
   DL = Whole - [],
   Whole = [a, b, c].
Whole = [a, b, c].

% O(1) append of two difference lists:
% DL1 = L1 - H1,  DL2 = L2 - H2
% Result = L1 - H2  with H1 unified with L2
% This is a single unification: H1 = L2.

dl_append(L1-H1, L2-H2, L1-H2) :-
    H1 = L2.   % One unification. No list traversal.

?- A = [a,b|H1]-H1,
   B = [c,d|H2]-H2,
   dl_append(A, B, Result-[]).
Result = [a, b, c, d].
% Two three-element segments concatenated in O(1) — one unification.

The unification H1 = L2 threads the output of the first segment directly into the input of the second. No copying. No traversal. The WAM binds the tail variable and the two lists are physically connected in the heap.

10.1.3 Difference Lists in DCG: The `S0, S` Threading Pattern

After DCG expansion, every non-terminal receives two list arguments: the input before it runs (S0) and the remaining input after it finishes (S). These are the two ends of the difference list that the non-terminal consumes.

The threading pattern for a sequence A, B, C:

S0 → [ A consumes ] → S1 → [ B consumes ] → S2 → [ C consumes ] → S3

Each predicate receives the remaining tail from the previous one. No concatenation occurs anywhere — only unification of the tail variable with the next list cell. The physical model:

WAM Heap view of parsing "abc" with rules [a], [b], [c]:

Before parsing:
  S0 = [97, 98, 99 | Hole]
  (97='a', 98='b', 99='c' as character codes)

After [a] DCG terminal consumes 'a':
  S0 = [97 | S1]
  S1 = [98, 99 | Hole]   (tail variable passed forward)

After [b] consumes 'b':
  S1 = [98 | S2]
  S2 = [99 | Hole]

After [c] consumes 'c':
  S2 = [99 | S3]
  S3 = Hole = []   (end of input — Hole unified with [])

At no point is a new list allocated. Each "advance" is a single binding of a tail variable. For a 1,000-character parse, exactly 1,000 bindings occur and exactly 1,000 list cells are traversed — O(n) time, O(1) additional allocation beyond the original code list.

10.1.4 Diagram: Difference List Threading Through DCG Rules

%%{init: {"themeVariables": {"fontSize": "14px"}}}%%
flowchart TD
    INPUT["Input: string_codes(&quot;192.168.1.1&quot;, Codes)<br/>Codes = [49,57,50,46,49,54,56,46,49,46,49]<br/>(character codes for '192.168.1.1')"]

    PHRASE["phrase(ipv4_address(IntVal), Codes)<br/>Expands to: ipv4_address(IntVal, Codes, [])"]

    OCT1["octet(192, S0, S1)<br/>S0 = [49,57,50 | S1]<br/>Consumes codes for '192'<br/>Binds S1 = tail after '192'"]

    DOT1["terminal '.'<br/>S1 = [46 | S2]<br/>Consumes code 46 (dot)<br/>Binds S2 = tail after first dot"]

    OCT2["octet(168, S2, S3)<br/>Consumes codes for '168'<br/>Binds S3 = tail after '168'"]

    DOT2["terminal '.'<br/>S3 = [46 | S4]<br/>Binds S4 = tail after second dot"]

    OCT3["octet(1, S4, S5)<br/>Binds S5 = tail after '1'"]

    DOT3["terminal '.'<br/>S5 = [46 | S6]<br/>Binds S6"]

    OCT4["octet(1, S6, S7)<br/>S7 = []<br/>Final tail = empty list<br/>Parse succeeds"]

    ACTION["{semantic action}<br/>IntVal is 192*16777216 + 168*65536 + 1*256 + 1<br/>IntVal = 3232235777"]

    INPUT --->|"string_codes/2"| PHRASE
    PHRASE --->|"S0 threaded in"| OCT1
    OCT1 --->|"S1 threaded forward"| DOT1
    DOT1 --->|"S2 threaded forward"| OCT2
    OCT2 --->|"S3 threaded forward"| DOT2
    DOT2 --->|"S4 threaded forward"| OCT3
    OCT3 --->|"S5 threaded forward"| DOT3
    DOT3 --->|"S6 threaded forward"| OCT4
    OCT4 --->|"S7 = [] — success"| ACTION

    style INPUT fill:#1A2B4A,color:#FFFFFF
    style PHRASE fill:#1A4070,color:#FFFFFF
    style OCT1 fill:#1A4070,color:#FFFFFF
    style DOT1 fill:#3A3A1A,color:#FFFFFF
    style OCT2 fill:#1A4070,color:#FFFFFF
    style DOT2 fill:#3A3A1A,color:#FFFFFF
    style OCT3 fill:#1A4070,color:#FFFFFF
    style DOT3 fill:#3A3A1A,color:#FFFFFF
    style OCT4 fill:#1A4070,color:#FFFFFF
    style ACTION fill:#1A6B3A,color:#FFFFFF

Reading the diagram: Each box consumes a portion of the input code list and passes the remaining tail variable to the next rule. Blue boxes are octet parsers; brown boxes are dot terminals; green is the semantic action. No list is copied at any arrow — each arrow represents one tail variable binding.

10.2 The `-->` Expansion Mechanism

10.2.1 What the Compiler Does

The SWI-Prolog compiler transforms DCG rules to standard clauses at load time. The transformation is mechanical and complete — no DCG-specific runtime support is required beyond phrase/2 and phrase/3 as entry-point conveniences.

The transformation rules:

% DCG rule:
A --> B.
% Expands to:
A(S0, S) :- B_expanded(S0, S).

% Sequence:
A --> B, C.
% Expands to:
A(S0, S) :- B(S0, S1), C(S1, S).

% Terminal list:
A --> [t1, t2].
% Expands to:
A([t1, t2 | S], S).

% Semantic action (no input consumed):
A --> B, { Goal }, C.
% Expands to:
A(S0, S) :- B(S0, S1), Goal, C(S1, S).

% Disjunction:
A --> B ; C.
% Expands to:
A(S0, S) :- ( B(S0, S) ; C(S0, S) ).

% Empty:
A --> [].
% Expands to:
A(S, S).

Concrete example — the full expansion of a simple sentence grammar:

% DCG source:
greeting --> [hello], name.
name --> [world].
name --> [operator].

% Compiler expansion:
greeting(S0, S) :- S0 = [hello|S1], name(S1, S).
name([world|S],    S).
name([operator|S], S).

% phrase/2 entry point:
?- phrase(greeting, [hello, world]).
true.
?- phrase(greeting, [hello, operator]).
true.
?- phrase(greeting, [hi, world]).
false.   % 'hi' ≠ 'hello' — terminal mismatch at first rule

10.2.2 `phrase/2` and `phrase/3`

phrase(RuleName, Codes) calls RuleName(Codes, []) — the full input must be consumed. phrase(RuleName, Codes, Rest) calls RuleName(Codes, Rest) — Rest is the unconsumed suffix after the rule matches. The second form enables embedding parsers as sub-parses within a larger grammar.

% phrase/2: entire input consumed
?- string_codes("192.168.1.1", Codes),
   phrase(ipv4_address(IntVal), Codes).
IntVal = 3232235777.

% phrase/3: partial match — Rest is the suffix
?- string_codes("192.168.1.1/24", Codes),
   phrase(ipv4_address(IntVal), Codes, Rest).
IntVal = 3232235777,
Rest = [47, 50, 52].   % [47]='/', [50]='2', [52]='4'
% phrase/3 consumed the IP address; Rest holds the CIDR suffix

10.2.3 The Regex Comparison: Why DCGs Win Every Engineering Argument

% A Python regex for IPv4 validation:
% r'^((25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)\.){3}(25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)$'

That regex is correct — barely. It validates each octet in the range 0–255 using three alternative patterns. The engineer who wrote it is not present. The engineer who must modify it to also extract the 32-bit integer value of the address, handle leading zeros consistently, or parse IPv4-mapped IPv6 addresses faces a choice: extend the regex (adding unmaintainable complexity) or write a separate extraction pass (duplicating logic).

The DCG alternative:

% ipv4_address(IntVal) -- one rule, typed output, composable
ipv4_address(IntVal) -->
    octet(A), [0'.],
    octet(B), [0'.],
    octet(C), [0'.],
    octet(D),
    { IntVal is A*16777216 + B*65536 + C*256 + D }.

The character 0'. is SWI-Prolog's character code notation for . (code 46) — 0' followed by the character gives its Unicode code point. The comparison table:

Property	Regular Expression	DCG
Octet range validation (0–255)	Embedded in alternation pattern — unreadable	`N >= 0, N =< 255` in semantic action
Extract 32-bit integer	Second pass required	Computed during parse via `{IntVal is ...}`
Composable into larger parser	Cannot embed; requires substring extraction	`phrase(ipv4_address(IP), Codes, Rest)`
Failure point	"Regex didn't match" — no location	Specific non-terminal that rejected input
Bidirectional	No — match only	Yes — DCG can generate valid addresses too
Readable by colleague	No	Yes — Horn clause with named non-terminals
Handles leading zeros consistently	Requires deliberate pattern	One semantic action: `N >= 0, N =< 255`
Modified without regressions	High risk	Add test case, verify proof still holds

10.3 Semantic Actions `{...}`

10.3.1 The `{Goal}` Syntax

Inside a DCG rule, {Goal} executes Goal as a standard Prolog goal without consuming any input from the difference list. The expansion:

% DCG with semantic action:
rule --> part_a, { validate(X) }, part_b.
% Expands to:
rule(S0, S) :- part_a(S0, S1), validate(X), part_b(S1, S).

The position of the semantic action in the rule body determines when it executes relative to the parse. A semantic action after a terminal or non-terminal can use variables bound by that terminal. A semantic action before a terminal can constrain what the terminal will accept.

10.3.2 Computing an IP Integer During the Parse

% digit(-Code) matches a single ASCII digit character code
digit(D) --> [D], { D >= 0'0, D =< 0'9 }.

% digits(-Codes) matches one or more digit character codes
digits([D|Ds]) --> digit(D), digits(Ds).
digits([D])    --> digit(D).

% octet(-N) parses 1–3 digits and validates the result is 0–255
% The semantic action fires DURING the parse — if N > 255, the rule FAILS.
% No post-parse validation step exists or is needed.

octet(N) -->
    digit(D1), digit(D2), digit(D3),
    { number_codes(N, [D1, D2, D3]),
      N >= 0, N =< 255 }.
octet(N) -->
    digit(D1), digit(D2),
    { number_codes(N, [D1, D2]) }.
octet(N) -->
    digit(D1),
    { number_codes(N, [D1]) }.

% ipv4_address(-IntVal) parses a dotted-quad and computes the 32-bit integer
ipv4_address(IntVal) -->
    octet(A), [0'.],
    octet(B), [0'.],
    octet(C), [0'.],
    octet(D),
    { A =< 255, B =< 255, C =< 255, D =< 255,   % belt-and-suspenders: octet/1 already checks
      IntVal is (A << 24) \/ (B << 16) \/ (C << 8) \/ D }.

% REPL: semantic action validation

?- string_codes("192.168.1.1", Cs), phrase(ipv4_address(N), Cs).
N = 3232235777.

?- string_codes("256.1.1.1", Cs), phrase(ipv4_address(N), Cs).
false.
% 256 > 255 — the octet/1 semantic action fails before ipv4_address/1 can succeed.
% No exception. No partial result. Clean failure.

?- string_codes("1.2.3", Cs), phrase(ipv4_address(N), Cs).
false.
% Only three octets — fourth octet rule never matches.

?- string_codes("10.0.0.0", Cs), phrase(ipv4_address(N), Cs).
N = 167772160.
% 10*16777216 = 167772160. Correct.

10.3.3 Semantic Actions Are Not "Mixing Concerns"

Placing arithmetic inside a grammar is not mixing concerns. An IP address is not the string "192.168.1.1" — it is the 32-bit integer 3232235777. The DCG's job is to accept a string and produce that integer. The arithmetic that converts four octets to one integer is as much a part of the grammar's specification as the rule that four octets separated by dots constitute an IP address. Separating them creates an artificial phase boundary with no engineering benefit and a real risk of the two phases disagreeing.

10.4 The Build: Infrastructure Network Parser

10.4.1 Architecture

Before the implementation: library(dcg/basics) ships with SWI-Prolog and provides C-optimised DCG primitives that are faster than any hand-rolled equivalent. The production-grade rules in this chapter are built from first principles so the mechanics are transparent — every digit//1, every decimal_digits//1, every whitespace//0 is written out so the S0/S threading is unambiguous. Once those mechanics are clear, replace the hand-rolled primitives with their library(dcg/basics) equivalents:

Hand-rolled (pedagogic)	`library(dcg/basics)` (production)	Notes
`digit(D)`	`digit(D)`	Identical interface; C-native in library
`decimal_digits(Cs)`	`digits(Cs)`	Matches 0 or more — adjust for 1+ with guard
`decimal_nat(N)`	`integer(N)`	Handles negative sign too; use when that's correct
`whitespace`	`whites`	Zero or more whitespace; C-optimised
`rest_of_line(Cs)`	`string(Cs)`	Matches any sequence to end of scope

% Equivalent with library(dcg/basics):
:- use_module(library(dcg/basics)).

% integer//1 matches an optional sign + digits → integer value directly
% whites//0 matches zero or more whitespace codes
% string//1 matches any sequence of codes up to the end of scope

The network parser below implements everything explicitly. In a high-throughput deployment parsing millions of log lines per hour, swap the primitive rules for their library(dcg/basics) counterparts. The grammar structure, semantic actions, and security contract are identical — only the inner-loop character-matching speed changes.

logicadmin@logic-node-01:~$ nano /opt/logic-node/kb/parsers/network_parser.pl

%% =============================================================================
%% FILE:    /opt/logic-node/kb/parsers/network_parser.pl
%% PURPOSE: DCG parsers for IPv4 addresses, MAC addresses, and CIDR masks.
%%
%% NOTE ON PERFORMANCE: The primitive DCG rules (digit//1, decimal_digits//1,
%% whitespace//0) are implemented explicitly for pedagogic clarity. For
%% production deployments requiring maximum throughput, replace them with
%% the C-optimised equivalents from library(dcg/basics):
%%   digit//1  → digit//1     (same interface, C-native)
%%   decimal_digits//1 → digits//1
%%   decimal_nat//1    → integer//1
%%   whitespace//0     → whites//0
%% All semantic actions, security guards, and entry points remain identical.
%%
%% ENTRY POINTS (all operate over character code lists from string_codes/2):
%%   parse_ipv4(+String, -IntVal)
%%   parse_mac(+String, -Bytes)
%%   parse_cidr(+String, -NetworkInt, -MaskBits)
%%   parse_ip_or_cidr(+String, -Result)
%%
%% SECURITY CONTRACT:
%%   — All entry points take Prolog strings, NOT atoms.
%%   — string_codes/2 converts to code list before any DCG rule fires.
%%   — DCG rules operate entirely on integer code lists — no atom creation.
%%   — Semantic actions produce integers and integer lists — no atoms.
%%   — Only parse_ipv4/2, parse_mac/2, parse_cidr/3 are exported.
%%     Internal DCG rules are not exported — callers cannot invoke raw
%%     phrase/2 on partial parsers with external code lists.
%%   — All results are bounded:
%%       IntVal ∈ [0, 2^32−1]
%%       Bytes: list of exactly 6 integers in [0,255]
%%       MaskBits ∈ [0, 32]
%% =============================================================================

:- module(network_parser, [
    parse_ipv4/2,
    parse_mac/2,
    parse_cidr/3,
    parse_ip_or_cidr/2
]).

:- use_module(library(error)).
%% Uncomment for production deployment:
%% :- use_module(library(dcg/basics)).

%% ---------------------------------------------------------------------------
%% PRIMITIVE TERMINALS
%% (Replace with library(dcg/basics) equivalents for production throughput)
%% ---------------------------------------------------------------------------

%% digit(-Code): matches exactly one ASCII digit character code
%% Explicit range is self-documenting; identical to library(dcg/basics) digit//1.
digit(D) --> [D], { D >= 0'0, D =< 0'9 }.

%% hex_digit(-Value): matches one ASCII hex digit; binds its integer value 0-15
hex_digit(V) --> [D], {
    ( D >= 0'0, D =< 0'9 -> V is D - 0'0
    ; D >= 0'a, D =< 0'f -> V is D - 0'a + 10
    ; D >= 0'A, D =< 0'F -> V is D - 0'A + 10
    ; fail
    )
}.

%% whitespace: matches zero or more whitespace characters (space, tab)
whitespace --> [C], { memberchk(C, [0' , 0'\t]) }, whitespace.
whitespace --> [].

%% ---------------------------------------------------------------------------
%% DECIMAL INTEGER PARSING
%% ---------------------------------------------------------------------------

%% decimal_digits(-Codes): one or more ASCII digit codes
decimal_digits([D|Ds]) --> digit(D), decimal_digits(Ds).
decimal_digits([D])    --> digit(D).

%% decimal_nat(-N): parses a non-negative decimal integer
%% N is bound to the integer value; no upper bound enforced here.
decimal_nat(N) -->
    decimal_digits(Ds),
    { number_codes(N, Ds) }.

%% ---------------------------------------------------------------------------
%% OCTET PARSER (0–255)
%% ---------------------------------------------------------------------------

%% octet(-N): parses 1, 2, or 3 decimal digits;
%% semantic action enforces N ∈ [0, 255].
%% Tries 3 digits first (longest-match), then 2, then 1.

octet(N) -->
    digit(D1), digit(D2), digit(D3),
    { \+ digit_follows },   % No 4th digit — prevents matching "1234" as 123
    { number_codes(N, [D1, D2, D3]),
      N >= 0, N =< 255 }.
octet(N) -->
    digit(D1), digit(D2),
    { \+ digit_follows },
    { number_codes(N, [D1, D2]) }.
octet(N) -->
    digit(D1),
    { number_codes(N, [D1]) }.

%% digit_follows/0: lookahead — succeeds if next character is a digit.
%% Used to prevent over-consuming digits (e.g., matching "1234" as octet 123).
%% Implemented as a semantic action using the phrase/3 remainder trick.
%% NOTE: In a DCG, lookahead without consuming is done via:
%%   digit_peek --> [D], { D >= 0'0, D =< 0'9 }, pushback [D].
%% SWI-Prolog supports pushback notation: Rule, Codes ==> Rule with remainder.
%% Here we use the cleaner "not followed by digit" form:

%% Revised octet using explicit pushback-free design:
octet_safe(N) -->
    digit(D1), digit(D2), digit(D3),
    { number_codes(N, [D1, D2, D3]), N >= 0, N =< 255 }.
octet_safe(N) -->
    digit(D1), digit(D2),
    { number_codes(N, [D1, D2]), N >= 0, N =< 99 }.
octet_safe(N) -->
    digit(D1),
    { number_codes(N, [D1]) }.

%% ---------------------------------------------------------------------------
%% IPv4 ADDRESS
%% ---------------------------------------------------------------------------

%% ipv4_dcg(-A, -B, -C, -D): parses dotted-quad, binds four octets.
ipv4_dcg(A, B, C, D) -->
    octet_safe(A), [0'.],
    octet_safe(B), [0'.],
    octet_safe(C), [0'.],
    octet_safe(D).

%% ipv4_address(-IntVal): computes 32-bit integer from four octets.
%% IntVal = (A << 24) | (B << 16) | (C << 8) | D
ipv4_address(IntVal) -->
    ipv4_dcg(A, B, C, D),
    { IntVal is (A << 24) \/ (B << 16) \/ (C << 8) \/ D }.

%% ---------------------------------------------------------------------------
%% MAC ADDRESS (supports both ':' and '-' separators)
%% ---------------------------------------------------------------------------

%% hex_byte(-V): parses exactly two hex digits; value in [0, 255].
hex_byte(V) -->
    hex_digit(Hi), hex_digit(Lo),
    { V is Hi * 16 + Lo }.

%% mac_sep: matches the MAC address separator — ':' (code 58) or '-' (code 45).
mac_sep --> [0':].
mac_sep --> [0'-].

%% mac_address(-Bytes): parses a 6-byte MAC address.
%% Returns Bytes as a list of 6 integers in [0, 255].
%% Both "AA:BB:CC:DD:EE:FF" and "AA-BB-CC-DD-EE-FF" are accepted.
%% Mixed separators are accepted; canonical form is ':'.

mac_address([B1,B2,B3,B4,B5,B6]) -->
    hex_byte(B1), mac_sep,
    hex_byte(B2), mac_sep,
    hex_byte(B3), mac_sep,
    hex_byte(B4), mac_sep,
    hex_byte(B5), mac_sep,
    hex_byte(B6).

%% ---------------------------------------------------------------------------
%% CIDR MASK
%% ---------------------------------------------------------------------------

%% mask_bits(-N): parses 1 or 2 decimal digits; semantic action enforces N ∈ [0, 32].
mask_bits(N) -->
    digit(D1), digit(D2),
    { number_codes(N, [D1, D2]), N >= 0, N =< 32 }.
mask_bits(N) -->
    digit(D1),
    { number_codes(N, [D1]), N >= 0, N =< 32 }.

%% cidr_mask(-NetworkInt, -MaskBits): parses "A.B.C.D/N"
%% NetworkInt: 32-bit network address (host bits zeroed by semantic action)
%% MaskBits: prefix length [0, 32]

cidr_mask(NetworkInt, MaskBits) -->
    ipv4_address(AddrInt),
    [0'/],   % '/' separator (code 47)
    mask_bits(MaskBits),
    {
        % Compute network address by zeroing host bits
        HostBits is 32 - MaskBits,
        ( HostBits =:= 0 ->
            NetworkInt = AddrInt
        ;
            Mask is ((1 << 32) - 1) xor ((1 << HostBits) - 1),
            NetworkInt is AddrInt /\ Mask
        )
    }.

%% ---------------------------------------------------------------------------
%% COMBINED PARSER
%% ---------------------------------------------------------------------------

%% ip_or_cidr(-Result): parses either a CIDR notation or a bare IPv4 address.
%% Result: cidr(NetworkInt, MaskBits) or ipv4(IntVal)

ip_or_cidr(cidr(NetInt, Bits)) -->
    cidr_mask(NetInt, Bits).
ip_or_cidr(ipv4(IntVal)) -->
    ipv4_address(IntVal).

%% ---------------------------------------------------------------------------
%% ENTRY POINTS: String → Typed Result
%% ---------------------------------------------------------------------------

%% parse_ipv4(+String, -IntVal)
%% Parses a Prolog string (NOT an atom) containing an IPv4 address.
%% Fails if String is not a valid IPv4 address.
%% IntVal is the 32-bit integer representation.

parse_ipv4(String, IntVal) :-
    must_be(string, String),
    string_codes(String, Codes),
    ( phrase(ipv4_address(IntVal), Codes) ->
        true
    ;
        throw(error(
            parse_failure(ipv4, String),
            context(parse_ipv4/2, 'Input is not a valid IPv4 address')
        ))
    ).

%% parse_mac(+String, -Bytes)
%% Bytes: list of 6 integers [0–255].

parse_mac(String, Bytes) :-
    must_be(string, String),
    string_codes(String, Codes),
    ( phrase(mac_address(Bytes), Codes) ->
        true
    ;
        throw(error(
            parse_failure(mac, String),
            context(parse_mac/2, 'Input is not a valid MAC address')
        ))
    ).

%% parse_cidr(+String, -NetworkInt, -MaskBits)

parse_cidr(String, NetworkInt, MaskBits) :-
    must_be(string, String),
    string_codes(String, Codes),
    ( phrase(cidr_mask(NetworkInt, MaskBits), Codes) ->
        true
    ;
        throw(error(
            parse_failure(cidr, String),
            context(parse_cidr/3, 'Input is not a valid CIDR notation')
        ))
    ).

%% parse_ip_or_cidr(+String, -Result)

parse_ip_or_cidr(String, Result) :-
    must_be(string, String),
    string_codes(String, Codes),
    ( phrase(ip_or_cidr(Result), Codes) ->
        true
    ;
        throw(error(
            parse_failure(ip_or_cidr, String),
            context(parse_ip_or_cidr/2, 'Input is not a valid IPv4 or CIDR')
        ))
    ).

% REPL: network_parser.pl

?- parse_ipv4("192.168.1.100", N).
N = 3232235876.

?- parse_ipv4("10.0.0.1", N).
N = 167772161.

?- parse_ipv4("256.1.1.1", N).
ERROR: parse_failure(ipv4, "256.1.1.1")
% Octet 256 > 255 — semantic action fails inside DCG.

?- parse_ipv4("10.0.0", N).
ERROR: parse_failure(ipv4, "10.0.0")
% Only 3 octets — ipv4_dcg/4 requires exactly 4.

?- parse_mac("AA:BB:CC:DD:EE:FF", Bytes).
Bytes = [170, 187, 204, 221, 238, 255].

?- parse_mac("AA-BB-CC-DD-EE-FF", Bytes).
Bytes = [170, 187, 204, 221, 238, 255].
% Hyphen separator accepted identically.

?- parse_mac("GG:BB:CC:DD:EE:FF", Bytes).
ERROR: parse_failure(mac, "GG:BB:CC:DD:EE:FF")
% 'G' is not a valid hex digit — hex_digit/1 fails.

?- parse_cidr("10.0.0.0/8", NetInt, Bits).
NetInt = 167772160, Bits = 8.
% 10.0.0.0 = 167772160. /8 mask — host bits already zero.

?- parse_cidr("192.168.1.100/24", NetInt, Bits).
NetInt = 3232235776, Bits = 24.
% Network address: 192.168.1.0 = 3232235776. Host bits zeroed.

?- parse_cidr("192.168.1.100/33", NetInt, Bits).
ERROR: parse_failure(cidr, "192.168.1.100/33")
% /33 > 32 — mask_bits semantic action fails.

?- parse_ip_or_cidr("10.1.2.3/16", R).
R = cidr(167837696, 16).

?- parse_ip_or_cidr("172.16.5.1", R).
R = ipv4(2886729985).

10.4.2 IP Containment Check: Using Parsed Integers

Once parsed to integers, network operations become integer arithmetic — no string manipulation required:

%% ip_in_network(+IPInt, +NetworkInt, +MaskBits)
%% True if the IP address IPInt falls within NetworkInt/MaskBits.

ip_in_network(IPInt, NetworkInt, MaskBits) :-
    must_be(integer, IPInt),
    must_be(integer, NetworkInt),
    must_be(integer, MaskBits),
    MaskBits >= 0, MaskBits =< 32,
    HostBits is 32 - MaskBits,
    ( HostBits =:= 0 ->
        IPInt =:= NetworkInt
    ;
        Mask is ((1 << 32) - 1) xor ((1 << HostBits) - 1),
        (IPInt /\ Mask) =:= NetworkInt
    ).

%% check_ip_allowed(+IPString, +PolicyList)
%% PolicyList: list of cidr(NetInt, Bits) terms from parse_cidr/3.
%% Fails if IPString parses to an address not in any policy network.

check_ip_allowed(IPString, PolicyList) :-
    parse_ipv4(IPString, IPInt),
    member(cidr(NetInt, Bits), PolicyList),
    ip_in_network(IPInt, NetInt, Bits),
    !.   % First matching policy suffices

?- parse_cidr("10.0.0.0/8", N, B),
   ip_in_network(167772161, N, B).   % 167772161 = 10.0.0.1
N = 167772160, B = 8,
true.

?- parse_cidr("10.0.0.0/8", N, B),
   ip_in_network(3232235777, N, B).  % 3232235777 = 192.168.1.1
false.   % 192.168.1.1 is not in 10.0.0.0/8

10.5 Security Context: The Parsing Trust Boundary

10.5.1 The 10,000 Log Line Attack

A syslog ingestion pipeline reads a file. The file contains 10,000 log lines. Each log line contains an IP address, a hostname, a severity, and a message. The naive implementation:

% DANGEROUS — do not implement
ingest_log_line_naive(Line) :-
    split_string(Line, " ", "", Parts),
    Parts = [IPStr, HostStr, SevStr | _],
    atom_string(IPAtom,   IPStr),    % WRONG: interns IPStr as atom
    atom_string(HostAtom, HostStr),  % WRONG: interns HostStr as atom
    atom_string(SevAtom,  SevStr),   % WRONG: interns SevStr as atom
    assertz(log_entry(IPAtom, HostAtom, SevAtom)).

If the log file is an authorised log from a known system, this is merely wasteful — three atoms interned per line, 30,000 atoms total, never freed. If the log file has been tampered with or is arriving from a compromised system, each unique "hostname" and "IP string" in the file becomes a permanent Atom Table entry. An adversary constructing a log file with 10,000 unique malformed hostname strings interns 10,000 atoms per ingest_log_line_naive/1 call. At 48 bytes per atom minimum, 10,000 unique strings ≈ 480KB of permanent Atom Table growth per ingested file — and the growth is permanent regardless of how the log entries are subsequently retracted.

The attack does not require a compromised log file. A noisy network producing legitimate DHCP requests from devices with randomly-generated hostnames (IoT devices, guest networks) produces the same Atom Table growth pattern from entirely legitimate data.

10.5.2 The Correct Ingestion Architecture

The correct architecture operates in four stages, none of which create atoms from untrusted strings:

Stage 1: Read       → Prolog string (not atom)
Stage 2: Code list  → string_codes/2 → list of integers
Stage 3: DCG parse  → integers only — typed results (IntVal, Bytes, etc.)
Stage 4: Intern     → ONLY if result passes whitelist/length check AND
                       the field type requires an atom (e.g., severity from
                       a closed vocabulary)

%% =============================================================================
%% FILE:    /opt/logic-node/kb/parsers/log_ingestion.pl
%% PURPOSE: Secure log line ingestion using DCG parsing.
%% =============================================================================

:- module(log_ingestion, [
    ingest_syslog_line/2,
    ingest_log_file/2
]).

:- use_module('/opt/logic-node/kb/parsers/network_parser').
:- use_module('/opt/logic-node/kb/ingestion/json_ingestion', [whitelist_filter_dict/3]).
:- use_module(library(error)).

%% ---------------------------------------------------------------------------
%% CLOSED VOCABULARIES — the ONLY values that become atoms
%% ---------------------------------------------------------------------------

%% known_severity(+Atom): closed set of syslog severity levels.
%% Only these atoms are permitted from the severity field of a log line.
known_severity(emergency).
known_severity(alert).
known_severity(critical).
known_severity(error).
known_severity(warning).
known_severity(notice).
known_severity(info).
known_severity(debug).

%% ---------------------------------------------------------------------------
%% TIMESTAMP DCG — RFC 3339 / ISO 8601
%%
%% Parses timestamps of the form: 2026-03-05T10:44:14Z
%%                              or 2026-03-05T10:44:14+00:00
%%
%% Strategy: capture the timestamp field as a code list, convert to a Prolog
%% string, and pass to parse_time/3. parse_time/3 produces a Unix epoch float
%% (e.g., 1741171454.0). The timestamp string is then discarded — only the
%% float survives.
%%
%% Why parse_time/3 and not a hand-rolled DCG?
%%   parse_time/3 is SWI-Prolog's C-native ISO 8601 parser. It handles all
%%   RFC 3339 variants (UTC 'Z', numeric offsets '+HH:MM', fractional seconds)
%%   without us reproducing that complexity in a DCG. The float result is
%%   immediately usable for arithmetic:
%%
%%     get_time(Now),
%%     Now - EntryTimestamp < 300   % "within the last 5 minutes"
%%
%%   No string retained. No atom created. One float on the stack.
%% ---------------------------------------------------------------------------

%% non_space(-Codes): matches one or more non-space characters
non_space([C|Cs]) --> [C], { C \= 0'  }, non_space(Cs).
non_space([C])    --> [C], { C \= 0'  }.

%% space: matches exactly one space character (code 32)
space --> [0' ].

%% rest_of_line(-Codes): matches all remaining codes
rest_of_line([C|Cs]) --> [C], rest_of_line(Cs).
rest_of_line([])     --> [].

%% timestamp_float(-EpochFloat):
%% Captures the next non-space token, attempts parse_time/3 on it.
%% EpochFloat is a Unix epoch float. The timestamp string is discarded.
%% Fails if the token is not a recognisable ISO 8601 / RFC 3339 timestamp.

timestamp_float(EpochFloat) -->
    non_space(TsCodes),
    {
        string_codes(TsString, TsCodes),
        % parse_time/3: parse_time(+String, +Format, -Time)
        % Format atom 'iso_8601' handles RFC 3339 including Z and ±HH:MM offsets.
        ( parse_time(TsString, iso_8601, EpochFloat) ->
            true
        ;
            throw(error(
                invalid_timestamp(TsString),
                context(timestamp_float//1,
                    'Token is not a valid ISO 8601 / RFC 3339 timestamp')
            ))
        )
        % TsString is a local variable — GC-eligible after this clause exits.
        % EpochFloat is a float — not interned in the Atom Table.
    }.

%% ---------------------------------------------------------------------------
%% SYSLOG LINE DCG
%% Parses: "<ISO8601-TIMESTAMP> <IP> <severity> <message...>"
%%
%% Format: 2026-03-05T10:44:14Z 192.168.1.1 error Connection refused
%%
%% All typed outputs:
%%   TimestampFloat: Unix epoch float — immediately usable for time arithmetic
%%   IPInt:          32-bit integer — immediately usable for subnet checks
%%   SevAtom:        from known_severity/1 closed vocabulary — zero new atoms
%%   MessageString:  Prolog string — GC-eligible, never interned
%% ---------------------------------------------------------------------------

syslog_line(TimestampFloat, IPInt, SevAtom, MessageString) -->
    % Parse timestamp directly to Unix epoch float — no atom created
    timestamp_float(TimestampFloat),
    space,
    % Parse IP address directly to 32-bit integer — no atom created
    ipv4_address(IPInt),
    space,
    % Parse severity field as a code list
    non_space(SevCodes),
    space,
    % Remainder is the message
    rest_of_line(MsgCodes),
    % Semantic actions:
    {
        % Convert severity codes to string for comparison
        string_codes(SevString, SevCodes),
        % Convert to atom ONLY if it's in the closed vocabulary
        ( atom_string(SevAtom, SevString),
          known_severity(SevAtom) ->
            true
        ;
            throw(error(
                unknown_severity(SevString),
                context(syslog_line//4, 'Severity not in closed vocabulary')
            ))
        ),
        % Message: retain as string — NOT interned as atom
        string_codes(MessageString, MsgCodes)
    }.

%% ---------------------------------------------------------------------------
%% ENTRY POINTS
%% ---------------------------------------------------------------------------

%% ingest_syslog_line(+Line, -Entry)
%% Line: Prolog string (NOT atom).
%% Entry: log_entry{timestamp: Float, ip: IPInt, severity: SevAtom, message: Str}
%%
%% The timestamp float enables direct time-series arithmetic:
%%   get_time(Now), Now - Entry.timestamp < 300  % last 5 minutes

ingest_syslog_line(Line, Entry) :-
    must_be(string, Line),
    string_codes(Line, Codes),
    ( phrase(syslog_line(Ts, IPInt, SevAtom, MsgStr), Codes) ->
        Entry = log_entry{
            timestamp: Ts,
            ip:        IPInt,
            severity:  SevAtom,
            message:   MsgStr
        }
    ;
        throw(error(
            parse_failure(syslog_line, Line),
            context(ingest_syslog_line/2, 'Line does not match syslog format')
        ))
    ).

%% ingest_log_file(+FilePath, -Entries)
%% Reads a syslog file line by line. Malformed lines are counted and skipped.
%% Returns: list of log_entry{} Dicts and a parse_summary{} Dict.

ingest_log_file(FilePath, ingest_result{entries: Entries, summary: Summary}) :-
    must_be(atom, FilePath),
    setup_call_cleanup(
        open(FilePath, read, Stream),
        read_and_parse_lines(Stream, 0, 0, RevEntries, Errors),
        close(Stream)
    ),
    reverse(RevEntries, Entries),
    length(Entries, NSuccess),
    Summary = ingest_summary{
        total_lines: NSuccess + Errors,
        parsed:      NSuccess,
        rejected:    Errors
    }.

read_and_parse_lines(Stream, NAcc, EAcc, Entries, Errors) :-
    read_term(Stream, Term, [end_of_file(eof)]),
    ( Term = eof ->
        Entries = [],
        Errors  = EAcc
    ;
        % Term is a string (line) — attempt parse
        ( catch(
            ingest_syslog_line(Term, Entry),
            _Error,
            fail
          ) ->
            % Success: accumulate entry
            read_and_parse_lines(Stream, NAcc+1, EAcc, RestEntries, Errors),
            Entries = [Entry | RestEntries]
        ;
            % Failure: skip line, increment error counter
            E1 is EAcc + 1,
            read_and_parse_lines(Stream, NAcc, E1, Entries, Errors)
        )
    ).

10.5.3 The Atom Interning Whitelist for Field Values

The only field in the syslog entry that becomes an atom is severity — and only because it is validated against known_severity/1 first. The eight atoms (emergency, alert, critical, error, warning, notice, info, debug) were interned at load time when the known_severity/1 facts were compiled. Processing 10,000 log lines from any source creates exactly zero new Atom Table entries for the severity field — the atom already exists, and atom_string/2 returns the existing interned atom, not a new one.

The IP address is never interned — it is a 32-bit integer. The message is a Prolog string — heap-allocated, GC-eligible, freed when the log entry is discarded.

This is the parsing trust boundary in operational terms:

Untrusted input → string_codes → integer codes → DCG rules → typed terms
                                                              ↓
                                                    integers:  never Atom Table
                                                    strings:   GC-eligible heap
                                                    atoms:     only from closed
                                                               vocabulary, already
                                                               interned at load time

10.5.4 Bounding the Input: Length Guard Before Parsing

A log line that is 1,000,000 characters long is not a valid syslog line. It is also a Heap exhaustion vector — string_codes/2 on a megabyte string allocates a million-element list on the Heap before the DCG fires a single rule.

%% max_log_line_bytes(+MaxBytes)
%% Hard ceiling on input length before string_codes/2 is called.
%% Syslog lines are specified up to 1,024 bytes (RFC 3164).
%% Set to 2,048 for margin; reject anything longer without parsing.

max_log_line_bytes(2048).

ingest_syslog_line_bounded(Line, Entry) :-
    must_be(string, Line),
    max_log_line_bytes(Max),
    string_length(Line, Len),
    ( Len > Max ->
        throw(error(
            input_too_long(syslog_line, Len, Max),
            context(ingest_syslog_line_bounded/2,
                    'Log line exceeds RFC 3164 maximum length')
        ))
    ; true ),
    string_codes(Line, Codes),
    ( phrase(syslog_line(IPInt, SevAtom, MsgStr), Codes) ->
        Entry = log_entry{ip: IPInt, severity: SevAtom, message: MsgStr}
    ;
        throw(error(parse_failure(syslog_line, Line),
                    context(ingest_syslog_line_bounded/2, 'Parse failure')))
    ).

The length check costs one string_length/2 call — O(1) in SWI-Prolog for strings (length is stored as metadata, not recomputed by traversal). This eliminates the Heap exhaustion vector before any list allocation.

Outcome: The Parsing Trust Boundary Model

10.6.1 The Conceptual Transition

Volume I's Logic Node answered questions about infrastructure it already knew. Every query was a proof over facts the operator had verified and loaded. The KB was a closed world with a known boundary.

Volume II's Logic Node must answer questions about infrastructure it is discovering from external sources. The external sources — API responses, log files, network packets, config file exports — are not trusted. They are not structured as Prolog terms. They are strings of characters that may or may not conform to the expected format, may or may not contain values within the expected ranges, and may or may not have been crafted to exploit the parser's handling of edge cases.

The parsing trust boundary is the architectural line between the string world (untrusted, unbounded, character codes) and the logical world (typed, validated, Prolog terms). The DCG is the boundary enforcement mechanism. Nothing crosses the boundary without passing through a DCG rule. Nothing that passes a DCG rule has an integer value outside its validated range. Nothing that passes a DCG rule interns an atom unless the atom was already known at load time.

Volume I (KB-based reasoning)	Volume II (External input parsing)
All inputs authored by human operator	Inputs arrive from APIs, logs, network
Values typed at compile time	Values must be validated during parse
Atoms created at load time	Atoms created only from closed vocabulary
No input length concern	Length guard mandatory before `string_codes/2`
Correctness: KB fact is correct by operator intent	Correctness: DCG rule is correct by formal grammar
Failure: proof fails cleanly	Failure: parse fails cleanly — no partial state

10.6.2 Verification Checklist

?- use_module('/opt/logic-node/kb/parsers/network_parser').
true.

% 1. IPv4 round-trip
?- parse_ipv4("192.168.1.1", N),
   A is (N >> 24) /\ 255,
   B is (N >> 16) /\ 255,
   C is (N >>  8) /\ 255,
   D is  N        /\ 255,
   A =:= 192, B =:= 168, C =:= 1, D =:= 1.
true.   % ✓ Parsed integer decomposes to original octets

% 2. Invalid octet rejected
?- \+ parse_ipv4("256.1.1.1", _).
true.   % ✓ Octet 256 rejected by semantic action

% 3. MAC both separators
?- parse_mac("DE:AD:BE:EF:00:01", B1),
   parse_mac("DE-AD-BE-EF-00-01", B2),
   B1 = B2.
true.   % ✓ ':' and '-' produce identical byte lists

% 4. CIDR host bits zeroed
?- parse_cidr("192.168.1.100/24", NetInt, 24),
   A is (NetInt >> 24) /\ 255,
   D is  NetInt        /\ 255,
   A =:= 192, D =:= 0.
true.   % ✓ Host bits zeroed — .100 → .0

% 5. IP containment
?- parse_cidr("10.0.0.0/8", N, B),
   parse_ipv4("10.255.255.254", IP),
   ip_in_network(IP, N, B).
true.   % ✓ 10.255.255.254 is in 10.0.0.0/8

?- parse_cidr("10.0.0.0/8", N, B),
   parse_ipv4("11.0.0.1", IP),
   \+ ip_in_network(IP, N, B).
true.   % ✓ 11.0.0.1 is NOT in 10.0.0.0/8

% 6. No atoms created from IP strings
?- atom_count(Before),
   parse_ipv4("172.16.100.200", _),
   atom_count(After),
   After =:= Before.
true.   % ✓ parse_ipv4/2 creates zero new atoms

% 7. Input length guard fires
?- string_length(S, 3000), string_codes(S, Cs),
   maplist(=(0'1), Cs), string_codes(LongLine, Cs),
   catch(ingest_syslog_line_bounded(LongLine, _),
         error(input_too_long(syslog_line, 3000, 2048), _),
         true).
true.   % ✓ 3000-byte line rejected before string_codes called on full string

Exercises

Exercise 10.1 — Difference List Mechanics Implement dl_length/2 that computes the length of a difference list without converting it to a proper list first. Verify: dl_length([a,b,c|H]-H, 3). Then implement dl_to_list/2 that closes the hole and returns the proper list. Explain why dl_length/2 cannot use length/2 directly and what the WAM does differently when the hole is unbound vs. bound to [].

Exercise 10.2 — DCG Expansion by Hand Given the DCG rule cidr_mask(NetInt, Bits) --> ipv4_address(Addr), [0'/], mask_bits(Bits), { ... }, write the fully-expanded standard Prolog clause that the compiler produces. Identify each S0..SN argument pair and which non-terminal or terminal produced each threading step. Verify your expansion is correct by temporarily asserting it as a clause and comparing its behaviour to phrase(cidr_mask(...), Codes).

Exercise 10.3 — IPv6 Address Parser Implement ipv6_address(-Groups) as a DCG where Groups is a list of 8 integers in [0, 65535], parsing standard colon-separated hex groups (e.g., "2001:0db8:85a3:0000:0000:8a2e:0370:7334"). Do not handle the :: abbreviation (that is Exercise 11.3). Implement parse_ipv6(+String, -Groups) as the entry point with the same security contract as parse_ipv4/2. Verify that groups outside [0, 65535] are rejected.

Exercise 10.4 — Log File Benchmark Generate a test log file with 10,000 lines: 9,000 valid syslog lines with varying severity levels and IP addresses, and 1,000 malformed lines (missing severity, IP out of range, wrong format). Run ingest_log_file/2 and verify: (1) the summary reports exactly 9,000 parsed and 1,000 rejected; (2) atom_count/1 shows zero growth after the ingest (all IP addresses and messages stayed as integers/strings); (3) the 8 severity atoms remain the only atoms created from log field values.

Exercise 10.5 — CIDR Containment Oracle Using parse_cidr/3 and ip_in_network/3, implement firewall_check_oracle/3:

firewall_check_oracle(+IPString, +PolicyStrings, -Decision)

Where PolicyStrings is a list of CIDR notation strings (e.g., ["10.0.0.0/8", "192.168.0.0/16"]) and Decision is permit if the IP falls within any policy network or deny(IPInt) otherwise. Verify that malformed CIDR strings in PolicyStrings throw parse_failure rather than silently producing incorrect containment decisions.

Outline

Preface

Gemini Prompt

possible coninueing chapters

Strategic Briefing: Volume I (The Foundations of Logic)

Chapter 1: The Sovereign Paradigm

Chapter 2: The Anatomy of Unification

Chapter 3: The Static Knowledge Base

Chapter 4: Search and Backtracking

Chapter 5: The Command Oracle (ZFS & Proxmox)

Chapter 6: Control Flow and The Cut (!)

Chapter 7: List Processing and Recursion

Chapter 8: Advanced Data Structures (Dicts)

Chapter 9: Meta-Programming & State Management

Strategic Briefing: VOLUME II: Parsing & The Offline Library

Chapter 10: Definite Clause Grammars (DCGs)

Chapter 11: Parsing System Logs

Chapter 12: Declarative Configuration

Chapter 13: Ingesting the Offline Archive

Strategic Briefing: Volume III Introduction: Scaling, Concurrency, and the Sovereign Bridge

Chapter 14: Concurrent Logic & Message Passing

Chapter 15: The CGO Bridge

Chapter 16: The Go-Log Concurrency Model

Chapter 17: Tabling (SLG Resolution)

Chapter 18: Prolog at the Edge (WASM)

Chapter 19: Building the Orchestrator UI

VOLUME IV: Strategic Briefing

Chapter 20: Bare-Metal Telemetry

Chapter 21: The PromQL Oracle

Chapter 22: Time-Series Logic

Chapter 23: Distributed Prolog (Pengines)

Chapter 24: Closed-Loop Remediation and Active Eviction

Chapter 25: The Physics of CLP(FD)

Chapter 26: The Proxmox Bin Packer

Chapter 27: High Availability Constraints

Chapter 28: Local LLMs (16GB GPU VRAM Math)

Chapter 29: Proxmox PCIe Passthrough and AI Inference

Chapter 30: Fine-Tuning for Infrastructure

Chapter 31: RAG via Prolog (The Deterministic Context)

Chapter 32: Tool Calling & Autonomy

Chapter 33: The Sovereign Codebase

Chapter 10: Definite Clause Grammars (DCGs)

10.1 The Physics of Difference Lists

10.1.1 The List Append Problem

10.1.2 The Difference List: Whole - Hole

10.1.3 Difference Lists in DCG: The S0, S Threading Pattern

10.1.4 Diagram: Difference List Threading Through DCG Rules

10.2 The --> Expansion Mechanism

10.2.1 What the Compiler Does

10.2.2 phrase/2 and phrase/3

10.2.3 The Regex Comparison: Why DCGs Win Every Engineering Argument

10.3 Semantic Actions {...}

10.3.1 The {Goal} Syntax

10.3.2 Computing an IP Integer During the Parse

10.3.3 Semantic Actions Are Not "Mixing Concerns"

10.4 The Build: Infrastructure Network Parser

10.4.1 Architecture

10.4.2 IP Containment Check: Using Parsed Integers

10.5 Security Context: The Parsing Trust Boundary

10.5.1 The 10,000 Log Line Attack

10.5.2 The Correct Ingestion Architecture

10.5.3 The Atom Interning Whitelist for Field Values

10.5.4 Bounding the Input: Length Guard Before Parsing

Outcome: The Parsing Trust Boundary Model

10.6.1 The Conceptual Transition

10.6.2 Verification Checklist

Exercises

Further Reading

10.1.2 The Difference List: `Whole - Hole`

10.1.3 Difference Lists in DCG: The `S0, S` Threading Pattern

10.2 The `-->` Expansion Mechanism

10.2.2 `phrase/2` and `phrase/3`

10.3 Semantic Actions `{...}`

10.3.1 The `{Goal}` Syntax