Chapter 12: Declarative Configuration
Every configuration management tool in production infrastructure does the same thing: it takes a template, substitutes variables, and writes a string to disk. Ansible's template module renders Jinja2. Puppet uses ERB. Terraform interpolates HCL. The output is always a string. The engineer writing the template is manually constructing the grammar of the target config format in their head — there is no formal model of what constitutes a valid NGINX config, a valid /etc/hosts file, or a valid sshd_config. The tool does not know. It cannot know. It fills in the blanks and writes whatever string results.
When the variable contains a newline, a brace, or a semicolon — because the data came from a database query, an API response, or a parameter file that another tool wrote — the resulting string breaks the target file's grammar. The NGINX process will not reload. The Proxmox host will apply the malformed config, fail to start the affected service, and the engineer will spend an hour in a git diff trying to find which substitution introduced the structural damage.
The alternative is not a more sophisticated templating engine. The alternative is treating configuration files as what they are: programs in a formal grammar. The parser and the generator share the same grammar rules. A value injected into the AST cannot break the grammar because it is never serialised as a raw string — it is constrained to the type the grammar's AST node expects. If the grammar expects a valid IPv4 address at a specific position, the AST mutation predicate requires the value to have already passed through parse_ipv4/2. The generator walks the typed AST and produces text that is, by construction, syntactically valid. No template. No substitution. No opportunity for injection.
This requires a bidirectional grammar: one set of DCG rules that operates in two modes. In parse mode, phrase(config(AST), Codes) with Codes bound consumes the character code list and binds AST to the parsed structure. In generate mode, phrase(config(AST), Codes) with AST bound and Codes unbound enumerates the unique character code list that the grammar produces for that AST. One grammar. Two directions. The guarantee: any AST that passes through the generator was accepted by the parser's structural type system first.
Five properties define the reversible DCG as the correct model for infrastructure configuration management.
1. phrase/2 is bidirectional: generation is parse with input and output swapped.
phrase(rule(AST), Codes) with Codes bound runs the DCG as a parser — consuming codes, binding AST. The same call with AST bound and Codes an unbound variable runs the DCG as a generator — the terminal rules [Code] unify Code with the corresponding element of the difference list, instantiating it rather than consuming it. The difference list becomes an output accumulator. No separate "generate" predicate, no separate grammar, no synchronisation hazard between a parser and a template.
2. Left-recursive rules destroy the generator.
A DCG rule a --> a, b. is well-behaved in parse mode: the WAM attempts to match a against the input, consuming codes until the match fails, then tries alternatives. In generate mode with Codes unbound, this rule calls itself immediately with nothing consumed and no progress toward termination — the search tree is infinitely deep before a single terminal is reached. Left recursion in generate mode is a non-terminating recursion. Every grammar intended for bidirectional use must eliminate left recursion, typically by rewriting left-recursive rules into right-recursive form or by using an iterative accumulator pattern with a separate items//1 that sequences items in one direction only.
3. Comments and whitespace must be first-class AST nodes.
A production config parser that discards whitespace and comments produces a semantically equivalent but textually different regeneration. Parsing a 1,000-line NGINX config, mutating one upstream IP address, and regenerating from a whitespace-free AST produces a git diff showing 1,000 changed lines — every blank line removed, every indent normalised, every comment stripped. The commit history becomes unreadable. The diff-review process breaks. Whitespace and comment preservation is not cosmetic — it is an operational requirement for any config management tool that participates in a version-controlled infrastructure. The AST must contain explicit ws/1 and comment/1 nodes. The generator emits them verbatim.
4. AST mutation is logic, not string manipulation.
mutate_upstream(OldAST, TargetDomain, NewIPInt, NewAST) is a Prolog predicate that transforms one AST into another. It does not produce strings. It does not write files. It does not interpolate variables into templates. It finds the node in OldAST matching TargetDomain, replaces the IP integer value with NewIPInt, and binds NewAST to the result. The new IP was validated by parse_ipv4/2 before being passed to mutate_upstream. The generator will then produce the correct config text from NewAST. At no point does an unvalidated string touch the config file's content.
5. Pre-write verification is a proof, not a lint check.
verify_ha_compliance(AST) succeeds if the proposed AST satisfies the HA invariants for the deployment: at least two upstream servers, a fallback listen directive on 443, a valid ssl_certificate path reference. It fails with a typed error term if any invariant is violated. This check runs against NewAST before the generator writes anything to disk. If verification fails, the disk is untouched. The Logic Node never writes a syntactically valid but operationally invalid config — the kind that passes an NGINX syntax check but causes a silent 502 under load because the upstream pool has only one member.
12.1 The Physics of Reversibility
12.1.1 Bidirectional phrase/2: Parse Mode vs. Generate Mode
The DCG terminal rule [Code] is a list unification. In parse mode:
% Parse mode: Codes is bound
% [Code] expands to: S0 = [Code|S], i.e. consume Code from the front
phrase(rule, [104, 101, 108, 108, 111])
% → rule([104|S1], S) unifies Code=104 with first element, threads S1 forward
In generate mode, Codes is unbound — it is a difference list whose hole has not been filled:
% Generate mode: Codes is unbound (a logic variable)
% [Code] expands to: S0 = [Code|S], i.e. BIND Code into the output list
phrase(rule, Codes)
% → The same unification now instantiates the output list element-by-element
% → Each terminal [C] binds one more element of Codes as the grammar executes
The WAM performs identical operations in both directions — it is the binding state of the arguments that determines which direction information flows.
A simple round-trip demonstration:
% A bidirectional DCG for the two-character token "ok"
ok_token --> [0'o], [0'k].
% Parse mode: codes → success
?- phrase(ok_token, [0'o, 0'k]).
true.
% Generate mode: unbound → codes instantiated
?- phrase(ok_token, Codes).
Codes = [111, 107]. % [0'o, 0'k] = [111, 107]
% Round-trip: parse then generate
?- phrase(ok_token, C1), % generate
phrase(ok_token, C1). % parse — must succeed
true. % Grammar is its own inverse
12.1.2 Left Recursion: Why It Kills the Generator
Consider a naive sequence rule:
% Left-recursive — correct in parse mode, non-terminating in generate mode
items([]) --> [].
items([H|T]) --> items(T), item(H). % LEFT RECURSIVE: items//1 calls itself first
In parse mode, items//1 with a bound input consumes codes from left to right, building the list from right to left as the stack unwinds. The recursion terminates because each recursive call has one fewer input code.
In generate mode with items([a,b,c]), the WAM:
- Tries
items([a,b,c]) --> items(T), item(a)— callsitems(T)first items(T)withTunbound tries the same first clause again — callsitems(T2)first- Infinite descent. The WAM fills the local stack and throws
ERROR: Stack limit exceeded.
The fix is right recursion:
% Right-recursive — correct in BOTH modes
items([]) --> [].
items([H|T]) --> item(H), items(T). % item//1 fires first — makes progress
In generate mode, item(H) produces the codes for H immediately before the recursive call. Each recursive invocation has one fewer element in the list. Termination is guaranteed.
The general rule for bidirectional DCGs: no non-terminal may appear as the leftmost element of its own expansion. This includes indirect left recursion: a --> b, c. where b --> a, d. is also prohibited.
12.1.3 The Three Generation Failure Modes
Beyond left recursion, two additional patterns prevent correct generation:
Semantic actions with uninstantiated variables:
% This works in parse mode; fails in generate mode
value(N) --> digits(Ds), { number_codes(N, Ds) }.
% Generate mode: N is bound, Ds is unbound. number_codes(N, Ds) instantiates Ds. CORRECT.
% BUT: if the DCG attempts digits(Ds) before the semantic action, Ds is still unbound.
% Solution: move the semantic action BEFORE the terminal rule in generate mode.
The solution for bidirectional numeric rules is to make the semantic action unconditionally deterministic in both modes:
% Bidirectional integer rule
integer_codes(N) --> { number_codes(N, Ds) }, Ds.
% In parse mode: Ds is bound by consuming terminals BEFORE this call — WRONG.
% Correct bidirectional approach:
integer_codes(N, S0, S) :-
( integer(N) ->
number_codes(N, Ds), % Generate: N bound, compute Ds, match against input
append(Ds, S, S0)
;
append(Ds, S, S0), % Parse: consume from front
number_codes(N, Ds)
).
Unbound variables in terminal positions:
% This generates infinitely many solutions in generate mode
any_char --> [_]. % _ is anonymous — generates all character codes 0–1114111
Every terminal in a bidirectional grammar must be fully ground when operating in generate mode. Non-terminals that match "any character" are acceptable in parse mode (consume one code) but must be guarded or avoided in generate mode.
Disjunction with symmetric alternatives:
separator --> [0':]. % colon
separator --> [0'-]. % hyphen
In generate mode, phrase(separator, Codes) produces two solutions: Codes = [58] and Codes = [45]. For a canonical generator, the first alternative should be the canonical form — and cuts or if-then-else should enforce it:
% Parse mode: accept both; generate mode: produce canonical form (colon)
separator --> [0':], !. % Canonical form in generate mode
separator --> [0'-]. % Alternative accepted in parse mode only
12.1.4 Diagram: Bidirectional AST Pipeline
%%{init: {"themeVariables": {"fontSize": "14px"}}}%%
flowchart TD
DISK_IN["NGINX Config File\n/etc/nginx/sites-available/app.conf\n(on disk, unchanged)"]
READ["read_file_to_codes/2\nor phrase_from_file/2\n→ Codes = [list of char codes]"]
PARSE["phrase(nginx_config(AST), Codes)\nParse Mode: Codes bound\n→ AST bound to structured term\nws/1, comment/1 nodes preserved"]
AST_OLD["OldAST\nnginx_config{blocks: [\n server_block{...},\n upstream_block{...}\n]}\nAll layout nodes intact"]
MUTATE["mutate_upstream(\n OldAST, 'app.internal',\n NewIPInt, NewAST)\nPure logic — no strings\nNewIPInt from parse_ipv4/2"]
VERIFY["verify_ha_compliance(NewAST)\n→ PASS: ≥2 upstream servers\n→ PASS: listen 443 ssl present\n→ PASS: ssl_certificate path exists\nFail = disk unchanged"]
AST_NEW["NewAST\nIdentical to OldAST\nexcept upstream IP node\nAll ws/1, comment/1 preserved"]
GENERATE["phrase(nginx_config(NewAST), NewCodes)\nGenerate Mode: NewAST bound\n→ NewCodes instantiated\nGrammar guarantees syntax validity"]
DIFF["git diff\nExactly 1 line changed:\n- server 192.168.1.10:8080;\n+ server 10.0.2.15:8080;\nAll whitespace, comments intact"]
DISK_OUT["Write NewCodes to disk\natom_codes(Text, NewCodes)\nwrite(Stream, Text)\nnginx -t && nginx -s reload"]
DISK_IN --->|"file read"| READ
READ --->|"Codes bound"| PARSE
PARSE --->|"AST produced"| AST_OLD
AST_OLD --->|"mutation logic"| MUTATE
MUTATE --->|"NewAST candidate"| VERIFY
VERIFY --->|"proof passes"| AST_NEW
AST_NEW --->|"AST bound"| GENERATE
GENERATE --->|"NewCodes instantiated"| DIFF
DIFF --->|"atomic write"| DISK_OUT
style DISK_IN fill:#1A2B4A,color:#FFFFFF
style READ fill:#1A4070,color:#FFFFFF
style PARSE fill:#1A4070,color:#FFFFFF
style AST_OLD fill:#8B6914,color:#FFFFFF
style MUTATE fill:#1A4070,color:#FFFFFF
style VERIFY fill:#7A1A1A,color:#FFFFFF
style AST_NEW fill:#8B6914,color:#FFFFFF
style GENERATE fill:#1A4070,color:#FFFFFF
style DIFF fill:#1A6B3A,color:#FFFFFF
style DISK_OUT fill:#1A2B4A,color:#FFFFFF
Reading the diagram: Left column is the read-parse path (file → codes → AST). Center is the mutation-verification path (old AST → pure logic → proof). Right column is the generate-write path (new AST → codes → disk). The red verify_ha_compliance node is the gate: failure here terminates the pipeline before a single byte reaches disk. The green git diff node shows the invariant: one mutation produces exactly one changed line.
12.2 The Diff-Friendly AST
12.2.1 Why Whitespace and Comments Are Not Optional
Consider a production NGINX upstream block:
# Primary application cluster — updated 2026-03-01
upstream app_cluster {
# Active nodes
server 192.168.1.10:8080 weight=5;
server 192.168.1.11:8080 weight=5;
# Backup — only used when both active nodes fail
server 192.168.1.99:8080 backup;
keepalive 32;
}
A parser that discards whitespace and comments produces an AST equivalent to:
upstream app_cluster {
server 192.168.1.10:8080 weight=5;
server 192.168.1.11:8080 weight=5;
server 192.168.1.99:8080 backup;
keepalive 32;
}
A generator that regenerates from that whitespace-free AST, even with no mutation applied, produces a git diff showing 10 changed lines: all blank lines removed, all comments stripped, all indentation normalised. The commit message for a one-IP-address change reads "Updated upstream IP" but the diff shows every comment and blank line in the file changed. The commit is unreviewable. The blame history for the comments is destroyed.
12.2.2 AST Layout Node Types
The AST representation for this chapter uses four layout-preserving node types alongside the semantic nodes:
% Layout node types — used by the AST alongside semantic nodes
ws(Codes) % Arbitrary whitespace: spaces, tabs, newlines
% Codes: list of character codes
% Generator: emits Codes verbatim
comment(Codes) % # comment text (to end of line)
% Codes: character codes of comment including '#' and newline
% Generator: emits Codes verbatim
blank_line % A line containing only whitespace (or nothing) followed by \n
% Preserved as a distinct node for readability
% Generator: emits "\n"
raw(Codes) % Unparsed directive — unknown syntax, preserved verbatim
% Used by error recovery: unrecognised directives are kept
% intact so regeneration doesn't destroy unknown config options
A parsed fragment of the upstream block above, represented as an AST:
upstream_block{
name: "app_cluster",
body: [
comment("# Primary application cluster — updated 2026-03-01\n"),
ws([0'\n]), % blank line after header comment
ws([0' ,0' ,0' ,0' ]), % 4-space indent
comment("# Active nodes\n"),
ws([0' ,0' ,0' ,0' ]),
server_entry{
address: 3232235786, % 192.168.1.10 as 32-bit integer
port: 8080,
params: [weight(5)]
},
ws([0'\n, 0' ,0' ,0' ,0' ]),
server_entry{
address: 3232235787, % 192.168.1.11
port: 8080,
params: [weight(5)]
},
blank_line,
ws([0' ,0' ,0' ,0' ]),
comment("# Backup — only used when both active nodes fail\n"),
ws([0' ,0' ,0' ,0' ]),
server_entry{
address: 3232235875, % 192.168.1.99
port: 8080,
params: [backup]
},
blank_line,
ws([0' ,0' ,0' ,0' ]),
directive{name: "keepalive", value: "32"}
]
}.
Mutation replaces exactly one server_entry node. The ws/1, comment/1, and blank_line nodes in the body list are unchanged. The generator emits them verbatim. The git diff shows exactly one changed line.
12.2.3 The Regeneration Invariant
For any config file F that the parser accepts:
% Regeneration invariant: parse then generate returns identical codes
regeneration_invariant(FilePath) :-
read_file_to_codes(FilePath, Codes, [encoding(utf8)]),
phrase(nginx_config(AST), Codes),
phrase(nginx_config(AST), RegeneratedCodes),
( Codes = RegeneratedCodes ->
format("[PASS] ~w: regeneration identical~n", [FilePath])
;
length(Codes, N1), length(RegeneratedCodes, N2),
format("[FAIL] ~w: ~w codes in, ~w codes out~n", [FilePath, N1, N2]),
fail
).
This is the regression test that must pass before any mutation is deployed. A grammar that fails regeneration_invariant/1 on the production config file cannot be trusted for mutation — the "unchanged" lines in the git diff would not actually be unchanged.
12.3 The Build: NGINX Two-Way Parser
12.3.1 Architecture
logicadmin@logic-node-01:~$ nano /opt/logic-node/kb/config/nginx_reversible.pl
%% =============================================================================
%% FILE: /opt/logic-node/kb/config/nginx_reversible.pl
%% PURPOSE: Bidirectional DCG parser/generator for NGINX configuration files.
%%
%% MODES:
%% Parse: phrase(nginx_config(AST), Codes) — Codes bound, AST unbound
%% Generate: phrase(nginx_config(AST), Codes) — AST bound, Codes unbound
%%
%% LAYOUT PRESERVATION:
%% ws/1, comment/1, blank_line nodes are explicit in the AST.
%% Parse → generate with no mutations produces bit-identical output.
%% One AST mutation produces exactly one changed line in `git diff`.
%%
%% SECURITY CONTRACT:
%% — No atoms created from config file content (hostnames, paths, values).
%% — All user-controlled string values stored as Prolog strings.
%% — IP addresses parsed to 32-bit integers via network_parser.pl.
%% — All AST values are typed; the generator never emits an unvalidated string.
%% — No left recursion anywhere in the grammar.
%%
%% EXPORTS:
%% nginx_parse_file/2 — file path → AST
%% nginx_generate/2 — AST → Prolog string
%% nginx_write_file/2 — AST → atomic file write (parse_temp/verify first)
%% =============================================================================
:- module(nginx_reversible, [
nginx_parse_file/2,
nginx_generate/2,
nginx_write_file/2,
nginx_config//1
]).
:- use_module(library(error)).
:- use_module('/opt/logic-node/kb/parsers/network_parser', [parse_ipv4/2]).
%% ---------------------------------------------------------------------------
%% BIDIRECTIONAL PRIMITIVES
%%
%% PERFORMANCE NOTE — library(dcg/high_order):
%% SWI-Prolog's library(dcg/high_order) provides `sequence//2` and
%% `sequence//3`, which are compiler-optimised for greedy matching in both
%% parse and generate directions. The hand-rolled `opt_whitespace//1` and
%% `spaces1//1` below are explicit for pedagogic clarity — they expose the
%% right-recursive structure that makes generation safe.
%%
%% In production, replace them with:
%% :- use_module(library(dcg/high_order)).
%% ws_char --> [C], { memberchk(C, [0' , 0'\t]) }.
%% opt_whitespace --> sequence(ws_char). % 0 or more — less choice points
%% spaces1 --> sequence(ws_char, 1, _). % 1 or more
%%
%% `sequence/2` is internally implemented as a deterministic loop rather
%% than a recursive predicate, so it creates significantly fewer choice
%% points during generation — each failed "one more character" attempt
%% does not leave a dangling choice point on the WAM stack. For a config
%% file with thousands of indent tokens, this reduces the local stack depth
%% noticeably and prevents the per-indent choice-point accumulation that
%% the recursive form produces.
%%
%% The grammar structure, security contract, and AST types are identical
%% whether the recursive or library form is used.
%% ---------------------------------------------------------------------------
%% bidi_codes(+Codes)//: emits or consumes a fixed code list verbatim.
%% In parse mode: matches the exact sequence from the input.
%% In generate mode: unifies Codes into the output difference list.
bidi_codes([]) --> [].
bidi_codes([H|T]) --> [H], bidi_codes(T).
%% bidi_char(+Code)//: single character in both modes.
bidi_char(C) --> [C].
%% opt_whitespace(-Codes)//: zero or more space/tab codes.
%% Right-recursive — safe in generate mode.
%% (Replace with sequence(ws_char) from library(dcg/high_order) for production)
opt_whitespace([C|Cs]) --> [C], { memberchk(C, [0' , 0'\t]) }, opt_whitespace(Cs).
opt_whitespace([]) --> [].
%% newline//: matches/generates exactly one newline (0x0A).
newline --> [0'\n].
%% line_end//: optional spaces then newline.
line_end --> opt_whitespace(_), newline.
%% spaces1(-Codes)//: one or more space/tab codes (for indent preservation).
%% (Replace with sequence(ws_char, 1, _) from library(dcg/high_order) for production)
spaces1([C|Cs]) --> [C], { memberchk(C, [0' , 0'\t]) }, spaces1(Cs).
spaces1([C]) --> [C], { memberchk(C, [0' , 0'\t]) }.
%% ---------------------------------------------------------------------------
%% LAYOUT NODE PARSERS — bidirectional
%% ---------------------------------------------------------------------------
%% layout_node(-Node)//: parses or generates one ws/comment/blank_line node.
% GENERATE MODE: Node is bound. Bypass the parsers and use the helper.
layout_node(Node) -->
{ nonvar(Node) }, !,
{ generate_layout_node(Node, Codes) },
bidi_codes(Codes).
% PARSE MODE: Node is unbound. Consume from input.
layout_node(comment(Codes)) -->
[0'#],
comment_body(BodyCodes),
newline,
{ append([0'# | BodyCodes], [0'\n], Codes) }.
layout_node(blank_line) -->
opt_whitespace(_), newline.
layout_node(ws(Codes)) -->
spaces1(Codes).
%% comment_body(-Codes)//: reads to end of line without consuming the newline.
comment_body([C|Cs]) --> [C], { C \= 0'\n }, comment_body(Cs).
comment_body([]) --> [].
%% layout_nodes(-Nodes)//: zero or more layout nodes.
layout_nodes([N|Ns]) --> layout_node(N), !, layout_nodes(Ns).
layout_nodes([]) --> [].
%% ---------------------------------------------------------------------------
%% LAYOUT NODE GENERATOR
%% The layout nodes must emit their stored codes verbatim in generate mode.
%% ---------------------------------------------------------------------------
%% generate_layout_node(+Node, -Codes)
%% Imperative (non-DCG) helper: extracts codes from a layout node for emission.
generate_layout_node(comment(Codes), Codes).
generate_layout_node(blank_line, [0'\n]).
generate_layout_node(ws(Codes), Codes).
%% ---------------------------------------------------------------------------
%% BIDIRECTIONAL STRING RULES
%% All string values stay as Prolog strings — NEVER atoms.
%% ---------------------------------------------------------------------------
%% string_to_codes(+String, -Codes): deterministic, both directions.
string_to_codes(S, Cs) :-
( string(S) ->
string_codes(S, Cs) % Generate mode: string → codes
;
string_codes(S, Cs) % Parse mode: codes → string (string_codes/2 is bidirectional)
).
%% quoted_string(-Str)//: parses/generates a double-quoted string value.
%% Str is a Prolog string (never an atom).
quoted_string(Str) -->
[0'"],
qstring_body(Codes),
[0'"],
{ string_codes(Str, Codes) }.
qstring_body([C|Cs]) --> [C], { C \= 0'", C \= 0'\n }, qstring_body(Cs).
qstring_body([]) --> [].
%% unquoted_token(-Str)//: non-space, non-brace, non-semicolon token.
%% Str is a Prolog string. Bounded to MaxLen=4096 characters.
unquoted_token(Str) -->
unquoted_codes(Codes),
{ length(Codes, Len), Len =< 4096,
string_codes(Str, Codes) }.
unquoted_codes([C|Cs]) -->
[C],
{ C \= 0' , C \= 0'\t, C \= 0'\n,
C \= 0'{, C \= 0'}, C \= 0'; },
unquoted_codes(Cs).
unquoted_codes([]) --> [].
%% ---------------------------------------------------------------------------
%% IP:PORT PARSING — bidirectional
%% All IPs are 32-bit integers. Ports are integers in [1, 65535].
%% ---------------------------------------------------------------------------
%% ip_port(-IPInt, -Port)//
%% Handles: "192.168.1.10:8080" → IPInt=3232235786, Port=8080
%% Handles: "192.168.1.10" → IPInt=3232235786, Port=80 (default)
ip_port(IPInt, Port) -->
ipv4_literal(IPInt),
[0':],
decimal_int(Port),
{ Port >= 1, Port =< 65535 }.
ip_port(IPInt, 80) -->
ipv4_literal(IPInt).
%% ipv4_literal(-IntVal)//: parses/generates dotted-quad ↔ 32-bit integer.
%% Bidirectional: in generate mode, decomposes IntVal into octets.
ipv4_literal(IntVal) -->
{ integer(IntVal),
A is (IntVal >> 24) /\ 255,
B is (IntVal >> 16) /\ 255,
C is (IntVal >> 8) /\ 255,
D is IntVal /\ 255 },
octet_codes(A), [0'.], octet_codes(B), [0'.],
octet_codes(C), [0'.], octet_codes(D).
ipv4_literal(IntVal) -->
{ var(IntVal) }, % Parse mode: IntVal unbound
octet_codes(A), [0'.], octet_codes(B), [0'.],
octet_codes(C), [0'.], octet_codes(D),
{ IntVal is (A << 24) \/ (B << 16) \/ (C << 8) \/ D }.
%% octet_codes(-N)//: bidirectional decimal integer in [0,255] ↔ char codes.
%% In generate mode: N is bound → produce digit codes.
%% In parse mode: codes bound → consume and compute N.
octet_codes(N) -->
{ integer(N), number_codes(N, Cs), length(Cs, Len), Len =< 3 },
bidi_codes(Cs).
octet_codes(N) -->
{ var(N) },
digit(D1), digit(D2), digit(D3),
{ number_codes(N, [D1, D2, D3]), N =< 255 }.
octet_codes(N) -->
{ var(N) },
digit(D1), digit(D2),
{ number_codes(N, [D1, D2]) }.
octet_codes(N) -->
{ var(N) },
digit(D1),
{ number_codes(N, [D1]) }.
digit(D) --> [D], { D >= 0'0, D =< 0'9 }.
%% decimal_int(-N)//: bidirectional positive integer
decimal_int(N) -->
{ integer(N), N >= 0, number_codes(N, Cs) },
bidi_codes(Cs).
decimal_int(N) -->
{ var(N) },
digit(D),
decimal_int_rest(D, N).
decimal_int_rest(Acc, N) -->
digit(D), !,
{ Acc1 is Acc * 10 + (D - 0'0) },
decimal_int_rest(Acc1, N).
decimal_int_rest(N, N) --> [].
%% ---------------------------------------------------------------------------
%% SERVER PARAMETER PARSING — bidirectional
%% ---------------------------------------------------------------------------
%% server_param(-Param)//: one optional parameter token after the ip:port.
server_param(weight(N)) --> `weight=`, decimal_int(N).
server_param(max_fails(N)) --> `max_fails=`, decimal_int(N).
server_param(fail_timeout(N)) --> `fail_timeout=`, decimal_int(N), `s`.
server_param(backup) --> `backup`.
server_param(down) --> `down`.
server_param(max_conns(N)) --> `max_conns=`, decimal_int(N).
server_params([P|Ps]) -->
[0' ], server_param(P), !, server_params(Ps).
server_params([]) --> [].
%% ---------------------------------------------------------------------------
%% UPSTREAM BLOCK — bidirectional
%% ---------------------------------------------------------------------------
%% upstream_block(-Block)//
%% Handles: upstream NAME { ... }
upstream_block(upstream_block{name: Name, body: Body}) -->
`upstream `,
unquoted_token(Name),
` {`, newline,
upstream_body(Body),
`}`.
upstream_body([N|Ns]) -->
layout_node(N), !,
upstream_body(Ns).
upstream_body([S|Ns]) -->
upstream_server(S), !,
upstream_body(Ns).
upstream_body([D|Ns]) -->
upstream_directive(D), !,
upstream_body(Ns).
upstream_body([]) --> [].
upstream_server(server_entry{address: IPInt, port: Port, params: Params}) -->
opt_whitespace(_),
`server `,
ip_port(IPInt, Port),
server_params(Params),
[0';], newline.
upstream_directive(directive{name: Name, value: Value}) -->
opt_whitespace(_),
unquoted_token(Name),
[0' ],
unquoted_token(Value),
[0';], newline.
%% ---------------------------------------------------------------------------
%% LISTEN DIRECTIVE — bidirectional
%% listen 80;
%% listen 443 ssl;
%% listen 443 ssl http2;
%% ---------------------------------------------------------------------------
listen_directive(listen{port: Port, options: Opts}) -->
`listen `,
decimal_int(Port),
listen_opts(Opts),
[0';].
listen_opts([0' |Rest]) -->
[0' ], listen_opts_body(Rest).
listen_opts([]) --> [].
listen_opts_body(Codes) -->
listen_option_codes(Codes).
listen_option_codes([C|Cs]) --> [C], { C \= 0'; }, listen_option_codes(Cs).
listen_option_codes([]) --> [].
%% ---------------------------------------------------------------------------
%% LOCATION BLOCK — bidirectional
%% ---------------------------------------------------------------------------
location_block(location_block{modifier: Mod, path: Path, body: Body}) -->
`location `,
location_modifier(Mod),
unquoted_token(Path),
[0' , 0'{], newline,
location_body(Body),
`}`.
location_modifier(exact) --> `= `, !.
location_modifier(prefix) --> `^~ `, !.
location_modifier(regex) --> `~ `, !.
location_modifier(none) --> [].
location_body([N|Ns]) -->
layout_node(N), !,
location_body(Ns).
location_body([D|Ns]) -->
generic_directive(D), !,
location_body(Ns).
location_body([]) --> [].
%% ---------------------------------------------------------------------------
%% SERVER BLOCK — bidirectional
%% ---------------------------------------------------------------------------
server_block(server_block{body: Body}) -->
`server {`, newline,
server_body(Body),
`}`.
server_body([N|Ns]) -->
layout_node(N), !,
server_body(Ns).
server_body([L|Ns]) -->
server_listen(L), !,
server_body(Ns).
server_body([Loc|Ns]) -->
location_block(Loc), !,
server_body(Ns).
server_body([D|Ns]) -->
generic_directive(D), !,
server_body(Ns).
server_body([]) --> [].
server_listen(listen(ListenNode)) -->
opt_whitespace(_),
listen_directive(ListenNode),
newline.
%% generic_directive(-D)//: any "key value;" directive not otherwise handled.
%% key and value stored as Prolog strings. raw/1 for unknown syntax.
generic_directive(directive{name: Name, value: Value}) -->
opt_whitespace(_),
unquoted_token(Name),
[0' ],
directive_value(Value),
[0';], newline.
directive_value(V) --> quoted_string(V), !.
directive_value(V) --> unquoted_token(V).
%% ---------------------------------------------------------------------------
%% TOP-LEVEL CONFIG — bidirectional
%% ---------------------------------------------------------------------------
%% nginx_config(-Blocks)//
%% A config file is a sequence of top-level blocks and layout nodes.
nginx_config([]) --> [].
nginx_config([N|Rest]) -->
layout_node(N), !,
nginx_config(Rest).
nginx_config([B|Rest]) -->
upstream_block(B), !,
nginx_config(Rest).
nginx_config([B|Rest]) -->
server_block(B), !,
nginx_config(Rest).
nginx_config([D|Rest]) -->
generic_directive(D), !,
nginx_config(Rest).
nginx_config([raw(Codes)|Rest]) -->
% Error recovery: unrecognised top-level syntax
% Consume to next '}' or ';' and store verbatim
raw_item(Codes), !,
nginx_config(Rest).
raw_item([C|Cs]) --> [C], { C \= 0'\n }, raw_item(Cs).
raw_item([0'\n]) --> [0'\n].
%% ---------------------------------------------------------------------------
%% ENTRY POINTS
%% ---------------------------------------------------------------------------
%% nginx_parse_file(+FilePath, -Blocks)
%% Reads FilePath and parses into AST list.
nginx_parse_file(FilePath, Blocks) :-
must_be(atom, FilePath),
read_file_to_codes(FilePath, Codes, [encoding(utf8)]),
( phrase(nginx_config(Blocks), Codes) ->
true
;
throw(error(
nginx_parse_failure(FilePath),
context(nginx_parse_file/2, 'File does not parse as valid NGINX config')
))
).
%% nginx_generate(+Blocks, -String)
%% Generates a Prolog string from an AST list.
nginx_generate(Blocks, String) :-
must_be(list, Blocks),
( phrase(nginx_config(Blocks), Codes) ->
string_codes(String, Codes)
;
throw(error(
nginx_generate_failure(Blocks),
context(nginx_generate/2, 'AST cannot be serialised — malformed node?')
))
).
%% nginx_write_file(+FilePath, +Blocks)
%% Writes a generated config to disk atomically.
%% Uses a temp file + rename for crash-safe write.
%%
%% CRITICAL — SAME FILESYSTEM ATOMICITY:
%% POSIX rename(2) is atomic only when source and destination are on the
%% same filesystem mount. If TmpPath is on a different filesystem
%% (e.g., /tmp on tmpfs while FilePath is on the root ext4 partition),
%% rename_file/2 silently falls back to a copy-then-delete — which is NOT
%% atomic. The NGINX master process performing a config reload during the
%% window between the copy completing and the delete finishing will read a
%% valid new config, but a crash in that window leaves the old file deleted
%% and the new file under the tmp name — the config vanishes.
%%
%% Solution: derive TmpPath from the SAME directory as FilePath.
%% file_directory_name/2 extracts the directory component; the tmp file is
%% written as a hidden dotfile in that directory, guaranteeing the same
%% filesystem mount for any conventional NGINX config path under /etc/nginx/.
%%
%% Example:
%% FilePath = /etc/nginx/sites-available/app.conf
%% TmpPath = /etc/nginx/sites-available/.app.conf.tmp
%% ↑ same directory, same filesystem
nginx_write_file(FilePath, Blocks) :-
must_be(atom, FilePath),
must_be(list, Blocks),
nginx_generate(Blocks, String),
% Derive tmp path in the same directory as FilePath
file_directory_name(FilePath, Dir),
file_base_name(FilePath, Base),
atomic_list_concat([Dir, '/.', Base, '.tmp'], TmpPath),
setup_call_cleanup(
open(TmpPath, write, Stream, [encoding(utf8)]),
write(Stream, String),
close(Stream)
),
rename_file(TmpPath, FilePath).
% rename_file/2 calls rename(2) — atomic on same filesystem.
% NGINX master reads either the old complete file or the new complete file.
% No partial config is ever visible.
12.3.2 REPL: Parse, Inspect, Regenerate
?- use_module('/opt/logic-node/kb/config/nginx_reversible').
true.
% Parse a config file
?- nginx_parse_file('/etc/nginx/sites-available/app.conf', Blocks).
Blocks = [
comment("# Primary application cluster — updated 2026-03-01\n"),
upstream_block{
name: "app_cluster",
body: [
ws([32,32,32,32]),
server_entry{address:3232235786, port:8080, params:[weight(5)]},
...
]
},
server_block{body: [...]}
].
% Regeneration invariant check
?- regeneration_invariant('/etc/nginx/sites-available/app.conf').
[PASS] /etc/nginx/sites-available/app.conf: regeneration identical
true.
% Generate a specific upstream block in isolation
?- phrase(upstream_block(upstream_block{
name: "test",
body: [
ws([32,32,32,32]),
server_entry{address:167772161, port:9090, params:[weight(3)]},
blank_line
]
}), Codes),
string_codes(S, Codes), write(S).
upstream test {
server 10.0.0.1:9090 weight=3;
}
true.
12.4 Logical Mutation and Verification
12.4.1 mutate_upstream/4
%% =============================================================================
%% FILE: /opt/logic-node/kb/config/nginx_mutations.pl
%% PURPOSE: AST mutation predicates for NGINX config changes.
%%
%% DESIGN:
%% Mutation predicates are pure logical transformations on AST terms.
%% They produce new ASTs — they do not write files, generate strings,
%% or produce any side effects.
%% Callers MUST pass the new AST through verify_ha_compliance/1
%% before calling nginx_write_file/2.
%% =============================================================================
:- module(nginx_mutations, [
mutate_upstream/4,
mutate_upstream_server/5,
replace_server_ip/4,
verify_ha_compliance/1,
verify_ha_compliance/2
]).
:- use_module(library(error)).
:- use_module('/opt/logic-node/kb/parsers/network_parser', [parse_ipv4/2]).
%% ---------------------------------------------------------------------------
%% UPSTREAM SERVER IP REPLACEMENT
%% ---------------------------------------------------------------------------
%% mutate_upstream(+OldBlocks, +UpstreamName, +NewIPInt, -NewBlocks)
%%
%% Finds the upstream_block with the given name in OldBlocks,
%% replaces all server_entry nodes whose IP matches the pattern
%% NewIPInt is the validated 32-bit integer from parse_ipv4/2.
%%
%% If no upstream_block with UpstreamName exists, throws a typed error.
%% If the upstream block exists but no server entry matches, NewBlocks = OldBlocks.
mutate_upstream(OldBlocks, UpstreamName, OldIPInt, NewIPInt, NewBlocks) :-
must_be(list, OldBlocks),
must_be(string, UpstreamName),
must_be(integer, OldIPInt),
must_be(integer, NewIPInt),
( select_upstream(OldBlocks, UpstreamName, UpstreamBlock, RestBlocks) ->
mutate_upstream_block(UpstreamBlock, OldIPInt, NewIPInt, NewUpstreamBlock),
NewBlocks = [NewUpstreamBlock | RestBlocks]
;
throw(error(
upstream_not_found(UpstreamName),
context(mutate_upstream/5, 'Named upstream block not in config AST')
))
).
%% select_upstream(+Blocks, +Name, -Block, -Rest)
%% Extracts the first upstream_block with Name from Blocks.
%% Rest preserves all other blocks in their original order.
select_upstream([B|Rest], Name, B, Rest) :-
is_dict(B, upstream_block),
B.name = Name,
!.
select_upstream([H|T], Name, Block, [H|Rest]) :-
select_upstream(T, Name, Block, Rest).
%% mutate_upstream_block(+Block, +OldIP, +NewIP, -NewBlock)
%% Walks the body list replacing server_entry nodes with matching address.
mutate_upstream_block(Block, OldIP, NewIP, NewBlock) :-
is_dict(Block, upstream_block),
maplist(
mutate_body_node(OldIP, NewIP),
Block.body,
NewBody
),
NewBlock = Block.put(body, NewBody).
mutate_body_node(OldIP, NewIP, Node, NewNode) :-
is_dict(Node, server_entry),
Node.address =:= OldIP,
!,
NewNode = Node.put(address, NewIP).
mutate_body_node(_, _, Node, Node). % All other nodes: unchanged
%% ---------------------------------------------------------------------------
%% MUTATION: REPLACE A SINGLE SERVER ENTRY BY IP STRING
%% Entry point for callers working with string IP notation.
%% ---------------------------------------------------------------------------
%% replace_server_ip(+Blocks, +UpstreamName, +OldIPStr, +NewIPStr, -NewBlocks)
%% OldIPStr, NewIPStr: Prolog strings (from JSON ingestion or UI input).
%% parse_ipv4/2 validates both before mutation proceeds.
replace_server_ip(Blocks, UpstreamName, OldIPStr, NewIPStr, NewBlocks) :-
must_be(string, OldIPStr),
must_be(string, NewIPStr),
parse_ipv4(OldIPStr, OldIPInt),
parse_ipv4(NewIPStr, NewIPInt),
mutate_upstream(Blocks, UpstreamName, OldIPInt, NewIPInt, NewBlocks).
%% ---------------------------------------------------------------------------
%% HA COMPLIANCE VERIFICATION
%% ---------------------------------------------------------------------------
%% verify_ha_compliance(+Blocks)
%% Verifies the proposed AST satisfies HA invariants.
%% Throws a typed error if any invariant fails.
%% The config is NEVER written to disk unless this predicate succeeds.
verify_ha_compliance(Blocks) :-
verify_ha_compliance(Blocks, Violations),
( Violations = [] ->
true
;
throw(error(
ha_compliance_failure(Violations),
context(verify_ha_compliance/1, 'Proposed AST fails HA invariants')
))
).
%% verify_ha_compliance(+Blocks, -Violations)
%% Non-throwing form: returns list of violation terms.
verify_ha_compliance(Blocks, Violations) :-
must_be(list, Blocks),
findall(V, check_ha_invariant(Blocks, V), Violations).
%% check_ha_invariant(+Blocks, -Violation)
%% One clause per invariant. Succeeds (binding Violation) when the invariant
%% is violated. Fails (no violation) when the invariant holds.
%% INVARIANT 1: Every upstream block must have at least 2 active server entries.
check_ha_invariant(Blocks, no_ha_upstream(Name, ActiveCount)) :-
member(Block, Blocks),
is_dict(Block, upstream_block),
Name = Block.name,
findall(S, (member(S, Block.body),
is_dict(S, server_entry),
\+ memberchk(backup, S.params),
\+ memberchk(down, S.params)),
ActiveServers),
length(ActiveServers, ActiveCount),
ActiveCount < 2.
%% INVARIANT 2: Every server block must have at least one listen directive.
check_ha_invariant(Blocks, no_listen_directive(BlockIndex)) :-
nth1(BlockIndex, Blocks, Block),
is_dict(Block, server_block),
\+ ( member(N, Block.body), N = listen(_) ).
%% INVARIANT 3: If any server block listens on 443, ssl_certificate must be present.
check_ha_invariant(Blocks, ssl_cert_missing_for_443) :-
member(Block, Blocks),
is_dict(Block, server_block),
member(listen(L), Block.body),
L.port =:= 443,
\+ ( member(directive{name: "ssl_certificate", value: _}, Block.body) ).
%% INVARIANT 4: No server entry IP may be in loopback range (127.0.0.0/8).
%% Loopback IPs in upstream blocks indicate misconfiguration.
check_ha_invariant(Blocks, loopback_in_upstream(Name, IP)) :-
member(Block, Blocks),
is_dict(Block, upstream_block),
Name = Block.name,
member(S, Block.body),
is_dict(S, server_entry),
IP = S.address,
LoopbackNet is 127 * 16777216, % 127.0.0.0 as integer
(IP >> 24) =:= 127,
\+ (IP =:= LoopbackNet + 1). % 127.0.0.1 alone is allowed for testing
12.4.2 Full Mutation Pipeline
%% nginx_update_upstream_ip/4
%% The complete pipeline: parse → mutate → verify → generate → write.
%% Called by the Logic Node's infrastructure management predicates.
nginx_update_upstream_ip(ConfigPath, UpstreamName, OldIPStr, NewIPStr) :-
must_be(atom, ConfigPath),
must_be(string, UpstreamName),
must_be(string, OldIPStr),
must_be(string, NewIPStr),
% Step 1: Parse the current config to AST
nginx_parse_file(ConfigPath, OldBlocks),
% Step 2: Validate both IP strings before any mutation
parse_ipv4(OldIPStr, OldIPInt),
parse_ipv4(NewIPStr, NewIPInt),
OldIPInt \=:= NewIPInt, % Sanity: mutation must change something
% Step 3: Apply mutation — pure logic, no I/O
mutate_upstream(OldBlocks, UpstreamName, OldIPInt, NewIPInt, NewBlocks),
% Step 4: Verify HA compliance of the new AST — no disk write yet
verify_ha_compliance(NewBlocks),
% Step 5: Verify regeneration is syntactically valid — generate to codes
nginx_generate(NewBlocks, _GeneratedString), % throws if AST is malformed
% Step 6: Atomic write — temp file + rename
nginx_write_file(ConfigPath, NewBlocks),
format("[OK] ~w: upstream '~w' server ~w → ~w~n",
[ConfigPath, UpstreamName, OldIPStr, NewIPStr]).
% REPL: full pipeline
?- nginx_update_upstream_ip(
'/etc/nginx/sites-available/app.conf',
"app_cluster",
"192.168.1.10",
"10.0.2.15").
[OK] /etc/nginx/sites-available/app.conf: upstream 'app_cluster' server 192.168.1.10 → 10.0.2.15
true.
% Verify git diff shows exactly one changed line:
% $ git diff /etc/nginx/sites-available/app.conf
% - server 192.168.1.10:8080 weight=5;
% + server 10.0.2.15:8080 weight=5;
% (All other lines identical — layout preservation confirmed)
% HA compliance failure — trying to set only one active server
?- nginx_update_upstream_ip(
'/etc/nginx/sites-available/app.conf',
"app_cluster",
"192.168.1.11",
"192.168.1.10"). % Both active servers now point to same IP
ERROR: ha_compliance_failure([no_ha_upstream("app_cluster", 1)])
% Blocked: the resulting AST has only 1 distinct active server.
% Disk is unchanged.
% Invalid IP rejected before mutation
?- nginx_update_upstream_ip(
'/etc/nginx/sites-available/app.conf',
"app_cluster",
"999.168.1.10",
"10.0.2.15").
ERROR: parse_failure(ipv4, "999.168.1.10")
% parse_ipv4/2 throws before mutate_upstream/5 is ever called.
12.5 Security: The Template Injection Myth
12.5.1 What Text Templating Actually Does
A Jinja2 template for an NGINX upstream block:
upstream {{ upstream_name }} {
{% for server in upstream_servers %}
server {{ server.ip }}:{{ server.port }} weight={{ server.weight }};
{% endfor %}
keepalive 32;
}
The Jinja2 engine substitutes variables and produces a string. It does not know what NGINX syntax is. It does not know that upstream_name must be an alphanumeric identifier. It does not know that server.ip must be a valid IP address. It does not know that a brace in server.ip would break the block structure.
The substitution {{ upstream_name }} with the value "app_cluster } server 0.0.0.0:80; }" produces:
upstream app_cluster } server 0.0.0.0:80; } {
This is syntactically invalid NGINX — nginx -t will catch it. But the substitution {{ server.ip }} with a value from a Netbox API response field that contains "10.0.1.5; # decommissioned\nserver 0.0.0.0:80" produces:
server 10.0.1.5; # decommissioned
server 0.0.0.0:80:8080 weight=3;
The first line is a syntactically valid NGINX directive pointing at port 80 on 10.0.1.5. The second line is a malformed directive NGINX will reject. The nginx -t check fails — but the engineer looking at the failure message sees a syntax error at line N+1, not at the injected value at line N. The Netbox record causing the injection may be the result of a data entry error, an API schema change, or deliberate manipulation of the CMDB by an attacker with Netbox write access.
12.5.2 The Structural Injection Vector
The deeper failure is not the obvious nginx -t rejection. It is the case where the injected value is syntactically valid NGINX but semantically wrong.
An Ansible variable upstream_ip bound to "10.0.1.5 weight=100; server 10.0.1.6" produces:
server 10.0.1.5 weight=100; server 10.0.1.6:8080 weight=3;
nginx -t passes. The running NGINX accepts the config. But the upstream pool now has a hidden high-weight server (10.0.1.5 without a port — NGINX uses port 80) receiving 97% of traffic while the intended weighted servers share 3%. This is not detectable from the NGINX config file's surface structure without understanding the injection. If 10.0.1.5:80 is not a valid application endpoint, all traffic silently returns 502.
A templating engine cannot prevent this. It has no model of "weight=100; server 10.0.1.6" as an injection payload — to the Jinja2 engine, it is a valid string substitution into a valid template position.
12.5.3 Why AST Generation Is Immune
The AST mutation path has no string substitution:
mutate_upstream/5 receives: UpstreamName (Prolog string), NewIPInt (integer)
NewIPInt passed to: Node.put(address, NewIPInt)
— NewIPInt is an integer. It cannot contain semicolons, braces, or newlines.
— The Dict update is a heap operation. No string is produced.
Generator receives: server_entry{address: NewIPInt, port: Port, params: Params}
— ip_port//2 called with NewIPInt bound
— ipv4_literal//1 decomposes NewIPInt into four integers (A, B, C, D)
— octet_codes//1 produces the decimal digit codes for each octet
— Terminal [0'.] inserts a literal dot code
— The output is: "A.B.C.D:Port"
There is no position in the grammar where a non-integer value can appear as an IP address. The grammar does not accept strings at IP address positions — it accepts integers. An attacker who can write to the Netbox CMDB record and inject "10.0.1.5 weight=100; server 10.0.1.6" as an "IP address" field will cause parse_ipv4/2 to throw parse_failure(ipv4, ...) — the integer is never produced, mutate_upstream/5 is never called, and the disk is unchanged.
The upstream name is stored as a Prolog string and emitted by unquoted_token//1. The unquoted_codes//1 grammar rule accepts only codes that are not spaces, braces, semicolons, or newlines. An injection attempt containing a brace or semicolon in the upstream name fails the unquoted_codes//1 grammar rule during generation. The nginx_generate/2 call throws nginx_generate_failure. Disk unchanged.
12.5.4 The Invariant Statement
Text templating guarantee: the output is a string that may or may not
be syntactically valid in the target format.
AST generation guarantee: the output is the unique string produced by
applying the grammar to a type-checked AST.
Every value in the AST satisfied its type
constraint before entering the AST.
The grammar is the type system.
The generator is the type-checked serialiser.
Syntactic validity is a theorem, not a check.
The phrase "syntactically valid by construction" is the engineering deliverable of this chapter. It requires three components that text templating cannot provide: a formal grammar (the bidirectional DCG), a typed AST (integer IPs, string tokens bounded by the grammar's terminal rules), and a separation between mutation logic (pure Prolog) and serialisation (the grammar).
Outcome: AST-Driven Infrastructure
12.6.1 The Conceptual Transition
Chapters 10 and 11 established the read path: external text → DCG parse → typed Prolog terms. Chapter 12 completes the round-trip by establishing the write path: typed Prolog terms → AST mutation → pre-write verification → DCG generate → config file.
The read path and the write path share the same grammar. There is no synchronisation problem between the parser and the generator — they cannot diverge because they are the same DCG rules. Adding a new NGINX directive to the grammar adds it to both the parser (the new directive is now recognised) and the generator (the new directive can be emitted). One grammar. One module. One test suite.
| Text templating (Jinja2, ERB, HCL) | AST-driven configuration (DCG + Dict) |
|---|---|
| Template is a string with holes | Grammar is a bidirectional DCG |
| Variables substituted as strings | Values are typed terms (integers, bounded strings) |
| No formal grammar of target format | Grammar is the target format's formal specification |
| Injection: any string can be substituted | Injection: only values satisfying type constraints enter the AST |
nginx -t is the last safety check |
HA compliance proof runs before nginx -t is ever needed |
git diff shows all re-normalised lines |
git diff shows exactly the mutated lines |
| Comments destroyed on re-generation | comment/1, ws/1 nodes preserved verbatim |
| Parser and template must be kept in sync manually | Same DCG rules handle both parse and generate |
12.6.2 Verification Checklist
?- use_module('/opt/logic-node/kb/config/nginx_reversible').
true.
?- use_module('/opt/logic-node/kb/config/nginx_mutations').
true.
% 1. Regeneration invariant
?- regeneration_invariant('/etc/nginx/sites-available/app.conf').
[PASS] /etc/nginx/sites-available/app.conf: regeneration identical
true.
% 2. IP round-trip through generation
?- phrase(ipv4_literal(3232235786), Codes),
phrase(ipv4_literal(IP), Codes),
IP =:= 3232235786.
true. % ✓ 192.168.1.10 round-trips through generate/parse
% 3. Upstream mutation produces correct result
?- nginx_parse_file('/etc/nginx/sites-available/app.conf', Blocks),
replace_server_ip(Blocks, "app_cluster", "192.168.1.10", "10.0.2.15", NewBlocks),
member(B, NewBlocks),
is_dict(B, upstream_block),
B.name = "app_cluster",
member(S, B.body),
is_dict(S, server_entry),
S.address =:= 167903759. % 10.0.2.15 as integer
true. % ✓ Mutation correctly replaced the IP integer
% 4. HA compliance fails on single-server upstream
?- catch(
verify_ha_compliance([
upstream_block{name:"test", body:[
server_entry{address:167772161, port:8080, params:[]}
]}
]),
error(ha_compliance_failure([no_ha_upstream("test", 1)]), _),
true).
true. % ✓ Single active server fails invariant
% 5. Injection attempt rejected at parse_ipv4/2 boundary
?- catch(
replace_server_ip([], "up", "10.0.0.1; server 0.0.0.0", "10.0.0.2", _),
error(parse_failure(ipv4, "10.0.0.1; server 0.0.0.0"), _),
true).
true. % ✓ Injection string rejected — disk never touched
% 6. Comments preserved through mutation
?- nginx_parse_file('/etc/nginx/sites-available/app.conf', OldBlocks),
replace_server_ip(OldBlocks, "app_cluster", "192.168.1.10", "10.0.2.15", NewBlocks),
nginx_generate(NewBlocks, NewStr),
string_concat("", NewStr, _), % String is well-formed
sub_string(NewStr, _, _, _, "# Active nodes").
true. % ✓ Comment preserved in generated output
% 7. Left-recursion absent: generation terminates
?- phrase(nginx_config([
upstream_block{name:"t", body:[
server_entry{address:167772161, port:80, params:[]}
]}
]), Codes),
length(Codes, N), N > 0.
true. % ✓ Generation terminates and produces non-empty output
Exercises
Exercise 12.1 — Grammar Extension: http Block
NGINX configs often wrap server and upstream blocks in an http { ... } top-level block. Extend nginx_config//1 to recognise and preserve http_block nodes. The http_block body contains the same mix of upstream_block, server_block, generic_directive, and layout nodes as the top level. Verify the regeneration invariant on an NGINX config that uses the http {} wrapper, and confirm that a mutation inside a nested upstream_block still produces exactly one changed line in git diff.
Exercise 12.2 — Left-Recursion Audit
Write a meta-predicate audit_left_recursion/1 that uses clause/2 to inspect all DCG rules in a given module for left recursion. A rule A//N --> A//N, ... is directly left-recursive; indirect left recursion requires tracing the expansion chain. Run it against nginx_reversible.pl and verify that no directly left-recursive rules exist. Document which rules would be dangerous if written left-recursively and why.
Exercise 12.3 — verify_ha_compliance Extension
Add two new invariants to check_ha_invariant/2: (1) No upstream block may have all server entries with down parameter set — at least one active server must exist. (2) Every server_name directive value must be a syntactically valid DNS hostname (alphanumeric, hyphens, dots; no underscores; no leading/trailing hyphens; at least one dot unless it is localhost). Write test cases that trigger each violation and verify the typed error terms are correct.
Exercise 12.4 — Atomic Multi-File Transaction
Implement nginx_update_multi/2:
nginx_update_multi(+Mutations, -Results)
% Mutations: list of update(ConfigPath, UpstreamName, OldIPStr, NewIPStr)
All mutations in the list must succeed or none are written. Implement a two-phase approach: Phase 1 parses all configs, applies all mutations, and runs verify_ha_compliance/1 on each — if any verification fails, no disk writes occur. Phase 2 writes all verified configs atomically. Verify that a list of 3 mutations where the 2nd mutation fails HA compliance leaves all 3 config files unchanged on disk.
Exercise 12.5 — Roundtrip Fuzz Testing
Write a property-based test roundtrip_fuzz/2 that generates random valid upstream_block ASTs (random IP integers, random port integers in [1, 65535], random combinations of weight/1, backup, down parameters) and verifies that: (1) nginx_generate/2 succeeds on every generated AST; (2) nginx_parse_file on the generated string produces an AST equal to the original; (3) no Atom Table growth occurs across 1,000 fuzz iterations. Use SWI-Prolog's random_between/3 to generate the random values.
Further Reading
- Pereira, F.C.N. & Shieber, S.M. (1987). Prolog and Natural-Language Analysis. CSLI Publications. Chapter 5: Reversible Grammars — the definitive treatment of bidirectional DCG design and left-recursion elimination.
- Shieber, S.M., Schabes, Y. & Pereira, F.C.N. (1990). "Principles and Implementation of Deductive Parsing." Journal of Logic Programming 24(1–2). — formal framework for generation as parsing with input/output roles swapped.
- NGINX documentation: Configuration file's structure —
https://nginx.org/en/docs/beginners_guide.html#conf_structure— normative reference for the config syntax implemented in this chapter. - SWI-Prolog Manual:
phrase/2,phrase/3—https://www.swi-prolog.org/pldoc/man?predicate=phrase/2 - SWI-Prolog Manual:
read_file_to_codes/3—https://www.swi-prolog.org/pldoc/man?predicate=read_file_to_codes/3 - OWASP: Server-Side Template Injection —
https://owasp.org/www-project-web-security-testing-guide/stable/4-Web_Application_Security_Testing/07-Input_Validation_Testing/18-Testing_for_Server-Side_Template_Injection— the templating injection class that AST generation eliminates. - Apt, K.R. (1997). From Logic Programming to Prolog. Prentice Hall. Chapter 8: Grammars — treatment of DCG termination conditions and left-recursion transformation.
No comments to display
No comments to display