ζ-Q3 ε.ι S8 Catala HMRC golden vectors — VALIDATED 2026-05-02

Date: 2026-05-02 — eighth ε.ι spike to fully complete this session.

Result summary

KILL-CONDITION-NOT-MET on strict reading:

  • 3/3 = 100% EXACT match on 3 NEW HMRC cases (S4-validated NRB cases EXCLUDED from denominator per spike spec).
  • Catala wall-clock: typecheck 0.017s + 3 × interpret ~0.04s = ~0.06s total (7500× headroom under 5-min CI budget); peak memory 27 MB.
  • No layer-failure encountered (substrate / tooling / provisioning / measurement all clean).

3 NEW HMRC cases validated:

CaseHMRC anchorStatuteExpected → ActualMatch
S8-CASE-1IHTM43041IHTA 1984 s.8A(4)100% TNRB at £600K estate → tnrb=£325K + chargeable=£0 + iht=£0
S8-CASE-2IHTM43020IHTA 1984 s.3£1M transfer with £50K exemptions → chargeable=£950K
S8-CASE-3IHTM46023 (substituted for IHTM43022)IHTA 1984 s.8FA + s.8D£2.2M / £400K residence → rnrb=£75K + chargeable=£1.8M + iht=£720K

Catala scope file: /tmp/spike-s8-catala/iht-extended.catala_en — extends S4’s uk-w-nrb.catala_en with rnrb_amount input on NilRateBandApplication + 2 NEW scopes (ChargeableTransferWithExemptions s.3 + ResidenceNilRateBandWithTapering s.8FA).

Combined S4+S8 evidence: 6/6 HMRC cases match exactly (3 S4 NRB + 3 S8 extended).

ε.ι Layer 5 lock-frame validated

“Layer 5 acceptance criterion = Catala scope ↔ HMRC-output match = 100% on selected golden vectors. Selected vectors are partner-firm-authored at Phase-1 build start (with pilot vector set hand-derived from HMRC IHT Manual sections IHTM43020 + IHTM43040 + IHTM43041 + IHTM46023 per S8 anchoring). Catala 1.1.0 is the production substrate; clerk start for stdlib resolution, --no-stdlib for spike-level CI gates. Failure relaxation: if 100% match cannot be achieved on a particular statute, document architectural-alternative (expert-reviewable-difference + partner-review burden +20-30%).”

Combined with S1+S2.5+S3+S4+S5+S8, ε.ι is now strongly de-risked across Layers 1, 2, 4, and 5. Layer 3 (operational separation) is process-level and not spike-able.

richard-task #220 disposition recommendation: SCOPE-DOWN

Old (DEFERRED): Catala formal-verification test-suite Phase-1 across 25-statute corpus per ξ.+ A-130 New (ADOPTED-NARROW per SCOPE-DOWN): Catala formal-verification test-suite Phase-1 across E&W IHT main rules (~6-12 statutes): NRB (s.7) + TNRB (s.8A-8C) + RNRB tapering (s.8D + s.8FA) + chargeable transfer with exemptions (s.3) + main rate (s.7(1)) + charity rate Sch 1A.

OptionPhase-1 effortRiskVerdict
PROMOTE (full 25-statute)6-9 weeksHIGH (untested edge cases: lifetime cumulation; BPR/APR; QSR; trust mechanics)NOT recommended
SCOPE-DOWN (E&W IHT main rules)2-3 weeksLOW (S4+S8 directly validate substrate)RECOMMENDED
DROP£0Forfeits acquirer-DD narrative legNOT recommended (substrate works)

Rationale: substrate proven (S4+S8 → 6/6 EXACT match). Risk concentrated in untested mechanics (19/25 statutes not yet exercised). Acquirer-DD narrative weight gain ≈ 80% of full scope at 25-30% of effort. Year-2+ optionality preserved (acquirer signal increase OR partner-firm demand OR Phase-1 audit close as reconsideration triggers).

Kill-condition for the scoped task: if Phase-1 partner-firm pilot produces F1 <0.85 against partner-authored golden vectors, pivot back to “expert-reviewable-difference” Layer 5 acceptance criterion.

Reconsideration trigger: Phase E Task 13 lock-decision (~Q4 2026) or Phase-1 audit close (~12-18 months post-build-start).

Currency display caveat

Catala 1.1.0 prints money values with $ prefix regardless of locale. $325,000.00 is semantically equal to expected £325,000.00 — only display-formatting differs. Production deploys would use a locale-aware formatter at the output boundary. This is a cosmetic/display question, not a semantic divergence — the numeric values match exactly. Documented in T-file §7 honesty caveats; does NOT affect the kill-condition (which is on numeric values).

IHTM43022 → IHTM46023 substitution

The prompt referenced IHTM43022 for RNRB tapering; in the live HMRC IHT Manual, RNRB-related guidance is in IHTM46xxx (IHTM46023 covers the £2M taper threshold). IHTM43022 is reserved for chargeable-event-on-trust mechanics. Substitution documented in /tmp/spike-s8-catala/sources.md §2.

No alternatives exercised

5 alternatives pre-staged per feedback_surface_alternatives_before_collapsing_synthesis_to_baseline:

  1. catala interpret → clerk run
  2. typecheck —no-stdlib for type isolation
  3. piecewise function for tapering
  4. custom stdlib stub
  5. Z3 / Lean 4 alternative formal-verification backend

Kill-condition not met → none exercised. Alternatives remain available for Phase-1 implementation if needed.

Cross-cutting disciplines exercised

  • feedback_universal_production_pipeline_sequence: STEP 1 utilise SEED first (S4 NRB scope reused; HMRC URLs anchored before Catala authoring) ✅
  • feedback_logging_contract_closure_within_same_session: T-file + arch-state §11 + arch-state changelog + Q-003 §10 + memory + MEMORY.md + active-work-log + plan §1.10 within same session ✅
  • feedback_kill_condition_strict_vs_spirit_reading_via_outcome_MITIGATED: 3/3 strict match + spirit met → outcome-VALIDATED (no MITIGATED ambiguity) ✅
  • feedback_test_theories_immediately_when_tabled: ε.ι Layer 5 100%-match acceptance criterion tested at theory-tabling time ✅
  • feedback_confront_richard_tasks_at_creation_time: richard-task #220 disposition recommendation produced based on substantive evidence (6/6 combined cases match) ✅
  • feedback_surface_alternatives_before_collapsing_synthesis_to_baseline: 5 alternatives pre-staged but NOT exercised because kill-condition not met ✅

Plan defects identified (plan v1.7 → v1.8 patch candidates)

  1. Line 812 hardcodes T-file as T-spike-eps-iota-S6-catala-hmrc-golden-... — should be S8 (S6 is AM-CDM precedent at line 873). Used corrected S8 slug.
  2. Plan §4 Task 6 Step 4 example uses stdin-JSON-input pattern that doesn’t match Catala 1.1.0’s interpret subcommand; working pattern is per S4’s test-scope-with-hardcoded-inputs style.
  3. Plan §4 Task 6 Step 1 lists IHTM43022 as RNRB anchor; correct anchor is IHTM46023 (IHTM46xxx is RNRB-specific section family).

Cross-references

  • T-file: ~/off-github/library/projects/inherit/T-spike-eps-iota-S8-catala-hmrc-golden-2026-05-02.md v1.0
  • Plan: ~/testatetech/docs-strategy/docs/superpowers/plans/2026-05-02-zeta-q3-eps-iota-derisking-spikes.md v1.7 §4 Task 6
  • Q-003 §10 (locked CCO/BFO 9 i-ζ classes) v1.8: ~/testatetech/docs-strategy/docs/superpowers/specs/2026-04-29-multi-phase-audit/answered-questions/Q-003-zeta-asset-taxonomy-CCO-BFO-rooted-9-classes-locked.md
  • Arch-state v3.26 §11 + Changelog: ~/testatetech/docs-strategy/docs/superpowers/specs/inherit-v2-architecture-state.md
  • Working artefacts: /tmp/spike-s8-catala/{iht-extended.catala_en, iht-extended-tests.catala_en, golden-vectors.json, match-table.json, sources.md, catala-typecheck.log}
  • Sibling spike memories: S1 / S2 / S2.5 / S3 / S2.6 / S2.10 / S4 / S2.9 / S2.9b / S5

Methodological observations for Phase E Task 13 lock-decision

  1. First spike to MEASURE 100% EXACT-match against an authoritative external source (HMRC IHT Manual). Acquirer-DD narrative now has a concrete claim: “Catala-verified golden vectors anchored line-for-line to HMRC IHT Manual”.
  2. Catala 1.1.0 + --no-stdlib empirically validated as the spike-level CI gate substrate; clerk start reserved for production stdlib resolution per S3 §3 working configuration. The 0.06s combined wall-clock allows running on every commit without budget concern.
  3. First richard-task disposition recommendation (#220 SCOPE-DOWN) produced from substantive evidence — operationalises the feedback_confront_richard_tasks_at_creation_time discipline.
  4. 10th spike in a row with logging-contract closed within same session as T-file authoring; only S1 had the historical 4.5-hour lag. Discipline fully validated across diverse spike topologies.
  5. Sustained alternatives-first discipline: 5 alternatives pre-staged for S8 even when expecting clean validation; pre-staging cost is low and option-value is high.