feedback_logic_trumps_hallucinations

The rule. For INHERIT v2, InheritKit, and downstream TT architecture decisions, the canonical semantic layer is deterministic logic (Catala for arithmetic + structured rules; Alloy for structural invariants; TLA+ for protocol invariants; Rego/Cedar for authorisation). LLMs are adjunct to logic — never above it, never the canonical interpreter.

Why (Rich’s verbatim statement, Saturday 18 April 2026): “i am surprised by LLMs popularity, as they require so much cash and volume to train them. I believe logic trumps hallucinations!”

Convergent library evidence supporting this position:

Training cost closes solo-bootstrap pretraining — Raschka (Manning 2024) Ch 5 p.168: Llama 2 7B pretraining = ~£690K; Ch 1 p.29: GPT-3 = ~£4.6M. Only fine-tuning over open-weights base is solo-viable (Godoy 2025 shows Phi-3 Mini 3.8B QLoRA fine-tune on £300 RTX 4060 in 35 min).
Mirzadeh 2025 kill criterion — o1-class frontier LLMs fail on perturbed formal reasoning. LLM-as-oracle is empirically disqualified (cited in Raieli ch 11, Negro, adopted in improvements memo).
Catala head-to-head bug-catch — Catala caught a real deployed OpenFisca bug in article L755-12 [Q-MERI-48E62ED06E]. ESOP 2024 Best Tool Paper found 16 real date-rounding bugs in a 20,000-LOC Catala program via Mopsa static analysis [Q-MERI-8C95DCA591]. No LLM would catch these.
Day-precise date arithmetic matters — Python, Java, and coreutils disagree on corner cases (Feb-29 + 2 years, leap seconds). Real court case Bowles v. Russell 2007 dismissed over 14-vs-17-day ambiguity [Q-MERI-700AD1022F]. IHT’s 7-year PET window, deemed-domicile, 2-year spouse-pre-decease all require deterministic arithmetic — probabilistic inference is structurally wrong for this.

How to apply:

Canonical layers are logic. When architecting any data+reasoning system for TT, default to deterministic substrates: Catala for arithmetic + structured rules, Alloy 6 for structural invariants, TLA+ for protocol invariants, Rego/Cedar for authorisation. The canonical integrity layer never invokes an LLM for the computation itself.
Maximise Catala scope wherever statutes permit deterministic encoding. Don’t limit Catala to “just arithmetic”; extend to intestacy hierarchy traversal, PoA revocation, testamentary capacity statutes, statutory legacy, any rule where the statute’s text can be encoded as Catala’s prioritised-default logic. Catala’s ESOP 2024 paper exhibits exactly the “general case + exceptions + exceptions to exceptions” pattern IHTA 1984 / Wills Act 1837 / I(PFD)A 1975 use.
LLMs adjunct to logic — narrow scope only. Residual LLM uses are:
- OCR + document-to-structured extraction (will PDF → Pydantic model: testator, date, executors, witnesses, bequests). Rich’s 18 April 2026 verbatim: “LLMs are powerful for OCR but not for making decisions about a will via aggregation of other wills”.
- Natural-language interface (user asks “what is my NRB in 2027” → orchestrator calls Catala → returns Catala’s answer translated; LLM never computes the answer itself)
- Statute → Catala-draft drafting assist (human-reviewed; LLM proposes an encoding, lawyer reviews and commits)
- Residual principle-based reasoning only where statute genuinely says “reasonable provision” / “in the best interests” / “fair” — and this output is explicitly labelled “judgment-assisted, not machine-checked” in the audit trail.
LLM-INAPPROPRIATE tasks (explicit rejections). Per Rich 18 April 2026:
- Will interpretation by aggregation / RAG over similar wills. Each testator’s will must be interpreted on its own terms (testator’s specific intent + applicable statutes + capacity + circumstances). “Find 1000 similar wills and average what they meant” is a legal-methodology category error, not just a reliability concern. Interpretation is particular, not aggregate.
- Case-based reasoning by similarity to decide contested interpretation. The residual principle-based reasoning (point 3 above) is about judgment of this document under this statute — NOT about aggregating across prior documents.
- Any canonical decision-making on succession outcomes. Decision-making rests with Catala (arithmetic + statutory rules) or with named human decision-makers (practitioner, executor, court) — never with an LLM.
Commercial moat preserved without LLM-as-oracle framing. TT’s three commercial assets remain: (a) proprietary training corpus — DaaS-licensable to frontier labs, compounds with RaaS deployments; (b) fine-tuned Mistral Nemo 12B or Llama 3 8B via QLoRA + SFT — narrow-scope tool (extraction + NL interface + drafting assist), still valuable because corpus is proprietary; (c) Catala rulebase per family per jurisdiction. The architecture sells on integrity (“logic-canonical succession standard”), not on LLM sophistication. Training corpus is valuable for extraction training (testator / executor / witness / bequest tagging), not for RAG-over-wills inference.
Acquirer narrative. Legal-sector acquirers (insurance, wealth management, legal platforms, big 4) trust deterministic integrity more than LLM sophistication. “Logic-canonical with audit trail” is a stronger story than “LLM-canonical with Catala fallback” for legal diligence.
Don’t mistake Nay 2025 for an LLM-centric endorsement. Nay’s “Legal Engineering: A Paradigm Shift in Law” [Q-NAY1-C1D3E9DEA1] is compatible with logic-canonical — Nay uses LLMs for orchestration (which tool to call, what to extract, how to present), not for the canonical interpretation itself. The third paradigm slot Nay introduces is “modular LLM-orchestrated workflows with auditability”, where each LLM call is a narrow extractor or translator wired around a logic core. Option F uses Nay’s pattern for the orchestration + interface layer, with Catala + Alloy + TLA+ providing the canonical core.

Connection to other memories:

project_options_d_and_e_synthesis.md — Rich wanted TT to “use its own LLM in some capacity” for acquirer appeal. This memory clarifies the role of that LLM: narrow adjunct, not canonical interpreter. The fine-tuned LLM remains a commercial asset (the 17 April AI-vendor-commercial thesis stands) but architecturally it sits on the periphery, not at the centre.
project_ai_vendor_commercial_model.md — AI-vendor revenue routes (licence + hosted API + training-data bundle) still apply; the corpus is the compounding moat, the model is the productised surface.
feedback_no_current_stack_bias_for_inherit_v2.md — Option F reopens stack choice; this memory reopens LLM-role weighting. Both apply to the v2 rebuild.
feedback_premature_option_siloing.md — Don’t over-weight “LLM everywhere” because it’s the 2024-2025 zeitgeist. Evidence-first: which layer is logic better at, which layer is LLM better at.

Applies to: INHERIT v2 (Option F), InheritKit, LegacyLists (where it consumes InheritKit), MyFamilyInherits.com, InheritWills.com, and any future TT product making a logic-vs-LLM trade-off. When in doubt, default to logic; justify any addition of LLM-as-canonical-interpreter with evidence the logic layer cannot cover that case.

TT Claude Memory

Explorer

feedback_logic_trumps_hallucinations

Graph View

Backlinks