project_options_d_and_e

Trigger event: Bratanič + Negro vocabulary-probe findings surfaced that full Option C (RDF/OWL + SHACL + JSON-LD + Rego + Akoma Ntoso + Oxigraph) does NOT have 2025 practitioner community endorsement. Both books converge on Neo4j + Cypher + schema-in-prompt; SHACL, JSON-LD, triplestore products, competency-questions all scored zero. Negro engages RDF/OWL as ontology-publication format but uses handleVocabUris: IGNORE to actively discard Web-architectural URIs at runtime.

Rich’s 18 April 2026 directive (verbatim): “when the books have been read, we need to have a good think about coming up with option D, and potentially E, which utilise the technologies you can see are, as of April 2026, regarded as the most current and usable. I still like my ‘Lines of Code’ metric but am open to other ways of looking at it. I think some potential investors/buyers will expect Testate Technologies to be using our own LLMS in some capacity”

Rich’s 18 April 2026 reinforcement (after Tamò-Larrieux read): “I am delighted to see option D. I would like you to be brave and work on D and E (AI-native) more than C”

Directional consequence: the synthesis should now centre D and E as Rich’s preferred direction, with C demoted to “legacy comparison baseline” alongside B. Be brave in the Option E design — TT fine-tuned LLM as canonical interpreter is explicitly on-the-table, not a hedging alternative.

Tamò-Larrieux adds (19 April 2026):

Catala DSL already has worked inheritance tax examples — direct domain adjacency for Option E’s formal-rules layer
OpenFisca is production rule-as-code infrastructure across 30+ countries — viable simulation-grade engine
ACE Attempto controlled language has empirical readability evidence (Flesch-Kincaid +14, comprehension improvement) — a drafting-layer option
ELI/ECLI IRI identifiers are the European-Commission-backed pattern for legal-document identity — stable URIs for 80-year legislative replay
Zero Akoma Ntoso across a 209pp specialist legal-informatics book — even more damning than Bratanič/Negro’s zeros; Akoma Ntoso is not current in the 2025 legal-informatics practitioner mainstream
Zero SHACL / SPARQL / JSON-LD / triplestores in Tamò-Larrieux — convergent disconfirmation across both KG+LLM and legal-informatics literatures

Constraints carried into D/E synthesis:

LOC metric preserved as primary elegance axis. Rich said “I still like my Lines of Code metric but am open to other ways of looking at it” — open to supplementary metrics (e.g. maintenance-weight ratio, time-to-new-jurisdiction, onboarding time, acquirer-valuation proxies) but LOC remains load-bearing.
Own-LLM requirement is new. TT must have a credible “we operate our own LLM layer” story for acquirer appeal. This changes the Option set — the standard and implementation now need to accommodate (a) TT-hosted fine-tuned LLM, (b) MCP server exposing standard operations, (c) embedded model-as-reference-interpreter possibility, (d) training-data strategy that converts INHERIT corpus into an LLM moat.
Currency filter (April 2026). Only technologies that the 2025-2026 practitioner literature endorses as mature + usable count as “current”. Disqualifies (by April 2026 evidence):
- SHACL as runtime validator (zero mentions across Bratanič + Negro)
- Triplestore products (Oxigraph, Jena, Stardog, GraphDB, Blazegraph, Virtuoso) at runtime
- Competency-question-driven design as primary methodology (zero mentions)
- Full pure-RDF stack as production substrate
Qualifies as current (April 2026 evidence):
- Neo4j + Cypher + neosemantics (Negro, Bratanič: both centre here)
- Pydantic + structured outputs for LLM extraction
- RDF/OWL as ontology-publication format (Negro engages, HPO/SNOMED/UMLS)
- Schema-in-prompt pattern
- Graph-RAG (vector + graph hybrid retrieval)
- MCP (Model Context Protocol) for agent interop
- JSON-LD for Web interop (neutral — no strong evidence either way yet)
- Rego/OPA for policy-as-code (from Jimmy Ray, O’Reilly 2024 — already in library)

Likely shape of Option D (hypothesis, to be refined after books 3 + 4):

“LPG-first hybrid — property graph substrate, JSON Schema / Pydantic wire shape, RDF/OWL as published ontology layer for semantic interop, Rego for policy, TT fine-tuned LLM as model-layer reference implementation, MCP server for agent access.”

LPG (Neo4j / Memgraph / FalkorDB) as runtime substrate — endorsed
Schema: JSON Schema 2020-12 or Pydantic v2 (whichever Rich prefers; both are schema-in-prompt ergonomic)
RDF/OWL core ontology published once (like Negro’s HPO/SNOMED pattern) for semantic interop — does not require RDF at runtime
Rego for jurisdiction-specific policy
TT-hosted fine-tuned LLM as canonical interpreter for cross-jurisdiction semantics
MCP server as the primary consumer-facing interface
JSON-LD optional for Web-agent interop

Likely shape of Option E (AI-native — Rich has explicitly asked this be developed, not hedged):

“AI-native standard — TT’s fine-tuned LLM IS the reference implementation; schema defines IO contracts; Catala DSL provides formal-rules layer; OpenFisca-style simulator provides what-if analysis; consumer interaction via MCP; conformance via model-based + DSL-executed test vectors.”

LLM-as-canonical-interpreter: TT fine-tuned LLM holds the cross-jurisdiction semantic knowledge; schema defines structured IO; model answers “what does this document mean under jurisdiction X?”
Catala DSL for formally-provable rules (inheritance tax, intestacy hierarchy, statutory legacy calculation) — Catala has worked inheritance-tax examples, direct domain match
OpenFisca-style simulation engine for what-if analysis across 21 jurisdictions — production pattern in 30+ countries
Controlled natural language (Attempto ACE or similar) for legal drafting layer — readable by non-lawyers, machine-parseable
ELI/ECLI IRI identifiers for legal-source grounding — stable across 80-year legislative replay
MCP server as the primary consumer-facing interface — agent-native
Training-data corpus becomes a commercial asset — fits AI-vendor commercial memo already in project context
Revenue model: MCP access fees + commercial embedded licence for InheritKit + training-data bundle licence to OpenAI/Anthropic/Google/Meta/xAI
Conformance: Catala-verified arithmetic + OpenFisca simulation test vectors + LLM-output comparison against golden reference
Moat: the fine-tuned LLM + training corpus + Catala rules library are three distinct commercial assets, each sellable separately or as a bundle
Highest acquirer-appeal (AI-native positioning) but highest execution risk — explicitly endorsed by Rich as the direction to be brave on

How to apply:

After Raieli completes (last of the 4 books), do a vocabulary-probe summary across all 4 books plus the 19 round-1 books — what DOES the 2025-2026 literature endorse for Rich’s stack?
Draft Option D spec with the LPG + RDF-published-ontology + LLM + MCP shape.
Test whether Option E is distinct from D or an extension of it — may collapse into one option.
Score B vs C vs D vs E on the existing 14-criterion scorecard, plus additions if Rich accepts them: maintenance-weight ratio, time-to-new-jurisdiction, acquirer-appeal-via-own-LLM, MCP-readiness.
LOC estimates for D and E using the same baseline methodology as the C estimate (~135-190k).
Bring the 5-option comparison back to Rich for a decision gate, before any build.

TT Claude Memory

Explorer

project_options_d_and_e_synthesis

Graph View

Backlinks