Trigger event: Bratanič + Negro vocabulary-probe findings surfaced that full Option C (RDF/OWL + SHACL + JSON-LD + Rego + Akoma Ntoso + Oxigraph) does NOT have 2025 practitioner community endorsement. Both books converge on Neo4j + Cypher + schema-in-prompt; SHACL, JSON-LD, triplestore products, competency-questions all scored zero. Negro engages RDF/OWL as ontology-publication format but uses handleVocabUris: IGNORE to actively discard Web-architectural URIs at runtime.
Rich’s 18 April 2026 directive (verbatim): “when the books have been read, we need to have a good think about coming up with option D, and potentially E, which utilise the technologies you can see are, as of April 2026, regarded as the most current and usable. I still like my ‘Lines of Code’ metric but am open to other ways of looking at it. I think some potential investors/buyers will expect Testate Technologies to be using our own LLMS in some capacity”
Rich’s 18 April 2026 reinforcement (after Tamò-Larrieux read): “I am delighted to see option D. I would like you to be brave and work on D and E (AI-native) more than C”
Directional consequence: the synthesis should now centre D and E as Rich’s preferred direction, with C demoted to “legacy comparison baseline” alongside B. Be brave in the Option E design — TT fine-tuned LLM as canonical interpreter is explicitly on-the-table, not a hedging alternative.
Tamò-Larrieux adds (19 April 2026):
- Catala DSL already has worked inheritance tax examples — direct domain adjacency for Option E’s formal-rules layer
- OpenFisca is production rule-as-code infrastructure across 30+ countries — viable simulation-grade engine
- ACE Attempto controlled language has empirical readability evidence (Flesch-Kincaid +14, comprehension improvement) — a drafting-layer option
- ELI/ECLI IRI identifiers are the European-Commission-backed pattern for legal-document identity — stable URIs for 80-year legislative replay
- Zero Akoma Ntoso across a 209pp specialist legal-informatics book — even more damning than Bratanič/Negro’s zeros; Akoma Ntoso is not current in the 2025 legal-informatics practitioner mainstream
- Zero SHACL / SPARQL / JSON-LD / triplestores in Tamò-Larrieux — convergent disconfirmation across both KG+LLM and legal-informatics literatures
Constraints carried into D/E synthesis:
-
LOC metric preserved as primary elegance axis. Rich said “I still like my Lines of Code metric but am open to other ways of looking at it” — open to supplementary metrics (e.g. maintenance-weight ratio, time-to-new-jurisdiction, onboarding time, acquirer-valuation proxies) but LOC remains load-bearing.
-
Own-LLM requirement is new. TT must have a credible “we operate our own LLM layer” story for acquirer appeal. This changes the Option set — the standard and implementation now need to accommodate (a) TT-hosted fine-tuned LLM, (b) MCP server exposing standard operations, (c) embedded model-as-reference-interpreter possibility, (d) training-data strategy that converts INHERIT corpus into an LLM moat.
-
Currency filter (April 2026). Only technologies that the 2025-2026 practitioner literature endorses as mature + usable count as “current”. Disqualifies (by April 2026 evidence):
- SHACL as runtime validator (zero mentions across Bratanič + Negro)
- Triplestore products (Oxigraph, Jena, Stardog, GraphDB, Blazegraph, Virtuoso) at runtime
- Competency-question-driven design as primary methodology (zero mentions)
- Full pure-RDF stack as production substrate
-
Qualifies as current (April 2026 evidence):
- Neo4j + Cypher + neosemantics (Negro, Bratanič: both centre here)
- Pydantic + structured outputs for LLM extraction
- RDF/OWL as ontology-publication format (Negro engages, HPO/SNOMED/UMLS)
- Schema-in-prompt pattern
- Graph-RAG (vector + graph hybrid retrieval)
- MCP (Model Context Protocol) for agent interop
- JSON-LD for Web interop (neutral — no strong evidence either way yet)
- Rego/OPA for policy-as-code (from Jimmy Ray, O’Reilly 2024 — already in library)
Likely shape of Option D (hypothesis, to be refined after books 3 + 4):
“LPG-first hybrid — property graph substrate, JSON Schema / Pydantic wire shape, RDF/OWL as published ontology layer for semantic interop, Rego for policy, TT fine-tuned LLM as model-layer reference implementation, MCP server for agent access.”
- LPG (Neo4j / Memgraph / FalkorDB) as runtime substrate — endorsed
- Schema: JSON Schema 2020-12 or Pydantic v2 (whichever Rich prefers; both are schema-in-prompt ergonomic)
- RDF/OWL core ontology published once (like Negro’s HPO/SNOMED pattern) for semantic interop — does not require RDF at runtime
- Rego for jurisdiction-specific policy
- TT-hosted fine-tuned LLM as canonical interpreter for cross-jurisdiction semantics
- MCP server as the primary consumer-facing interface
- JSON-LD optional for Web-agent interop
Likely shape of Option E (AI-native — Rich has explicitly asked this be developed, not hedged):
“AI-native standard — TT’s fine-tuned LLM IS the reference implementation; schema defines IO contracts; Catala DSL provides formal-rules layer; OpenFisca-style simulator provides what-if analysis; consumer interaction via MCP; conformance via model-based + DSL-executed test vectors.”
- LLM-as-canonical-interpreter: TT fine-tuned LLM holds the cross-jurisdiction semantic knowledge; schema defines structured IO; model answers “what does this document mean under jurisdiction X?”
- Catala DSL for formally-provable rules (inheritance tax, intestacy hierarchy, statutory legacy calculation) — Catala has worked inheritance-tax examples, direct domain match
- OpenFisca-style simulation engine for what-if analysis across 21 jurisdictions — production pattern in 30+ countries
- Controlled natural language (Attempto ACE or similar) for legal drafting layer — readable by non-lawyers, machine-parseable
- ELI/ECLI IRI identifiers for legal-source grounding — stable across 80-year legislative replay
- MCP server as the primary consumer-facing interface — agent-native
- Training-data corpus becomes a commercial asset — fits AI-vendor commercial memo already in project context
- Revenue model: MCP access fees + commercial embedded licence for InheritKit + training-data bundle licence to OpenAI/Anthropic/Google/Meta/xAI
- Conformance: Catala-verified arithmetic + OpenFisca simulation test vectors + LLM-output comparison against golden reference
- Moat: the fine-tuned LLM + training corpus + Catala rules library are three distinct commercial assets, each sellable separately or as a bundle
- Highest acquirer-appeal (AI-native positioning) but highest execution risk — explicitly endorsed by Rich as the direction to be brave on
How to apply:
- After Raieli completes (last of the 4 books), do a vocabulary-probe summary across all 4 books plus the 19 round-1 books — what DOES the 2025-2026 literature endorse for Rich’s stack?
- Draft Option D spec with the LPG + RDF-published-ontology + LLM + MCP shape.
- Test whether Option E is distinct from D or an extension of it — may collapse into one option.
- Score B vs C vs D vs E on the existing 14-criterion scorecard, plus additions if Rich accepts them: maintenance-weight ratio, time-to-new-jurisdiction, acquirer-appeal-via-own-LLM, MCP-readiness.
- LOC estimates for D and E using the same baseline methodology as the C estimate (~135-190k).
- Bring the 5-option comparison back to Rich for a decision gate, before any build.