Spike replacement research — Round 4 (2026-04-25)

After Round 4 dispatched 22 spike candidates totalling ~17-19 weeks of empirical work, Rich asked: “can deeper research avoid the spikes?” Six parallel subagents harvested public evidence (published benchmarks, production case studies, vendor docs, library papers) to classify each spike as FULLY-REPLACED / PARTLY-REPLACED / TRULY-EMPIRICAL.

Result: ~10-11 weeks of spike-time avoided. Original 17-19 weeks → 3 weeks of residual empirical work, most parallelisable. 7 spikes FULLY-REPLACED, 6 PARTLY-REPLACED with reduced scope, 0 TRULY-EMPIRICAL (none unanswerable from public evidence).

Five strategic findings worth surviving compaction

1. Cedar wins 19.5pp from research alone (D-R4-01)

Public-evidence-only scorecard: Cedar 7.50 / Rego 5.55 / 14.5pp headroom even before unmeasured WASM bundle size (15% weight). Sensitivity-robust: worst-case bundle-size flip leaves Cedar ahead by ~12pp. Recommendation: switch to Cedar subject to 2-day bundle-size measurement; preserve Rego as thin Y2+ OSCAL layer. Apple/Styra acquihire (Aug 2025) is a material strategic tailwind — Apple hired Hinrichs/Sandall/Koponen + several Styra engineers; Styra DAS commercial offering sunsetting. OPA remains CNCF-governed but loses commercial maintainer underwriter. Cedar is NOT a research toy — Siemens 590M auth-calls/month, Twilio Flex, Stedi, FIS managing $50T payments are production deployments.

2. Catala→WASM Path A already in production (D-R4-02)

@catala-lang/french-law is a published npm package (v0.10.0, Feb 2026). catala-dsfr is a French-government-funded browser demo at code.gouv.fr/demos/catala — runs Catala programs in production using exact pipeline we’d use (Catala→OCaml→_api_web.ml→js_of_ocaml→JS). We don’t invent the path; we inherit it. wasm_of_ocaml (Tarides + Jane Street, full-released Feb 2025) is upgrade path if needed.

Critical correction to T32: Cloudflare Workers compressed bundle limit is 3 MiB Free / 10 MiB Paid (NOT 10MB/50MB as T32 stated). Pyodide baseline alone is ~24 MB → Path B (Pyodide) ruled out by size before any code added. T32 file requires amendment.

3. LinkML adoption inherits 4 T-files via Mungall intellectual lineage (SPK-LINKML-01)

Mungall + Matentzoglu + Hegde steward LinkML (T13) AND co-author SSSOM (T10) AND maintain ROBOT (T8) AND build OntoGPT (T17). Adopting LinkML inherits 4 T-programme integrations as ONE toolchain, not 4 separate bets. Far stronger commercial + governance posture than original spike framing implied. Biolink Model = 14,775 LOC + 527 slots → INHERIT v2 LinkML estimate 7,500-13,000 LOC (vs current 28K v6.6 + 8K extensions). Gaia-X is non-biomedical precedent (migrated from manual SHACL to LinkML 2024) — INHERIT not first cross-domain adopter.

Highest-risk LinkML generator: gen-shacl (medium fidelity; CEUR-WS Vol-3705 documents equals_string_in + some any_of patterns NOT translating; ifabsent bugs only fixed late 2025). gen-owl simple is OWL-2-Full leaning; production tool is linkml-owl (separate package). SPK-LINKML-01 collapses 6 weeks → 7-day LITE confirmation spike.

4. Claude PDF-native + Citations + Haiku 4.5 hybrid is the dark-horse winner (T-NEW-CLAUDE-01)

Path B’ (hybrid) recommended: Claude PDF-native + Citations API + Haiku 4.5 + LinkML schema constraint. Was NOT in original spike framing. Cost at 10k wills/mo: Path B’ ~£230/mo vs Path A (OntoGPT+GPT-4o default) ~£850/mo (4-6× gap). Critical nuance: OntoGPT defaults to GPT-4o but supports Claude via LiteLLM — original “Claude vs OntoGPT” framing was partly SPIRES-prompting-methodology vs prompt-engineering, not LLM-vs-LLM. Hybrid Path B’ is best of both. Mini-spike: 50 hand-marked wills, 2-3 days, ~£1,500. Stop-conditions: field-F1 <65% OR Citations grounding <80% OR ANY beneficiary hallucination triggers full 2-week spike.

5. Citations + Structured Outputs really do return 400 (T-NEW-CLAUDE-04)

Anthropic docs explicitly state incompatibility — returns 400 if both enabled. Two-call workaround (cite then structure) is the documented pattern. INHERIT v2 explainability pipeline (catala-explain commercial-artifact play from Phase-C Step 1) MUST plan for two-call pattern from Day 1. Hard architectural constraint, not optional.

Eight new architecture-state cross-cutting amendments (A-NEW-T-CROSS-10..17)

  • A-NEW-T-CROSS-10 Citations + Structured Outputs two-call pattern mandatory (T-NEW-CLAUDE-04)
  • A-NEW-T-CROSS-11 D1 EU jurisdiction=eu hard provisioning constraint for EU tenants (must be set at create-time; immutable)
  • A-NEW-T-CROSS-12 MCP Streamable HTTP locked transport; SSE deprecated by mid-2026 (Anthropic Connectors gate)
  • A-NEW-T-CROSS-13 Cloudflare Code Mode SDK pattern for InheritKit MCP (2 tools, ~1k tokens, 99.9% reduction confirmed)
  • A-NEW-T-CROSS-14 OMG Commons cmns-rlcmp:Role canonical (FIBO fibo-fnd-pty-rl:Role being progressively deprecated since OMG Commons 1.0 2023)
  • A-NEW-T-CROSS-15 UK AKN URIs derived; legislation.gov.uk ELI form canonical (UK National Archives have NOT published canonical /akn/uk/... namespace)
  • A-NEW-T-CROSS-16 Jurisdiction-aware section/article addressing (UK paths /section/9 vs Swiss anchors #art_457)
  • A-NEW-T-CROSS-17 Swiss tri-language parallel ingestion + reconciliation (DE/FR/IT equally authoritative on fedlex; v2 symmetric-jurisdictional Phase-1 implies parallel not single-language citation)

Spike-replacement final scoreboard

SpikeVerdictOriginalResidual
Cedar vs RegoPARTLY1 week2 days
Catala→WASMPARTLY1-2 weeks2.5 days
Claude PDF vs OntoGPTPARTLY2 weeks2-3 days mini-eval
SPK-LINKML-01PARTLY6 weeks7 days LITE
6 small-cluster spikes5 FULLY + 1 PARTLY6.5 days30 min
RSC + IRI + AKN cluster (3)1 PARTLY + 2 FULLY6 days0
TOTAL7 FULLY + 6 PARTLY + 0 TRULY-EMPIRICAL17-19 weeks~3 weeks

Six reports authored (not committed)

All at ~/testatetech/docs-strategy/docs/superpowers/scoping/2026-04-25-scorecards/spikes/:

  1. spike-replacement-1-cedar-vs-rego.md (~1,950 words)
  2. spike-replacement-2-catala-wasm.md (~1,800 words)
  3. spike-replacement-3-linkml-round-trip.md (~2,600 words)
  4. spike-replacement-4-claude-pdf-vs-ontogpt.md (~1,800 words)
  5. spike-replacement-5-small-spike-cluster.md (~2,800 words)
  6. spike-replacement-6-rsc-iri-akn-cluster.md (~3,200 words)

Plus: spike-1-cedar-vs-rego-t30.md (the original 1-week spike spec authored before the research-replacement run; status draft).

Research-replacement methodology — replicable pattern

Trigger: ≥2 weeks of code-spike work proposed AND public evidence likely to exist. Method: Dispatch focused subagents (~5 hours each, 6 in parallel) with narrow questions + cite-everything discipline. Cost: ~£100-150 subagent + Claude time across 6 dispatches. Yield: ~10-11 weeks of code-spike work avoided + 5 strategic findings + 8 architecture-state amendments. Discipline: “research is cheaper than spikes when public evidence exists” — generalises beyond INHERIT.

Cumulative T-programme + spike-replacement totals (after this round)

  • 34 T-files research-complete (Rounds 1-4)
  • 22 spike candidates → 13 closed-by-research (7 FULLY + 6 PARTLY) + 9 in residual queue (3 weeks aggregate)
  • 175+ top-tier discoveries
  • 115+ Phase-5 tensions queued
  • 30+ A-NEW amendment candidates including 13 cross-cutting A-NEW-T-CROSS-01..17
  • £190-290 cumulative research cost (4 rounds + spike-replacement)
  • £180-750/mo Phase-1 cost-displacement identified
  • 0 retroactive disruption to locked decisions

Next-step Rich-decisions queued (priority order)

  1. D-R4-01 Cedar adoption — confirmed-by-research; pending 2-day bundle-size measurement only
  2. D-R4-02 Catala→WASM Path A lock — confirmed-by-research; pending 2.5-day measurement
  3. SPK-LINKML-01 LITE authorisation — 7 working days at InheritKit kick-off (down from 6 weeks)
  4. T-NEW-CLAUDE-01 mini-eval — 50 hand-marked wills, 2-3 days
  5. D-R3-01 T22 vs T26 observability — Better Stack vs Sentry-absorbs scorecard (still pending from Round 3)
  6. D-R3-02 Option G §1.8 revise — MCP server Vercel Fluid → Cloudflare Workers + DOs
  7. D-R3-04 Stripe Sessions 2026 top-up — by 7 May (keynote 29-30 April)