Spike replacement research — Round 4 (2026-04-25)
After Round 4 dispatched 22 spike candidates totalling ~17-19 weeks of empirical work, Rich asked: “can deeper research avoid the spikes?” Six parallel subagents harvested public evidence (published benchmarks, production case studies, vendor docs, library papers) to classify each spike as FULLY-REPLACED / PARTLY-REPLACED / TRULY-EMPIRICAL.
Result: ~10-11 weeks of spike-time avoided. Original 17-19 weeks → 3 weeks of residual empirical work, most parallelisable. 7 spikes FULLY-REPLACED, 6 PARTLY-REPLACED with reduced scope, 0 TRULY-EMPIRICAL (none unanswerable from public evidence).
Five strategic findings worth surviving compaction
1. Cedar wins 19.5pp from research alone (D-R4-01)
Public-evidence-only scorecard: Cedar 7.50 / Rego 5.55 / 14.5pp headroom even before unmeasured WASM bundle size (15% weight). Sensitivity-robust: worst-case bundle-size flip leaves Cedar ahead by ~12pp. Recommendation: switch to Cedar subject to 2-day bundle-size measurement; preserve Rego as thin Y2+ OSCAL layer. Apple/Styra acquihire (Aug 2025) is a material strategic tailwind — Apple hired Hinrichs/Sandall/Koponen + several Styra engineers; Styra DAS commercial offering sunsetting. OPA remains CNCF-governed but loses commercial maintainer underwriter. Cedar is NOT a research toy — Siemens 590M auth-calls/month, Twilio Flex, Stedi, FIS managing $50T payments are production deployments.
2. Catala→WASM Path A already in production (D-R4-02)
@catala-lang/french-law is a published npm package (v0.10.0, Feb 2026). catala-dsfr is a French-government-funded browser demo at code.gouv.fr/demos/catala — runs Catala programs in production using exact pipeline we’d use (Catala→OCaml→_api_web.ml→js_of_ocaml→JS). We don’t invent the path; we inherit it. wasm_of_ocaml (Tarides + Jane Street, full-released Feb 2025) is upgrade path if needed.
Critical correction to T32: Cloudflare Workers compressed bundle limit is 3 MiB Free / 10 MiB Paid (NOT 10MB/50MB as T32 stated). Pyodide baseline alone is ~24 MB → Path B (Pyodide) ruled out by size before any code added. T32 file requires amendment.
3. LinkML adoption inherits 4 T-files via Mungall intellectual lineage (SPK-LINKML-01)
Mungall + Matentzoglu + Hegde steward LinkML (T13) AND co-author SSSOM (T10) AND maintain ROBOT (T8) AND build OntoGPT (T17). Adopting LinkML inherits 4 T-programme integrations as ONE toolchain, not 4 separate bets. Far stronger commercial + governance posture than original spike framing implied. Biolink Model = 14,775 LOC + 527 slots → INHERIT v2 LinkML estimate 7,500-13,000 LOC (vs current 28K v6.6 + 8K extensions). Gaia-X is non-biomedical precedent (migrated from manual SHACL to LinkML 2024) — INHERIT not first cross-domain adopter.
Highest-risk LinkML generator: gen-shacl (medium fidelity; CEUR-WS Vol-3705 documents equals_string_in + some any_of patterns NOT translating; ifabsent bugs only fixed late 2025). gen-owl simple is OWL-2-Full leaning; production tool is linkml-owl (separate package). SPK-LINKML-01 collapses 6 weeks → 7-day LITE confirmation spike.
4. Claude PDF-native + Citations + Haiku 4.5 hybrid is the dark-horse winner (T-NEW-CLAUDE-01)
Path B’ (hybrid) recommended: Claude PDF-native + Citations API + Haiku 4.5 + LinkML schema constraint. Was NOT in original spike framing. Cost at 10k wills/mo: Path B’ ~£230/mo vs Path A (OntoGPT+GPT-4o default) ~£850/mo (4-6× gap). Critical nuance: OntoGPT defaults to GPT-4o but supports Claude via LiteLLM — original “Claude vs OntoGPT” framing was partly SPIRES-prompting-methodology vs prompt-engineering, not LLM-vs-LLM. Hybrid Path B’ is best of both. Mini-spike: 50 hand-marked wills, 2-3 days, ~£1,500. Stop-conditions: field-F1 <65% OR Citations grounding <80% OR ANY beneficiary hallucination triggers full 2-week spike.
5. Citations + Structured Outputs really do return 400 (T-NEW-CLAUDE-04)
Anthropic docs explicitly state incompatibility — returns 400 if both enabled. Two-call workaround (cite then structure) is the documented pattern. INHERIT v2 explainability pipeline (catala-explain commercial-artifact play from Phase-C Step 1) MUST plan for two-call pattern from Day 1. Hard architectural constraint, not optional.
Eight new architecture-state cross-cutting amendments (A-NEW-T-CROSS-10..17)
- A-NEW-T-CROSS-10 Citations + Structured Outputs two-call pattern mandatory (T-NEW-CLAUDE-04)
- A-NEW-T-CROSS-11 D1 EU
jurisdiction=euhard provisioning constraint for EU tenants (must be set at create-time; immutable) - A-NEW-T-CROSS-12 MCP Streamable HTTP locked transport; SSE deprecated by mid-2026 (Anthropic Connectors gate)
- A-NEW-T-CROSS-13 Cloudflare Code Mode SDK pattern for InheritKit MCP (2 tools, ~1k tokens, 99.9% reduction confirmed)
- A-NEW-T-CROSS-14 OMG Commons
cmns-rlcmp:Rolecanonical (FIBOfibo-fnd-pty-rl:Rolebeing progressively deprecated since OMG Commons 1.0 2023) - A-NEW-T-CROSS-15 UK AKN URIs derived;
legislation.gov.ukELI form canonical (UK National Archives have NOT published canonical/akn/uk/...namespace) - A-NEW-T-CROSS-16 Jurisdiction-aware section/article addressing (UK paths
/section/9vs Swiss anchors#art_457) - A-NEW-T-CROSS-17 Swiss tri-language parallel ingestion + reconciliation (DE/FR/IT equally authoritative on fedlex; v2 symmetric-jurisdictional Phase-1 implies parallel not single-language citation)
Spike-replacement final scoreboard
| Spike | Verdict | Original | Residual |
|---|---|---|---|
| Cedar vs Rego | PARTLY | 1 week | 2 days |
| Catala→WASM | PARTLY | 1-2 weeks | 2.5 days |
| Claude PDF vs OntoGPT | PARTLY | 2 weeks | 2-3 days mini-eval |
| SPK-LINKML-01 | PARTLY | 6 weeks | 7 days LITE |
| 6 small-cluster spikes | 5 FULLY + 1 PARTLY | 6.5 days | 30 min |
| RSC + IRI + AKN cluster (3) | 1 PARTLY + 2 FULLY | 6 days | 0 |
| TOTAL | 7 FULLY + 6 PARTLY + 0 TRULY-EMPIRICAL | 17-19 weeks | ~3 weeks |
Six reports authored (not committed)
All at ~/testatetech/docs-strategy/docs/superpowers/scoping/2026-04-25-scorecards/spikes/:
spike-replacement-1-cedar-vs-rego.md(~1,950 words)spike-replacement-2-catala-wasm.md(~1,800 words)spike-replacement-3-linkml-round-trip.md(~2,600 words)spike-replacement-4-claude-pdf-vs-ontogpt.md(~1,800 words)spike-replacement-5-small-spike-cluster.md(~2,800 words)spike-replacement-6-rsc-iri-akn-cluster.md(~3,200 words)
Plus: spike-1-cedar-vs-rego-t30.md (the original 1-week spike spec authored before the research-replacement run; status draft).
Research-replacement methodology — replicable pattern
Trigger: ≥2 weeks of code-spike work proposed AND public evidence likely to exist. Method: Dispatch focused subagents (~5 hours each, 6 in parallel) with narrow questions + cite-everything discipline. Cost: ~£100-150 subagent + Claude time across 6 dispatches. Yield: ~10-11 weeks of code-spike work avoided + 5 strategic findings + 8 architecture-state amendments. Discipline: “research is cheaper than spikes when public evidence exists” — generalises beyond INHERIT.
Cumulative T-programme + spike-replacement totals (after this round)
- 34 T-files research-complete (Rounds 1-4)
- 22 spike candidates → 13 closed-by-research (7 FULLY + 6 PARTLY) + 9 in residual queue (3 weeks aggregate)
- 175+ top-tier discoveries
- 115+ Phase-5 tensions queued
- 30+ A-NEW amendment candidates including 13 cross-cutting A-NEW-T-CROSS-01..17
- £190-290 cumulative research cost (4 rounds + spike-replacement)
- £180-750/mo Phase-1 cost-displacement identified
- 0 retroactive disruption to locked decisions
Next-step Rich-decisions queued (priority order)
- D-R4-01 Cedar adoption — confirmed-by-research; pending 2-day bundle-size measurement only
- D-R4-02 Catala→WASM Path A lock — confirmed-by-research; pending 2.5-day measurement
- SPK-LINKML-01 LITE authorisation — 7 working days at InheritKit kick-off (down from 6 weeks)
- T-NEW-CLAUDE-01 mini-eval — 50 hand-marked wills, 2-3 days
- D-R3-01 T22 vs T26 observability — Better Stack vs Sentry-absorbs scorecard (still pending from Round 3)
- D-R3-02 Option G §1.8 revise — MCP server Vercel Fluid → Cloudflare Workers + DOs
- D-R3-04 Stripe Sessions 2026 top-up — by 7 May (keynote 29-30 April)