Phase-1 INHERIT v2 uses vendor-API LLMs (Scenario 2), not fine-tuned models. Rich locked this 2026-04-22 after reviewing LLM cost scenarios 0-5.
What’s committed
- Scenario 2 (vendor API for NL extraction + narration + AI-tip + RAG help) is the Phase-1 default. Annual cost envelope £4K-£12K at Phase-1 scale.
- Cloudflare AI Gateway specifically (NOT “Cloudflare or similar”) — single committed gateway for provider-swap, rate-limiting, cost metering, logging, fallback routing.
- Haiku 4.5 default tier across all user-facing flows. Sonnet 4.6 is an upgrade requiring task-specific evidence (not speculative). No Opus / frontier-tier models in Phase-1 production. Dev-time use (Rich’s own Claude subscription for strategy work) is unaffected.
- Hard monthly ceiling: £1,500/month at Phase 1. Alerts before breach; provisioning blocks further calls on breach until Rich or operations approves increase.
- Per-session cost limit, configurable in Advisory BackOffice (not hard-coded in v2). Auto-fallback to form-based flow when exceeded. Implements new mega-spec claim A4.32b.
What’s NOT a Phase-1 possibility
Scenario 3 (fine-tuning per Q4 memo) is gated on any one of:
- TT ARR ≥ £1m sustained for 12 consecutive months with ≥6 months post-spend runway, OR
- Acquirer-greenlit (post-acquisition), OR
- Customer-funded — a single customer pays ≥£40-60K upfront for the specific fine-tune; ships under InheritKit commercial license; TT retains IP.
Explicitly NOT a gate:
- Competitive-response to a rival shipping a succession-specialist model (Q4 memo trigger #4) — TT does not burn capital chasing competitors without revenue.
- Cross-task aggregation performance signal (Q4 memo trigger #5) — insufficient without revenue.
Q4 memo’s performance triggers become advisory only — they prioritize which task to fine-tune when the revenue gate opens, not whether to open it.
How to apply
- Designing any LLM-touching feature in Phase 1: default to Haiku 4.5 via Cloudflare Gateway; upgrade to Sonnet only with task-specific evidence that Haiku plateaus.
- Architecting any flow that uses LLM calls: include form-based fallback path + per-session cost tracking from Day 1. Per-session limit lives as a BackOffice-admin-editable config value.
- Budget planning: treat £1,500/month as hard ceiling through Phase 1; model revenue-growth-to-£1m-ARR separately as the revenue-gate precondition for Scenario 3.
- When asked about fine-tuning: always reply “revenue-gated, not a Phase-1 possibility.” Only revisit when ARR + runway gates test positive.
- Partner / investor conversations: frame LLMs as an additive UX-boundary layer, NOT as the core product. The core is Catala + JSON Schema + JSON-LD + SPARQL. LLMs are swappable convenience.
Why
- Solo-bootstrap (R3.21) — fine-tune infrastructure + eval-corpus annotation (£5-£20K/task) is significant capital. Should follow revenue, not lead it.
- ISO/JTC 1 PAS fast-track evaluates the standard, not the AI layer. LLMs irrelevant to ISO story.
- Clean-break discipline — LLM layer is swappable via Cloudflare Gateway; if a vendor API becomes unavailable or uneconomic, TT swaps without touching core artefacts.
- Catala does the rule-work deterministically — LLMs add UX-boundary convenience, not computation. Scenario 0 (no LLMs in production) remains a defensible fallback at any time.
Cross-references
docs/superpowers/specs/2026-04-22-inherit-v2-architecture-proposal.mdv1.3+ §“LLM posture — Phase-1 Scenario 2 commitment”docs/superpowers/specs/2026-04-21-ambiguity-4-r34-llm-route.mdv1.1+ §D5 (triggers downgraded to advisory only)- Mega-spec claims: A4.32 (Cloudflare specifically, Haiku default), A4.32b (per-session fallback, BackOffice-editable), A4.32c (£1,500/month ceiling)
- R3.4 cascade: trigger-gated → revenue-gated
Do NOT
- Pitch fine-tuning as a Phase-1 capability, even prospectively. R3.4 is strictly revenue-gated.
- Default to Sonnet or Opus in any new flow. Haiku 4.5 is default.
- Suggest building a custom AI gateway; Cloudflare is committed.
- Exceed £1,500/month LLM budget at Phase 1 without Rich’s explicit approval.
- Treat competitive-response or aggregation triggers as revenue-gate substitutes.