ν.β D5 CLOSED — Q-018 InheritKitBench eval framework
Spike: D5 | Date closed: 2026-05-04 | Outcome: outcome-VALIDATED-WITH-NOTE
Key findings
-
5-task eval framework designed: E1 sara_numeric + E2 sara_entailment + E3 learned_hands_estates (LegalBench-derived) + E4 IK-Will-author + E5 IK-Trigger-detect (IK SDK design-time). Full framework spec in T-file §5.
-
Kill condition NOT triggered: Phase-5 eval framework build ~£2-2.5K (LegalBench harness + IK gold-set + CI) — well below £10K threshold. Only the HARNESS cost is gated; InheritKitBench Phase A content (£30-50K) is separate.
-
LegalBench fixtures missing at D5 time: E1-E3 are design-time estimates. Fixture acquisition:
pip install datasets; load_dataset('nguha/legalbench', task)— ~5-10 min, free. Phase-5 prerequisite. -
LLM inference cost negligible: $2.86 (~£2.26) per full 5-trial run. Annual (quarterly × 4): ~£9. NOT a gating constraint.
-
IK SDK E4/E5 design-time only: IK SDK Phase-2+ prerequisite. D5 v2.0 triggered by SDK delivery.
-
T82 methodology anchored: BEA 2e fitness functions map cleanly to E1-E5 task types (atomic/holistic × triggered × static/dynamic).
-
sara_numeric precedent from ε.ι S4: F1=1.000 on 3 HMRC cases confirms legal numeric reasoning tractable. E1 extends to 192 LegalBench examples.
Files created
- T-file:
/home/richardd/off-github/library/projects/inherit/T-spike-nu-beta-D5-q018-inheritkitbench-eval-framework-2026-05-04.md - Q-NU-014:
/home/richardd/testatetech/docs-strategy/docs/superpowers/specs/2026-04-29-multi-phase-audit/current-questions/Q-NU-014-q018-inheritkitbench-eval-framework.md - Closure bundle:
/tmp/spike-nu-beta-D5/closure-bundle.md
Reactivation trigger for D5 v2.0
All of:
- LegalBench fixtures acquired (
~/tools/inherit-spike-data/legalbench-fixtures/*.jsonl) - IK SDK Phase-2 sprint 1 delivered
- 22-spike Spike 19 formally closed
Proposed arch-state §13.2 row
| D5 | Q-NU-014 | outcome-VALIDATED-WITH-NOTE | 5-task IK eval framework; E1-E3 LegalBench design-time (fixtures pending); E4-E5 IK-SDK design-time (Phase-2+); Phase-5 harness £2-2.5K; kill NOT-MET | T-spike-nu-beta-D5-q018-inheritkitbench-eval-framework-2026-05-04.md |