Rule: When ANY theory is tabled — in a scorecard cell, lock decision, A-N amendment row, plan section, scoping discussion, or casual analysis — IF it is load-bearing on a downstream decision, schedule a ~½-1 day spike-validation IMMEDIATELY. Do NOT defer “test the theory” to Phase-1 build, Phase-2+ uplift, or “later” without an explicit reconsideration trigger.

Theories accumulate as latent decision-debt. Each untested theory carries a probability that, if it turns out wrong, downstream commitments need reversal — and the cost of reversal scales with how many decisions have been chained on top of the untested theory.

Why: Per Rich-directive 2026-05-02T~14:30 BST: “i am enjoying running these spikes - it is good to test if theorys work - we need to test all theorys as soon as they are tabled.”

The first 6 hours of ζ-Q3 ε.ι derisking work (2026-05-02 BST) demonstrated the value:

SpikeTheory tabledTested whenOutcomeIf untested → cost
S1”v6.6 is high-quality 80% SEED across 8+ jurisdictions”Same morning as plan v1.0 commitVALIDATED (depth-4.6/5; 21 dedicated jurisdictions; cost story revises DOWNWARD)If untested + assumed-true: Phase-1 cost story locks +£40-80K higher than necessary; partner-pitch under-prepared
S2”schema-automator+funowl handles production OWL bulk-import”~2hr after S1KILL-CONDITION-MET (3 distinct funowl failure modes in 21s combined)If untested: ε.ι Layer 1 locks + Phase-1 build hits the wall at first CCO ingest attempt; ~3wk wasted before discovery
S2.5”owlready2 is a viable alternative OWL loader”Triggered by S2 MITIGATED + Rich-directiveVALIDATED (1.2s combined wall-clock for CCO+IAO+IOF; 28-class PoC end-to-end)If untested: ε.ι Layer 1 collapses to ω.η pattern; loses distinguishing component
S3”LinkML annotations transfer through gen-pydantic + gen-typescript + gen-json-schema”~2hr after S2.5PARTIALLY-MITIGATED (3-of-5 layers PASS; YAML-as-canonical alternative validates Catala scope-binding)If untested: ε.ι Layer 2 OntoUML standards-track narrative breaks at first codegen attempt
S2.10”Cedar Analysis works for INHERIT v2 inheritance-policy shapes”Triggered by Rich-directive after #219 surfacedIN FLIGHT 2026-05-02T~14:00 BSTIf untested + auto-committed-as-task #219: 3wk Phase-1 build budget consumed before discovery of viability

Theories tabled but NOT yet tested (post-2026-05-02T14:30 directive — these are the new flag-list):

TheoryWhere tabledLoad-bearing?Spike candidateStatus
owlready2 emitter scales to production CCO 1431 classesS2.5 closure cascadeYES — without scale, ε.ι Layer 1 PoC doesn’t translate to Phase-1S2.6 ~1 dayNOT YET DISPATCHED
graph-RAG over v6.6 corpus is Phase-1 viableζ-Q2 ξ.+ A-130 + richard-task #209YES — load-bearing on ξ.+ aspirational distinctivenessS2.9 ~1-2 daysNOT YET DISPATCHED
Catala formal-verification at “25-statute corpus” scope is Phase-1 viablerichard-task #220YES — load-bearing on Phase-1 scope decisionDEFERRED until S4+S5+S8 outcomes (per #220 disposition 2026-05-02)DEFERRED with explicit trigger
OntoGPT-assisted authoring for per-jurisdiction refinements is Phase-1 viableε.ι Layer 4 + ξ.+ commitmentsYES — load-bearing on ε.ι Layer 4 cost storyS4 ~2 days (UK&W NRB E2E pipeline)NOT YET DISPATCHED
MLP partner-firm wants the format we plan to deliverS10 scopeYES — load-bearing on partner-pitch + universal-production-pipeline-sequence STEP 3S10 (out-of-band) ~Mon 11 May 2026SCHEDULED

How to apply:

  1. At theory-tabling time (scorecard cell / lock decision / amendment row / plan section / scoping discussion / casual analysis): apply a 60-second triage — is this LOAD-BEARING on a downstream decision? If yes:

    • Spike-able in ½-1 day: SCHEDULE the spike NOW. Add to current spike-suite or dispatch as standalone confrontation spike (per feedback_confront_richard_tasks_at_creation_time). Don’t auto-create a richard-task that defers the test.
    • Spike-able but >1 day: AUTHOR the spike scope inline (kill-condition + concrete next-action + reconsideration trigger). Schedule for next-available spike-suite wave. Don’t treat the theory as confirmed-by-default.
    • Not spike-able (e.g. partner-firm signal that requires conversation): SCHEDULE the test (e.g. partner conversation date) with explicit reconsideration trigger if test doesn’t happen by deadline.
  2. In aspirational uplifts (A-130-style multi-leg uplifts where each leg adds Phase-1 commitments): apply alternatives-first + theory-test-immediately to EACH leg, not the uplift as a whole. ξ.+ A-130 had 4 legs (Cedar Analysis + Catala formal-verification + standards-track + ICAIL paper); each was an untested theory; only Cedar got spike-tested at S2.10; the others got DEFERRED or DROPPED on confrontation.

  3. In post-spike cascade work: when a spike outcome surfaces a NEW theory (“X works at small scale” → “does X scale to production?”), schedule the scale-test as the NEXT spike (S{N+1}.5 or S{N}.6 convention). Don’t let scale-validation accumulate as a latent assumption.

  4. In retroactive review: when reviewing existing tasks/theories, apply the (a)/(b)/(c) test from feedback_confront_richard_tasks_at_creation_time PLUS the theory-immediacy test. Theories that are load-bearing AND deferred without trigger should be either spike-validated immediately OR explicitly dropped.

  5. In end-of-turn reporting: when a session reports outcomes, surface load-bearing theories that haven’t been tested. Don’t treat “I assumed X” as evidence; mark X as a theory needing spike OR explicit explicit reconsideration trigger.

Boundary tests (when this rule fires STRONGLY):

  • ✓ Theories about tool viability for a Phase-1 commitment (Cedar Analysis works / owlready2 reads CCO / graph-RAG retrieval works on v6.6)
  • ✓ Theories about cost / scope / scale (3wk fits Phase-1 budget / 25-statute corpus is right size / 1000+ classes scale)
  • ✓ Theories about ecosystem maturity (LinkML 1.10 production-stable / owlready2 maintenance-active / Cedar Analysis ecosystem)
  • ✓ Theories about partner-firm or external-stakeholder reception (MLP wants this format / OASIS TC accepts our charter)
  • ✓ Theories embedded in scorecard cells (criterion n scores 4/5 because of theory T)

Boundary tests (when this rule does NOT apply):

  • ✓ Theories about subjective preferences (Rich likes British English) — no objective spike possible
  • ✓ Theories about Year-2+ horizons (e.g. “DOLCE might mature by 2028”) — too far out for immediate spike; defer with explicit reconsideration trigger
  • ✓ Theories already empirically grounded by prior work (e.g. T-file evidence, library-grounded research) — re-spiking just to double-check is wasteful

Codification trigger: Rich-directive 2026-05-02T~14:30 BST: “i am enjoying running these spikes - it is good to test if theorys work - we need to test all theorys as soon as they are tabled.” Generalisation of feedback_confront_richard_tasks_at_creation_time from tasks to all theories. Validated by 6-hour pattern of S1+S2+S2.5+S3+S2.10 spikes — every spike either confirmed a load-bearing theory or productively killed a tool-specific theory while preserving synthesis.

Related memories:

  • feedback_confront_richard_tasks_at_creation_time.md — narrower rule for tasks; this rule generalises to theories
  • feedback_surface_alternatives_before_collapsing_synthesis_to_baseline.md — when a tested theory’s tool fails, surface alternatives BEFORE falling back to baseline
  • feedback_iri_verification_before_lock.md — narrower rule for IRI choices specifically
  • feedback_actively_use_t_files_in_scorecard_authoring.md — closely related: read evidence before authoring scorecards
  • feedback_universal_production_pipeline_sequence.md — related sequencing rule for jurisdiction-content authoring
  • feedback_logging_contract_closure_within_same_session.md — analogous “act immediately, not later” discipline applied to T-file authoring

Forward integration:

  • Refined-prompt v3.6 → v3.7 candidate: add Step 14 — “TEST THEORIES IMMEDIATELY WHEN TABLED — at theory-tabling time, apply 60-second triage; if load-bearing + spike-able in ½-1 day, schedule the spike NOW; don’t defer to richard-tasks; don’t treat untested theories as confirmed.”
  • Plan §1.7 cross-cutting disciplines — add this as 5th codified discipline alongside the existing four
  • Q-locking cascade: amendment row should cite spike evidence per load-bearing theory (or explicit “DEFERRED until trigger X” if spike not viable in-session)