feedback_test_theories_immediately_when

Rule: When ANY theory is tabled — in a scorecard cell, lock decision, A-N amendment row, plan section, scoping discussion, or casual analysis — IF it is load-bearing on a downstream decision, schedule a ~½-1 day spike-validation IMMEDIATELY. Do NOT defer “test the theory” to Phase-1 build, Phase-2+ uplift, or “later” without an explicit reconsideration trigger.

Theories accumulate as latent decision-debt. Each untested theory carries a probability that, if it turns out wrong, downstream commitments need reversal — and the cost of reversal scales with how many decisions have been chained on top of the untested theory.

Why: Per Rich-directive 2026-05-02T~14:30 BST: “i am enjoying running these spikes - it is good to test if theorys work - we need to test all theorys as soon as they are tabled.”

The first 6 hours of ζ-Q3 ε.ι derisking work (2026-05-02 BST) demonstrated the value:

Spike	Theory tabled	Tested when	Outcome	If untested → cost
S1	”v6.6 is high-quality 80% SEED across 8+ jurisdictions”	Same morning as plan v1.0 commit	VALIDATED (depth-4.6/5; 21 dedicated jurisdictions; cost story revises DOWNWARD)	If untested + assumed-true: Phase-1 cost story locks +£40-80K higher than necessary; partner-pitch under-prepared
S2	”schema-automator+funowl handles production OWL bulk-import”	~2hr after S1	KILL-CONDITION-MET (3 distinct funowl failure modes in 21s combined)	If untested: ε.ι Layer 1 locks + Phase-1 build hits the wall at first CCO ingest attempt; ~3wk wasted before discovery
S2.5	”owlready2 is a viable alternative OWL loader”	Triggered by S2 MITIGATED + Rich-directive	VALIDATED (1.2s combined wall-clock for CCO+IAO+IOF; 28-class PoC end-to-end)	If untested: ε.ι Layer 1 collapses to ω.η pattern; loses distinguishing component
S3	”LinkML annotations transfer through gen-pydantic + gen-typescript + gen-json-schema”	~2hr after S2.5	PARTIALLY-MITIGATED (3-of-5 layers PASS; YAML-as-canonical alternative validates Catala scope-binding)	If untested: ε.ι Layer 2 OntoUML standards-track narrative breaks at first codegen attempt
S2.10	”Cedar Analysis works for INHERIT v2 inheritance-policy shapes”	Triggered by Rich-directive after #219 surfaced	IN FLIGHT 2026-05-02T~14:00 BST	If untested + auto-committed-as-task #219: 3wk Phase-1 build budget consumed before discovery of viability

Theories tabled but NOT yet tested (post-2026-05-02T14:30 directive — these are the new flag-list):

Theory	Where tabled	Load-bearing?	Spike candidate	Status
owlready2 emitter scales to production CCO 1431 classes	S2.5 closure cascade	YES — without scale, ε.ι Layer 1 PoC doesn’t translate to Phase-1	S2.6 ~1 day	NOT YET DISPATCHED
graph-RAG over v6.6 corpus is Phase-1 viable	ζ-Q2 ξ.+ A-130 + richard-task #209	YES — load-bearing on ξ.+ aspirational distinctiveness	S2.9 ~1-2 days	NOT YET DISPATCHED
Catala formal-verification at “25-statute corpus” scope is Phase-1 viable	richard-task #220	YES — load-bearing on Phase-1 scope decision	DEFERRED until S4+S5+S8 outcomes (per #220 disposition 2026-05-02)	DEFERRED with explicit trigger
OntoGPT-assisted authoring for per-jurisdiction refinements is Phase-1 viable	ε.ι Layer 4 + ξ.+ commitments	YES — load-bearing on ε.ι Layer 4 cost story	S4 ~2 days (UK&W NRB E2E pipeline)	NOT YET DISPATCHED
MLP partner-firm wants the format we plan to deliver	S10 scope	YES — load-bearing on partner-pitch + universal-production-pipeline-sequence STEP 3	S10 (out-of-band) ~Mon 11 May 2026	SCHEDULED

How to apply:

At theory-tabling time (scorecard cell / lock decision / amendment row / plan section / scoping discussion / casual analysis): apply a 60-second triage — is this LOAD-BEARING on a downstream decision? If yes:
- Spike-able in ½-1 day: SCHEDULE the spike NOW. Add to current spike-suite or dispatch as standalone confrontation spike (per feedback_confront_richard_tasks_at_creation_time). Don’t auto-create a richard-task that defers the test.
- Spike-able but >1 day: AUTHOR the spike scope inline (kill-condition + concrete next-action + reconsideration trigger). Schedule for next-available spike-suite wave. Don’t treat the theory as confirmed-by-default.
- Not spike-able (e.g. partner-firm signal that requires conversation): SCHEDULE the test (e.g. partner conversation date) with explicit reconsideration trigger if test doesn’t happen by deadline.
In aspirational uplifts (A-130-style multi-leg uplifts where each leg adds Phase-1 commitments): apply alternatives-first + theory-test-immediately to EACH leg, not the uplift as a whole. ξ.+ A-130 had 4 legs (Cedar Analysis + Catala formal-verification + standards-track + ICAIL paper); each was an untested theory; only Cedar got spike-tested at S2.10; the others got DEFERRED or DROPPED on confrontation.
In post-spike cascade work: when a spike outcome surfaces a NEW theory (“X works at small scale” → “does X scale to production?”), schedule the scale-test as the NEXT spike (S{N+1}.5 or S{N}.6 convention). Don’t let scale-validation accumulate as a latent assumption.
In retroactive review: when reviewing existing tasks/theories, apply the (a)/(b)/(c) test from feedback_confront_richard_tasks_at_creation_time PLUS the theory-immediacy test. Theories that are load-bearing AND deferred without trigger should be either spike-validated immediately OR explicitly dropped.
In end-of-turn reporting: when a session reports outcomes, surface load-bearing theories that haven’t been tested. Don’t treat “I assumed X” as evidence; mark X as a theory needing spike OR explicit explicit reconsideration trigger.

Boundary tests (when this rule fires STRONGLY):

✓ Theories about tool viability for a Phase-1 commitment (Cedar Analysis works / owlready2 reads CCO / graph-RAG retrieval works on v6.6)
✓ Theories about cost / scope / scale (3wk fits Phase-1 budget / 25-statute corpus is right size / 1000+ classes scale)
✓ Theories about ecosystem maturity (LinkML 1.10 production-stable / owlready2 maintenance-active / Cedar Analysis ecosystem)
✓ Theories about partner-firm or external-stakeholder reception (MLP wants this format / OASIS TC accepts our charter)
✓ Theories embedded in scorecard cells (criterion n scores 4/5 because of theory T)

Boundary tests (when this rule does NOT apply):

✓ Theories about subjective preferences (Rich likes British English) — no objective spike possible
✓ Theories about Year-2+ horizons (e.g. “DOLCE might mature by 2028”) — too far out for immediate spike; defer with explicit reconsideration trigger
✓ Theories already empirically grounded by prior work (e.g. T-file evidence, library-grounded research) — re-spiking just to double-check is wasteful

Codification trigger: Rich-directive 2026-05-02T~14:30 BST: “i am enjoying running these spikes - it is good to test if theorys work - we need to test all theorys as soon as they are tabled.” Generalisation of feedback_confront_richard_tasks_at_creation_time from tasks to all theories. Validated by 6-hour pattern of S1+S2+S2.5+S3+S2.10 spikes — every spike either confirmed a load-bearing theory or productively killed a tool-specific theory while preserving synthesis.

Related memories:

feedback_confront_richard_tasks_at_creation_time.md — narrower rule for tasks; this rule generalises to theories
feedback_surface_alternatives_before_collapsing_synthesis_to_baseline.md — when a tested theory’s tool fails, surface alternatives BEFORE falling back to baseline
feedback_iri_verification_before_lock.md — narrower rule for IRI choices specifically
feedback_actively_use_t_files_in_scorecard_authoring.md — closely related: read evidence before authoring scorecards
feedback_universal_production_pipeline_sequence.md — related sequencing rule for jurisdiction-content authoring
feedback_logging_contract_closure_within_same_session.md — analogous “act immediately, not later” discipline applied to T-file authoring

Forward integration:

Refined-prompt v3.6 → v3.7 candidate: add Step 14 — “TEST THEORIES IMMEDIATELY WHEN TABLED — at theory-tabling time, apply 60-second triage; if load-bearing + spike-able in ½-1 day, schedule the spike NOW; don’t defer to richard-tasks; don’t treat untested theories as confirmed.”
Plan §1.7 cross-cutting disciplines — add this as 5th codified discipline alongside the existing four
Q-locking cascade: amendment row should cite spike evidence per load-bearing theory (or explicit “DEFERRED until trigger X” if spike not viable in-session)

TT Claude Memory

Explorer

feedback_test_theories_immediately_when_tabled

Graph View

Backlinks