feedback v3.12 release — 9 improvements bundled per Q-022 /review-plan findings

Locked: 2026-05-05T08:38 BST. Directive v3.11 → v3.12 amendment landed at docs/superpowers/specs/2026-04-29-multi-phase-audit/refined-end-of-turn-directive.md v3.12.

Trigger: Parallel session 2026-05-05T07:30 BST framed ζ-Q22 Phase-1 gate sequence dependency under refined-prompt v3.11, ran 3 spikes (S1 lock-BLOCKING gate-DAG analysis + D1 Bazel cost research + D2 remote-cache benchmark), and surfaced 9 specific weaknesses in v3.11 itself. Q-022 lock paused; this v3.12 amendment lands BEFORE Q-022’s final answer to keep the directive honest.

Why this is feedback, not just directive text

The 9 improvements address meta-failure modes that occurred under v3.11 in production. Codifying as feedback (auto-loaded discipline) ensures:

Future Claude sessions internalise WHY each improvement exists, not just WHAT it says.
If a future session is tempted to skip pre-flight, excuse missing substrate via intellectual-honesty, optimistically estimate reversal cost, or pattern-match to consecutive-bold-synthesis, the discipline-memory fires AHEAD of the directive’s procedural rule.
The 9 improvements together represent a coherent enforcement-and-honesty package; treating them as separate disciplines would dilute the package narrative.

The 9 improvements

#	Discipline	Where it lands	Severity	Failure mode it prevents
1	Substrate pre-flight gate	step 6 (new sub-paragraph after intro)	MED enforcement	Substrate-skip masquerading as intellectual-honesty (Q-022 ran with 2 web tracks + S1 codebase analysis — no halt)
2	Option commitment classification (DISCIPLINE vs CONCRETE-ARTEFACT-AUTHORING)	step 4d (new)	MED enforcement	Phase-leak via option commitments straddling phase boundaries (Q-022 σ̃.η committed Phase-5 artefact-authoring under Phase-2 lock)
3	Symmetric ÷2 perturbation on most-relied-on criterion	step 7 (extended sensitivity perturbations bullet)	MED structural	Single-criterion over-reliance hidden by ×2-amplification asymmetry in 8-perturbation default
4	Reversal-cost realism guard (deployed-state / DD-commitments / consumer-impacts)	step 7 (extended Reversal cost bullet)	MED honesty	Optimistic reversal-cost claims; £0 marginal claims used to look cheap when deployed infra makes them costly
5	NEW INFERENCE-NOTE TRANSIENT sub-mode (`outcome-VALIDATED-WITH-INFERENCE-NOTE`)	step 6f maturity vocabulary (2 → 3 sub-modes)	MED integrity	Inferred-vs-cited substrate orphaning verification follow-up (Q-022 S1 returned VALIDATED with gate names INFERRED rather than cited)
6	sub_clarifications_locked discipline codification (composite options must be expressible as named-key dictionary)	step 7 (new bullet)	MED codification	Composite-option lock-time ambiguity from emergent-but-uncodified discipline (Q-006 onward locked sub_clarifications without formal directive requirement)
7	N≥5 consecutive-bold-synthesis counter-pattern check	step 9 (extended addendum)	MED anti-bias	Bold-synthesis pattern-match bias (TWELVE consecutive bold-synthesis locks Q-006..Q-021 passed without explicit counter-pattern check)
8	R3a/R3b ramp-up-vs-steady-state cost split	step 7 (extended R3 row)	LOW-MED honesty	Build-amortised vs steady-state cost conflation (Q-022 D2 found σ̃.η R3 over-estimated 30-100×)
9	Phase-N+1 uplift trigger 3-field codification (metric / cadence / artefact)	step 7 (new bullet)	LOW completeness	Vague Phase-N+1 uplift triggers without metric + cadence + artefact spec

Why this discipline pattern works

Why: v3.11 was a tier-collapse (single specific change with substantial restructure implications); production use over 13 consecutive locks (Q-006..Q-021) validated the uniform default empirically — but also exposed enforcement / honesty / structural gaps that v3.11 didn’t gate. The 9 improvements address meta-failure modes that occurred under v3.11 in production. Bundling them as v3.12 (rather than v3.11.1) reflects that they coherently address “v3.11’s gaps under production use” as a single narrative.

How to apply: when authoring or reviewing a Q-framing under v3.12+, run the 9 disciplines INTEGRATED, not as a checklist:

Improvement #1 (substrate pre-flight) fires BEFORE step 7 framing; if it halts, complete substrate then re-enter.
Improvement #2 (option commitment classification) fires DURING option authoring, before scoring.
Improvements #3, #4, #6, #8, #9 fire DURING scorecard authoring as quality checks.
Improvement #5 fires DURING spike-prompt authoring + at outcome-rating time.
Improvement #7 fires AFTER scoring + recommendation, BEFORE final reasoning summary.

What this discipline does NOT do

Does NOT alter v3.11’s Harvard-depth uniform default (load-bearing; recently locked).
Does NOT alter step 6 sub-letters 6a-6g (verified clean; preserved).
Does NOT bundle F4 (closing-question boundary test) or F6 (research-completeness scorecard) — both deferred to v3.13 because their specs are still unwritten.
Does NOT retroactively re-rate prior locks Q-001..Q-021 — they stand under v3.6..v3.11 as recorded; v3.12 applies prospectively.

Empirical evidence cited inline (in directive §3 v3.12 evolution history entry)

Q-022 framing under v3.11 ran with 2 web tracks + S1 codebase analysis; pre-flight halt did not fire because v3.11 had no formal pre-flight gate. → improvement #1.
Q-022 σ̃.η option committed to “master plan §15 amendment + per-gate fitness-function manifesto” under a Phase-2 lock — both Phase-5 territory. → improvement #2.
Q-022 8-perturbation set was structurally biased toward ×2 amplification (only c8 ÷2). → improvement #3.
Q-022 σ̃.η→σ̃.γ degrade estimate “3-5 days” implausibly optimistic given Phase-1 deployed Bazel + acquirer-DD §3.4 + partner-firm SDK consumer impacts. → improvement #4.
Q-022 S1 spike returned VALIDATED on lock-BLOCKING gate-DAG analysis but gates 1-22 baseline names were INFERRED rather than directly cited from canonical A-21 §13 row enumeration. → improvement #5.
Q-006 onward locked sub_clarifications blocks but the directive didn’t formally require it. → improvement #6.
TWELVE consecutive bold-synthesis locks Q-006..Q-021 passed without explicit counter-pattern check. → improvement #7.
Q-022 D2 spike found σ̃.η R3 over-estimated 30-100× because the original framing bundled build + run into single £K/yr figure. → improvement #8.
Phase-1.5+ uplift triggers in Q-013 / Q-015 / Q-019 lacked formal 3-field minimum (metric / cadence / artefact). → improvement #9.

What this means for future Q-framings

When Rich pastes the v3.12+ refined-prompt at the close of a Q-locking answer:

Step 6 ends with a “Substrate Pre-Flight” 3-line block BEFORE the Research Sweep Summary table. If this block is missing, that’s a v3.12 violation.
Step 7 SOURCE-ATTRIBUTION TABLE is followed by anti-bias scan + trade-off examination + composability + cost decomposition (existing) THEN by step 4d’s option commitment classification check (NEW v3.12).
Each option’s scorecard row carries R1, R2, R3 (or R3a + R3b if split applies), R4, R5, R6 cost rows.
Each option’s reversal cost row is scrutinised against deployed-state / DD-commitments / consumer-impacts.
8-perturbation set includes ≥1 symmetric ÷2 on the most-relied-on criterion.
If recommendation is composite (≥3 components OR ≥2 substrates), sub_clarifications_locked: block is pre-authored with named keys.
If recommendation includes Phase-N+1 uplift triggers, each trigger has metric + cadence + artefact spec.
If THIS lock would be the Nth consecutive bold-synthesis lock with N ≥ 5, step 9 reasoning explicitly runs counter-pattern check.
Step 6f maturity vocabulary table now lists 3 TRANSIENT sub-modes (METHODOLOGICAL-SUBSTITUTION + PROVISIONING-NOTE + INFERENCE-NOTE).

Substrate documents

Directive (v3.12): docs/superpowers/specs/2026-04-29-multi-phase-audit/refined-end-of-turn-directive.md
Predecessor memory: project_refined_prompt_v3_11_released_harvard_uniform_2026_05_05.md (this dir)
Predecessor discipline memory: feedback_harvard_depth_uniform_default_for_all_questions.md (this dir; v3.11)

CHANGELOG

2026-05-05T08:38 — feedback memory authored at v3.12 release. Codifies the 9 improvements as feedback discipline (not just directive text) so future sessions internalise WHY each improvement exists. Composes with v3.11’s feedback_harvard_depth_uniform_default_for_all_questions — v3.11 set Harvard-depth as uniform substrate; v3.12 hardens enforcement / honesty / structural gates around it.

TT Claude Memory

Explorer

feedback_v3_12_release_9_improvements_2026_05_05