feedback v3.12 release — 9 improvements bundled per Q-022 /review-plan findings
Locked: 2026-05-05T08:38 BST. Directive v3.11 → v3.12 amendment landed at docs/superpowers/specs/2026-04-29-multi-phase-audit/refined-end-of-turn-directive.md v3.12.
Trigger: Parallel session 2026-05-05T07:30 BST framed ζ-Q22 Phase-1 gate sequence dependency under refined-prompt v3.11, ran 3 spikes (S1 lock-BLOCKING gate-DAG analysis + D1 Bazel cost research + D2 remote-cache benchmark), and surfaced 9 specific weaknesses in v3.11 itself. Q-022 lock paused; this v3.12 amendment lands BEFORE Q-022’s final answer to keep the directive honest.
Why this is feedback, not just directive text
The 9 improvements address meta-failure modes that occurred under v3.11 in production. Codifying as feedback (auto-loaded discipline) ensures:
- Future Claude sessions internalise WHY each improvement exists, not just WHAT it says.
- If a future session is tempted to skip pre-flight, excuse missing substrate via intellectual-honesty, optimistically estimate reversal cost, or pattern-match to consecutive-bold-synthesis, the discipline-memory fires AHEAD of the directive’s procedural rule.
- The 9 improvements together represent a coherent enforcement-and-honesty package; treating them as separate disciplines would dilute the package narrative.
The 9 improvements
| # | Discipline | Where it lands | Severity | Failure mode it prevents |
|---|---|---|---|---|
| 1 | Substrate pre-flight gate | step 6 (new sub-paragraph after intro) | MED enforcement | Substrate-skip masquerading as intellectual-honesty (Q-022 ran with 2 web tracks + S1 codebase analysis — no halt) |
| 2 | Option commitment classification (DISCIPLINE vs CONCRETE-ARTEFACT-AUTHORING) | step 4d (new) | MED enforcement | Phase-leak via option commitments straddling phase boundaries (Q-022 σ̃.η committed Phase-5 artefact-authoring under Phase-2 lock) |
| 3 | Symmetric ÷2 perturbation on most-relied-on criterion | step 7 (extended sensitivity perturbations bullet) | MED structural | Single-criterion over-reliance hidden by ×2-amplification asymmetry in 8-perturbation default |
| 4 | Reversal-cost realism guard (deployed-state / DD-commitments / consumer-impacts) | step 7 (extended Reversal cost bullet) | MED honesty | Optimistic reversal-cost claims; £0 marginal claims used to look cheap when deployed infra makes them costly |
| 5 | NEW INFERENCE-NOTE TRANSIENT sub-mode (outcome-VALIDATED-WITH-INFERENCE-NOTE) | step 6f maturity vocabulary (2 → 3 sub-modes) | MED integrity | Inferred-vs-cited substrate orphaning verification follow-up (Q-022 S1 returned VALIDATED with gate names INFERRED rather than cited) |
| 6 | sub_clarifications_locked discipline codification (composite options must be expressible as named-key dictionary) | step 7 (new bullet) | MED codification | Composite-option lock-time ambiguity from emergent-but-uncodified discipline (Q-006 onward locked sub_clarifications without formal directive requirement) |
| 7 | N≥5 consecutive-bold-synthesis counter-pattern check | step 9 (extended addendum) | MED anti-bias | Bold-synthesis pattern-match bias (TWELVE consecutive bold-synthesis locks Q-006..Q-021 passed without explicit counter-pattern check) |
| 8 | R3a/R3b ramp-up-vs-steady-state cost split | step 7 (extended R3 row) | LOW-MED honesty | Build-amortised vs steady-state cost conflation (Q-022 D2 found σ̃.η R3 over-estimated 30-100×) |
| 9 | Phase-N+1 uplift trigger 3-field codification (metric / cadence / artefact) | step 7 (new bullet) | LOW completeness | Vague Phase-N+1 uplift triggers without metric + cadence + artefact spec |
Why this discipline pattern works
Why: v3.11 was a tier-collapse (single specific change with substantial restructure implications); production use over 13 consecutive locks (Q-006..Q-021) validated the uniform default empirically — but also exposed enforcement / honesty / structural gaps that v3.11 didn’t gate. The 9 improvements address meta-failure modes that occurred under v3.11 in production. Bundling them as v3.12 (rather than v3.11.1) reflects that they coherently address “v3.11’s gaps under production use” as a single narrative.
How to apply: when authoring or reviewing a Q-framing under v3.12+, run the 9 disciplines INTEGRATED, not as a checklist:
- Improvement #1 (substrate pre-flight) fires BEFORE step 7 framing; if it halts, complete substrate then re-enter.
- Improvement #2 (option commitment classification) fires DURING option authoring, before scoring.
- Improvements #3, #4, #6, #8, #9 fire DURING scorecard authoring as quality checks.
- Improvement #5 fires DURING spike-prompt authoring + at outcome-rating time.
- Improvement #7 fires AFTER scoring + recommendation, BEFORE final reasoning summary.
What this discipline does NOT do
- Does NOT alter v3.11’s Harvard-depth uniform default (load-bearing; recently locked).
- Does NOT alter step 6 sub-letters 6a-6g (verified clean; preserved).
- Does NOT bundle F4 (closing-question boundary test) or F6 (research-completeness scorecard) — both deferred to v3.13 because their specs are still unwritten.
- Does NOT retroactively re-rate prior locks Q-001..Q-021 — they stand under v3.6..v3.11 as recorded; v3.12 applies prospectively.
Empirical evidence cited inline (in directive §3 v3.12 evolution history entry)
- Q-022 framing under v3.11 ran with 2 web tracks + S1 codebase analysis; pre-flight halt did not fire because v3.11 had no formal pre-flight gate. → improvement #1.
- Q-022 σ̃.η option committed to “master plan §15 amendment + per-gate fitness-function manifesto” under a Phase-2 lock — both Phase-5 territory. → improvement #2.
- Q-022 8-perturbation set was structurally biased toward ×2 amplification (only c8 ÷2). → improvement #3.
- Q-022 σ̃.η→σ̃.γ degrade estimate “3-5 days” implausibly optimistic given Phase-1 deployed Bazel + acquirer-DD §3.4 + partner-firm SDK consumer impacts. → improvement #4.
- Q-022 S1 spike returned VALIDATED on lock-BLOCKING gate-DAG analysis but gates 1-22 baseline names were INFERRED rather than directly cited from canonical A-21 §13 row enumeration. → improvement #5.
- Q-006 onward locked sub_clarifications blocks but the directive didn’t formally require it. → improvement #6.
- TWELVE consecutive bold-synthesis locks Q-006..Q-021 passed without explicit counter-pattern check. → improvement #7.
- Q-022 D2 spike found σ̃.η R3 over-estimated 30-100× because the original framing bundled build + run into single £K/yr figure. → improvement #8.
- Phase-1.5+ uplift triggers in Q-013 / Q-015 / Q-019 lacked formal 3-field minimum (metric / cadence / artefact). → improvement #9.
What this means for future Q-framings
When Rich pastes the v3.12+ refined-prompt at the close of a Q-locking answer:
- Step 6 ends with a “Substrate Pre-Flight” 3-line block BEFORE the Research Sweep Summary table. If this block is missing, that’s a v3.12 violation.
- Step 7 SOURCE-ATTRIBUTION TABLE is followed by anti-bias scan + trade-off examination + composability + cost decomposition (existing) THEN by step 4d’s option commitment classification check (NEW v3.12).
- Each option’s scorecard row carries R1, R2, R3 (or R3a + R3b if split applies), R4, R5, R6 cost rows.
- Each option’s reversal cost row is scrutinised against deployed-state / DD-commitments / consumer-impacts.
- 8-perturbation set includes ≥1 symmetric ÷2 on the most-relied-on criterion.
- If recommendation is composite (≥3 components OR ≥2 substrates),
sub_clarifications_locked:block is pre-authored with named keys. - If recommendation includes Phase-N+1 uplift triggers, each trigger has metric + cadence + artefact spec.
- If THIS lock would be the Nth consecutive bold-synthesis lock with N ≥ 5, step 9 reasoning explicitly runs counter-pattern check.
- Step 6f maturity vocabulary table now lists 3 TRANSIENT sub-modes (METHODOLOGICAL-SUBSTITUTION + PROVISIONING-NOTE + INFERENCE-NOTE).
Substrate documents
- Directive (v3.12):
docs/superpowers/specs/2026-04-29-multi-phase-audit/refined-end-of-turn-directive.md - Predecessor memory:
project_refined_prompt_v3_11_released_harvard_uniform_2026_05_05.md(this dir) - Predecessor discipline memory:
feedback_harvard_depth_uniform_default_for_all_questions.md(this dir; v3.11)
CHANGELOG
- 2026-05-05T08:38 — feedback memory authored at v3.12 release. Codifies the 9 improvements as feedback discipline (not just directive text) so future sessions internalise WHY each improvement exists. Composes with v3.11’s
feedback_harvard_depth_uniform_default_for_all_questions— v3.11 set Harvard-depth as uniform substrate; v3.12 hardens enforcement / honesty / structural gates around it.