refined-prompt v3.10 → v3.11 released — Harvard-depth uniform default for ALL Qs

Released: 2026-05-05T06:10 BST (directive frontmatter lastmod); release commit 254afb0 on origin/main 2026-05-05T06:15 BST.

Trigger

Rich directive 2026-05-05 BST: “i want Harvard depth for all questions.”

Issued after Q-014 V5+ + Q-015 λ.η demonstrated Harvard-depth value empirically in the 24-hour window 2026-05-04 → 2026-05-05.

What shipped

Six edits across step 6 + step 7 + step 9 of the directive body, plus frontmatter / §1 heading / §3 evolution history / cost-benefit summary heading:

EditLocationChange
1Step 6 RESEARCH SWEEP introHarvard-depth MANDATORY for all Qs; tactical-Q tier removed; v3.2 ESCALATE-to-Harvard-depth-automatically (i)-(iv) trigger list RETIRED; uniform time-box ~30-45 min minimum / ~45-60 min foundational; empirical effects (a)-(d) cited inline
2Step 6b TIER-2 PGVECTOR QUERY conditionsWas conditional on (i)-(iii) OR-clauses; now MANDATORY for every Q
3Step 6b cite top 3-5 chunksWas “OPTIONAL for tactical / MANDATORY for Harvard-depth”; now MANDATORY for every Q
4Step 6e WEB RESEARCH headerWas “3-5 parallel for Harvard-depth / 1-2 searches min for tactical”; now “3-5 parallel for every Q / 5-7 for foundational substrate-architectural”
5Step 6e content (sub-clause ii)Same shift as header; “1-2 searches min for tactical Qs” carve-out retired
6Step 7 9th synthesis-via-composition optionDrop “for Harvard-depth Qs” qualifier; Q-015 λ.η added as second exemplar alongside ε.ι
7Step 9 INTELLECTUAL-HONESTY ADDENDUMWording shift “Harvard-depth escalation” → “Harvard-depth research surfacing substrate-correcting findings”; Q-015 λ.η example added

Plus:

  • Frontmatter version "3.10""3.11"; lastmod with v3.11 narrative; tags extended (harvard-depth-uniform-default, F-harvard-uniform)
  • §1 prompt heading bumped v3.10 → v3.11
  • §3 Evolution history v3.11 entry appended (with empirical evidence + reasoning + 6-edit summary)
  • Cost-benefit summary heading extended to include v3.11

Empirical evidence (cited inline in step 6)

Q-014 V5+ alignment-axiom verify-tier (2026-05-05T05:44 BST)

  • Original framing under refined-prompt v3.9 used 14 criteria + ~45-min research and reached margin 0.21 (borderline).
  • Rich-fragility-signal “there seems to be a lot of uncertainty” triggered manual escalation to Harvard-depth.
  • Harvard-depth re-framing extended frame to 16 criteria (+c4b sssom-java compatibility + c15 A-20 IRI-verification-scar) and lifted margin to 0.38 ROBUST~81% growth in margin.
  • Three substrate-correcting findings: curation_rule MappingSet-level NOT per-row; extension_definitions YAML metadata preserved by sssom-py 0.4.19; sssom-java —extensions-to-other round-trips losslessly.
  • Result: V5+ at Wsum 4.69 margin 0.38 ROBUST; A-21 CI gates 25 → 27 (TWO new gates); A-158; arch-state v3.53.

Q-015 λ.η Wills follow-through (2026-05-05 BST; Rich asked for Harvard-depth explicitly)

  • Spike 5 substrate (2026-05-03) had locked λ.ζ at margin 0.41 with 14 criteria. Rich explicitly asked for Harvard-depth re-research before Q-015 framing.
  • Harvard-depth surfaced 4 post-Spike-5 substrate-correcting findings:
    1. EUDI ARF advanced from Spike 5’s v1.4 → v2.8.0 (~7 month spec advancement; end-of-2026 EU member state wallet deadline)
    2. W3C VC 2.0 family — JOSE/COSE securing pathway now W3C Rec (May 2025) alongside SD-JWT VC + Data Integrity (substrate cited only SD-JWT VC)
    3. Render Method + Confidence Method publishing as Recommendations September 2026
    4. Government response to Law Commission Modernising Wills Law Final Report deadline 16 May 2026 = T-11 days from now (Phase-1.5+ trigger may fire mid-Phase-1 build)
  • Frame extended 14 → 17 criteria (+c16 EUDI ARF v2.8.0 + W3C VC 2.0 family conformance; +c17 Q-005..Q-014 cascade alignment).
  • NEW λ.η bold-synthesis option emerged consuming Q-005 + Q-008 + Q-009 + Q-012 + Q-013 + Q-014 cascade Day-1 — composes 7 production-grade patterns. Won at Wsum 4.88 / margin 0.29 ROBUST vs λ.ζ runner-up 4.59. 7/8 perturbations favour λ.η.
  • A-159; arch-state v3.54; Phase 2 architecture-options 15 substantive locks COMPLETE; Harvard-depth pattern validated TWICE in 24h window.

Why uniform: the reasoning

v3.2 (2026-05-02) deliberately introduced a two-tier carve-out (~30-45 min foundational / ~15-20 min tactical) to manage time-cost on routine Qs. v3.11 reverses on three grounds:

  1. Substrate-correcting-findings effect operates on apparently-tactical Qs too. Tactical-mode 1-2 web searches min cannot surface ARF version advances or W3C Rec advancements unless they happen to match a topic-mapping keyword. Q-015 demonstrated that Spike 5’s 4.83/0.41 substrate would have locked λ.ζ; under uniform Harvard-depth, λ.η wins by consuming the cascade.
  2. The auto-trigger list was incomplete. v3.2’s (i)-(iv) auto-escalation criteria failed to fire on Q-015 (Spike 5 had derisked the Q already; Rich’s “Harvard-depth please” directive bypassed auto-detection). Rather than patch with (v) Rich-explicit-request as a fifth trigger (my original proposal), uniformly defaulting to Harvard-depth eliminates the gap entirely.
  3. Cost-benefit is empirically POSITIVE under uniform default. Net +10-25 min per previously-tactical Q. Substrate-correcting findings prevent Phase-1.5+ retrofit costs (~£2-5K each per finding) that compound over the build’s life.

What’s NOT changed

  • Step 6 sub-letters 6a-6g UNCHANGED — verified sed -n '/^6\. /,/^7\. /p' refined-end-of-turn-directive.md | grep -cE '^ [a-z]\.' returns 7.
  • Total sub-letter count (step 4 + 6 + 8) = 14 unchanged.
  • Existing step 6f SPIKES preserved unchanged.
  • Step 7 SPIKE-CANDIDATE SCAN bullet (v3.10 / F-spike-loop) preserved unchanged.
  • Q-001..Q-014 prior locks UNAFFECTED — they stand under v3.6..v3.10 as recorded.

F4 + F6 status

F4 (closing-question boundary test) + F6 (research-completeness scorecard) remain queued separately for a future v3.12 bundle. They were NOT bundled into v3.11 because:

  • v3.11 is a tier-collapse (single specific change with substantial restructure implications); bundling F4 + F6 would mix tier-collapse with new-feature additions.
  • F4 + F6 specs are still not written.

First production use

ζ-Q16 = the next ζ-Q in the natural-sequence ask cycle.

Validation criteria (per the directive’s empirical-effects paragraph in step 6):

  1. Tier-2 pgvector query MANDATORY — must run for ζ-Q16 even if topic-mapping is rich.
  2. 3-5 parallel web tracks MANDATORY — must run even if T-files cover the topic strongly.
  3. Time-box ~30-45 min minimum — even on apparently-tactical Qs.
  4. Frame extension expected — Harvard-depth typically extends +1-3 criteria capturing newly-surfaced dimensions.
  5. Substrate-correcting findings expected on Qs whose substrate predates recent cascade locks.

Commit cascade

CommitSubject
254afb0audit: refined-prompt v3.10 → v3.11 — Harvard-depth uniform default for ALL Qs
(next)audit: v3.11 release record — Harvard-depth uniform default for ALL Qs

Substrate files

  • Directive (v3.11): docs/superpowers/specs/2026-04-29-multi-phase-audit/refined-end-of-turn-directive.md (commit 254afb0)
  • Audit-record (forthcoming v1.0): docs/superpowers/specs/2026-04-29-multi-phase-audit/audit-records/2026-05-05-v3.11-release.md
  • Codified discipline: feedback_harvard_depth_uniform_default_for_all_questions.md in this memory dir

CHANGELOG

  • 2026-05-05T06:20 — memory file authored at the close of v3.11 release commit cascade. Captures trigger / what-shipped / empirical evidence (Q-014 V5+ + Q-015 λ.η) / reasoning / what’s-not-changed / F4-F6 status / first-production-use trigger / commit cascade. Companion to feedback_harvard_depth_uniform_default_for_all_questions.md + audit-record.