Harvard-depth research is the uniform default for ALL Qs

Locked: 2026-05-05 BST per Rich directive “i want Harvard depth for all questions.”

Codified in: refined-end-of-turn-directive v3.11 (2026-05-05T06:10 BST; commit 254afb0).

The rule

Every audit Q under refined-prompt v3.11+ gets the full Harvard-research-professor depth treatment, regardless of perceived tactical vs foundational tier. There is no “tactical Q ~15-20 min” carve-out.

Specifically:

Persona: adopt the persona of a Harvard research professor who has dedicated themselves to this specific question.
Mode: hypothesis-driven multi-track parallel investigation; cross-domain analogy scan; library-as-pattern-source; production-grade verification at each component-claim.
Time-box: ~30-45 min minimum per Q; ~45-60 min for foundational architectural Qs (Phase-2 substrate-architectural; Phase-5 closure-prep; any Q where bold-synthesis-preference fires strong).
Step 6b TIER-2 PGVECTOR QUERY: MANDATORY for every Q (run python3 query_library.py "<keyword>" --k 8 --type t-file).
Step 6e WEB RESEARCH: 3-5 parallel tracks for every Q; 5-7 for foundational substrate-architectural decisions.
Step 7 9th synthesis-via-composition option: include explicitly when it surfaces (no longer gated to “Harvard-depth Qs” only).

Why: Empirically observed effects of Harvard-depth

Two empirical observations in the 24-hour window 2026-05-04 → 2026-05-05 demonstrated that Harvard-depth pays for itself even on apparently-tactical Qs:

Q-014 V5+ alignment-axiom verify-tier (2026-05-05T05:44 BST)

Original framing under refined-prompt v3.9 used 14 criteria + ~45-min research + reached Wsum margin 0.21 (borderline).
Rich-fragility-signal “there seems to be a lot of uncertainty” triggered manual escalation to Harvard-depth research (per v3.2 ESCALATE auto-trigger (i) — but Rich invoked it manually, signalling the auto-trigger pattern was insufficient on its own).
Harvard-depth re-framing extended the frame to 16 criteria (+c4b sssom-java compatibility + c15 A-20 IRI-verification-scar) and lifted margin to 0.38 ROBUST — ~81% growth in margin.
Three substrate-correcting findings emerged: (i) curation_rule is MappingSet-level NOT per-row; (ii) extension_definitions YAML metadata is preserved by sssom-py 0.4.19; (iii) sssom-java --extensions-to-other round-trips losslessly.
Mini-spike outcome-VALIDATED end-to-end pipeline parse → validate → convert → ROBOT reason ALL stages PASS.
Result: V5+ locked at 4.69 margin 0.38 ROBUST; A-21 CI gates 25 → 27 (TWO new gates in single lock-event); A-158; arch-state v3.53.

Q-015 λ.η Wills follow-through (2026-05-05 BST; Rich explicitly asked for Harvard-depth)

Spike 5 substrate (2026-05-03) had locked λ.ζ at margin 0.41 with 14 criteria. Under refined-prompt v3.7 substrate-only (no Harvard-depth re-research), λ.ζ would have stood as Spike 5’s recommendation.
Rich’s explicit Harvard-depth directive triggered re-research that surfaced 4 post-Spike-5 substrate-correcting findings the spike substrate could not have known:
1. EUDI ARF advanced from Spike 5’s v1.4 → v2.8.0 today (~7 month spec advancement; end-of-2026 EU member state wallet deadline)
2. W3C VC 2.0 family Recs — JOSE/COSE securing pathway is now W3C Rec (May 2025) alongside SD-JWT VC + Data Integrity (substrate cited only SD-JWT VC)
3. Render Method + Confidence Method publishing as Recommendations September 2026 (substrate didn’t reference upcoming Recs)
4. Government response to Law Commission Modernising Wills Law Final Report deadline 16 May 2026 = T-11 days from now (Phase-1.5+ trigger may fire during Phase-1 build’s first sprint)
Plus the Q-005..Q-014 cascade locks (κ.δ + R.δ + ι.ε + π.ζ + Δ.η + V5+) all post-date Spike 5.
Frame extended 14 → 17 criteria (+c16 EUDI ARF v2.8.0 + W3C VC 2.0 family conformance; +c17 Q-005..Q-014 cascade alignment).
NEW λ.η bold-synthesis option emerged that consumes Q-005 + Q-008 + Q-009 + Q-012 + Q-013 + Q-014 cascade Day-1 — composes 7 production-grade patterns. Won at Wsum 4.88 / margin 0.29 ROBUST vs λ.ζ runner-up 4.59. 7/8 perturbations favour λ.η.
Average Wsum across 7 options 3.69 (lifted from Spike 5’s ~3.40 estimate via research-grounded scoring + 17-criterion frame).

Why uniform: the reasoning

v3.2 (2026-05-02) deliberately introduced a two-tier carve-out (~30-45 min foundational / ~15-20 min tactical) to manage time-cost on routine Qs. v3.11 reverses that based on the empirical evidence above:

Substrate-correcting-findings effect operates on apparently-tactical Qs too. Tactical-mode 1-2 web searches min cannot surface ARF version advances or W3C Rec advancements unless they happen to match a topic-mapping keyword. Q-015 demonstrated the cost of missing them: Spike 5’s 4.83/0.41 substrate would have locked λ.ζ; under uniform Harvard-depth, λ.η wins by consuming the cascade.
The auto-trigger list was incomplete. v3.2’s (i)-(iv) auto-escalation criteria failed to fire on Q-015 (Spike 5 had derisked the Q already; Rich’s “Harvard-depth please” directive bypassed auto-detection). Adding (v) Rich-explicit-request would patch the immediate gap, but uniformly defaulting to Harvard-depth eliminates the gap entirely.
Cost-benefit is empirically POSITIVE under uniform default. Net +10-25 min per previously-tactical Q. Substrate-correcting findings prevent Phase-1.5+ retrofit costs (~£2-5K each per finding) that compound over the build’s life.

When this rule applies

Every audit Q under refined-prompt v3.11+ across all phases (Phase 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9).
Every Q-NU-NNN cascade-Q that promotes to formal Q-NNN under refined-prompt v3.11+.
All future ζ-Q ask cycles + per-module Phase-3 ask cycles + per-jurisdiction Phase-4 ask cycles + per-brand Phase-6+ ask cycles.

Anti-patterns

Treating “tactical Q” as a license to skip Tier-2 pgvector / 3-5 parallel tracks: under v3.11 every Q gets the full treatment. The phrase “this is just a tactical Q” is a signal to STOP and apply Harvard-depth, not a justification for skipping.
Reading the prior v3.2 ESCALATE auto-trigger list as still active: it’s RETIRED in v3.11 (criteria are moot since they always fire under uniform default). The retirement is documented in the directive frontmatter lastmod prose.
Skipping Tier-2 pgvector “for time”: the 2-5 min Tier-2 pgvector query is the cheapest source of substrate-correcting findings. It must run for every Q.
Stopping at 1-2 web searches: the v3.11 minimum is 3-5 parallel tracks (5-7 for foundational substrate-architectural).

How to apply

When framing any Q under v3.11:

State the Q topic + phase + Q-count status (steps 1-5 of directive).
Run the Research Sweep (step 6) UNIFORMLY:
- 6a T-files topic-mapping
- 6b Tier-2 pgvector query (MANDATORY)
- 6c Library check
- 6d Memory check (two passes: retrieval-discipline first, content second)
- 6e Web research (3-5 parallel tracks; 5-7 for foundational)
- 6f Inline-mandatory-before-framing spike if applicable
- 6g Amazon check if no other coverage
Output Research Sweep Summary BEFORE step 7.
Frame the question (step 7) using all Harvard-depth findings + 17-criterion frame extension if applicable.
Apply step 8 decision-quality-check + step 9 intellectual-honesty addendum (acknowledging if Harvard-depth surfaced substrate-correcting findings).
If a SPIKE-CANDIDATE SCAN bullet (step 7 v3.10+ F-spike-loop) surfaces an evidence-thin question, surface as BLOCKING/DEFERABLE candidate per F-spike-loop discipline.

Time-cost expectations

Q tier	Pre-v3.11 time-box	v3.11 time-box	Delta
Tactical	~15-20 min	~30-45 min	+10-25 min
Substantive	~30-45 min	~30-45 min	unchanged
Foundational architectural	~30-45 min	~45-60 min	+0-15 min

For remaining ~60-90 audit Qs across Phase 2 closure + Phase 3-9: ~+10-30 hours total wall-clock cost, bounded.

CHANGELOG

2026-05-05T06:15 — feedback memory authored at the close of the v3.11 release commit cascade. Codifies the discipline + Q-014 V5+ + Q-015 λ.η empirical evidence + reasoning + anti-patterns + how-to-apply. Companion to refined-prompt v3.11 directive amendment (commit 254afb0).

TT Claude Memory

Explorer

feedback_harvard_depth_uniform_default_for_all_questions