Recursive review-and-revise pattern (session-validated 2026-05-25)

Rule: For any substantive substrate (design spec / launch-prompt / convention change / new SKILL): EVERY artefact should pass through three checkpoints before “approved”:

/review-plan agent dispatch — independent critical review by general-purpose agent that has NOT seen the authoring session’s context. Catches blind spots the author can’t see.
Visible-evidence forcing function — ## §0 Pre-authoring checklist results body section with verbatim command + verbatim output (per upgraded feedback_grep_workspace_before_authoring_substrate 5-item checklist). Frontmatter claim alone is insufficient — empirically failed twice this session.
Empirical spike (where decision is non-trivial) — install + test the actual tool/standard against real TT substrate before committing to it. Rich-rejected “first candidates” 2026-05-25T~14:00 BST: “we will not rush into accepting the first candidates”.

Why (empirical evidence from 2026-05-25 ~5h session):

Recursive iteration absorbed more TT-custom into community standards at each pass: v1 (MADR overlay) → v2 (Structured MADR; rejected on 5-star single-maintainer signal) → v3 (3-layer ISO) → v4 (5-tier with AGENTS.md) → v5 (9-layer institutional) → v6 (FAIR/W3C-anchored). TT-internal LoC claim went from ~700 → ~50 → eventually ~10 (claimed) → ~400-600 (honest after audit). Without recursive review, v1 would have shipped.
/review-plan caught what self-checking didn’t: the design spec v1.0 was authored with explicit superpowers:brainstorming + verifying-before-author + grep-workspace-before-authoring-substrate memory cited — yet still had 5 HIGH + 6 MEDIUM findings on independent review including the structural collision (frontmatter-conventions.md was already at v1.4 when v1.0 proposed creating v1.4). Self-checking has systematic blind spots that independent review surfaces.
Visible-evidence forcing function: the launch-prompt v1.0 claimed checklist application in honest_substrate_read_declaration: frontmatter but item 4 (tool sanity check) was punted to fresh session — a dodge. The /review-plan caught this. v1.1 required BODY §0 section with verbatim install + run output; the dodge became impossible.
Spike empirical validation prevented bad adoptions: of 6 candidate standards spiked, 5 had non-trivial adoption frictions only visible from empirical use. 3 of the 5 would have been wrong adoptions if the design spec had locked them without spiking (memory-md wrong abstraction; architecture-guard wrong domain; mcp_agent_mail license-incompatible).

How to apply (substrate-authoring workflow):

For any substantive substrate authoring (design spec / launch-prompt / new convention / arch-state amendment / standards adoption decision):

Author v1.0 with explicit honest_substrate_read_declaration: frontmatter + visible ## §0 Pre-authoring checklist results body section per the 5-item checklist (target read; LoC count; entity enumeration; tool sanity check actually run; cross-ref resolution).
Dispatch /review-plan agent before claiming “approved” status. Frame the review brief to surface what the author may have missed: empirical claims, scope overreach/underreach, candidate selections that may have better SOTA alternatives, missing rollback paths.
Apply review findings in v1.1 with §“Review findings” audit-trail section preserving the v1.0 critique verbatim. v1.0 stays as audit record; v1.1 supersedes.
If candidate-selection is non-trivial, dispatch a spike-suite before adoption: each candidate gets a closure-bundle T-file with hypothesis + method + verbatim evidence + outcome (VALIDATED / VALIDATED-WITH-NOTE / FALSIFIED). Aggregate findings into a single decision-substrate doc with COMMIT / DEFER / REJECT verdicts per candidate.
Only after spike findings, author the roadmap / execution launch-prompt / engagement programme that consumes the validated decisions.

Cost: ~5-10% premium on authoring time (e.g., 1.5h author + 0.5h review + revise = 2h vs 1.5h author-and-ship). Spike-suite is bigger investment (~5-7h for 6 spikes) but only when candidate-selection is non-trivial.

ROI: prevents the lock-then-revise cycles that consumed ~24h of substrate-churn earlier in this 30h window (PE-47 v1.13/v1.14/v1.15/v1.16 cascade; workspace-rename dispatch v1.0 needs-revision; frontmatter-standards-stack design v1.0 needs-revision).

Empirical anti-patterns observed (STOP if you catch yourself doing these):

Authoring a substrate body BEFORE running the 5-item checklist on real workspace state — the discipline-citation pattern (memory referenced in frontmatter) is insufficient; the discipline must produce visible verifiable artefacts.
First-pass candidate selection without alternatives research — the v1.0 design spec picked W3C OWL backwardCompatibleWith for floor-pin without checking semantics (it’s semantically inverted); first-pass picks are systematically too narrow.
Committing to a candidate based on README description without empirical install + use — memory-md’s “decision-record” framing in its README is technically true but its abstraction differs from TT’s Q-NNN cascade-Q pattern; only spike install revealed the structural mismatch.
Treating /review-plan as optional — even with explicit grep-discipline + verifying-before-author + brainstorming skills loaded, self-checking has blind spots independent review surfaces.
Marking spec status approved without spike validation when adoption is non-trivial — the original “10-tier institutional standards stack” claim would have shipped 4-6 wrong adoptions without empirical spiking.

Related disciplines:

grep-workspace-before-authoring-substrate (5-item pre-authoring checklist; this memory PRESUMES that discipline is applied AND extended via the §0 body section forcing function)
verify-before-author (6-method toolkit for verifying claims before lock; spike-suite is method 7 — empirical-via-tool-install)
scorecards-one-at-a-time-optimal-sequence (knock-on-effect sequencing; this memory adds: review checkpoint AFTER each scorecard / option-set, not just at the end)
concurrent-burst-race-condition-count-24h (file-edit race issue that motivated QW-4; example of empirical evidence accumulating that justifies a discipline change)
research-artefact-forward-traceability (research artefacts get bidirectional cite-back at authoring; recursive review adds: at-revision cite-back too)
batch-compression-lowers-defer-threshold (batch compression makes “do now” cheaper than “defer”; spike-suite is the cousin: empirical-validate now beats commit-and-revise later)

Session evidence (commits to verify the empirical claim):

Design spec v1.0 → review findings → v1.1 launch-prompt cycle: commits fc24857 (spec v1.0 approved) → a3ed284 (spec v1.0 needs-revision after review) → 57fe0b5 (launch-prompt v1.0) → b2b9cf7 (launch-prompt v1.1 after review)
Spike-suite dispatch → empirical findings → adoption decisions cycle: commit 5ea604f (spike-suite launch-prompt v1.0) → f273d65 (findings v1.0 with 3 COMMIT + 5 DEFER + 1 REJECT)
QW-4 buy-not-build win: commit 8889b6e + 6 follow-on commits (b256cf1, 00e6b83, c444faa, d1f402e, 275af02, 2c9369d) — example of an adoption that DIDN’T need a spike because the SOTA candidate (native Claude Code TaskList) was already in production use

Cross-reference for spike-driven decisions:

Spike-suite findings doc: docs/superpowers/specs/2026-05-25-standardisation-spike-suite-findings.md v1.1 (commit f273d65)
Per-spike T-files: ~/off-github/library/projects/inherit/T-spike-sN-*-2026-05-25.md (6 files)

Strategic positioning:

This pattern aligns with Rich’s operator-first frame (per project_strategic_position_operator_first_2026_05_22): community-standard SOTA adoption is buy-not-build; TT-custom invention only where genuine frontier exists. The recursive review-and-revise loop is the discipline that prevents committing to wrong buys + prevents over-investing in build-from-scratch when a buy candidate validates.

TT Claude Memory

Explorer

feedback_recursive_review_and_revise_pattern_2026_05_25

Recursive review-and-revise pattern (session-validated 2026-05-25)

Graph View

Backlinks