Recursive review-and-revise pattern (session-validated 2026-05-25)
Rule: For any substantive substrate (design spec / launch-prompt / convention change / new SKILL): EVERY artefact should pass through three checkpoints before “approved”:
- /review-plan agent dispatch — independent critical review by general-purpose agent that has NOT seen the authoring session’s context. Catches blind spots the author can’t see.
- Visible-evidence forcing function —
## §0 Pre-authoring checklist resultsbody section with verbatim command + verbatim output (per upgradedfeedback_grep_workspace_before_authoring_substrate5-item checklist). Frontmatter claim alone is insufficient — empirically failed twice this session. - Empirical spike (where decision is non-trivial) — install + test the actual tool/standard against real TT substrate before committing to it. Rich-rejected “first candidates” 2026-05-25T~14:00 BST: “we will not rush into accepting the first candidates”.
Why (empirical evidence from 2026-05-25 ~5h session):
-
Recursive iteration absorbed more TT-custom into community standards at each pass: v1 (MADR overlay) → v2 (Structured MADR; rejected on 5-star single-maintainer signal) → v3 (3-layer ISO) → v4 (5-tier with AGENTS.md) → v5 (9-layer institutional) → v6 (FAIR/W3C-anchored). TT-internal LoC claim went from ~700 → ~50 → eventually ~10 (claimed) → ~400-600 (honest after audit). Without recursive review, v1 would have shipped.
-
/review-plan caught what self-checking didn’t: the design spec v1.0 was authored with explicit
superpowers:brainstorming+verifying-before-author+grep-workspace-before-authoring-substratememory cited — yet still had 5 HIGH + 6 MEDIUM findings on independent review including the structural collision (frontmatter-conventions.mdwas already at v1.4 when v1.0 proposed creating v1.4). Self-checking has systematic blind spots that independent review surfaces. -
Visible-evidence forcing function: the launch-prompt v1.0 claimed checklist application in
honest_substrate_read_declaration:frontmatter but item 4 (tool sanity check) was punted to fresh session — a dodge. The /review-plan caught this. v1.1 required BODY §0 section with verbatim install + run output; the dodge became impossible. -
Spike empirical validation prevented bad adoptions: of 6 candidate standards spiked, 5 had non-trivial adoption frictions only visible from empirical use. 3 of the 5 would have been wrong adoptions if the design spec had locked them without spiking (memory-md wrong abstraction; architecture-guard wrong domain; mcp_agent_mail license-incompatible).
How to apply (substrate-authoring workflow):
For any substantive substrate authoring (design spec / launch-prompt / new convention / arch-state amendment / standards adoption decision):
-
Author v1.0 with explicit
honest_substrate_read_declaration:frontmatter + visible## §0 Pre-authoring checklist resultsbody section per the 5-item checklist (target read; LoC count; entity enumeration; tool sanity check actually run; cross-ref resolution). -
Dispatch /review-plan agent before claiming “approved” status. Frame the review brief to surface what the author may have missed: empirical claims, scope overreach/underreach, candidate selections that may have better SOTA alternatives, missing rollback paths.
-
Apply review findings in v1.1 with §“Review findings” audit-trail section preserving the v1.0 critique verbatim. v1.0 stays as audit record; v1.1 supersedes.
-
If candidate-selection is non-trivial, dispatch a spike-suite before adoption: each candidate gets a closure-bundle T-file with hypothesis + method + verbatim evidence + outcome (VALIDATED / VALIDATED-WITH-NOTE / FALSIFIED). Aggregate findings into a single decision-substrate doc with COMMIT / DEFER / REJECT verdicts per candidate.
-
Only after spike findings, author the roadmap / execution launch-prompt / engagement programme that consumes the validated decisions.
Cost: ~5-10% premium on authoring time (e.g., 1.5h author + 0.5h review + revise = 2h vs 1.5h author-and-ship). Spike-suite is bigger investment (~5-7h for 6 spikes) but only when candidate-selection is non-trivial.
ROI: prevents the lock-then-revise cycles that consumed ~24h of substrate-churn earlier in this 30h window (PE-47 v1.13/v1.14/v1.15/v1.16 cascade; workspace-rename dispatch v1.0 needs-revision; frontmatter-standards-stack design v1.0 needs-revision).
Empirical anti-patterns observed (STOP if you catch yourself doing these):
- Authoring a substrate body BEFORE running the 5-item checklist on real workspace state — the discipline-citation pattern (memory referenced in frontmatter) is insufficient; the discipline must produce visible verifiable artefacts.
- First-pass candidate selection without alternatives research — the v1.0 design spec picked W3C OWL
backwardCompatibleWithfor floor-pin without checking semantics (it’s semantically inverted); first-pass picks are systematically too narrow. - Committing to a candidate based on README description without empirical install + use — memory-md’s “decision-record” framing in its README is technically true but its abstraction differs from TT’s Q-NNN cascade-Q pattern; only spike install revealed the structural mismatch.
- Treating /review-plan as optional — even with explicit grep-discipline + verifying-before-author + brainstorming skills loaded, self-checking has blind spots independent review surfaces.
- Marking spec status
approvedwithout spike validation when adoption is non-trivial — the original “10-tier institutional standards stack” claim would have shipped 4-6 wrong adoptions without empirical spiking.
Related disciplines:
- grep-workspace-before-authoring-substrate (5-item pre-authoring checklist; this memory PRESUMES that discipline is applied AND extended via the §0 body section forcing function)
- verify-before-author (6-method toolkit for verifying claims before lock; spike-suite is method 7 — empirical-via-tool-install)
- scorecards-one-at-a-time-optimal-sequence (knock-on-effect sequencing; this memory adds: review checkpoint AFTER each scorecard / option-set, not just at the end)
- concurrent-burst-race-condition-count-24h (file-edit race issue that motivated QW-4; example of empirical evidence accumulating that justifies a discipline change)
- research-artefact-forward-traceability (research artefacts get bidirectional cite-back at authoring; recursive review adds: at-revision cite-back too)
- batch-compression-lowers-defer-threshold (batch compression makes “do now” cheaper than “defer”; spike-suite is the cousin: empirical-validate now beats commit-and-revise later)
Session evidence (commits to verify the empirical claim):
- Design spec v1.0 → review findings → v1.1 launch-prompt cycle: commits
fc24857(spec v1.0 approved) →a3ed284(spec v1.0 needs-revision after review) →57fe0b5(launch-prompt v1.0) →b2b9cf7(launch-prompt v1.1 after review) - Spike-suite dispatch → empirical findings → adoption decisions cycle: commit
5ea604f(spike-suite launch-prompt v1.0) →f273d65(findings v1.0 with 3 COMMIT + 5 DEFER + 1 REJECT) - QW-4 buy-not-build win: commit
8889b6e+ 6 follow-on commits (b256cf1,00e6b83,c444faa,d1f402e,275af02,2c9369d) — example of an adoption that DIDN’T need a spike because the SOTA candidate (native Claude Code TaskList) was already in production use
Cross-reference for spike-driven decisions:
- Spike-suite findings doc:
docs/superpowers/specs/2026-05-25-standardisation-spike-suite-findings.mdv1.1 (commitf273d65) - Per-spike T-files:
~/off-github/library/projects/inherit/T-spike-sN-*-2026-05-25.md(6 files)
Strategic positioning:
This pattern aligns with Rich’s operator-first frame (per project_strategic_position_operator_first_2026_05_22): community-standard SOTA adoption is buy-not-build; TT-custom invention only where genuine frontier exists. The recursive review-and-revise loop is the discipline that prevents committing to wrong buys + prevents over-investing in build-from-scratch when a buy candidate validates.