Grep the workspace before authoring substrate
Rule: Before locking any substrate that asserts a fact about the workspace (file lists, naming patterns applied to all repos, cross-ref scopes, sweep targets, “files to check” enumerations), run grep -rlE against the workspace to discover the empirical state. Closed-set lists authored from memory are an anti-pattern.
Why: Three failures in 24h (2026-05-24 → 2026-05-25) from the same root cause:
-
PE-47 v1.13 (locked overnight 2026-05-25T02:35) — claimed feature-dir naming
<role>-phase-1/was a substrate-correcting “deviation” from canonical-sharedinherit-v2-phase-1/. Didn’t check Spec-Kit’s actual convention (NNN-semantic-slug/). Reversed in v1.14 + v1.15 + v1.16 after SOTA research surfaced the right pattern. -
PE-47 v1.15 (2026-05-25T09:00) — locked feature-dir mapping at 5 names:
standard / inheritkit / ias / www / test-suitewithout checking the rule applied uniformly. Rich spotted inconsistency: 3 different naming patterns mixed (semantic / repo-name-minus-prefix / last-segment). Re-locked as pure-semantic-role after the diagnostic. Cost: 2 unnecessary lock-revisions + 5 extra commits + substrate-churn cascade. -
Workspace-rename dispatch v1.0 (2026-05-25T09:30) — authored Step 6 file list (7 docs) from memory.
/review-planaudit caught: ≥61 actual grep hits across the workspace + 3 HIGH-severity missed scopes (live worktrees,package.jsonJSON metadata,.specify/**subtree). Marked needs-revision before dispatch.
Common thread: Claude pattern-matches what the substrate “should” look like, locks confidently, skips the grep -rlE discovery step. The damage compounds when the substrate then directs OTHER Claude sessions to act on the false closed-set — the executing session inherits the blind spots.
How to apply:
When you are about to author ANY of these artefact-shapes:
- A “sweep targets” list (files to update across a workspace)
- A mapping table claiming to cover N cases (repos / modules / files / etc.)
- A naming-rule lock applied to multiple entities
- Cross-reference scope for a refactor
- A “files to check” enumeration in a dispatch doc
- A list of CI workflows / configs / scripts to update
- An assertion that “all instances of X are at Y”
Do FIRST (before committing the substrate):
# Discover the empirical scope across markdown + JSON + YAML + scripts
grep -rlE '<pattern>' ~/testatetech/ \
--include='*.md' --include='*.json' --include='*.yml' --include='*.toml' \
--include='*.sh' --include='*.py' --include='*.ts' --include='*.js' \
--exclude-dir=.git --exclude-dir=node_modules \
2>/dev/null | wc -l
# For directory-state checks (especially worktrees + tmp dirs)
for r in code-inherit-v2 code-inheritkit code-ias code-inheritv2-www code-inheritv2-test-suite; do
git -C ~/testatetech/$r worktree list 2>/dev/null
done
# For naming-rule locks: explicit enumeration of ALL cases the rule covers
# (not just the first 1-2; test the rule against EVERY case)
for case in case1 case2 case3 case4 case5; do
echo "Rule produces: $(apply_rule $case)"
doneThen:
- Replace closed-set lists with grep-driven discovery in the substrate itself (e.g., dispatch instruction reads: “Step N: run
grep -rlEto produce file list, then iterate” — not “the file list is X, Y, Z”) - Append the actual hit-counts to the substrate’s “clean-state” section so future-Claude knows the true scope at execution time
- Test the naming rule against every case explicitly (not just the first 1-2)
- For dispatch docs specifically: the executing session should ALSO grep-discover, not trust the authored list. Trust the empirical, not the inherited
Anti-pattern signals (STOP if you catch yourself doing these):
- Typing a list of file paths without having just run
grep -rlEto produce them - Asserting “these are all the places X appears” without
wc -levidence - Authoring a mapping table for N cases where you only mentally tested 1-2
- Locking a “canonical” name without enumerating all N entities the canonical applies to
- Treating an LLM-recalled file list as the same epistemic class as an actual grep result
Verification check before locking: ask yourself “did the file list / mapping / pattern come from a grep run in THIS session, or from my memory of what should be there?” If memory: STOP, grep, then revise.
5-item pre-authoring checklist (elevated 2026-05-25T12:15 BST after 4th occurrence in ~30h — frontmatter standards-stack design spec v1.0 was authored claiming “v1.3 → v1.4” when the target was already at v1.4, plus 5 other empirical-fact errors caught by /review-plan audit):
Before writing ANY substrate body that asserts facts about workspace state, complete ALL 5 items:
-
Read target file’s current version + status —
head -10 <target-file>BEFORE proposing a version bump. Multiple authoring sessions may have advanced the target since memory was last read. The substrate-version-collision pattern (proposingvN → vN+1when target is already atvN+1) is a sentinel for this failure. -
Count actual LoC of any code being replaced —
wc -l <script>BEFORE asserting “this replaces ~X LoC”. Estimates from memory drift; actual counts are 1-command-away. -
Enumerate ALL entities the substrate covers —
grep -rl '<pattern>' <scope>BEFORE asserting “N files use this”. Do not estimate; count. If the substrate claims to handle multiple cases (file types, status values, naming patterns), explicitly enumerate each case + verify each is covered. -
Run the validation tools the substrate proposes on a sample — BEFORE asserting “JSON Schema validates as expected” or “yamllint catches this” or “owl:propertyX has these semantics”, actually run the tool on a representative file. If proposing a vocabulary or schema change, sanity-check with the canonical W3C/ISO/IETF spec text (not memory of what it might mean).
-
Confirm cross-refs in the proposed substrate resolve to existing files —
lsorgh apiBEFORE listing path/IRI references. The PE-47 cascade + spec v1.0 cross-ref claims that didn’t resolve are the empirical evidence this matters.
Apply this checklist EXPLICITLY by writing the checklist results into the substrate’s honest_substrate_read_declaration: frontmatter field OR §0 of the substrate body. Visible-discipline-application is a forcing function; memory-only application is not. Empirically validated: the 4 occurrences in 30h all had memories cited but the checklist was not visibly applied.
Related memories:
- banner_sweep_grep_all_occurrences (sister discipline; banner-rewrites must grep all hits)
- concurrent_burst_race_condition_count_24h (live-worktree state matters; concurrent sessions race)
- verify_before_author (verifying-before-author skill; 6-method toolkit)
- verify_after_author_via_directory_ls (post-author verification; method-7 candidate)
- git_mv_with_unstaged_edits_loses_modifications (worktree-edge-case where unverified state causes silent data loss)
- research_artefact_forward_traceability (research findings need bidirectional cite-back; same discipline applied to research)
- architecture_state_file_discipline (state files exist BECAUSE memory-based assumptions drift)
Discipline lineage: This memory generalises three sister memories that each cover one slice of the problem. The unifying claim: trust empirical-grep output, never memory-recall, for any substrate that asserts facts about workspace state.
Sources of the 2026-05-25 evidence:
- SCF v1.13 → v1.14 → v1.15 → v1.16 supersession chain (PE-47 lock-revisions; in
docs-strategy/docs/superpowers/specs/2026-04-29-multi-phase-audit/batch-imp-24-phase-d-double-prime-path-e-pivot-research-findings-v1.0.md) - Workspace-rename dispatch v1.0 §10.5 review findings (in
docs-strategy/docs/superpowers/specs/2026-04-29-multi-phase-audit/workspace-rename-dispatch-v1.0.md; commit 1e90a71) /review-planaudit transcript (in this session’s claude-mem corpus 2026-05-25T09:45 BST)