Rule: Before finalising a scorecard, check each criterion against this list of weak-criteria patterns. If a criterion matches, either remove it (redistribute weight to discriminating criteria) or rename/reframe it to measure what actually matters.

Why: Rich caught four weak criteria in one session (2026-04-24) across T-NEW-02 / T-NEW-CL-05 / T-NEW-CL-06 / T-NEW-CL-02 scorecards. Pattern strong enough to codify. Weak criteria waste weight that should discriminate between genuinely different structural options.

The four observed weak-criteria patterns:

#CriterionWhy weakFix
1Registry-bump amortisationAmendments batch at wave close → one registry bump regardless of amendment count (per feedback_architecture_state_file_discipline). Criterion measures a cost that doesn’t exist.Remove; redistribute weight
2A-number-economyA-## IDs are unbounded sequential integers. No scarcity. Lookup is O(1) by ID. Mature architectures have MORE amendments, not fewer.Remove; redistribute
3v6.6 design fidelityPer feedback_v2_clean_break_from_v6_6, v2 is clean-break; literal-shape-preservation pulls against the point of v2.Rename to “Improvement over v6.6” (measures quality of change, not preservation)
4v6.6 migration pathv2 has no runtime dependency on v6.6 data; LL v1 migration is Year-2+; partner familiarity already covered by partner-facing-clarity. Double-counting + low-priority.De-weight sharply (5→1) or remove

How to identify a weak criterion during scorecard authoring:

  1. Scarcity test: does the criterion measure a resource that’s genuinely finite? (e.g., registry IDs are unbounded → scarcity criterion weak)
  2. Clean-break test: does the criterion penalise deviation from v6.6 / legacy design? (per clean-break memory: should measure improvement, not preservation)
  3. Double-counting test: does another criterion measure the same thing? (e.g., partner-familiarity + partner-facing-clarity = same concern)
  4. Temporal-mismatch test: does the criterion measure a Year-2+ concern that shouldn’t drive Phase-1 decisions?
  5. Effect-size test: do the top 2-3 options score identically on this criterion? (non-discriminating → low utility)

What to do when a weak criterion is caught mid-scorecard:

  1. Be transparent with Rich — acknowledge the weak criterion openly; don’t hide behind the winner
  2. Show impact of removal — recompute top-2 options with criterion removed; does the winner change?
  3. Offer redistribution options — present 2-3 candidate redistributions to discriminating criteria
  4. Let Rich decide — preference between redistribution targets is a design-priority judgment

Session pattern observations:

  • Weak criteria emerge more often in SD-level scorecards (sub-decisions) than in parent scorecards — parents have clearer structural contours; SDs risk over-specifying
  • Weak criteria tend to cluster around “load” / “cost” / “preservation” framing — check these dimensions specifically
  • Rich’s catch rate is high (4 wins in one session) — if he asks “explain criterion X”, there’s a good chance it’s weak

Cross-reference:

  • feedback_reframe_beats_reweight — related discipline; weak-criteria removal is a specific reframe type
  • feedback_architecture_state_file_discipline — establishes registry-bump-is-cheap principle
  • feedback_v2_clean_break_from_v6_6 — establishes v6.6 is reference-only; fidelity-to-v6.6 pulls wrong direction

When this pattern does NOT apply:

  • Criteria measuring genuine discriminating structural dimensions (atomic discipline, reversibility, standards alignment, etc.) are strong — keep at weight
  • “Precedent consistency” is strong when a pattern has won multiple times — not weak even if it’s about past decisions
  • “v1 design learning adoption” (renamed from v6.6 fidelity) is strong — measures whether v2 absorbs v6.6’s real-world learning

Applying forward: Check new scorecard criterion lists against the 5-test checklist before finalising scoring. Catching weak criteria at authoring time is better than catching them at review time.