Rule: Before finalising a scorecard, check each criterion against this list of weak-criteria patterns. If a criterion matches, either remove it (redistribute weight to discriminating criteria) or rename/reframe it to measure what actually matters.
Why: Rich caught four weak criteria in one session (2026-04-24) across T-NEW-02 / T-NEW-CL-05 / T-NEW-CL-06 / T-NEW-CL-02 scorecards. Pattern strong enough to codify. Weak criteria waste weight that should discriminate between genuinely different structural options.
The four observed weak-criteria patterns:
| # | Criterion | Why weak | Fix |
|---|---|---|---|
| 1 | Registry-bump amortisation | Amendments batch at wave close → one registry bump regardless of amendment count (per feedback_architecture_state_file_discipline). Criterion measures a cost that doesn’t exist. | Remove; redistribute weight |
| 2 | A-number-economy | A-## IDs are unbounded sequential integers. No scarcity. Lookup is O(1) by ID. Mature architectures have MORE amendments, not fewer. | Remove; redistribute |
| 3 | v6.6 design fidelity | Per feedback_v2_clean_break_from_v6_6, v2 is clean-break; literal-shape-preservation pulls against the point of v2. | Rename to “Improvement over v6.6” (measures quality of change, not preservation) |
| 4 | v6.6 migration path | v2 has no runtime dependency on v6.6 data; LL v1 migration is Year-2+; partner familiarity already covered by partner-facing-clarity. Double-counting + low-priority. | De-weight sharply (5→1) or remove |
How to identify a weak criterion during scorecard authoring:
- Scarcity test: does the criterion measure a resource that’s genuinely finite? (e.g., registry IDs are unbounded → scarcity criterion weak)
- Clean-break test: does the criterion penalise deviation from v6.6 / legacy design? (per clean-break memory: should measure improvement, not preservation)
- Double-counting test: does another criterion measure the same thing? (e.g., partner-familiarity + partner-facing-clarity = same concern)
- Temporal-mismatch test: does the criterion measure a Year-2+ concern that shouldn’t drive Phase-1 decisions?
- Effect-size test: do the top 2-3 options score identically on this criterion? (non-discriminating → low utility)
What to do when a weak criterion is caught mid-scorecard:
- Be transparent with Rich — acknowledge the weak criterion openly; don’t hide behind the winner
- Show impact of removal — recompute top-2 options with criterion removed; does the winner change?
- Offer redistribution options — present 2-3 candidate redistributions to discriminating criteria
- Let Rich decide — preference between redistribution targets is a design-priority judgment
Session pattern observations:
- Weak criteria emerge more often in SD-level scorecards (sub-decisions) than in parent scorecards — parents have clearer structural contours; SDs risk over-specifying
- Weak criteria tend to cluster around “load” / “cost” / “preservation” framing — check these dimensions specifically
- Rich’s catch rate is high (4 wins in one session) — if he asks “explain criterion X”, there’s a good chance it’s weak
Cross-reference:
feedback_reframe_beats_reweight— related discipline; weak-criteria removal is a specific reframe typefeedback_architecture_state_file_discipline— establishes registry-bump-is-cheap principlefeedback_v2_clean_break_from_v6_6— establishes v6.6 is reference-only; fidelity-to-v6.6 pulls wrong direction
When this pattern does NOT apply:
- Criteria measuring genuine discriminating structural dimensions (atomic discipline, reversibility, standards alignment, etc.) are strong — keep at weight
- “Precedent consistency” is strong when a pattern has won multiple times — not weak even if it’s about past decisions
- “v1 design learning adoption” (renamed from v6.6 fidelity) is strong — measures whether v2 absorbs v6.6’s real-world learning
Applying forward: Check new scorecard criterion lists against the 5-test checklist before finalising scoring. Catching weak criteria at authoring time is better than catching them at review time.