For any Claude agent reading a book from ~/off-github/library/indexed/<slug>/, the primary read source is <slug>-text.txt (produced by ~/off-github/library/extract-pdf-text.sh via pdftotext -layout + [[PAGE N]] markers at form-feeds). The PDF stays canonical for pagination + visuals; only open it directly to verify a diagram, table, or fine typography.
Why: Phase 2.5 parallel dispatch on Tuesday 21 April 2026 had 2 of 7 agents fail against the Claude Code Read tool’s 32 MB request-size limit when reading image-heavy PDFs (Cheshire 18 MB, Hallaq 4.9 MB). Recovery via EPUB worked but Rich pushed back on “EPUB-first” — EPUBs are flowable (page refs not stable), not all books have them (Meeker, Labra Gayo are PDF-only), and the Read tool doesn’t support EPUB natively (requires Python zip extraction). Text extraction via pdftotext -layout is the standard Unix approach: poppler-utils already installed, 1-2 sec per book, 3-60× size reduction (Allemang 52 MB → 0.9 MB extreme; Cheshire 17.5 MB → 6.1 MB typical), page numbers preserved via [[PAGE N]] markers.
How to apply:
- In agent prompts (notes-taking, critique, research), direct agents at
<slug>-text.txtwithRead(offset, limit)+ grep[[PAGE 345]]for navigation. - Do NOT send large PDF page ranges to the model unless the agent specifically needs to see a diagram — send text.
- If
<slug>-text.txtis missing for a book, run~/off-github/library/extract-pdf-text.sh <slug>— idempotent, skips existing files without--force. - At indexing time, always extract text as step 4 of the per-book indexing workflow (see
filing-system.mdv1.2). - The principle is general: if Claude keeps biting off more than it can chew, the fix is pre-extraction + chunked consumption, not bigger reads or format-switching.
Do NOT:
- Default to EPUB-first — Rich rejected this; pagination and PDF-only books make it worse.
- Feed raw PDF page ranges to agents for bulk text extraction — that’s what tripped the 32 MB limit twice.
- Assume the Read tool will handle large PDFs gracefully — it won’t if image content is present.