WCAG 2.2 AA Coverage Enhancement Plan
1. Architecture Map of the Current Pipeline
Entry Points
| Flow | Entry File | Endpoint |
|---|---|---|
| PDF upload | workers/api/src/routes/convert.ts | POST /api/convert/:fileId |
| URL fetch | workers/api/src/routes/gateway.ts | POST /api/gateway/convert |
| Bulk URLs | workers/api/src/routes/gateway.ts | POST /api/gateway/bulk |
| HTML remediation | workers/api/src/routes/remediate.ts | POST /api/remediate/html |
| V2 remediation | workers/api/src/routes/remediate-v2.ts | POST /remediate/remediate |
| PPTX remediation | workers/api/src/routes/remediate-pptx.ts | Dedicated PPTX flow |
| DOCX remediation | workers/api/src/routes/remediate-docx.ts | Dedicated DOCX flow |
Page Chunking & Routing
PDF upload ↓pdf-complexity-detector.ts ← Zero-cost binary inspection (fonts, XObjects, vector paths) classifies each page → text | math | image | mixed | table | dense-table ↓chunk-boundary-detector.ts ← Auto-split at ~30-50 pages using PDF outline/bookmarks ↓ Hard-split at MAX_CHUNK_SIZE_PAGES if no natural breakschunk-scheduler.ts ← Polls Supabase `chunk_jobs` table optimistic locking, sequential chunk-N+1 visibility ↓chunk-processor.ts ← Per-chunk: extract page range → SmartCascade → store to R2Model Routing per Page Type (Smart Cascade)
File: workers/api/src/services/smart-cascade-converter.ts
| Page Type | Tier 1 (cheap) | Tier 2 (escalation) | Escalation Trigger |
|---|---|---|---|
| Text-only | Marker API ($0.001/pg) | Gemini 2.5 Flash ($0.005/pg) | Quality score < 80 |
| Math | Marker + Temml MathML | Gemini 2.5 Flash | LaTeX rendering failures |
| Image/visual | Gemini 2.5 Flash ($0.005/pg) | Claude Sonnet 4.6 agentic ($0.15/pg) | Score stall after 2 passes |
| Mixed | Claude Sonnet 4.6 agentic (direct) | — | — |
| Table | Gemini 2.5 Flash | Vision table extractor (Claude Sonnet) | Complex table detection |
| Dense table | Vision table extractor (Claude Sonnet) | — | — |
Post-Processing Pipeline
File: workers/api/src/services/post-processing-pipeline.ts
After all chunks are assembled (chunk-assembler.ts), output runs through these sequential steps:
- structurePages (
page-structurer.ts) — page breaks, headers/footers, page numbers - optimizeDeterministic (
ux-optimizer.ts) — CSS injection, MathML normalization, long-description aria-describedby - runValidators (
validators.ts) — MathML, table, heading validators with structured reporting - addMathReadingAnnotations (
latex-math-renderer.ts) — LaTeX → screen-reader “reads as” hints - polishVisuals (
visual-polisher.ts) — LLM-powered CSS enhancement (Gemini Flash, CSS-only) - enhanceAccessibility (
wcag-validator.ts) — ARIA roles, landmarks, lang, skip links, viewport,<b>→<strong>, sr-only class - validateAndFix (
wcag-validator.ts) — Deterministic WCAG AA validation + auto-fix (no LLM) - wrapInDocument (
utils/html.ts) — Full HTML document + XHTML IR generation
Axe-Core Fix Loop (Optional)
File: workers/api/src/services/axe-fixer.ts
After the deterministic pipeline, an optional browser-based axe-core audit runs up to 3 iterations:
runAxeAudit()→ identifies violations →applyFixes()→ re-audit- Handles ~30 axe rule IDs deterministically (contrast, headings, lists, ARIA, tables, etc.)
- Reverts if a fix introduces regression
Seams Where Post-Processing Could Be Inserted
The pipeline in post-processing-pipeline.ts is sequential and well-factored. New passes can be inserted:
- Between step 6 (enhanceAccessibility) and step 7 (validateAndFix) — this is where
afterEnhancecallback already exists (line 55). Ideal for AI-powered passes that need full document context. - After step 7 (validateAndFix) — for passes that should run on the “final” HTML before document wrapping.
- After the axe-fixer loop — for passes that need the fully-fixed HTML (e.g., screen-reader simulation).
- In chunk-processor.ts — for per-page passes before assembly (e.g., reading-order verification per page).
- In image-enhancer.ts — for per-image passes (e.g., alt-text self-critique, long-description generation).
2. Current WCAG 2.2 AA Coverage Audit
Criteria Catalogue
The codebase maintains a full WCAG 2.1 A+AA catalogue in workers/api/src/services/wcag-criteria-map.ts (50 success criteria). Many criteria are correctly marked typicallyNA: true for static converted documents (audio/video, keyboard traps, timing, etc.).
Criteria We Handle Well (Automated + Tested)
| SC | Name | Implementation | Files |
|---|---|---|---|
| 1.1.1 | Non-text Content | Alt text generation (Gemini Flash), quality gate with retry, isAltTextAcceptable() blocklist | image-enhancer.ts, image-description-pipeline.ts, wcag-validator.ts |
| 1.3.1 | Info and Relationships | Heading hierarchy fix, table header/scope injection, <b>→<strong> semantics, list structure fixes | wcag-validator.ts, axe-fixer.ts, ux-optimizer.ts |
| 1.4.3 | Contrast (Minimum) | Inline style detection + ratio calculation (4.5:1 / 3:1 large text). Axe-fixer can fix contrast. | wcag-validator.ts:248-272, axe-fixer.ts |
| 2.4.1 | Bypass Blocks | Skip-to-main-content link auto-injected | wcag-validator.ts:112-120 |
| 2.4.2 | Page Titled | <title> injected from filename | wcag-validator.ts:89-101 |
| 2.4.4 | Link Purpose | Poor-link-text detector (regex patterns), LLM-assisted replacement | workers/remediate/src/wcag-impl.ts:211-341 |
| 2.4.6 | Headings and Labels | Heading hierarchy normalization (h1→h3 → h1→h2) | wcag-validator.ts, axe-fixer.ts |
| 3.1.1 | Language of Page | lang attribute injected on <html> | wcag-validator.ts:84-86 |
| 4.1.1 | Parsing | Duplicate ID removal, valid ARIA attributes, structural validation | axe-fixer.ts |
Criteria We Handle Weakly or Partially
| SC | Name | Current State | Gap |
|---|---|---|---|
| 1.1.1 | Non-text Content | Alt text generated for all images | No long descriptions for complex images (charts, infographics, data visualizations). Short alt text misses data content. isAltTextAcceptable() is a heuristic blocklist check, not a semantic quality assessment. |
| 1.3.1 | Info and Relationships | Tables get <th> + scope | No <caption> generation. Definition lists (<dl>) validated but not generated. Form label association is in the remediate worker but not the convert pipeline. |
| 1.3.2 | Meaningful Sequence | MCID-based reading order checked in PDF scorer | No reading-order verification for HTML output. Multi-column layouts from vision models may produce wrong reading order. The prompt says “maintain reading order” but there’s no post-hoc verification. |
| 1.4.3 | Contrast | Inline styles checked | CSS class-based colors and inherited styles not checked. Only the axe-fixer catches these (requires browser rendering). |
| 1.4.5 | Images of Text | Listed in criteria map | No detection or remediation. Scanned PDFs with text-as-image are OCR’d but there’s no check that the OCR fully replaces the image. |
| 2.4.6 | Headings and Labels | Hierarchy validated | No check for heading descriptiveness — empty headings caught, but “Chapter 1” or “Section A” pass. |
| 3.1.2 | Language of Parts | Listed in criteria map with custom rule ai-lang-of-parts | Not implemented in the convert pipeline. No per-element lang attribute detection for multilingual documents. |
| 4.1.2 | Name, Role, Value | Button names, link names, input labels | No validation for custom ARIA roles/states beyond what axe catches. |
Criteria We Skip Entirely
| SC | Name | Status | Notes |
|---|---|---|---|
| 1.4.4 | Resize Text | Not tested | Would require viewport rendering; intrinsically met by responsive CSS injected in step 6 |
| 1.4.10 | Reflow | Not tested | Same — CSS ensures reflow, but not verified |
| 1.4.11 | Non-text Contrast | Not tested | UI component borders/icons — partially N/A for converted docs |
| 1.4.12 | Text Spacing | Not tested | CSS allows text-spacing overrides, but not verified |
| 2.4.7 | Focus Visible | Not verified | :focus-visible CSS injected (line 188 wcag-validator.ts) but never validated |
| 4.1.3 | Status Messages | Not addressed | N/A for static converted documents |
Summary: Coverage Estimate
- 50 WCAG 2.1 A+AA success criteria in the catalogue
- ~18 are N/A for static converted documents (audio, video, keyboard traps, timing, etc.)
- ~32 applicable criteria remain
- ~20 well-handled with automated detection + fix
- ~8 partially handled (detection exists, fix is incomplete or shallow)
- ~4 listed but untested (CSS-based criteria assumed-passing)
Estimated coverage: ~62-70% of applicable criteria are fully passing. The ~70% figure the user cited aligns with this analysis.
3. Cost + Model Inventory
Per-Page Model Usage
| Service | Model | Provider | Input $/M tok | Output $/M tok | Approx Cost/Page | When Used |
|---|---|---|---|---|---|---|
| Marker API | — | Datalab | — | — | $0.001 | Text-only pages |
| Smart cascade T1 | Gemini 2.5 Flash | $0.15 | $2.50 | $0.005 | Vision pages (first try) | |
| Smart cascade T2 | Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | $0.15 | Vision pages (escalation) |
| Image enhancer | Gemini 2.5 Flash | $0.15 | $2.50 | $0.0003/image | Alt text generation | |
| Mathpix equations | Mathpix API | Datalab | — | — | $0.002/image | Equation images |
| Visual polish | Gemini 2.5 Flash | $0.15 | $2.50 | ~$0.002 | CSS-only polish (optional) | |
| Struct metadata | Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | Cached/amortized | PDF structure extraction |
Prompt Caching
Currently used in:
agentic-vision-converter.ts—cache_control: { type: 'ephemeral' }on PDF documents and system promptsstruct-table-extractor.ts— ephemeral cache for table extractionscore-metadata-extractor.ts— ephemeral cache for metadata
Tracked fields: cacheCreationInputTokens, cacheReadInputTokens in token usage.
Cost Tracking
Fully implemented. The cost-ledger.ts service records every AI call to a cost_ledger Supabase table with:
user_id,file_id,product,operation_type,model,backendinput_tokens,output_tokens,estimated_cost_usd,metadata- Structured JSON logging to stdout for Loki/Grafana
Additionally:
budget-estimator.ts— pre-conversion cost estimate based on page complexitydollar-budget.ts— hard cost cap per conversion with graceful stopcredits.ts— user credit system (3 credits/page worst case)llm-cost.ts— centralized per-model pricing table
4. Enhancement Opportunities (Ranked by Coverage-Gain-per-Dollar)
Enhancement 1: Cross-Page Heading Hierarchy + Coherence Check
WCAG SC(s): 1.3.1 (Info and Relationships), 2.4.6 (Headings and Labels), 2.4.10 (Section Headings)
Problem: Each chunk is processed independently. Heading levels may be inconsistent across chunk boundaries (chunk 1 ends at h2, chunk 2 restarts at h1). The existing analyzeHeadings() in wcag-validator.ts only validates within a single HTML fragment.
Approach: After chunk assembly, run a single cheap-model pass over the entire heading tree (extracted as a flat list of {level, text, position}). The model normalizes levels for cross-chunk coherence and flags non-descriptive headings.
- Estimated incremental cost: ~$0.001-0.003 per document (Gemini 2.5 Flash, <2K tokens total — headings only, not full HTML)
- Files to touch:
chunk-assembler.ts(extract heading tree post-assembly), newheading-coherence-checker.ts,post-processing-pipeline.ts(insert step) - Risk/Complexity: S — deterministic heading extraction, small LLM call, easy to validate
- Coverage gain: Improves 1.3.1, 2.4.6 from partial to full for multi-chunk documents
Enhancement 2: Alt-Text Self-Critique (Cheap Model)
WCAG SC(s): 1.1.1 (Non-text Content)
Problem: Current alt text quality gate (isAltTextAcceptable()) uses heuristic blocklist checks (too short, generic phrases like “image of”). It doesn’t assess whether alt text is semantically adequate for the image content. Complex images like charts get short alt text that misses the data story.
Approach: After initial alt text generation, run a cheap self-critique pass that checks:
- Does the alt text describe the image’s purpose (not just appearance)?
- For charts/data: does it convey the key data point or trend?
- Is it too verbose (>150 chars) or too terse (<15 chars)?
- Does it avoid redundancy with surrounding text?
Use Gemini 2.0 Flash Lite ($0.075/$0.30 per M tokens) — cheapest option that can compare text against an image.
- Estimated incremental cost: ~$0.0002 per image ($0.002 for a 10-image document)
- Files to touch:
image-enhancer.ts(add critique step after generation),image-description-pipeline.ts(wire critique into pipeline) - Risk/Complexity: S — single cheap call per image, clear pass/fail output
- Coverage gain: Upgrades 1.1.1 from “generates alt text” to “generates quality-assured alt text”
Enhancement 3: Deep-Vision Long Description for Complex Images
WCAG SC(s): 1.1.1 (Non-text Content — long description for complex images)
Problem: Charts, infographics, data visualizations, and complex diagrams get a short alt text but no long description. WCAG 1.1.1 requires “an alternative that serves an equivalent purpose” — for a bar chart, that means conveying the data, not just “bar chart showing quarterly revenue.”
Approach: Gate with the existing diagramType classifier from image-enhancer.ts:
-
If
diagramTypeischart,diagram, orillustrationAND the image is >100KB (not a simple icon):- Send to a capable vision model (Gemini 2.5 Flash) with a structured prompt asking for data extraction
- Generate a long description (<500 words) with data table if applicable
- Inject as
aria-describedbylinked<details>element (pattern already exists inux-optimizer.ts:1242)
-
Estimated incremental cost: ~$0.005 per complex image. Typical doc has 0-3 complex images → $0.00-$0.015/doc
-
Files to touch:
image-enhancer.ts(add long-description generation),ux-optimizer.ts(inject<details>element),image-description-pipeline.ts(wire in) -
Risk/Complexity: M — needs prompt engineering for data extraction;
<details>injection pattern exists but needs expansion -
Coverage gain: Addresses the biggest remaining gap in 1.1.1 — complex image descriptions
Legal note: Under ADA Title II (DOJ rule effective April 2027/2028 per CLAUDE.md compliance deadlines), complex images in government documents must have equivalent text alternatives. This is a high-priority SC for the target market.
Enhancement 4: Reading-Order Verification for Multi-Column Layouts
WCAG SC(s): 1.3.2 (Meaningful Sequence)
Problem: Vision models sometimes output multi-column content in wrong reading order (interleaving columns, or reading across instead of down). The convert pipeline has no post-hoc verification — it trusts the model output.
Approach: After conversion (per-page, in chunk-processor.ts), compare the HTML’s text sequence against the PDF’s text extraction order (from unpdf extractText). If >20% of text segments are reordered, flag for re-processing or route to a reading-order jury:
- Extract text blocks from HTML (split on block elements)
- Extract text from PDF (already available from complexity detector)
- Compute Kendall tau correlation between the two orderings
- If τ < 0.7: send the page image + current HTML to a vision model asking specifically about reading order
- Estimated incremental cost: $0 for the deterministic check (most pages pass). ~$0.005 per flagged page for vision jury. Estimated 5-10% of pages flag → $0.0025-0.005/page average.
- Files to touch: New
reading-order-verifier.ts,chunk-processor.ts(call after conversion),post-processing-pipeline.ts(optional doc-level pass) - Risk/Complexity: M — text extraction ordering comparison is non-trivial for complex layouts; false positive rate needs tuning
- Coverage gain: Moves 1.3.2 from “not verified” to “verified with automated check”
Enhancement 5: Form-Field Label-Association Jury
WCAG SC(s): 1.3.1 (Info and Relationships), 1.3.5 (Identify Input Purpose), 3.3.2 (Labels or Instructions)
Problem: The premium-form-converter.ts generates form HTML but label↔input association relies on the initial vision model output. The remediate worker (wcag-impl.ts:220-246) detects unlabeled inputs but only in the remediation flow — not the convert flow. No jury pass verifies the association is correct (label text matches the field’s purpose).
Approach: After form conversion, run a cheap model pass that:
- Extracts all
<label>/<input>pairs - Checks each
for/idpairing is correct - Verifies label text semantically matches field purpose
- Adds
autocompleteattributes per 1.3.5 (currently missing entirely)
- Estimated incremental cost: ~$0.002 per form page. Most docs have 0 form pages → $0 for typical docs.
- Files to touch:
premium-form-converter.ts(add post-conversion jury), newform-label-jury.ts - Risk/Complexity: S — small scope (form pages only), clear validation criteria
- Coverage gain: Moves 1.3.5 from “partially” to “full” for form documents. Also addresses 3.3.2.
Legal note: Forms in government PDFs are high-scrutiny items under ADA Title II. Incorrect label association is one of the most commonly cited WCAG violations in DOJ settlement agreements.
Enhancement 6: Language-of-Parts Detection
WCAG SC(s): 3.1.2 (Language of Parts)
Problem: The criteria map lists ai-lang-of-parts as a custom rule but it’s not implemented. Multilingual documents (common in government and academic PDFs) have no per-element lang attribute annotation.
Approach: After document assembly, scan text blocks for language switches using a lightweight language-detection library (e.g., cld3 or Gemini Flash with a simple prompt). Insert lang attributes on elements containing non-primary-language text.
- Estimated incremental cost: ~$0.001 per document (Gemini 2.0 Flash Lite, text-only, <1K tokens for language classification). For the library approach: $0 (CPU-only).
- Files to touch: New
language-of-parts-detector.ts,post-processing-pipeline.ts(new step after enhanceAccessibility) - Risk/Complexity: S — well-defined problem, small scope
- Coverage gain: Moves 3.1.2 from “not implemented” to “automated”
Enhancement 7: Confidence Scoring + Human Review Queue
WCAG SC(s): All — meta-enhancement
Problem: The pipeline ships everything with equal confidence. A perfect text-only page and a mangled multi-column layout with complex images get the same treatment. There’s no way for downstream consumers to know which items need human review.
Approach: Add a per-element confidence score to the output:
- Leverage the existing
QualityScore(8-dimension breakdown inquality-scorer.ts) - Extend with per-element granularity: each
<img>,<table>, heading gets a confidence tag - Elements with confidence < threshold populate a
requiresHumanReviewarray in the output - The array includes: element type, WCAG SC at risk, reason, location in document
This is the scaffolding that all other enhancements feed into.
- Estimated incremental cost: $0 (deterministic aggregation of existing scores)
- Files to touch:
quality-scorer.ts(add per-element scoring), shared types inpackages/shared/src/types.ts, output formatting inconvert.ts - Risk/Complexity: M — needs type changes across shared package + API output, but no new AI calls
- Coverage gain: Enables human-in-the-loop for the ~10% of elements that automated passes can’t confidently handle
Enhancement 8: Screen-Reader Simulation Read-Back
WCAG SC(s): 1.3.1, 1.3.2, 2.4.6, 4.1.2 — structural coherence validation
Problem: Even with all the above passes, the final output may have structural issues that are only apparent when “read” linearly as a screen reader would — e.g., a table caption that’s been orphaned from its table, or a heading that doesn’t match the content that follows it.
Approach: Serialize the final HTML to a linear text stream (strip tags, preserve element boundaries with markers). Send to a cheap model asking: “Does this read coherently as a document? Flag any points where the reading order breaks, content seems out of place, or a heading doesn’t match what follows.”
- Estimated incremental cost: ~$0.005-0.01 per document (Gemini 2.5 Flash, full-document text, ~5K-10K tokens)
- Files to touch: New
screen-reader-simulator.ts,post-processing-pipeline.ts(final step before wrapInDocument) - Risk/Complexity: L — LLM judgment on coherence is subjective; needs careful prompt engineering to avoid false positives. Results should route to
requiresHumanReview, not auto-fix. - Coverage gain: Catches structural issues that slip through rule-based checks. Primary value is quality assurance, not direct SC coverage.
Enhancement 9: Opus-Tier Jury on Low-Confidence Items
WCAG SC(s): All — quality escalation for the hardest items
Problem: Some elements are genuinely hard — complex data visualizations, unusual table structures, ambiguous reading order. Cheap models produce low-confidence results. Currently these ship as-is.
Approach: After confidence scoring (Enhancement 7), items with confidence < 40% and high WCAG impact (images, tables, forms) are sent to Claude Opus for a single review pass. Opus output replaces the original only if it scores higher.
- Estimated incremental cost: ~$0.05-0.10 per low-confidence item. With <10% of elements flagged and ~2-5 per doc: $0.10-0.50/doc (only for complex documents).
- Files to touch: New
opus-jury.ts,post-processing-pipeline.ts(conditional step gated on confidence scores) - Risk/Complexity: M — cost management is critical; must have a hard budget cap. Feature flag essential.
- Coverage gain: Targeted improvement on the hardest 5-10% of elements where cheaper models fail.
Summary: Enhancement Priority Matrix
| # | Enhancement | WCAG SCs | Cost/Doc | Complexity | Coverage Lift | Priority |
|---|---|---|---|---|---|---|
| 1 | Heading coherence check | 1.3.1, 2.4.6 | $0.002 | S | Medium | P1 |
| 2 | Alt-text self-critique | 1.1.1 | $0.002 | S | Medium | P1 |
| 3 | Long desc for complex images | 1.1.1 | $0.015 | M | High | P1 |
| 4 | Reading-order verification | 1.3.2 | $0.003 | M | High | P1 |
| 5 | Form label-association jury | 1.3.1, 1.3.5, 3.3.2 | $0.002 | S | Medium (form docs) | P2 |
| 6 | Language-of-parts detection | 3.1.2 | $0.001 | S | Medium (multilingual) | P2 |
| 7 | Confidence scoring + human queue | All (meta) | $0 | M | Foundational | P1 |
| 8 | Screen-reader simulation | 1.3.1, 1.3.2, 2.4.6 | $0.008 | L | Low-Medium | P3 |
| 9 | Opus jury on low-confidence | All | $0.10-0.50 | M | High (targeted) | P3 |
Estimated Total Cost Impact
For a typical 20-page document (mostly text, 5 images, 1 chart, 0 forms):
- Current pipeline cost: ~$0.05-0.20
- All P1 enhancements: +$0.02-0.03 (+10-60%)
- All P1+P2: +$0.02-0.04
- All P1+P2+P3: +$0.05-0.55 (Opus jury dominates when triggered)
Expected Coverage Lift
- Current: ~70% of applicable WCAG 2.2 AA criteria fully passing
- After P1 enhancements: ~80-82% (heading coherence, alt-text quality, reading order, long descriptions)
- After P1+P2: ~83-85% (forms + language-of-parts)
- After P1+P2+P3: ~85-88% (screen-reader simulation catches edge cases, Opus jury handles hard items)
The remaining ~12-15% are criteria that require browser-based interactive testing (reflow, text spacing, focus visible) or are genuinely N/A but marked as applicable for conservatism in VPAT reporting.
Legal Jurisdictions to Note
- ADA Title II (DOJ rule): Deadlines April 26, 2027 (pop ≥50K) and April 26, 2028 (pop <50K) per the compliance schedule in CLAUDE.md. All enhancements addressing SC 1.1.1, 1.3.1, and 1.3.2 are directly relevant to government PDF compliance.
- Section 508 (US Federal): Requires WCAG 2.0 AA conformance; all enhancements apply.
- EN 301 549 (EU): Maps to WCAG 2.1 AA; language-of-parts (Enhancement 6) is specifically flagged in EU accessibility audits of multilingual documents.
- AODA (Ontario): Requires WCAG 2.0 AA; same SC applicability.
Important: Automated remediation cannot serve as a substitute for human attestation of WCAG conformance. The output should clearly indicate which criteria were automatically verified vs. which require human review (Enhancement 7). VPAT reports generated from this pipeline already use conservative “partially supports” / “not-verified” language (per wcag-criteria-map.ts honesty rules), and this should continue.
Implementation Status (P1 Enhancements)
All P1 enhancements have been implemented on branch feature/wcag-coverage-enhancements.
Files Created
| Enhancement | File | Tests |
|---|---|---|
| Confidence scoring (#7) | workers/api/src/services/confidence-scorer.ts | __tests__/services/confidence-scorer.test.ts (16 tests) |
| Heading coherence (#1) | workers/api/src/services/heading-coherence-checker.ts | __tests__/services/heading-coherence-checker.test.ts (11 tests) |
| Alt-text critique (#2) | workers/api/src/services/alt-text-critique.ts | __tests__/services/alt-text-critique.test.ts (6 tests) |
| Long descriptions (#3) | workers/api/src/services/long-description-generator.ts | __tests__/services/long-description-generator.test.ts (14 tests) |
| Reading-order verifier (#4) | workers/api/src/services/reading-order-verifier.ts | __tests__/services/reading-order-verifier.test.ts (12 tests) |
Files Modified
| File | Changes |
|---|---|
workers/api/src/services/post-processing-pipeline.ts | Added 3 new pipeline steps (heading coherence, reading-order check, confidence scoring) with feature flags |
workers/api/src/services/image-enhancer.ts | Added enableAltTextCritique flag and critique call after retry logic |
workers/api/src/services/image-description-pipeline.ts | Added long description generation for qualifying complex images |
Feature Flags
All enhancements are disabled by default. Enable via PostProcessOptions or EnhancerConfig:
| Flag | Where | Purpose |
|---|---|---|
enableConfidenceScoring | PostProcessOptions | Per-element confidence scores + requiresHumanReview output |
confidenceThreshold | PostProcessOptions | Confidence threshold (default: 60) |
enableHeadingCoherence | PostProcessOptions | Cross-chunk heading hierarchy normalization |
enableReadingOrderCheck | PostProcessOptions | PDF ↔ HTML reading-order comparison |
pdfTextPages | PostProcessOptions | PDF text per page (required for reading-order check) |
enableAltTextCritique | EnhancerConfig | Semantic alt-text quality verification |
Cost Tracking
- Alt-text critique: Uses
estimateLlmCost()with modelgemini-2.0-flash-lite. Token usage tracked inCritiqueResultand flows throughenhanceImagesInHtmltoken totals. - Long descriptions: Uses
estimateLlmCost()with modelgemini-2.5-flash. Token usage tracked inLongDescriptionResultand logged in image pipeline. - Heading coherence: No LLM calls — deterministic, zero cost.
- Confidence scoring: No LLM calls — deterministic, zero cost.
- Reading-order verifier: No LLM calls — deterministic, zero cost.
- Form label jury: No LLM calls — deterministic, zero cost.
- Language-of-parts: No LLM calls — deterministic Unicode analysis, zero cost.
Implementation Status (P2 Enhancements)
P2 enhancements implemented on branch feature/wcag-coverage-enhancements.
Files Created
| Enhancement | File | Tests |
|---|---|---|
| Form label jury (#5) | workers/api/src/services/form-label-jury.ts | __tests__/services/form-label-jury.test.ts (26 tests) |
| Language-of-parts (#6) | workers/api/src/services/language-of-parts-detector.ts | __tests__/services/language-of-parts-detector.test.ts (19 tests) |
Feature Flags (P2)
| Flag | Where | Purpose |
|---|---|---|
enableFormLabelJury | PostProcessOptions | Label↔input validation + autocomplete injection |
enableLanguageOfParts | PostProcessOptions | Unicode-based per-element lang attribute detection |
Implementation Status (P3 Enhancements)
P3 enhancements implemented on branch feature/wcag-coverage-enhancements.
Files Created
| Enhancement | File | Tests |
|---|---|---|
| Screen-reader sim (#8) | workers/api/src/services/screen-reader-simulator.ts | __tests__/services/screen-reader-simulator.test.ts (13 tests) |
| Opus jury (#9) | workers/api/src/services/opus-jury.ts | __tests__/services/opus-jury.test.ts (10 tests) |
Feature Flags (P3)
| Flag | Where | Purpose |
|---|---|---|
enableScreenReaderSim | PostProcessOptions | Coherence validation via linearized read-back |
enableOpusJury | PostProcessOptions | Claude Opus review of low-confidence items |
maxOpusCostUsd | PostProcessOptions | Hard budget cap for Opus jury (default: $0.50/doc) |
Cost Tracking (P3)
- Screen-reader sim: Gemini 2.5 Flash, ~$0.005-0.01/doc. Token usage tracked in
ScreenReaderSimResult. - Opus jury: Claude Opus 4.6, ~$0.05-0.10/item. Token usage tracked per-verdict. Hard budget cap prevents runaway costs.