WCAG 2.2 AA Coverage Enhancement Plan

1. Architecture Map of the Current Pipeline

Entry Points

Flow	Entry File	Endpoint
PDF upload	`workers/api/src/routes/convert.ts`	`POST /api/convert/:fileId`
URL fetch	`workers/api/src/routes/gateway.ts`	`POST /api/gateway/convert`
Bulk URLs	`workers/api/src/routes/gateway.ts`	`POST /api/gateway/bulk`
HTML remediation	`workers/api/src/routes/remediate.ts`	`POST /api/remediate/html`
V2 remediation	`workers/api/src/routes/remediate-v2.ts`	`POST /remediate/remediate`
PPTX remediation	`workers/api/src/routes/remediate-pptx.ts`	Dedicated PPTX flow
DOCX remediation	`workers/api/src/routes/remediate-docx.ts`	Dedicated DOCX flow

Page Chunking & Routing

PDF upload
  ↓
pdf-complexity-detector.ts         ← Zero-cost binary inspection (fonts, XObjects, vector paths)
  classifies each page → text | math | image | mixed | table | dense-table
  ↓
chunk-boundary-detector.ts         ← Auto-split at ~30-50 pages using PDF outline/bookmarks
  ↓                                   Hard-split at MAX_CHUNK_SIZE_PAGES if no natural breaks
chunk-scheduler.ts                 ← Polls Supabase `chunk_jobs` table
  optimistic locking, sequential chunk-N+1 visibility
  ↓
chunk-processor.ts                 ← Per-chunk: extract page range → SmartCascade → store to R2

Model Routing per Page Type (Smart Cascade)

File: workers/api/src/services/smart-cascade-converter.ts

Page Type	Tier 1 (cheap)	Tier 2 (escalation)	Escalation Trigger
Text-only	Marker API ($0.001/pg)	Gemini 2.5 Flash ($0.005/pg)	Quality score < 80
Math	Marker + Temml MathML	Gemini 2.5 Flash	LaTeX rendering failures
Image/visual	Gemini 2.5 Flash ($0.005/pg)	Claude Sonnet 4.6 agentic ($0.15/pg)	Score stall after 2 passes
Mixed	Claude Sonnet 4.6 agentic (direct)	—	—
Table	Gemini 2.5 Flash	Vision table extractor (Claude Sonnet)	Complex table detection
Dense table	Vision table extractor (Claude Sonnet)	—	—

Post-Processing Pipeline

File: workers/api/src/services/post-processing-pipeline.ts

After all chunks are assembled (chunk-assembler.ts), output runs through these sequential steps:

structurePages (page-structurer.ts) — page breaks, headers/footers, page numbers
optimizeDeterministic (ux-optimizer.ts) — CSS injection, MathML normalization, long-description aria-describedby
runValidators (validators.ts) — MathML, table, heading validators with structured reporting
addMathReadingAnnotations (latex-math-renderer.ts) — LaTeX → screen-reader “reads as” hints
polishVisuals (visual-polisher.ts) — LLM-powered CSS enhancement (Gemini Flash, CSS-only)
enhanceAccessibility (wcag-validator.ts) — ARIA roles, landmarks, lang, skip links, viewport, <b>→<strong>, sr-only class
validateAndFix (wcag-validator.ts) — Deterministic WCAG AA validation + auto-fix (no LLM)
wrapInDocument (utils/html.ts) — Full HTML document + XHTML IR generation

Axe-Core Fix Loop (Optional)

File: workers/api/src/services/axe-fixer.ts

After the deterministic pipeline, an optional browser-based axe-core audit runs up to 3 iterations:

runAxeAudit() → identifies violations → applyFixes() → re-audit
Handles ~30 axe rule IDs deterministically (contrast, headings, lists, ARIA, tables, etc.)
Reverts if a fix introduces regression

Seams Where Post-Processing Could Be Inserted

The pipeline in post-processing-pipeline.ts is sequential and well-factored. New passes can be inserted:

Between step 6 (enhanceAccessibility) and step 7 (validateAndFix) — this is where afterEnhance callback already exists (line 55). Ideal for AI-powered passes that need full document context.
After step 7 (validateAndFix) — for passes that should run on the “final” HTML before document wrapping.
After the axe-fixer loop — for passes that need the fully-fixed HTML (e.g., screen-reader simulation).
In chunk-processor.ts — for per-page passes before assembly (e.g., reading-order verification per page).
In image-enhancer.ts — for per-image passes (e.g., alt-text self-critique, long-description generation).

2. Current WCAG 2.2 AA Coverage Audit

Criteria Catalogue

The codebase maintains a full WCAG 2.1 A+AA catalogue in workers/api/src/services/wcag-criteria-map.ts (50 success criteria). Many criteria are correctly marked typicallyNA: true for static converted documents (audio/video, keyboard traps, timing, etc.).

Criteria We Handle Well (Automated + Tested)

SC	Name	Implementation	Files
1.1.1	Non-text Content	Alt text generation (Gemini Flash), quality gate with retry, isAltTextAcceptable() blocklist	`image-enhancer.ts`, `image-description-pipeline.ts`, `wcag-validator.ts`
1.3.1	Info and Relationships	Heading hierarchy fix, table header/scope injection, `<b>`→`<strong>` semantics, list structure fixes	`wcag-validator.ts`, `axe-fixer.ts`, `ux-optimizer.ts`
1.4.3	Contrast (Minimum)	Inline style detection + ratio calculation (4.5:1 / 3:1 large text). Axe-fixer can fix contrast.	`wcag-validator.ts:248-272`, `axe-fixer.ts`
2.4.1	Bypass Blocks	Skip-to-main-content link auto-injected	`wcag-validator.ts:112-120`
2.4.2	Page Titled	`<title>` injected from filename	`wcag-validator.ts:89-101`
2.4.4	Link Purpose	Poor-link-text detector (regex patterns), LLM-assisted replacement	`workers/remediate/src/wcag-impl.ts:211-341`
2.4.6	Headings and Labels	Heading hierarchy normalization (h1→h3 → h1→h2)	`wcag-validator.ts`, `axe-fixer.ts`
3.1.1	Language of Page	`lang` attribute injected on `<html>`	`wcag-validator.ts:84-86`
4.1.1	Parsing	Duplicate ID removal, valid ARIA attributes, structural validation	`axe-fixer.ts`

Criteria We Handle Weakly or Partially

SC	Name	Current State	Gap
1.1.1	Non-text Content	Alt text generated for all images	No long descriptions for complex images (charts, infographics, data visualizations). Short alt text misses data content. `isAltTextAcceptable()` is a heuristic blocklist check, not a semantic quality assessment.
1.3.1	Info and Relationships	Tables get `<th>` + `scope`	No `<caption>` generation. Definition lists (`<dl>`) validated but not generated. Form label association is in the remediate worker but not the convert pipeline.
1.3.2	Meaningful Sequence	MCID-based reading order checked in PDF scorer	No reading-order verification for HTML output. Multi-column layouts from vision models may produce wrong reading order. The prompt says “maintain reading order” but there’s no post-hoc verification.
1.4.3	Contrast	Inline styles checked	CSS class-based colors and inherited styles not checked. Only the axe-fixer catches these (requires browser rendering).
1.4.5	Images of Text	Listed in criteria map	No detection or remediation. Scanned PDFs with text-as-image are OCR’d but there’s no check that the OCR fully replaces the image.
2.4.6	Headings and Labels	Hierarchy validated	No check for heading descriptiveness — empty headings caught, but “Chapter 1” or “Section A” pass.
3.1.2	Language of Parts	Listed in criteria map with custom rule `ai-lang-of-parts`	Not implemented in the convert pipeline. No per-element `lang` attribute detection for multilingual documents.
4.1.2	Name, Role, Value	Button names, link names, input labels	No validation for custom ARIA roles/states beyond what axe catches.

Criteria We Skip Entirely

SC	Name	Status	Notes
1.4.4	Resize Text	Not tested	Would require viewport rendering; intrinsically met by responsive CSS injected in step 6
1.4.10	Reflow	Not tested	Same — CSS ensures reflow, but not verified
1.4.11	Non-text Contrast	Not tested	UI component borders/icons — partially N/A for converted docs
1.4.12	Text Spacing	Not tested	CSS allows text-spacing overrides, but not verified
2.4.7	Focus Visible	Not verified	`:focus-visible` CSS injected (line 188 wcag-validator.ts) but never validated
4.1.3	Status Messages	Not addressed	N/A for static converted documents

Summary: Coverage Estimate

50 WCAG 2.1 A+AA success criteria in the catalogue
~18 are N/A for static converted documents (audio, video, keyboard traps, timing, etc.)
~32 applicable criteria remain
~20 well-handled with automated detection + fix
~8 partially handled (detection exists, fix is incomplete or shallow)
~4 listed but untested (CSS-based criteria assumed-passing)

Estimated coverage: ~62-70% of applicable criteria are fully passing. The ~70% figure the user cited aligns with this analysis.

3. Cost + Model Inventory

Per-Page Model Usage

Service	Model	Provider	Input $/M tok	Output $/M tok	Approx Cost/Page	When Used
Marker API	—	Datalab	—	—	$0.001	Text-only pages
Smart cascade T1	Gemini 2.5 Flash	Google	$0.15	$2.50	$0.005	Vision pages (first try)
Smart cascade T2	Claude Sonnet 4.6	Anthropic	$3.00	$15.00	$0.15	Vision pages (escalation)
Image enhancer	Gemini 2.5 Flash	Google	$0.15	$2.50	$0.0003/image	Alt text generation
Mathpix equations	Mathpix API	Datalab	—	—	$0.002/image	Equation images
Visual polish	Gemini 2.5 Flash	Google	$0.15	$2.50	~$0.002	CSS-only polish (optional)
Struct metadata	Claude Sonnet 4.6	Anthropic	$3.00	$15.00	Cached/amortized	PDF structure extraction

Prompt Caching

Currently used in:

agentic-vision-converter.ts — cache_control: { type: 'ephemeral' } on PDF documents and system prompts
struct-table-extractor.ts — ephemeral cache for table extraction
score-metadata-extractor.ts — ephemeral cache for metadata

Tracked fields: cacheCreationInputTokens, cacheReadInputTokens in token usage.

Cost Tracking

Fully implemented. The cost-ledger.ts service records every AI call to a cost_ledger Supabase table with:

user_id, file_id, product, operation_type, model, backend
input_tokens, output_tokens, estimated_cost_usd, metadata
Structured JSON logging to stdout for Loki/Grafana

Additionally:

budget-estimator.ts — pre-conversion cost estimate based on page complexity
dollar-budget.ts — hard cost cap per conversion with graceful stop
credits.ts — user credit system (3 credits/page worst case)
llm-cost.ts — centralized per-model pricing table

4. Enhancement Opportunities (Ranked by Coverage-Gain-per-Dollar)

Enhancement 1: Cross-Page Heading Hierarchy + Coherence Check

WCAG SC(s): 1.3.1 (Info and Relationships), 2.4.6 (Headings and Labels), 2.4.10 (Section Headings)

Problem: Each chunk is processed independently. Heading levels may be inconsistent across chunk boundaries (chunk 1 ends at h2, chunk 2 restarts at h1). The existing analyzeHeadings() in wcag-validator.ts only validates within a single HTML fragment.

Approach: After chunk assembly, run a single cheap-model pass over the entire heading tree (extracted as a flat list of {level, text, position}). The model normalizes levels for cross-chunk coherence and flags non-descriptive headings.

Estimated incremental cost: ~$0.001-0.003 per document (Gemini 2.5 Flash, <2K tokens total — headings only, not full HTML)
Files to touch: chunk-assembler.ts (extract heading tree post-assembly), new heading-coherence-checker.ts, post-processing-pipeline.ts (insert step)
Risk/Complexity: S — deterministic heading extraction, small LLM call, easy to validate
Coverage gain: Improves 1.3.1, 2.4.6 from partial to full for multi-chunk documents

Enhancement 2: Alt-Text Self-Critique (Cheap Model)

WCAG SC(s): 1.1.1 (Non-text Content)

Problem: Current alt text quality gate (isAltTextAcceptable()) uses heuristic blocklist checks (too short, generic phrases like “image of”). It doesn’t assess whether alt text is semantically adequate for the image content. Complex images like charts get short alt text that misses the data story.

Approach: After initial alt text generation, run a cheap self-critique pass that checks:

Does the alt text describe the image’s purpose (not just appearance)?
For charts/data: does it convey the key data point or trend?
Is it too verbose (>150 chars) or too terse (<15 chars)?
Does it avoid redundancy with surrounding text?

Use Gemini 2.0 Flash Lite ($0.075/$0.30 per M tokens) — cheapest option that can compare text against an image.

Estimated incremental cost: ~$0.0002 per image ($0.002 for a 10-image document)
Files to touch: image-enhancer.ts (add critique step after generation), image-description-pipeline.ts (wire critique into pipeline)
Risk/Complexity: S — single cheap call per image, clear pass/fail output
Coverage gain: Upgrades 1.1.1 from “generates alt text” to “generates quality-assured alt text”

Enhancement 3: Deep-Vision Long Description for Complex Images

WCAG SC(s): 1.1.1 (Non-text Content — long description for complex images)

Problem: Charts, infographics, data visualizations, and complex diagrams get a short alt text but no long description. WCAG 1.1.1 requires “an alternative that serves an equivalent purpose” — for a bar chart, that means conveying the data, not just “bar chart showing quarterly revenue.”

Approach: Gate with the existing diagramType classifier from image-enhancer.ts:

If diagramType is chart, diagram, or illustration AND the image is >100KB (not a simple icon):
- Send to a capable vision model (Gemini 2.5 Flash) with a structured prompt asking for data extraction
- Generate a long description (<500 words) with data table if applicable
- Inject as aria-describedby linked <details> element (pattern already exists in ux-optimizer.ts:1242)
Estimated incremental cost: ~$0.005 per complex image. Typical doc has 0-3 complex images → $0.00-$0.015/doc
Files to touch: image-enhancer.ts (add long-description generation), ux-optimizer.ts (inject <details> element), image-description-pipeline.ts (wire in)
Risk/Complexity: M — needs prompt engineering for data extraction; <details> injection pattern exists but needs expansion
Coverage gain: Addresses the biggest remaining gap in 1.1.1 — complex image descriptions

Legal note: Under ADA Title II (DOJ rule effective April 2027/2028 per CLAUDE.md compliance deadlines), complex images in government documents must have equivalent text alternatives. This is a high-priority SC for the target market.

Enhancement 4: Reading-Order Verification for Multi-Column Layouts

WCAG SC(s): 1.3.2 (Meaningful Sequence)

Problem: Vision models sometimes output multi-column content in wrong reading order (interleaving columns, or reading across instead of down). The convert pipeline has no post-hoc verification — it trusts the model output.

Approach: After conversion (per-page, in chunk-processor.ts), compare the HTML’s text sequence against the PDF’s text extraction order (from unpdf extractText). If >20% of text segments are reordered, flag for re-processing or route to a reading-order jury:

Extract text blocks from HTML (split on block elements)
Extract text from PDF (already available from complexity detector)
Compute Kendall tau correlation between the two orderings
If τ < 0.7: send the page image + current HTML to a vision model asking specifically about reading order

Estimated incremental cost: $0 for the deterministic check (most pages pass). ~$0.005 per flagged page for vision jury. Estimated 5-10% of pages flag → $0.0025-0.005/page average.
Files to touch: New reading-order-verifier.ts, chunk-processor.ts (call after conversion), post-processing-pipeline.ts (optional doc-level pass)
Risk/Complexity: M — text extraction ordering comparison is non-trivial for complex layouts; false positive rate needs tuning
Coverage gain: Moves 1.3.2 from “not verified” to “verified with automated check”

Enhancement 5: Form-Field Label-Association Jury

WCAG SC(s): 1.3.1 (Info and Relationships), 1.3.5 (Identify Input Purpose), 3.3.2 (Labels or Instructions)

Problem: The premium-form-converter.ts generates form HTML but label↔input association relies on the initial vision model output. The remediate worker (wcag-impl.ts:220-246) detects unlabeled inputs but only in the remediation flow — not the convert flow. No jury pass verifies the association is correct (label text matches the field’s purpose).

Approach: After form conversion, run a cheap model pass that:

Extracts all <label>/<input> pairs
Checks each for/id pairing is correct
Verifies label text semantically matches field purpose
Adds autocomplete attributes per 1.3.5 (currently missing entirely)

Estimated incremental cost: ~$0.002 per form page. Most docs have 0 form pages → $0 for typical docs.
Files to touch: premium-form-converter.ts (add post-conversion jury), new form-label-jury.ts
Risk/Complexity: S — small scope (form pages only), clear validation criteria
Coverage gain: Moves 1.3.5 from “partially” to “full” for form documents. Also addresses 3.3.2.

Legal note: Forms in government PDFs are high-scrutiny items under ADA Title II. Incorrect label association is one of the most commonly cited WCAG violations in DOJ settlement agreements.

Enhancement 6: Language-of-Parts Detection

WCAG SC(s): 3.1.2 (Language of Parts)

Problem: The criteria map lists ai-lang-of-parts as a custom rule but it’s not implemented. Multilingual documents (common in government and academic PDFs) have no per-element lang attribute annotation.

Approach: After document assembly, scan text blocks for language switches using a lightweight language-detection library (e.g., cld3 or Gemini Flash with a simple prompt). Insert lang attributes on elements containing non-primary-language text.

Estimated incremental cost: ~$0.001 per document (Gemini 2.0 Flash Lite, text-only, <1K tokens for language classification). For the library approach: $0 (CPU-only).
Files to touch: New language-of-parts-detector.ts, post-processing-pipeline.ts (new step after enhanceAccessibility)
Risk/Complexity: S — well-defined problem, small scope
Coverage gain: Moves 3.1.2 from “not implemented” to “automated”

Enhancement 7: Confidence Scoring + Human Review Queue

WCAG SC(s): All — meta-enhancement

Problem: The pipeline ships everything with equal confidence. A perfect text-only page and a mangled multi-column layout with complex images get the same treatment. There’s no way for downstream consumers to know which items need human review.

Approach: Add a per-element confidence score to the output:

Leverage the existing QualityScore (8-dimension breakdown in quality-scorer.ts)
Extend with per-element granularity: each <img>, <table>, heading gets a confidence tag
Elements with confidence < threshold populate a requiresHumanReview array in the output
The array includes: element type, WCAG SC at risk, reason, location in document

This is the scaffolding that all other enhancements feed into.

Estimated incremental cost: $0 (deterministic aggregation of existing scores)
Files to touch: quality-scorer.ts (add per-element scoring), shared types in packages/shared/src/types.ts, output formatting in convert.ts
Risk/Complexity: M — needs type changes across shared package + API output, but no new AI calls
Coverage gain: Enables human-in-the-loop for the ~10% of elements that automated passes can’t confidently handle

WCAG SC(s): 1.3.1, 1.3.2, 2.4.6, 4.1.2 — structural coherence validation

Problem: Even with all the above passes, the final output may have structural issues that are only apparent when “read” linearly as a screen reader would — e.g., a table caption that’s been orphaned from its table, or a heading that doesn’t match the content that follows it.

Approach: Serialize the final HTML to a linear text stream (strip tags, preserve element boundaries with markers). Send to a cheap model asking: “Does this read coherently as a document? Flag any points where the reading order breaks, content seems out of place, or a heading doesn’t match what follows.”

Estimated incremental cost: ~$0.005-0.01 per document (Gemini 2.5 Flash, full-document text, ~5K-10K tokens)
Files to touch: New screen-reader-simulator.ts, post-processing-pipeline.ts (final step before wrapInDocument)
Risk/Complexity: L — LLM judgment on coherence is subjective; needs careful prompt engineering to avoid false positives. Results should route to requiresHumanReview, not auto-fix.
Coverage gain: Catches structural issues that slip through rule-based checks. Primary value is quality assurance, not direct SC coverage.

Enhancement 9: Opus-Tier Jury on Low-Confidence Items

WCAG SC(s): All — quality escalation for the hardest items

Problem: Some elements are genuinely hard — complex data visualizations, unusual table structures, ambiguous reading order. Cheap models produce low-confidence results. Currently these ship as-is.

Approach: After confidence scoring (Enhancement 7), items with confidence < 40% and high WCAG impact (images, tables, forms) are sent to Claude Opus for a single review pass. Opus output replaces the original only if it scores higher.

Estimated incremental cost: ~$0.05-0.10 per low-confidence item. With <10% of elements flagged and ~2-5 per doc: $0.10-0.50/doc (only for complex documents).
Files to touch: New opus-jury.ts, post-processing-pipeline.ts (conditional step gated on confidence scores)
Risk/Complexity: M — cost management is critical; must have a hard budget cap. Feature flag essential.
Coverage gain: Targeted improvement on the hardest 5-10% of elements where cheaper models fail.

Summary: Enhancement Priority Matrix

#	Enhancement	WCAG SCs	Cost/Doc	Complexity	Coverage Lift	Priority
1	Heading coherence check	1.3.1, 2.4.6	$0.002	S	Medium	P1
2	Alt-text self-critique	1.1.1	$0.002	S	Medium	P1
3	Long desc for complex images	1.1.1	$0.015	M	High	P1
4	Reading-order verification	1.3.2	$0.003	M	High	P1
5	Form label-association jury	1.3.1, 1.3.5, 3.3.2	$0.002	S	Medium (form docs)	P2
6	Language-of-parts detection	3.1.2	$0.001	S	Medium (multilingual)	P2
7	Confidence scoring + human queue	All (meta)	$0	M	Foundational	P1
8	Screen-reader simulation	1.3.1, 1.3.2, 2.4.6	$0.008	L	Low-Medium	P3
9	Opus jury on low-confidence	All	$0.10-0.50	M	High (targeted)	P3

Estimated Total Cost Impact

For a typical 20-page document (mostly text, 5 images, 1 chart, 0 forms):

Current pipeline cost: ~$0.05-0.20
All P1 enhancements: +$0.02-0.03 (+10-60%)
All P1+P2: +$0.02-0.04
All P1+P2+P3: +$0.05-0.55 (Opus jury dominates when triggered)

Expected Coverage Lift

Current: ~70% of applicable WCAG 2.2 AA criteria fully passing
After P1 enhancements: ~80-82% (heading coherence, alt-text quality, reading order, long descriptions)
After P1+P2: ~83-85% (forms + language-of-parts)
After P1+P2+P3: ~85-88% (screen-reader simulation catches edge cases, Opus jury handles hard items)

The remaining ~12-15% are criteria that require browser-based interactive testing (reflow, text spacing, focus visible) or are genuinely N/A but marked as applicable for conservatism in VPAT reporting.

Legal Jurisdictions to Note

ADA Title II (DOJ rule): Deadlines April 26, 2027 (pop ≥50K) and April 26, 2028 (pop <50K) per the compliance schedule in CLAUDE.md. All enhancements addressing SC 1.1.1, 1.3.1, and 1.3.2 are directly relevant to government PDF compliance.
Section 508 (US Federal): Requires WCAG 2.0 AA conformance; all enhancements apply.
EN 301 549 (EU): Maps to WCAG 2.1 AA; language-of-parts (Enhancement 6) is specifically flagged in EU accessibility audits of multilingual documents.
AODA (Ontario): Requires WCAG 2.0 AA; same SC applicability.

Important: Automated remediation cannot serve as a substitute for human attestation of WCAG conformance. The output should clearly indicate which criteria were automatically verified vs. which require human review (Enhancement 7). VPAT reports generated from this pipeline already use conservative “partially supports” / “not-verified” language (per wcag-criteria-map.ts honesty rules), and this should continue.

Implementation Status (P1 Enhancements)

All P1 enhancements have been implemented on branch feature/wcag-coverage-enhancements.

Files Created

Enhancement	File	Tests
Confidence scoring (#7)	`workers/api/src/services/confidence-scorer.ts`	`__tests__/services/confidence-scorer.test.ts` (16 tests)
Heading coherence (#1)	`workers/api/src/services/heading-coherence-checker.ts`	`__tests__/services/heading-coherence-checker.test.ts` (11 tests)
Alt-text critique (#2)	`workers/api/src/services/alt-text-critique.ts`	`__tests__/services/alt-text-critique.test.ts` (6 tests)
Long descriptions (#3)	`workers/api/src/services/long-description-generator.ts`	`__tests__/services/long-description-generator.test.ts` (14 tests)
Reading-order verifier (#4)	`workers/api/src/services/reading-order-verifier.ts`	`__tests__/services/reading-order-verifier.test.ts` (12 tests)

Files Modified

File	Changes
`workers/api/src/services/post-processing-pipeline.ts`	Added 3 new pipeline steps (heading coherence, reading-order check, confidence scoring) with feature flags
`workers/api/src/services/image-enhancer.ts`	Added `enableAltTextCritique` flag and critique call after retry logic
`workers/api/src/services/image-description-pipeline.ts`	Added long description generation for qualifying complex images

Feature Flags

All enhancements are disabled by default. Enable via PostProcessOptions or EnhancerConfig:

Flag	Where	Purpose
`enableConfidenceScoring`	`PostProcessOptions`	Per-element confidence scores + `requiresHumanReview` output
`confidenceThreshold`	`PostProcessOptions`	Confidence threshold (default: 60)
`enableHeadingCoherence`	`PostProcessOptions`	Cross-chunk heading hierarchy normalization
`enableReadingOrderCheck`	`PostProcessOptions`	PDF ↔ HTML reading-order comparison
`pdfTextPages`	`PostProcessOptions`	PDF text per page (required for reading-order check)
`enableAltTextCritique`	`EnhancerConfig`	Semantic alt-text quality verification

Cost Tracking

Alt-text critique: Uses estimateLlmCost() with model gemini-2.0-flash-lite. Token usage tracked in CritiqueResult and flows through enhanceImagesInHtml token totals.
Long descriptions: Uses estimateLlmCost() with model gemini-2.5-flash. Token usage tracked in LongDescriptionResult and logged in image pipeline.
Heading coherence: No LLM calls — deterministic, zero cost.
Confidence scoring: No LLM calls — deterministic, zero cost.
Reading-order verifier: No LLM calls — deterministic, zero cost.
Form label jury: No LLM calls — deterministic, zero cost.
Language-of-parts: No LLM calls — deterministic Unicode analysis, zero cost.

Implementation Status (P2 Enhancements)

P2 enhancements implemented on branch feature/wcag-coverage-enhancements.

Files Created

Enhancement	File	Tests
Form label jury (#5)	`workers/api/src/services/form-label-jury.ts`	`__tests__/services/form-label-jury.test.ts` (26 tests)
Language-of-parts (#6)	`workers/api/src/services/language-of-parts-detector.ts`	`__tests__/services/language-of-parts-detector.test.ts` (19 tests)

Feature Flags (P2)

Flag	Where	Purpose
`enableFormLabelJury`	`PostProcessOptions`	Label↔input validation + autocomplete injection
`enableLanguageOfParts`	`PostProcessOptions`	Unicode-based per-element `lang` attribute detection

Implementation Status (P3 Enhancements)

P3 enhancements implemented on branch feature/wcag-coverage-enhancements.

Files Created

Enhancement	File	Tests
Screen-reader sim (#8)	`workers/api/src/services/screen-reader-simulator.ts`	`__tests__/services/screen-reader-simulator.test.ts` (13 tests)
Opus jury (#9)	`workers/api/src/services/opus-jury.ts`	`__tests__/services/opus-jury.test.ts` (10 tests)

Feature Flags (P3)

Flag	Where	Purpose
`enableScreenReaderSim`	`PostProcessOptions`	Coherence validation via linearized read-back
`enableOpusJury`	`PostProcessOptions`	Claude Opus review of low-confidence items
`maxOpusCostUsd`	`PostProcessOptions`	Hard budget cap for Opus jury (default: $0.50/doc)

Cost Tracking (P3)

Screen-reader sim: Gemini 2.5 Flash, ~$0.005-0.01/doc. Token usage tracked in ScreenReaderSimResult.
Opus jury: Claude Opus 4.6, ~$0.05-0.10/item. Token usage tracked per-verdict. Hard budget cap prevents runaway costs.

WCAG 2.2 AA Coverage Enhancement Plan

1. Architecture Map of the Current Pipeline

Entry Points

Page Chunking & Routing

Model Routing per Page Type (Smart Cascade)

Post-Processing Pipeline

Axe-Core Fix Loop (Optional)

Seams Where Post-Processing Could Be Inserted

2. Current WCAG 2.2 AA Coverage Audit

Criteria Catalogue

Criteria We Handle Well (Automated + Tested)

Criteria We Handle Weakly or Partially

Criteria We Skip Entirely

Summary: Coverage Estimate

3. Cost + Model Inventory

Per-Page Model Usage

Prompt Caching

Cost Tracking

4. Enhancement Opportunities (Ranked by Coverage-Gain-per-Dollar)

Enhancement 1: Cross-Page Heading Hierarchy + Coherence Check

Enhancement 2: Alt-Text Self-Critique (Cheap Model)

Enhancement 3: Deep-Vision Long Description for Complex Images

Enhancement 4: Reading-Order Verification for Multi-Column Layouts

Enhancement 5: Form-Field Label-Association Jury

Enhancement 6: Language-of-Parts Detection

Enhancement 7: Confidence Scoring + Human Review Queue

Enhancement 8: Screen-Reader Simulation Read-Back

Enhancement 9: Opus-Tier Jury on Low-Confidence Items

Summary: Enhancement Priority Matrix

Estimated Total Cost Impact

Expected Coverage Lift

Legal Jurisdictions to Note

Implementation Status (P1 Enhancements)

Files Created

Files Modified

Feature Flags

Cost Tracking

Implementation Status (P2 Enhancements)

Files Created

Feature Flags (P2)

Implementation Status (P3 Enhancements)

Files Created

Feature Flags (P3)

Cost Tracking (P3)