Skip to content

WCAG Enhancement Feature Flags

This guide explains how to enable, test, and compare the WCAG coverage enhancement features added to the post-processing pipeline. All features are disabled by default and controlled via environment variables.

Both the PDF conversion flow and the HTML remediation flow run through the same runPostProcessing() pipeline, so every flag listed here applies to both.


Quick Start

Set environment variables on your server, in .dev.vars (local), or in your Cloudflare Worker / Lambda configuration. Every flag is a simple boolean — set to "true" to enable.

Terminal window
# Zero-cost deterministic checks (recommended to enable first)
WCAG_HEADING_COHERENCE=true
WCAG_FORM_LABEL_JURY=true
WCAG_LANGUAGE_OF_PARTS=true
WCAG_CONFIDENCE_SCORING=true
WCAG_READING_ORDER_CHECK=true
# LLM-powered checks (add cost — see estimates below)
WCAG_SCREEN_READER_SIM=true # requires GEMINI_API_KEY
WCAG_OPUS_JURY=true # requires ANTHROPIC_API_KEY

Changes take effect on the next request — no redeploy needed for runtime-env systems (Lambda, Cloudflare Workers). For Docker/PM2 deployments, restart the process after changing env vars.

To disable a flag, set it to any value other than "true" or remove it.


Flag Reference

WCAG_HEADING_COHERENCE

WCAG SCs1.3.1 (Info and Relationships), 2.4.6 (Headings and Labels)
What it doesNormalizes heading hierarchy across chunk boundaries in multi-chunk documents. Fixes level skips (h1→h3 becomes h1→h2), shifts documents that start at h2+ down to h1, and flags non-descriptive headings (e.g., “Chapter 1”, empty headings, numeric-only headings) for human review.
Cost$0 — deterministic, no LLM calls
PrerequisitesNone
Pipeline stepRuns after UX optimization, before validators (step 2.25)
Conformance reportEmits heading-coherence (pass/fixed) and heading-descriptiveness (pass/warning) rules

What to look for when testing:

  • Convert a multi-chunk PDF (>30 pages). Check if heading levels are consistent across page boundaries.
  • Look for [post-processing] START heading-coherence in server logs.
  • In the VPAT, SC 1.3.1 and 2.4.6 should show “Supports” or “Partially Supports” instead of “Not Verified” when headings were checked.

WCAG_FORM_LABEL_JURY

WCAG SCs1.3.1 (Info and Relationships), 1.3.5 (Identify Input Purpose), 3.3.2 (Labels or Instructions)
What it doesValidates that every <input>, <select>, and <textarea> has a properly associated <label>, aria-label, aria-labelledby, or title. Adds autocomplete attributes based on 22 label-text patterns (name, email, phone, address, date of birth, SSN→off, credit card, password, etc.). Reports missing labels, empty labels, missing IDs, and duplicate IDs.
Cost$0 — deterministic, no LLM calls
PrerequisitesNone
Pipeline stepRuns after enhance-accessibility (step 6.1)
Conformance reportEmits form-label-association (pass/fail) and autocomplete-added (pass/fixed) rules. When form fields are found, SCs 1.3.5 and 3.3.2 switch from “Not Applicable” to their actual conformance level.

What to look for when testing:

  • Convert or remediate a PDF with form fields (e.g., a government application form).
  • Check that the output HTML has autocomplete="given-name", autocomplete="email", etc. on the appropriate fields.
  • In the VPAT, SC 1.3.5 should show “Supports” (with auto-remediation note) instead of “Not Applicable” for documents that contain forms.
  • Compare a form PDF with and without this flag — the “without” version will have no autocomplete attributes.

WCAG_LANGUAGE_OF_PARTS

WCAG SCs3.1.2 (Language of Parts)
What it doesScans text blocks for non-Latin scripts using Unicode range analysis. Detects 19 scripts: CJK (Chinese, Japanese, Korean), Arabic, Hebrew, Cyrillic, Greek, Devanagari, Tamil, Telugu, Bengali, Gujarati, Gurmukhi, Thai, Khmer, Myanmar, Georgian, Armenian, and Ethiopic. Adds lang attributes to elements containing ≥3 characters of a detected non-primary script.
Cost$0 — deterministic Unicode analysis, no LLM calls
PrerequisitesNone
Pipeline stepRuns after form-label-jury (step 6.2)
Conformance reportEmits lang-of-parts (pass/fixed) rule

What to look for when testing:

  • Convert a multilingual document (e.g., an academic paper with Chinese or Arabic citations).
  • Inspect the output HTML — elements containing non-primary-language text should have lang="zh", lang="ar", etc.
  • In the VPAT, SC 3.1.2 should show “Supports” with a note like “Annotated 3 element(s) with lang attributes (zh, ar)”.

Limitation: This detects scripts, not languages within the same script. It cannot distinguish French from Spanish (both Latin script). Latin-script language detection would require an LLM or a language-detection library, which is deferred to a future enhancement.


WCAG_CONFIDENCE_SCORING

WCAG SCsAll (meta-enhancement)
What it doesAssigns a confidence score (0–100) to every image, table, and heading in the output HTML based on existing quality signals (alt text quality, table headers, heading hierarchy). Elements scoring below the threshold (default: 60) populate a requiresHumanReview array in the API response.
Cost$0 — deterministic signal aggregation, no LLM calls
PrerequisitesNone. Required for WCAG_OPUS_JURY.
Pipeline stepRuns after reading-order check (step 6.5)
Conformance reportEmits confidence-review-needed (warning) when low-confidence items exist

What to look for when testing:

  • Convert a PDF with images that have generic or missing alt text.
  • Check the API response for confidenceResult.requiresHumanReview — it should list the low-confidence elements with their WCAG criteria, reason, and excerpt.
  • In the VPAT, SC 1.1.1 should show “Partially Supports” with a note about low-confidence items if any images were flagged.

Configuration: Set confidenceThreshold in PostProcessOptions to adjust the threshold (default: 60). Lower values flag fewer items; higher values flag more.


WCAG_READING_ORDER_CHECK

WCAG SCs1.3.2 (Meaningful Sequence)
What it doesCompares the text sequence in the converted HTML against the PDF’s native text extraction order using trigram-based Kendall tau correlation. Pages where >30% of text segments appear reordered (correlation < 0.7) are flagged.
Cost$0 — deterministic text comparison, no LLM calls
PrerequisitespdfTextPages must be passed in PostProcessOptions (populated automatically in the PDF conversion flow via unpdf extractText). Not available for HTML remediation (no source PDF).
Pipeline stepRuns after language-of-parts (step 6.25)
Conformance reportEmits reading-order-verified (pass/warning) rule

What to look for when testing:

  • Convert a multi-column PDF (e.g., a two-column academic paper or newspaper).
  • Check server logs for [post-processing] DONE reading-order-check flagged=N — if N > 0, pages with reading-order issues were detected.
  • In the VPAT, SC 1.3.2 should show “Supports” (all pages pass) or “Partially Supports” with specific page numbers flagged.

Note: This check only runs during PDF conversion, not HTML remediation, because it requires the original PDF text for comparison.


WCAG_SCREEN_READER_SIM

WCAG SCs1.3.1, 1.3.2, 2.4.6, 4.1.2
What it doesSerializes the final HTML into a linear text stream with structural markers ([HEADING 2: ...], [IMAGE: alt="..."], [TABLE CAPTION: ...], [LINK: "..." → url]) that mimics how a screen reader announces content. Sends the stream to Gemini 2.5 Flash to flag coherence issues: orphaned captions, heading/content mismatches, alt-text contradictions, and abrupt topic shifts suggesting reading-order corruption.
Cost~$0.005–0.01 per document (Gemini 2.5 Flash, ~5K–10K tokens)
PrerequisitesGEMINI_API_KEY must be set
Pipeline stepRuns after confidence scoring (step 6.75)
Conformance reportEmits screen-reader-coherence (pass/fail/warning) rule

What to look for when testing:

  • Convert a complex document with tables, charts, and multi-level headings.
  • Check the API response for screenReaderSimResult.issues — each issue includes type, location, description, and severity.
  • Findings go to requiresHumanReview — this pass never auto-fixes, because coherence judgments are subjective.
  • In the VPAT, SCs 1.3.1, 1.3.2, and 2.4.6 benefit from the coherence check — showing “Supports” when no issues are found.

Important: This is an AI-powered check. The model may produce false positives (flagging coherent content as incoherent) or miss real issues. Treat its output as advisory, not definitive.


WCAG_OPUS_JURY

WCAG SCsAll (quality escalation for hardest items)
What it doesAfter confidence scoring, sends elements with confidence < 40% and high WCAG impact (images, tables only) to Claude Opus 4.6 for a single targeted review. Opus evaluates whether the element’s accessibility treatment is adequate and provides a corrected HTML snippet if it can improve it. Originals are only replaced if Opus produces a fix.
Cost~$0.05–0.10 per reviewed item. Hard budget cap: maxOpusCostUsd (default: $0.50/document). Typical documents have 0–3 items reviewed.
PrerequisitesANTHROPIC_API_KEY must be set. WCAG_CONFIDENCE_SCORING must be enabled (Opus jury uses its requiresHumanReview output).
Pipeline stepRuns after screen-reader sim (step 6.9)
Conformance reportOpus verdicts are attached to the items in requiresHumanReview but do not emit separate rules — they improve the underlying elements that other rules already check.

What to look for when testing:

  • Convert a PDF with complex charts or tables that the pipeline struggles with (low-quality alt text, missing table headers).
  • Enable WCAG_CONFIDENCE_SCORING first and check which items fall below the threshold.
  • Then enable WCAG_OPUS_JURY and re-convert the same document.
  • Compare the output HTML — Opus-improved elements should have better alt text or table structure.
  • Check opusJuryResult.verdicts in the API response for Opus’s assessment of each reviewed item.
  • Check opusJuryResult.totalCostUsd to verify the budget cap is working.

Budget control: Set maxOpusCostUsd in PostProcessOptions to limit per-document Opus spend. Default is $0.50. When the budget is exhausted, remaining items are skipped (counted in skippedDueToBudget).


How to Compare Results

Method 1: Before/After on the Same Document

  1. Convert a test PDF with all flags disabled (baseline):

    Terminal window
    # Ensure no WCAG_* vars are set
    unset WCAG_HEADING_COHERENCE WCAG_FORM_LABEL_JURY WCAG_LANGUAGE_OF_PARTS \
    WCAG_CONFIDENCE_SCORING WCAG_READING_ORDER_CHECK \
    WCAG_SCREEN_READER_SIM WCAG_OPUS_JURY
  2. Save the output HTML and VPAT.

  3. Enable the deterministic flags:

    Terminal window
    export WCAG_HEADING_COHERENCE=true
    export WCAG_FORM_LABEL_JURY=true
    export WCAG_LANGUAGE_OF_PARTS=true
    export WCAG_CONFIDENCE_SCORING=true
    export WCAG_READING_ORDER_CHECK=true
  4. Re-convert the same PDF. Save the output HTML and VPAT.

  5. Diff the two VPATs — criteria that were “Not Verified” should now show “Supports”, “Partially Supports”, or “Does Not Support” with specific remarks.

  6. Diff the HTML — look for added autocomplete attributes, lang attributes, and normalized heading levels.

Method 2: Staged Rollout

Enable flags one at a time and convert the same test document after each:

StepFlag enabledWhat to check
1WCAG_HEADING_COHERENCEHeading levels in output, SC 1.3.1/2.4.6 in VPAT
2+ WCAG_FORM_LABEL_JURYautocomplete attributes, SC 1.3.5/3.3.2 in VPAT
3+ WCAG_LANGUAGE_OF_PARTSlang attributes on non-English text, SC 3.1.2 in VPAT
4+ WCAG_CONFIDENCE_SCORINGrequiresHumanReview in API response
5+ WCAG_READING_ORDER_CHECKreadingOrderResult.flaggedPages in API response, SC 1.3.2
6+ WCAG_SCREEN_READER_SIMscreenReaderSimResult.issues in API response
7+ WCAG_OPUS_JURYCompare alt text quality before/after Opus review

Method 3: Staging Server

Deploy to staging with all flags enabled:

Terminal window
npm run stage

Set the env vars in your .staging-deploy manifest or staging server config. Convert several representative test PDFs and review the VPATs.


Use these document types to exercise specific flags:

Document typeFlags exercised
Multi-column academic paper (>30 pages)HEADING_COHERENCE, READING_ORDER_CHECK, SCREEN_READER_SIM
Government form (fillable PDF)FORM_LABEL_JURY, CONFIDENCE_SCORING
Multilingual report (English + CJK/Arabic/Cyrillic)LANGUAGE_OF_PARTS
Data-heavy report with charts/infographicsCONFIDENCE_SCORING, OPUS_JURY, SCREEN_READER_SIM
Simple text-only PDFAll flags should pass with no changes (verify no regressions)

Monitoring

All enhancement steps log timing to stdout in the format:

[post-processing] START heading-coherence (htmlLen=45230)
[post-processing] DONE heading-coherence (12ms) headings=8 adjusted=2 nonDesc=1

When WCAG_SCREEN_READER_SIM or WCAG_OPUS_JURY are enabled, cost is logged:

[post-processing] DONE screen-reader-sim (1230ms) issues=2 stream=8432chars cost=$0.0067
[post-processing] DONE opus-jury (3450ms) reviewed=1 improved=1 cost=$0.0823 skipped=0

These logs flow to Loki/Grafana via the standard structured logging pipeline.


Cost Summary

FlagCost per documentModel
WCAG_HEADING_COHERENCE$0
WCAG_FORM_LABEL_JURY$0
WCAG_LANGUAGE_OF_PARTS$0
WCAG_CONFIDENCE_SCORING$0
WCAG_READING_ORDER_CHECK$0
WCAG_SCREEN_READER_SIM~$0.005–0.01Gemini 2.5 Flash
WCAG_OPUS_JURY~$0.05–0.50 (budget-capped)Claude Opus 4.6
All deterministic$0
All flags enabled~$0.06–0.52

Conformance Report Impact

With no flags enabled, the VPAT reports these SCs as “Not Verified”:

  • 1.3.2 (Meaningful Sequence)
  • 1.3.5 (Identify Input Purpose) — shown as N/A for non-form docs
  • 2.4.6 (Headings and Labels) — partially covered by axe-core only
  • 3.1.2 (Language of Parts) — listed but never checked
  • 3.3.2 (Labels or Instructions) — shown as N/A for non-form docs

With all deterministic flags enabled, these upgrade to:

  • 1.3.2 → “Supports” (reading order verified) or “Partially Supports” (pages flagged)
  • 1.3.5 → “Supports” (autocomplete added) for form documents, stays “N/A” for non-form
  • 2.4.6 → “Supports” (heading coherence + descriptiveness checked)
  • 3.1.2 → “Supports” (lang attributes added) or stays “Supports” (no non-primary text found)
  • 3.3.2 → “Supports” (form labels validated) for form documents, stays “N/A” for non-form

With LLM flags also enabled:

  • 1.3.1, 1.3.2, 2.4.6 additionally benefit from screen-reader coherence validation
  • 1.1.1 benefits from Opus jury improving low-confidence alt text