Skip to content

WCAG Static Checks Implementation Plan

This document is the implementation plan for adding the 11 statically testable WCAG 2.1 criteria identified in WCAG-COVERAGE-GAPS.md. Each item is scoped, prioritized, and ready to implement.

Last updated: 2026-03-07


Environment Notes

  • The validator runs on Node.js (not Cloudflare Workers). Any npm package is available.
  • Puppeteer is already installed and used for PDF rendering/screenshots — it is available as a tool if a check needs a rendered DOM, though all items in this plan are purely HTML/text analysis.
  • The validator lives in workers/api/src/services/wcag-validator.ts (currently ~1,480 lines). At the end of Phase 2 it will exceed 2,000 lines; at that point the file should be split into focused modules (see Refactoring note at the end).
  • All new rules follow the existing patterns: add to ALL_RULES, push to violations or warnings, push to evaluatedRules, add tests in src/__tests__/services/wcag-validator.test.ts.
  • New checks that are heuristic-based (false positives are possible) should produce warnings, not violations. Checks that are deterministic should produce violations.

Phased Plan

PhaseItemsRationale
11.4.5, 4.1.3, 1.3.3, 1.1.1Low effort, no new dependencies, high signal-to-noise ratio
24.1.2, 2.4.6, 1.4.1, 3.2.4, 1.3.2Medium effort, pure regex/string analysis, no new dependencies
34.1.1, 3.1.2Higher complexity — 4.1.1 needs a DOM parser, 3.1.2 needs Unicode range analysis

Phase 1 — Quick Wins

Item 1 — 1.4.5: Images of Text (AA)

WCAG SC: 1.4.5 Images of Text (Level AA) Current state: The images-of-text-no-exception rule already implements this heuristic at AAA. The AA version is completely absent. Difference from AAA version: AA allows exceptions for decorative images (alt="") and logos. AAA allows no exceptions.

Implementation:

Add a new check block in the AA section (before the AAA section) in validateWCAG:

// images-of-text — WCAG 1.4.5 (AA)
{
const imgMatches = html.matchAll(/<img[^>]*>/gi);
let foundImagesOfText = false;
for (const match of imgMatches) {
const tag = match[0];
const altMatch = tag.match(/alt\s*=\s*["']([^"']*)["']/i);
const alt = altMatch?.[1] ?? '';
// Skip decorative images and likely logos
if (alt === '') continue;
if (/logo|icon|badge|seal|signature/i.test(alt)) continue;
// Flag images with long prose alt text — likely an image of text
if (alt.length > 80 && alt.split(/\s+/).length > 8) {
foundImagesOfText = true;
warnings.push({ id: 'images-of-text', ... });
}
}
evaluatedRules.push({ id: 'images-of-text', result: foundImagesOfText ? 'warning' : 'pass', ... });
}

Add to ALL_RULES: { id: 'images-of-text', level: 'AA', description: 'Images should not be used to present text', helpUrl: '...' }

Severity: Warning (heuristic) Files changed: wcag-validator.ts Tests to write (5):

  1. Passes for alt="" (decorative)
  2. Passes for alt="Company logo"
  3. Passes for short alt text (alt="Chart showing sales")
  4. Warns for long prose alt text at AA level
  5. Does not fire the AAA rule at AA level (AAA images-of-text-no-exception still fires at AAA)
  6. Skipped at AA level when level: 'AA' — confirm AAA rule is separate

Item 2 — 4.1.3: Status Messages (AA)

WCAG SC: 4.1.3 Status Messages (Level AA) Current state: Not checked. What it means: Any region that can receive dynamically injected status/error messages must be identified with role="status", role="alert", role="log", or aria-live so assistive technology can announce it without it receiving focus.

Implementation:

The check has two parts:

  1. If form elements exist in the document, warn if no live region is present — forms commonly produce validation feedback.
  2. If elements have class names or IDs containing “alert”, “error”, “notice”, “status”, “notification”, “message” but lack a live region role, flag them.
// status-messages — WCAG 4.1.3 (AA)
{
const hasLiveRegion = /role\s*=\s*["'](status|alert|log|timer|marquee)["']/i.test(html)
|| /aria-live\s*=/i.test(html);
const hasForms = /<form[\s>]/i.test(html);
const hasFeedbackPatterns = /(?:class|id)\s*=\s*["'][^"']*(?:alert|error|notice|status|notification|message|feedback)[^"']*["']/i.test(html);
const needsLiveRegion = hasForms || hasFeedbackPatterns;
const passed = !needsLiveRegion || hasLiveRegion;
if (!passed) {
warnings.push({ id: 'status-messages', ... });
}
evaluatedRules.push({ id: 'status-messages', result: passed ? 'pass' : 'warning', ... });
}

Severity: Warning (heuristic — converted PDFs rarely have forms, but DOCX conversions may) Files changed: wcag-validator.ts Tests to write (5):

  1. Passes when no forms and no feedback patterns
  2. Passes when form present AND role="alert" present
  3. Passes when form present AND aria-live present
  4. Warns when form present and no live region
  5. Warns when element with class="error-message" present and no live region
  6. Passes when no indicators at all (pure document content)

Item 3 — 1.3.3: Sensory Characteristics (A)

WCAG SC: 1.3.3 Sensory Characteristics (Level A) Current state: Not checked. What it means: Instructions must not rely solely on sensory characteristics — shape, color, size, or spatial location. “Click the green button” or “see the diagram on the left” fails if that is the only way to identify the item.

Implementation:

Pattern match against the text content of the document body. False positive rate is manageable since these are specific phrases. Produce a warning, not a violation, since context determines whether color/shape is the only indicator.

// sensory-characteristics — WCAG 1.3.3 (A)
{
const bodyText = html.replace(/<[^>]+>/g, ' ');
const sensoryPatterns = [
/\b(click|select|press|tap|choose)\s+the\s+(red|green|blue|yellow|orange|purple|pink|gray|grey|black|white)\s+\w+/i,
/\b(the|a)\s+(red|green|blue|yellow|orange|purple|pink|gray|grey|black|white)\s+(button|link|box|section|area|icon|image)\b/i,
/\b(the\s+)?(box|section|area|panel|column|diagram)\s+(on\s+the\s+)?(left|right|top|bottom|above|below)\b/i,
/\bthe\s+(round|square|circular|rectangular|triangular)\s+\w+/i,
/\b(the\s+)?(small|large|big|tiny)\s+(button|link|icon|box)\b/i,
];
const matches: string[] = [];
for (const pattern of sensoryPatterns) {
const m = bodyText.match(pattern);
if (m) matches.push(m[0].trim());
}
const passed = matches.length === 0;
if (!passed) {
warnings.push({ id: 'sensory-characteristics', nodes: matches.map(m => ({ html: m, ... })), ... });
}
evaluatedRules.push({ id: 'sensory-characteristics', result: passed ? 'pass' : 'warning', ... });
}

Severity: Warning Files changed: wcag-validator.ts Tests to write (6):

  1. Passes for normal body text with no sensory references
  2. Warns for “click the green button”
  3. Warns for “the box on the right”
  4. Warns for “the round icon”
  5. Warns for “see the diagram below” (spatial)
  6. Does not false-positive on “the green fields of Ireland” (sensory but not instructional)

Item 4 — 1.1.1: Meaningful Alt Text (A, partial)

WCAG SC: 1.1.1 Non-text Content (Level A) — extends existing image-alt check Current state: We detect a missing alt attribute. We do not detect a present but meaningless alt attribute. What it means: alt="image" is technically present but is useless to a screen reader user. Common bad values include generic words, filenames, and placeholder text injected by conversion tools.

Implementation:

Add a second pass after the existing image-alt check. Introduce a new rule ID image-alt-meaningful to keep it separate from the missing-alt check.

// image-alt-meaningful — WCAG 1.1.1 (A, partial)
{
const MEANINGLESS_ALT = /^(image|img|photo|photograph|picture|graphic|figure|icon|screenshot|scan|page|untitled|placeholder|temp|tmp|null|undefined|none|blank|spacer|\s*)$/i;
const FILENAME_PATTERN = /\.(png|jpg|jpeg|gif|webp|svg|bmp|tiff?|pdf)$/i;
const GENERIC_PREFIX = /^(img_|image_|photo_|fig_|figure_|scan_|page_)\d+/i;
const imgMatches2 = html.matchAll(/<img[^>]*>/gi);
let hasMeaninglessAlt = false;
for (const match of imgMatches2) {
const tag = match[0];
const altMatch = tag.match(/alt\s*=\s*["']([^"']*)["']/i);
if (!altMatch) continue; // already caught by image-alt
const alt = altMatch[1].trim();
if (alt === '') continue; // decorative — valid
if (MEANINGLESS_ALT.test(alt) || FILENAME_PATTERN.test(alt) || GENERIC_PREFIX.test(alt)) {
hasMeaninglessAlt = true;
violations.push({ id: 'image-alt-meaningful', impact: 'serious', ... });
}
}
evaluatedRules.push({ id: 'image-alt-meaningful', result: hasMeaninglessAlt ? 'fail' : 'pass', ... });
}

Severity: Violation (deterministic — these values are always wrong) Files changed: wcag-validator.ts Tests to write (7):

  1. Passes for descriptive alt text
  2. Passes for alt="" (decorative is valid)
  3. Fails for alt="image"
  4. Fails for alt="photo"
  5. Fails for alt="img_001.png" (filename)
  6. Fails for alt="figure_3.jpg" (filename with prefix)
  7. Does not double-report when alt is missing (that is image-alt’s job)

Phase 2 — Medium Complexity

Item 5 — 4.1.2: ARIA Role Validation (A, partial)

WCAG SC: 4.1.2 Name, Role, Value (Level A) Current state: We check button and link names, but not ARIA role validity or attribute compatibility. What it means: Using role="badvalue" or aria-checked on an element that doesn’t support it makes the accessibility tree incorrect.

Implementation:

Define two lookup structures — valid ARIA roles, and which aria-* attributes require a compatible role. No new dependencies; pure string matching.

// aria-role-valid + aria-allowed-attr — WCAG 4.1.2 (A)
const VALID_ARIA_ROLES = new Set([
'alert','alertdialog','application','article','banner','button','cell',
'checkbox','columnheader','combobox','complementary','contentinfo','definition',
'dialog','directory','document','feed','figure','form','grid','gridcell','group',
'heading','img','link','list','listbox','listitem','log','main','marquee','math',
'menu','menubar','menuitem','menuitemcheckbox','menuitemradio','navigation','none',
'note','option','presentation','progressbar','radio','radiogroup','region','row',
'rowgroup','rowheader','scrollbar','search','searchbox','separator','slider',
'spinbutton','status','switch','tab','table','tablist','tabpanel','term','textbox',
'timer','toolbar','tooltip','tree','treegrid','treeitem',
]);
// aria-* attributes that are only valid on specific roles
const ROLE_RESTRICTED_ATTRS: Record<string, string[]> = {
'aria-checked': ['checkbox','menuitemcheckbox','menuitemradio','option','radio','switch','treeitem'],
'aria-expanded': ['button','checkbox','combobox','listbox','option','row','tab','treeitem','grid'],
'aria-selected': ['gridcell','option','row','tab','treeitem','columnheader','rowheader'],
'aria-pressed': ['button'],
'aria-level': ['heading','listitem','row','treeitem'],
'aria-multiline': ['textbox','searchbox'],
'aria-readonly': ['checkbox','combobox','grid','gridcell','listbox','radiogroup','slider','spinbutton','textbox'],
};

Two sub-checks:

  1. aria-role-valid — flag any role="..." value not in the valid roles set.
  2. aria-allowed-attr — flag any aria-* attribute on an element whose role (or implicit role) does not support it.

Severity: Violation for invalid role values; Warning for mismatched aria attributes Files changed: wcag-validator.ts Tests to write (9):

  1. Passes for role="button" (valid)
  2. Passes for role="navigation" (valid)
  3. Fails for role="badvalue"
  4. Fails for role="dropdown" (not in spec)
  5. Passes for <div role="checkbox" aria-checked="true">
  6. Warns for <div aria-checked="true"> (no role, implicit role is generic)
  7. Warns for <p aria-pressed="false"> (pressed only valid on button)
  8. Passes when no ARIA roles present
  9. Multiple invalid roles reported individually

Item 6 — 2.4.6: Headings and Labels (AA)

WCAG SC: 2.4.6 Headings and Labels (Level AA) Current state: We check heading order and emptiness but not whether heading text is descriptive. What it means: Headings must describe the topic of the section. “Section 2”, “Continued”, or a single character heading fails this criterion.

Implementation:

// heading-descriptive — WCAG 2.4.6 (AA)
{
const NON_DESCRIPTIVE = /^(section|chapter|part|continued?|see\s+above|see\s+below|n\/a|tbd|todo|untitled|\d+[\.\d]*)\.?$/i;
const headingMatches = html.matchAll(/<h([1-6])[^>]*>([\s\S]*?)<\/h\1>/gi);
let hasNonDescriptive = false;
for (const m of headingMatches) {
const text = m[2].replace(/<[^>]+>/g, '').trim();
if (text.length === 0) continue; // caught by empty-heading
if (text.length < 3 || NON_DESCRIPTIVE.test(text)) {
hasNonDescriptive = true;
warnings.push({ id: 'heading-descriptive', ... });
}
}
evaluatedRules.push({ id: 'heading-descriptive', result: hasNonDescriptive ? 'warning' : 'pass', ... });
}

Severity: Warning (heuristic — short headings are not always wrong, e.g. <h2>FAQ</h2>) Files changed: wcag-validator.ts Tests to write (7):

  1. Passes for <h2>Introduction to Machine Learning</h2>
  2. Passes for <h2>FAQ</h2> (short but a real acronym — test that 3-char headings pass)
  3. Warns for <h2>Section 2</h2>
  4. Warns for <h3>Continued</h3>
  5. Warns for <h2>1.3</h2> (purely numeric)
  6. Warns for <h2>N/A</h2>
  7. Does not double-report empty headings (already caught by empty-heading)

Item 7 — 1.4.1: Use of Color (A)

WCAG SC: 1.4.1 Use of Color (Level A) Current state: Not checked. What it means: Color must not be the only visual means of conveying information, indicating an action, or distinguishing a visual element. The classic failure is <span style="color:red">Error</span> or a table where red rows mean “failed” with no text or icon indicator.

Implementation:

Two heuristics:

  1. Detect <span> or <td>/<th> elements with only a color inline style and no other semantic indicator (surrounding text doesn’t include words like “error”, “warning”, “required”, etc.).
  2. Detect <span style="color:..."> that wraps a single word in a sentence without an accompanying icon or symbol.
// use-of-color — WCAG 1.4.1 (A)
{
const colorOnlyPattern = /<(span|td|th|p|div|li)[^>]*style\s*=\s*["'][^"']*\bcolor\s*:[^;"']+["'][^>]*>([\s\S]*?)<\/\1>/gi;
const SEMANTIC_INDICATORS = /\b(error|warning|caution|danger|required|invalid|valid|success|fail|passed?|notice|alert|important|critical)\b/i;
let colorOnlyFound = false;
for (const m of html.matchAll(colorOnlyPattern)) {
const tag = m[0];
const inner = m[2].replace(/<[^>]+>/g, '').trim();
// Only flag if: no aria-label/role, no semantic text indicator, no icon
if (/aria-label|role\s*=/i.test(tag)) continue;
if (SEMANTIC_INDICATORS.test(inner)) continue;
if (/<img|<svg|&#/i.test(m[0])) continue; // has icon/symbol
if (inner.length > 0 && inner.length < 60) {
colorOnlyFound = true;
warnings.push({ id: 'use-of-color', ... });
}
}
evaluatedRules.push({ id: 'use-of-color', result: colorOnlyFound ? 'warning' : 'pass', ... });
}

Severity: Warning (heuristic — we cannot know with certainty that color is the only indicator) Files changed: wcag-validator.ts Tests to write (6):

  1. Passes for <span style="color:red">Error: file not found</span> (has semantic word)
  2. Passes for <span style="color:red" aria-label="Error">X</span> (has aria-label)
  3. Warns for <span style="color:red">Smith</span> (color only, no indicator)
  4. Warns for <td style="color:red">42</td> (data cell colored with no label)
  5. Passes for a <span> without any color style
  6. Does not flag elements with both color and an icon child element

Item 8 — 3.2.4: Consistent Identification (AA)

WCAG SC: 3.2.4 Consistent Identification (Level AA) Current state: Not checked. What it means: Components with the same function must be identified consistently. If one submit button is <button>Submit</button> and another is <a href="#">Submit</a>, screen reader users get an inconsistent experience.

Implementation:

Collect all interactive elements (buttons, links, inputs) by their visible label. If the same label appears on two different element types performing the same action, flag it.

// consistent-identification — WCAG 3.2.4 (AA)
{
type InteractiveEl = { tag: string; label: string; html: string };
const elements: InteractiveEl[] = [];
for (const m of html.matchAll(/<button[^>]*>([\s\S]*?)<\/button>/gi))
elements.push({ tag: 'button', label: m[1].replace(/<[^>]+>/g, '').trim().toLowerCase(), html: m[0] });
for (const m of html.matchAll(/<a\s[^>]*href[^>]*>([\s\S]*?)<\/a>/gi))
elements.push({ tag: 'a', label: m[1].replace(/<[^>]+>/g, '').trim().toLowerCase(), html: m[0] });
for (const m of html.matchAll(/<input[^>]*(?:type\s*=\s*["']?(?:submit|button))[^>]*>/gi)) {
const val = m[0].match(/value\s*=\s*["']([^"']+)["']/i)?.[1]?.toLowerCase() ?? '';
if (val) elements.push({ tag: 'input', label: val, html: m[0] });
}
const labelToTags = new Map<string, Set<string>>();
for (const el of elements) {
if (!el.label) continue;
if (!labelToTags.has(el.label)) labelToTags.set(el.label, new Set());
labelToTags.get(el.label)!.add(el.tag);
}
let inconsistent = false;
for (const [label, tags] of labelToTags) {
if (tags.size > 1) {
inconsistent = true;
warnings.push({ id: 'consistent-identification', ... });
}
}
evaluatedRules.push({ id: 'consistent-identification', result: inconsistent ? 'warning' : 'pass', ... });
}

Severity: Warning Files changed: wcag-validator.ts Tests to write (5):

  1. Passes when all “Submit” labels are <button>
  2. Passes when labels differ (<button>Submit</button> and <a>Read more</a>)
  3. Warns when <button>Submit</button> and <a href="#">Submit</a> both present
  4. Passes when no interactive elements present
  5. Case-insensitive comparison (“submit” vs “Submit” treated as same label)

Item 9 — 1.3.2: Meaningful Sequence — Layout Table Detection (A)

WCAG SC: 1.3.2 Meaningful Sequence (Level A) Current state: Not checked. Layout tables from PDFs are a common conversion artifact. What it means: Content must be presented in a sequence that makes sense. A table used purely for visual layout (not data) breaks reading order for screen readers.

Implementation:

Detect tables that look like layout tables rather than data tables. A table is likely a layout table if it has: no <th> elements, no <caption>, no summary attribute, and its cells contain only block-level elements (divs, paragraphs) rather than data values.

// layout-table — WCAG 1.3.2 (A)
{
const tables = html.match(/<table[\s>][\s\S]*?<\/table>/gi) || [];
let layoutTableFound = false;
for (const table of tables) {
const hasHeader = /<th[\s>]/i.test(table);
const hasCaption = /<caption[\s>]/i.test(table);
const hasSummary = /\bsummary\s*=/i.test(table);
const hasRole = /\brole\s*=\s*["'](?!presentation|none)/i.test(table);
const isExplicitLayout = /\brole\s*=\s*["'](presentation|none)["']/i.test(table);
if (isExplicitLayout) continue; // explicitly marked as layout
if (hasHeader || hasCaption || hasSummary || hasRole) continue; // likely a data table
// Check if cells contain only block-level content (layout indicator)
const cellContents = [...table.matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)].map(m => m[1]);
const blockOnlyCells = cellContents.filter(c =>
c.trim().length > 0 && /^\s*<(div|p|ul|ol|h[1-6]|section|article|header|footer)[\s>]/i.test(c.trim())
);
if (blockOnlyCells.length > 0 && blockOnlyCells.length === cellContents.length) {
layoutTableFound = true;
warnings.push({ id: 'layout-table', ... });
}
}
evaluatedRules.push({ id: 'layout-table', result: layoutTableFound ? 'warning' : 'pass', ... });
}

Severity: Warning (heuristic) Files changed: wcag-validator.ts Tests to write (6):

  1. Passes for a standard data table with <th> headers
  2. Passes for a table with <caption>
  3. Passes for a table with role="presentation" (explicitly declared layout)
  4. Warns for a table with no <th>, no <caption>, and cells containing only <div> or <p> children
  5. Passes for a table with no <th> but cells containing plain text values (could be data)
  6. Passes for an empty table

Phase 3 — Higher Complexity

Item 10 — 4.1.1: HTML Parsing / Invalid Nesting (A, partial)

WCAG SC: 4.1.1 Parsing (Level A) Current state: We detect duplicate id values. We do not detect invalid element nesting or malformed markup. What it means: Block elements (<div>, <p>, <ul>) inside inline elements (<span>, <a>, <em>) is invalid HTML and breaks the accessibility tree in some browsers and AT.

Dependency: Add node-html-parser to the API’s package.json. It is a fast, lightweight HTML parser with no browser dependencies, suitable for this environment.

Terminal window
npm install node-html-parser

Implementation:

import { parse } from 'node-html-parser';
// invalid-nesting — WCAG 4.1.1 (A)
{
const INLINE_ELEMENTS = new Set(['a','abbr','acronym','b','bdo','big','br','button','cite',
'code','dfn','em','i','img','input','kbd','label','map','object','output','q','samp',
'select','small','span','strong','sub','sup','textarea','time','tt','u','var']);
const BLOCK_ELEMENTS = new Set(['div','p','ul','ol','li','table','thead','tbody','tr','th','td',
'h1','h2','h3','h4','h5','h6','blockquote','pre','figure','figcaption','section',
'article','header','footer','main','nav','aside','dl','dt','dd','form','fieldset',
'address','hr']);
const root = parse(html);
const nestingViolations: string[] = [];
function walk(node: any) {
if (!node.childNodes) return;
for (const child of node.childNodes) {
if (child.nodeType === 1) { // element node
const parentTag = node.rawTagName?.toLowerCase();
const childTag = child.rawTagName?.toLowerCase();
if (parentTag && childTag) {
// Block element inside inline element is invalid
if (INLINE_ELEMENTS.has(parentTag) && BLOCK_ELEMENTS.has(childTag)) {
nestingViolations.push(`<${childTag}> inside <${parentTag}>`);
}
// Anchor inside anchor is invalid
if (parentTag === 'a' && childTag === 'a') {
nestingViolations.push('<a> nested inside <a>');
}
// Interactive inside interactive
if (parentTag === 'button' && (childTag === 'button' || childTag === 'a' || childTag === 'input')) {
nestingViolations.push(`<${childTag}> nested inside <button>`);
}
}
walk(child);
}
}
}
walk(root);
const passed = nestingViolations.length === 0;
if (!passed) {
for (const v of [...new Set(nestingViolations)]) {
violations.push({ id: 'invalid-nesting', impact: 'moderate', ... });
}
}
evaluatedRules.push({ id: 'invalid-nesting', result: passed ? 'pass' : 'fail', ... });
}

Severity: Violation (deterministic — invalid HTML is always wrong) New dependency: node-html-parser Files changed: wcag-validator.ts, package.json Tests to write (8):

  1. Passes for valid block-in-block (<div><p>text</p></div>)
  2. Passes for valid inline-in-block (<p><strong>text</strong></p>)
  3. Fails for <span><div>text</div></span> (block inside inline)
  4. Fails for <a href="#"><p>text</p></a> (block inside anchor)
  5. Fails for <a href="#"><a href="#">nested</a></a> (anchor inside anchor)
  6. Fails for <button><button>Click</button></button>
  7. Passes for <a href="#"><span>text</span></a> (inline inside anchor — valid)
  8. Deduplicates the same violation type reported multiple times

Item 11 — 3.1.2: Language of Parts (AA)

WCAG SC: 3.1.2 Language of Parts (Level AA) Current state: We verify the document-level lang attribute but not inline language changes. What it means: When a passage switches language, the surrounding element must declare the new language via lang="xx" so AT can switch its pronunciation engine.

Implementation:

Use Unicode block ranges to detect non-document-language script characters. If the document is declared as English (lang="en") but contains CJK, Arabic, Cyrillic, Hebrew, Devanagari, or Thai characters outside of elements with a lang attribute, flag it.

This does not require a language detection library — Unicode ranges are sufficient for script-level detection.

// lang-of-parts — WCAG 3.1.2 (AA)
{
const docLangMatch = html.match(/<html[^>]*lang\s*=\s*["']([^"']+)["']/i);
const docLang = (docLangMatch?.[1] ?? 'en').toLowerCase().split('-')[0];
// Scripts that are always a different language from Latin-based documents
const FOREIGN_SCRIPT_RANGES: Array<{ name: string; pattern: RegExp; langs: string[] }> = [
{ name: 'CJK', pattern: /[\u4E00-\u9FFF\u3040-\u30FF\uAC00-\uD7AF]/, langs: ['zh','ja','ko'] },
{ name: 'Arabic', pattern: /[\u0600-\u06FF\u0750-\u077F]/, langs: ['ar','fa','ur'] },
{ name: 'Cyrillic', pattern: /[\u0400-\u04FF]/, langs: ['ru','uk','bg','sr'] },
{ name: 'Hebrew', pattern: /[\u0590-\u05FF]/, langs: ['he','yi'] },
{ name: 'Devanagari', pattern: /[\u0900-\u097F]/, langs: ['hi','mr','sa'] },
{ name: 'Greek', pattern: /[\u0370-\u03FF]/, langs: ['el'] },
{ name: 'Thai', pattern: /[\u0E00-\u0E7F]/, langs: ['th'] },
];
// Only check when document is declared as a Latin-based language
const LATIN_LANGS = new Set(['en','fr','de','es','it','pt','nl','sv','da','no','fi','pl','cs','ro','hu']);
if (!LATIN_LANGS.has(docLang)) {
// Skip — document is already declared non-Latin; a different check would be needed
evaluatedRules.push({ id: 'lang-of-parts', result: 'pass', ... });
} else {
// Strip elements that already have a lang attribute from the text to check
const withoutLanggedElements = html.replace(/<[^>]*\blang\s*=\s*["'][^"']+["'][^>]*>[\s\S]*?<\/[a-z][a-z0-9]*>/gi, '');
const plainText = withoutLanggedElements.replace(/<[^>]+>/g, '');
const unlabelledForeignScripts: string[] = [];
for (const script of FOREIGN_SCRIPT_RANGES) {
if (script.pattern.test(plainText) && !script.langs.includes(docLang)) {
unlabelledForeignScripts.push(script.name);
}
}
const passed = unlabelledForeignScripts.length === 0;
if (!passed) {
warnings.push({ id: 'lang-of-parts',
description: `Document contains ${unlabelledForeignScripts.join(', ')} script characters without a lang attribute on the containing element`, ... });
}
evaluatedRules.push({ id: 'lang-of-parts', result: passed ? 'pass' : 'warning', ... });
}
}

Severity: Warning (we can detect the script but cannot determine the exact language) Files changed: wcag-validator.ts Tests to write (7):

  1. Passes for a Latin-script English document with no foreign characters
  2. Warns for an English document containing CJK characters without a lang attribute on a parent element
  3. Warns for an English document containing Arabic characters without lang
  4. Warns for an English document containing Cyrillic characters without lang
  5. Passes when foreign script characters are inside an element with a lang attribute (e.g. <span lang="ja">日本語</span>)
  6. Passes when the document itself is declared as a non-Latin language (e.g. lang="zh")
  7. Passes for mathematical symbols (Greek letters in equations are common and should not fire)

Refactoring Note

The validator will exceed 2,000 lines after Phase 1 and ~2,500 lines after Phase 2. At that point it should be split into focused modules:

workers/api/src/services/wcag/
index.ts — re-exports validateWCAG, applyWCAGFixes, validateAndFix
rules-level-a.ts — all Level A checks
rules-level-aa.ts — all Level AA checks
rules-level-aaa.ts — all Level AAA checks
fixes.ts — applyWCAGFixes implementation
enhance.ts — enhanceAccessibility implementation
types.ts — shared interfaces
aria-data.ts — ARIA role lookup tables (used by Phase 2 Item 5)

This split can happen at the start of Phase 2 as a preparatory step, or at the end of Phase 1 if the file is already feeling unwieldy.


Test Count Summary

PhaseItemsNew Tests
Phase 14 items~24 tests
Phase 25 items~34 tests
Phase 32 items~15 tests
Total11 items~73 new tests

New Dependencies

PackagePhaseReason
node-html-parserPhase 3DOM walking for invalid nesting checks (Item 10). Lightweight, no browser required.

No other new dependencies are required. All other checks use regex and string operations only.


New Rule IDs Summary

Rule IDSCLevelSeverity
images-of-text1.4.5AAWarning
status-messages4.1.3AAWarning
sensory-characteristics1.3.3AWarning
image-alt-meaningful1.1.1AViolation
aria-role-valid4.1.2AViolation
aria-allowed-attr4.1.2AWarning
heading-descriptive2.4.6AAWarning
use-of-color1.4.1AWarning
consistent-identification3.2.4AAWarning
layout-table1.3.2AWarning
invalid-nesting4.1.1AViolation
lang-of-parts3.1.2AAWarning