Equation Workflow

How math equations are detected, processed, and rendered as accessible HTML.

Overview

The pipeline handles math equations through multiple detection and rendering strategies depending on the source material:

Source Type	Detection Method	Rendering Path
Digital PDF with math fonts	Font names + symbol analysis	Marker + temml (LaTeX to MathML)
Digital PDF with LaTeX source	LaTeX pattern matching	Marker + temml
Scanned page with typed math	Mathpix OCR	Mathpix MathML
Scanned page with handwritten math	Gemini vision classification + Mathpix OCR	Mathpix MathML
Individual equation images	Gemini `diagramType: 'equation'` + Mathpix `processImage()`	Mathpix MathML via image pipeline

All paths produce MathML output with “(reads as …)” plain-English annotations for screen reader accessibility.

Detection Phase

1. Text-Layer Math Detection (`math-detector.ts`)

For PDFs with an extractable text layer, the detector uses weighted pattern scoring:

Math Unicode characters (weight 3): \u2211, \u222B, \u221A, \u00B1, etc.
LaTeX display math (weight 5): $$...$$ patterns
LaTeX inline math (weight 4): $...$ patterns
LaTeX commands (weight 4): \frac, \sqrt, \sum, \int, trig functions
Math environments (weight 5): \begin{equation}, \begin{align}, etc.
MathML markup (weight 5): existing <math> tags
Equation patterns (weight 2): x = ..., a^2 + b^2

Score thresholds: 0-3 = no math, 4-7 = uncertain, 8+ = math detected.

2. PDF Complexity Detection (`pdf-complexity-detector.ts`)

Zero-LLM pre-check that reads the PDF binary directly:

Detects math font names: cmsy, cmmi, cmex, stix, cambria math, etc.
Counts paintImageMaskXObject operations (1-bit masks used for math glyphs)
15+ image masks per page + 4+ distinct font sizes = strong math signal
Classifies pages as text, math, image, mixed, table, or dense-table

3. Vision-Based Detection (Image Pipeline)

For scanned documents with no text layer:

Gemini Flash analyzes each extracted image
Returns diagramType: 'equation' when it identifies mathematical content
This triggers Mathpix refinement in the image description pipeline

Rendering Paths

Path A: Marker + temml (Digital PDFs with LaTeX)

Used when the complexity detector identifies math content type with extractable text.

PDF page
  -> Marker API (extracts text, outputs <math>raw LaTeX</math>)
  -> temml converts LaTeX to MathML
  -> addMathReadingAnnotations() adds "(reads as ...)" annotations
  -> Quality check (temml failure rate < 30%)
  -> Accept or escalate to Mathpix

Cost: ~$0.006/page (Marker only). temml is a local library, no API cost.

How temml works: Marker outputs equations as <math>5.021 \times 10^4</math> — valid LaTeX but not valid MathML. temml converts to proper <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>5.021</mn><mo>\u00D7</mo><msup><mn>10</mn><mn>4</mn></msup></math>.

Path B: Mathpix Full-Page OCR (Math/Dense-Table/Scanned Pages)

Used when:

Complexity detector classifies a page as math and temml fails > 30%
Page is classified as dense-table
Page is classified as image (scanned) and Mathpix detects equations

PDF page
  -> Render to PNG (via Puppeteer)
  -> Mathpix /v3/text API (OCR with math + text modes)
  -> Returns HTML with <math> MathML + <mathml> blocks
  -> Strip hidden <latex> tags, show <mathml> content
  -> addMathReadingAnnotations() adds "(reads as ...)" annotations
  -> Quality check
  -> Accept or escalate to vision cascade

Cost: ~$0.01-0.10/page (Mathpix API). Scanned handwritten math: For image pages, the cascade now tries Mathpix first before falling through to Gemini/Claude vision. Mathpix excels at handwritten math OCR — it detects equations that the complexity detector misses (since scanned pages have no text layer or math fonts to analyze). If hasEquations is true in the Mathpix response and quality passes the threshold, the output is accepted. Used when individual equation images are extracted from the PDF (e.g., equations embedded as raster images in the source).

PDF -> Extract images (unpdf + sharp)
  -> Gemini Flash classifies each image
  -> diagramType === 'equation'?
      |-- No:  Normal alt text injection
      |-- Yes: Mathpix processImage() API
                -> Returns { mathml, latex }
                -> Store in ImageAnalysis.mathpixMathml / mathpixLatex
                -> injectAltText() replaces <img> with <math> MathML
                -> addMathReadingAnnotations() adds "(reads as ...)"

Cost: $0.0003/image (Gemini classification) + $0.002/equation (Mathpix). Graceful degradation: If Mathpix fails, the vision model’s alt text is preserved. The equation remains as an <img> with descriptive alt text rather than rendered MathML.

”(reads as …)” Annotations

MathML screen reader support is inconsistent across browsers and assistive technology. Every rendered equation gets a plain-English annotation:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow><mn>2</mn><mi>x</mi><mo>+</mo><mn>3</mn><mi>y</mi><mo>+</mo><mi>z</mi><mo>=</mo><mn>17</mn></mrow>
</math>
<p class="math-annotation">
  <span class="math-reading sr-only">(reads as "2x plus 3y plus z equals 17")</span>
</p>

The latexToPlainEnglish() function in latex-math-renderer.ts handles conversion:

LaTeX	Plain English
`\frac{a}{b}`	”a over b”
`x^2`	”x squared”
`x^3`	”x cubed”
`x^{n}`	”x to the power of n”
`\sqrt{x}`	”square root of x”
`\alpha`, `\beta`, `\pi`	”alpha”, “beta”, “pi”
`\int_0^\infty`	”integral from 0 to infinity of”
`\sum`	”sum of”
`\times`	”times”
`\pm`	”plus or minus”
`\leq`	”less than or equal to”
`\neq`	”not equal to”
`=`, `+`, `-`	”equals”, “plus”, “minus”

The annotation uses sr-only CSS class so it is available to screen readers but not visible on screen (unless the user inspects the HTML or uses assistive technology).

Cost Tracking

All equation processing costs are tracked and reported:

Image Description Pipeline (`image-description-pipeline.ts`)

ImageDescriptionResult {
  totalCostUsd: number;             // Gemini + Mathpix combined
  mathpixEquationsProcessed: number; // Count of images sent to Mathpix
  costBreakdown: {
    geminiCostUsd: number;           // $0.0003/image
    mathpixCostUsd: number;          // $0.002/equation
  }
}

Smart Cascade (`smart-cascade-converter.ts`)

Mathpix page-level calls are tracked via TokenUsage:

model: 'mathpix'
estimatedCostUsd: 0.01 (image pages) or 0.10 (math/dense-table pages)

Gateway (`gateway.ts`)

Cost metadata recorded to the cost ledger:

metadata: {
  imagesDescribed: number,
  imageDescriptionCostUsd: number,
  mathpixEquationsProcessed: number,
  imageDescCostBreakdown: { geminiCostUsd, mathpixCostUsd } | null
}

Routing Decision Tree

PDF Page
  |
  ├── Has text layer?
  |     |
  |     ├── Math fonts detected? ──> contentType: 'math'
  |     |     |
  |     |     ├── Marker + temml (free)
  |     |     |     |
  |     |     |     ├── temml success rate >= 70% ──> ACCEPT
  |     |     |     └── temml failure rate > 30%  ──> Try Mathpix
  |     |     |
  |     |     └── Mathpix page OCR ($0.10)
  |     |           |
  |     |           ├── Quality >= threshold ──> ACCEPT
  |     |           └── Quality < threshold  ──> Vision cascade
  |     |
  |     ├── Dense table detected? ──> contentType: 'dense-table'
  |     |     └── Mathpix page OCR ($0.10) ──> quality check ──> accept or escalate
  |     |
  |     └── Plain text ──> contentType: 'text'
  |           └── Marker + LLM structuring
  |
  └── No text layer (scanned)?
        |
        └── contentType: 'image'
              |
              ├── Mathpix probe ($0.01)
              |     |
              |     ├── hasEquations && quality >= threshold ──> ACCEPT (with annotations)
              |     └── No equations or low quality ──> fall through
              |
              └── Vision cascade (Gemini Flash -> Claude agentic)
                    |
                    └── Image pipeline (parallel):
                          Gemini classifies extracted images
                          diagramType 'equation' ──> Mathpix processImage ($0.002)
                          Other types ──> alt text only

Key Files

File	Role
`services/math-detector.ts`	Text-based math detection (fonts, symbols, LaTeX patterns)
`services/pdf-complexity-detector.ts`	Binary PDF analysis, page classification
`services/latex-math-renderer.ts`	temml LaTeX-to-MathML + “(reads as …)” annotations
`services/equation-renderer.ts`	Replaces equation `<img>` tags with MathML via Mathpix
`services/mathpix-pdf.ts`	Mathpix API client (PDF + image endpoints)
`services/image-enhancer.ts`	Vision model image analysis + equation injection in `injectAltText()`
`services/image-description-pipeline.ts`	Parallel image processing with Mathpix equation refinement
`services/smart-cascade-converter.ts`	Page routing: Mathpix probe for scanned image pages
`routes/gateway.ts`	Orchestrator: wires Mathpix credentials, records costs

Background

See equation-rendering-problem.md for the original problem analysis that led to the temml solution for Marker’s raw LaTeX output. The scanned handwritten math pipeline (Mathpix probe for image pages) was added to handle cases where no text layer exists and the complexity detector cannot detect math fonts or symbols.