Equation Workflow
How math equations are detected, processed, and rendered as accessible HTML.
Overview
The pipeline handles math equations through multiple detection and rendering strategies depending on the source material:
| Source Type | Detection Method | Rendering Path |
|---|---|---|
| Digital PDF with math fonts | Font names + symbol analysis | Marker + temml (LaTeX to MathML) |
| Digital PDF with LaTeX source | LaTeX pattern matching | Marker + temml |
| Scanned page with typed math | Mathpix OCR | Mathpix MathML |
| Scanned page with handwritten math | Gemini vision classification + Mathpix OCR | Mathpix MathML |
| Individual equation images | Gemini diagramType: 'equation' + Mathpix processImage() | Mathpix MathML via image pipeline |
All paths produce MathML output with β(reads as β¦)β plain-English annotations for screen reader accessibility.
Detection Phase
1. Text-Layer Math Detection (math-detector.ts)
For PDFs with an extractable text layer, the detector uses weighted pattern scoring:
- Math Unicode characters (weight 3):
\u2211,\u222B,\u221A,\u00B1, etc. - LaTeX display math (weight 5):
$$...$$patterns - LaTeX inline math (weight 4):
$...$patterns - LaTeX commands (weight 4):
\frac,\sqrt,\sum,\int, trig functions - Math environments (weight 5):
\begin{equation},\begin{align}, etc. - MathML markup (weight 5): existing
<math>tags - Equation patterns (weight 2):
x = ...,a^2 + b^2
Score thresholds: 0-3 = no math, 4-7 = uncertain, 8+ = math detected.
2. PDF Complexity Detection (pdf-complexity-detector.ts)
Zero-LLM pre-check that reads the PDF binary directly:
- Detects math font names: cmsy, cmmi, cmex, stix, cambria math, etc.
- Counts
paintImageMaskXObjectoperations (1-bit masks used for math glyphs) - 15+ image masks per page + 4+ distinct font sizes = strong math signal
- Classifies pages as
text,math,image,mixed,table, ordense-table
3. Vision-Based Detection (Image Pipeline)
For scanned documents with no text layer:
- Gemini Flash analyzes each extracted image
- Returns
diagramType: 'equation'when it identifies mathematical content - This triggers Mathpix refinement in the image description pipeline
Rendering Paths
Path A: Marker + temml (Digital PDFs with LaTeX)
Used when the complexity detector identifies math content type with extractable text.
PDF page -> Marker API (extracts text, outputs <math>raw LaTeX</math>) -> temml converts LaTeX to MathML -> addMathReadingAnnotations() adds "(reads as ...)" annotations -> Quality check (temml failure rate < 30%) -> Accept or escalate to MathpixCost: ~$0.006/page (Marker only). temml is a local library, no API cost.
How temml works: Marker outputs equations as <math>5.021 \times 10^4</math> β valid LaTeX but not valid MathML. temml converts to proper <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>5.021</mn><mo>\u00D7</mo><msup><mn>10</mn><mn>4</mn></msup></math>.
Path B: Mathpix Full-Page OCR (Math/Dense-Table/Scanned Pages)
Used when:
- Complexity detector classifies a page as
mathand temml fails > 30% - Page is classified as
dense-table - Page is classified as
image(scanned) and Mathpix detects equations
PDF page -> Render to PNG (via Puppeteer) -> Mathpix /v3/text API (OCR with math + text modes) -> Returns HTML with <math> MathML + <mathml> blocks -> Strip hidden <latex> tags, show <mathml> content -> addMathReadingAnnotations() adds "(reads as ...)" annotations -> Quality check -> Accept or escalate to vision cascadeCost: ~$0.01-0.10/page (Mathpix API).
Scanned handwritten math: For image pages, the cascade now tries Mathpix first before falling through to Gemini/Claude vision. Mathpix excels at handwritten math OCR β it detects equations that the complexity detector misses (since scanned pages have no text layer or math fonts to analyze). If hasEquations is true in the Mathpix response and quality passes the threshold, the output is accepted.
Path C: Image-Level Equation Refinement (Individual Equation Images)
Used when individual equation images are extracted from the PDF (e.g., equations embedded as raster images in the source).
PDF -> Extract images (unpdf + sharp) -> Gemini Flash classifies each image -> diagramType === 'equation'? |-- No: Normal alt text injection |-- Yes: Mathpix processImage() API -> Returns { mathml, latex } -> Store in ImageAnalysis.mathpixMathml / mathpixLatex -> injectAltText() replaces <img> with <math> MathML -> addMathReadingAnnotations() adds "(reads as ...)"Cost: $0.0003/image (Gemini classification) + $0.002/equation (Mathpix).
Graceful degradation: If Mathpix fails, the vision modelβs alt text is preserved. The equation remains as an <img> with descriptive alt text rather than rendered MathML.
β(reads as β¦)β Annotations
MathML screen reader support is inconsistent across browsers and assistive technology. Every rendered equation gets a plain-English annotation:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> <mrow><mn>2</mn><mi>x</mi><mo>+</mo><mn>3</mn><mi>y</mi><mo>+</mo><mi>z</mi><mo>=</mo><mn>17</mn></mrow></math><p class="math-annotation"> <span class="math-reading sr-only">(reads as "2x plus 3y plus z equals 17")</span></p>The latexToPlainEnglish() function in latex-math-renderer.ts handles conversion:
| LaTeX | Plain English |
|---|---|
\frac{a}{b} | βa over bβ |
x^2 | βx squaredβ |
x^3 | βx cubedβ |
x^{n} | βx to the power of nβ |
\sqrt{x} | βsquare root of xβ |
\alpha, \beta, \pi | βalphaβ, βbetaβ, βpiβ |
\int_0^\infty | βintegral from 0 to infinity ofβ |
\sum | βsum ofβ |
\times | βtimesβ |
\pm | βplus or minusβ |
\leq | βless than or equal toβ |
\neq | βnot equal toβ |
=, +, - | βequalsβ, βplusβ, βminusβ |
The annotation uses sr-only CSS class so it is available to screen readers but not visible on screen (unless the user inspects the HTML or uses assistive technology).
Cost Tracking
All equation processing costs are tracked and reported:
Image Description Pipeline (image-description-pipeline.ts)
ImageDescriptionResult { totalCostUsd: number; // Gemini + Mathpix combined mathpixEquationsProcessed: number; // Count of images sent to Mathpix costBreakdown: { geminiCostUsd: number; // $0.0003/image mathpixCostUsd: number; // $0.002/equation }}Smart Cascade (smart-cascade-converter.ts)
Mathpix page-level calls are tracked via TokenUsage:
model: 'mathpix'estimatedCostUsd: 0.01(image pages) or0.10(math/dense-table pages)
Gateway (gateway.ts)
Cost metadata recorded to the cost ledger:
metadata: { imagesDescribed: number, imageDescriptionCostUsd: number, mathpixEquationsProcessed: number, imageDescCostBreakdown: { geminiCostUsd, mathpixCostUsd } | null}Routing Decision Tree
PDF Page | βββ Has text layer? | | | βββ Math fonts detected? ββ> contentType: 'math' | | | | | βββ Marker + temml (free) | | | | | | | βββ temml success rate >= 70% ββ> ACCEPT | | | βββ temml failure rate > 30% ββ> Try Mathpix | | | | | βββ Mathpix page OCR ($0.10) | | | | | βββ Quality >= threshold ββ> ACCEPT | | βββ Quality < threshold ββ> Vision cascade | | | βββ Dense table detected? ββ> contentType: 'dense-table' | | βββ Mathpix page OCR ($0.10) ββ> quality check ββ> accept or escalate | | | βββ Plain text ββ> contentType: 'text' | βββ Marker + LLM structuring | βββ No text layer (scanned)? | βββ contentType: 'image' | βββ Mathpix probe ($0.01) | | | βββ hasEquations && quality >= threshold ββ> ACCEPT (with annotations) | βββ No equations or low quality ββ> fall through | βββ Vision cascade (Gemini Flash -> Claude agentic) | βββ Image pipeline (parallel): Gemini classifies extracted images diagramType 'equation' ββ> Mathpix processImage ($0.002) Other types ββ> alt text onlyKey Files
| File | Role |
|---|---|
services/math-detector.ts | Text-based math detection (fonts, symbols, LaTeX patterns) |
services/pdf-complexity-detector.ts | Binary PDF analysis, page classification |
services/latex-math-renderer.ts | temml LaTeX-to-MathML + β(reads as β¦)β annotations |
services/equation-renderer.ts | Replaces equation <img> tags with MathML via Mathpix |
services/mathpix-pdf.ts | Mathpix API client (PDF + image endpoints) |
services/image-enhancer.ts | Vision model image analysis + equation injection in injectAltText() |
services/image-description-pipeline.ts | Parallel image processing with Mathpix equation refinement |
services/smart-cascade-converter.ts | Page routing: Mathpix probe for scanned image pages |
routes/gateway.ts | Orchestrator: wires Mathpix credentials, records costs |
Background
See equation-rendering-problem.md for the original problem analysis that led to the temml solution for Markerβs raw LaTeX output. The scanned handwritten math pipeline (Mathpix probe for image pages) was added to handle cases where no text layer exists and the complexity detector cannot detect math fonts or symbols.