Skip to content

Conversion Cascade β€” Tiers and Branches

How the PDF→HTML converter routes each page. All code lives in workers/api/src/services/smart-cascade-converter.ts.

Top-level branch: budgetMode

chunk-scheduler calls one of two per-page functions (line 660):

qualityTierbudgetModeFunctionBehavior
budgettrueprocessPageBudget (line 1369)Cheapest viable backend, no escalation
standardfalseprocessPage (line 913)Full cascade with quality gates
premiumfalseprocessPageSame cascade, different chunk/fidelity settings elsewhere

Standard branch β€” processPage

Tiers are tried top-to-bottom. The first tier that meets qualityThreshold (and visualLayoutThreshold when configured) wins. A tier failing its threshold or throwing escalates to the next tier.

OrderTierApplies toBackend~Cost/page
preskip-blankblank pagespdf-lib structure probe$0
Tier 0toc-text-layerdetectTocPage.isToc === truepdf.js text layer β†’ <nav><ol>$0
Tier 0mathpixcontentType = dense-tableMathPix image API~$0.01
Tier 0marker-apitext, table, dense-table (fallback)Marker API~$0.006
Tier 0marker+temmlmathMarker + local LaTeX→MathML~$0.006
Tier 0mathpixmath (when marker+temml fails)MathPix image API~$0.01
Tier 0mathpiximage (probe for handwritten equations)MathPix image API~$0.01
Vision 1gemini-flashanything still unresolvedgemini-2.5-flash vision~$0.005
Vision 1agemini-flash-iterativestructural OK but visual-layout failssame model, re-prompted with layout feedback~$0.005
Vision 2agentic-visionlast resortclaude-sonnet-4-6 vision (agentic)~$0.15

Vision tiers are defined by DEFAULT_TIERS (line 49) and can be overridden via SmartCascadeConfig.tiers.

mixed content type is special: it skips the cheap vision tiers and starts directly at the anthropic tier (startTierIdx at line 1195).

Last tier is always accepted β€” if agentic-vision falls below threshold, we log a warning and return the output anyway.

Budget branch β€” processPageBudget

No quality escalation β€” whichever backend returns content first wins. No MathPix, no agentic-vision, no iteration, no visual-layout scoring.

OrderTierConditionBackend~Cost/page
preskip-blankblank pagepdf-lib probe$0
Tier 0toc-text-layerdetectTocPage.isToc === truepdf.js text layer β†’ <nav><ol>$0
1budget:markerMARKER_API_KEY presentMarker API~$0.006
1atemml overlaycontentType = math (inline)local LaTeX→MathML$0
2budget:gemini-flashMarker returned <20 chars, threw, or no keygemini-2.5-flash single pass~$0.005
fallbackbudget:noneall backends failedplaceholder <p>$0

Content types

The page classifier labels each page as one of:

  • text β€” body prose
  • table β€” gridded tables
  • dense-table β€” typewritten/monospaced columnar data without gridlines
  • math β€” equations / LaTeX-y content
  • image β€” scanned / image-only (no usable text layer)
  • mixed β€” combined layout; skips cheap vision and goes straight to agentic

Classification drives which Tier-0 branch runs before falling through to the vision cascade.

Tier 0: deterministic TOC

Runs in both branches. Detects Table-of-Contents / List of Figures / List of Tables pages from the pdf.js text layer and emits a deterministic <nav><ol> with .toc-label / .toc-title / .toc-byline / .toc-page spans (toc-text-converter.ts). Built to defeat the vision-model failure mode of silently collapsing repeating TOC rows (e.g. 4 CASE entries β†’ 1). Detector lives in toc-detector.ts; see __tests__/services/toc-detector.test.ts for the behavior contract.

Where things go wrong

  • TOC pages rendered as prose / collapsed rows β†’ Tier 0 didn’t fire. Grep container logs for TOC page detected vs marker-api on the page in question. If the detector returned low confidence, extend toc-detector.ts β€” don’t loosen the cascade.
  • Math pages with broken equations β†’ temml fail rate >30%; MathPix fallback didn’t run (missing MATHPIX_APP_ID/KEY) or budget mode.
  • Image pages with no content β†’ skip-blank incorrectly fired, or budget:gemini-flash path not configured.
  • Budget output looks worse than standard β†’ expected. Budget is Marker-only with no escalation; re-run with qualityTier=standard if the page needs vision tiers.
  • smart-cascade-converter.ts β€” the router described above
  • toc-detector.ts / toc-text-converter.ts β€” Tier-0 TOC path
  • pdf-complexity-detector.ts β€” content-type classification
  • quality-gate.ts β€” scorePageQuality / completeness checks
  • chunk-processor.ts β€” the caller that selects branch per chunk