Skip to content

HTML Edit Version History β€” Retention

The preview editor’s HTML save endpoint snapshots the prior HTML to R2 and records a row in public.html_edits before each overwrite (issue #693 section A). This document covers what is kept, where it lives, and how old versions are pruned.

Storage layout

  • Live HTML: users/{userId}/output/{fileId}/index.html β€” pointed at by UploadedFile.outputR2Key.
  • Snapshots: users/{userId}/output/{fileId}/versions/{ISO-timestamp}-{source}.html where {source} is one of edit, restore, or fix.
  • Index table: public.html_edits rows reference the R2 key plus the user, file, byte size, and source.

Retention policy

Per file we keep the 20 most recent snapshots. When a new snapshot is inserted, pruneOldHtmlVersions() (workers/api/src/routes/files.ts) selects all html_edits rows beyond rank 20 (ordered by created_at DESC), deletes the corresponding R2 blobs, then deletes the rows.

Pruning is fire-and-forget β€” failures are logged but don’t block the user’s save. Orphaned blobs are recoverable (a future sweep can list and reconcile), orphaned rows are not, so we delete blobs first.

Tuning

The cap lives in HTML_VERSION_RETENTION_COUNT at the top of the prune function. To change it:

  1. Update the constant.
  2. Decide whether existing files should retroactively shrink (run a one-off prune against all file_ids) or simply enforce the new cap on next save.

Cost model

  • A typical snapshot is 20–200 KB of HTML. At 20 versions per file the per-file overhead is well under 5 MB.
  • R2 standard pricing puts 1 GB of stored snapshots at roughly $0.015/month; for 10,000 actively edited files this is < $1/month. We do not record a cost-ledger entry per snapshot β€” too noisy for the size.
  • PDF regenerations triggered by the Refresh PDF button do record a cost-ledger entry with operationType: 'pdf-refresh' (currently a $0.005 placeholder pending real telemetry).

Open follow-ups

  • Time-based sweep for files whose owners are inactive β€” currently snapshots for an abandoned file persist until the file itself is deleted (the FK cascade then removes the rows, but R2 blobs are only cleaned up when the parent file’s R2 prefix is purged).
  • Per-user retention overrides (e.g. paid plans keep 100).
  • Restore-from-disposed-version recovery β€” beyond our retention window the R2 blob is gone. If we need long-term archival, add a β€œpin” flag on html_edits that exempts the row from pruning.