Skip to content

S3 Customer Storage - Implementation Plan

Goal

Allow a tenant to provide credentials for compatible S3 storage in Settings so uploaded files and generated outputs can be stored in a customer-controlled bucket instead of our default hosted storage.

Outcome

For opted-in tenants:

  • source files land in a customer bucket
  • generated outputs land in a customer bucket
  • retention and bucket policy are customer-controlled
  • our hosted environment processes files transiently but does not persist long-term copies after processing completes

Current State

The codebase already supports S3-compatible object storage in parts of the stack. The missing work is productization:

  • tenant-scoped configuration
  • secure credential handling
  • routing logic by tenant
  • validation and health checks
  • admin UI and support tooling

Non-Goals

  • Cross-cloud object stores with non-S3 APIs
  • Customer-managed queues or databases
  • Full customer-side processing

Product Design

Settings Surface

Add a new Settings section: Storage.

Fields:

  • provider label, default S3-compatible
  • bucket name
  • region
  • endpoint URL, optional for AWS, required for MinIO/R2/other compatible services
  • access key ID
  • secret access key
  • optional session token
  • optional prefix/path namespace
  • retention mode toggle:
    • hosted default
    • customer-managed storage only

Controls:

  • Validate connection
  • Save
  • Disable customer storage

Status panel:

  • validation status
  • last successful validation time
  • last error
  • effective storage target for new jobs

User Experience Rules

  • Existing files remain where they were created unless explicitly migrated.
  • New files use the currently active storage target.
  • If customer storage is enabled and unhealthy, new jobs fail with a clear configuration error.
  • We do not silently fall back to hosted storage unless the tenant explicitly allows fallback.

Technical Plan

Phase 1 - Tenant Configuration Model

Data model

Add a tenant-scoped storage configuration record with:

  • tenant_id
  • mode (hosted, customer_s3)
  • bucket
  • region
  • endpoint
  • path_prefix
  • access_key_id
  • secret_encrypted
  • session_token_encrypted
  • allow_hosted_fallback
  • validation_status
  • validation_error
  • validated_at
  • created_by
  • updated_by

Security

  • Encrypt secrets before persistence.
  • Never return raw secrets after save.
  • Support secret rotation without deleting the whole config.

Phase 2 - Validation and Health Checks

Create a storage validation service that performs:

  1. bucket existence validation
  2. scoped write test to a temporary key
  3. read-back verification
  4. delete verification
  5. optional prefix enforcement check

Validation must catch:

  • invalid credentials
  • wrong region
  • endpoint TLS errors
  • missing permissions
  • bucket policy conflicts

Phase 3 - Storage Routing

Introduce tenant-aware storage resolution:

  1. determine tenant storage mode at job creation
  2. persist the resolved storage target on the file/job record
  3. use the resolved target consistently for:
    • original upload
    • intermediate assets if needed
    • final HTML
    • ZIP bundles
    • reports

This avoids behavior changes if tenant settings are edited mid-job.

Phase 4 - Settings UI

Add a Settings UI and admin API for:

  • create/update config
  • validate config
  • disable config
  • rotate keys
  • view current status

UI requirements:

  • clear warnings about required permissions
  • copyable example IAM policy
  • one-time reveal behavior for new secrets
  • redact secrets on reload

Phase 5 - Retention and Deletion Semantics

Define what β€œwe do not retain documents after processing” means operationally.

Hosted-side requirements:

  • temporary working files deleted on successful completion
  • temporary working files deleted on failure after TTL
  • logs must not contain document content or raw presigned URLs
  • retries must respect customer storage location

Customer-side requirements:

  • generated URLs must be presigned and time-limited
  • path layout should isolate tenant data

Phase 6 - Support and Migration

Add:

  • admin troubleshooting page
  • migration script for moving an opted-in tenant’s historic files if needed
  • runbook for storage outages and credential rotation

Backend Work Items

  • Add tenant storage config schema and migrations
  • Add encrypted secret persistence
  • Add storage validation service
  • Add tenant-aware storage resolver
  • Add APIs for CRUD + validation
  • Update file-processing pipeline to persist resolved storage target
  • Add audit events for config changes

Frontend Work Items

  • Add Storage section in Settings
  • Add validation and save flows
  • Add error and status display
  • Add admin-visible diagnostic metadata

Infrastructure Work Items

  • Key encryption support for stored secrets
  • Optional secret-manager abstraction if not already present
  • Alerting for repeated validation failures

Dependencies

  • tenant settings framework
  • audit logging
  • encryption for stored secrets

Risks

  • Misconfigured bucket policies can create hard-to-debug failures.
  • Presigned URL handling can leak access if logged improperly.
  • Mixed hosted and customer storage in the same tenant can complicate support.

Acceptance Criteria

  • A tenant can save and validate S3-compatible credentials from Settings.
  • New files for that tenant are stored in the customer bucket.
  • Failed validations prevent activation unless explicitly overridden by an admin.
  • Hosted temporary copies are deleted after processing per documented TTL.
  • Audit logs show who changed the storage configuration and when.

Estimated Effort

  • Backend and schema: 4-6 days
  • Frontend settings UI: 2-3 days
  • Validation, QA, runbooks: 2-3 days
  • Total: 8-12 days