Bring Your Own LLM Keys - Implementation Plan

Goal

Allow a tenant to provide its own AI provider keys so document and image analysis runs against customer-controlled model accounts instead of our shared hosted accounts.

Minimum Product Promise

If a tenant provides at least one supported vision-capable model provider, the core conversion path remains usable without using our hosted AI credentials.

Current State

The stack already supports multiple AI providers by environment variable. The missing work is per-tenant configuration, routing, validation, fallback policy, and observability.

Supported Provider Model

Phase 1 should focus on providers that already exist in the codebase and have stable API usage patterns:

Google Gemini / Vertex AI
Anthropic
OpenAI, if currently supported in active paths

Later phases can add:

Mistral
Azure OpenAI
self-hosted OpenAI-compatible endpoints

Product Design

Settings Surface

Add a new Settings section: AI Providers.

For each provider:

enabled toggle
provider type
API key or service-account credentials
endpoint override if applicable
model selection
vision capability indicator
optional rate-limit cap

Tenant-level controls:

default provider order
disable hosted fallback
allow hosted fallback

Status panel:

last validation result
tested models
vision-capable providers available
current effective routing order

Functional Requirements

A tenant can store multiple providers.
Routing chooses the first healthy provider that satisfies the task.
Tasks requiring vision must only use providers marked vision-capable.
If hosted fallback is disabled and no healthy provider exists, the job fails immediately with a clear tenant-facing error.

Technical Plan

Phase 1 - Tenant Provider Schema

Add tenant-scoped AI provider records with:

tenant_id
provider
label
credentials_encrypted
endpoint
default_model
supports_vision
is_enabled
priority
validation_status
validation_error
validated_at
created_by
updated_by

Also add tenant-level policy:

allow_hosted_ai_fallback
require_customer_managed_ai

Phase 2 - Secret Handling

Encrypt provider credentials at rest.
Support one-time input and redaction after save.
Support key rotation without breaking unrelated providers.

Phase 3 - Validation Service

Each configured provider needs a health test that verifies:

credential validity
requested model availability
vision support if the tenant expects it
quota / permission sanity

The validation path should not send customer documents. It should use a synthetic health-check prompt or provider metadata endpoint.

Phase 4 - Runtime Provider Resolution

Build a tenant-aware provider selection layer:

load tenant provider policy
determine task requirements:
- text only
- vision required
- high-context / long-output path
choose the highest-priority healthy provider
record which provider actually handled the request

Persist provider choice on the job record for supportability.

Phase 5 - Failure and Fallback Behavior

Explicitly define:

whether retries stay on the same provider or fail over
when hosted fallback is allowed
what user-facing error appears when no compliant provider is available

Recommended default:

retry once on same provider for transient failures
fail over to next healthy tenant provider
never use hosted fallback unless tenant policy permits it

Phase 6 - Billing and Analytics

If customers bring their own keys, we need a billing policy decision:

do we still charge for conversion only
do we discount hosted AI costs
do we expose provider usage counters in the UI

Even if pricing does not change immediately, we need internal analytics for:

jobs by provider
failures by provider
average latency by provider

Backend Work Items

Add tenant AI provider schema and migrations
Add encrypted credential storage
Add provider validation service
Add provider selection abstraction
Refactor current env-based provider resolution to allow tenant override
Add job-level provider attribution
Add audit events for provider config changes

Frontend Work Items

Add AI Providers settings UI
Add priority ordering UI
Add validation, enable/disable, and rotation flows
Add provider-health status

Documentation Work Items

Customer guide for supported providers and required permissions
Support runbook for common validation errors
Security note describing how provider data is used and not retained

Dependencies

secret encryption/storage
tenant settings framework
provider abstraction cleanup in the conversion pipeline

Risks

Per-tenant provider routing increases operational complexity.
Provider-specific edge cases can leak into product behavior.
Some providers may have inconsistent model naming or permissions APIs.

Acceptance Criteria

A tenant can add and validate at least one vision-capable provider from Settings.
A conversion job for that tenant uses the tenant’s provider instead of hosted credentials.
Hosted fallback behavior is enforced exactly as configured.
Logs and admin tools show which provider handled each job.

Estimated Effort

Backend and provider abstraction: 5-7 days
Frontend settings UI: 2-3 days
Validation, testing, runbooks: 2-3 days
Total: 9-13 days