Secret Rotation System
Automated secret rotation across Cloudflare Workers, AWS SSM, and on-prem Docker hosts with zero-downtime rolling deploys.
1. System Overview
The rotation system has three components:
| Component | File | Purpose |
|---|---|---|
| Rotation workflow | .github/workflows/rotate-secrets.yml | Full end-to-end rotation via GitHub Actions |
| Reminder workflow | .github/workflows/rotation-reminder.yml | Opens a quarterly GitHub issue with a rotation checklist |
| Inventory & runbook | docs/admin/secrets-inventory-and-rotation.md | Complete secrets inventory, manual procedures, rotation log |
Supporting scripts:
| Script | Purpose |
|---|---|
scripts/update-env-secret.sh | Updates a single key in .env.node-server with automatic backup |
scripts/smoke-test.sh | Health check: asserts HTTP 200 and valid response body |
How a rotation flows
You generate a new key in the provider dashboard β gh workflow run rotate-secrets.yml -f key_name=new_value βββ Phase 1: Cloud secrets βββββββββββββββββββββββββββββββββββ Cloudflare Worker secrets (wrangler secret put) ββ AWS SSM Parameter Store (aws ssm put-parameter) ββ β runs in parallel β ββββββββββββββββββββββββββββββββββββοΏ½οΏ½ββββββββββββββββββββββββ βββ Phase 2: Test server (10.1.1.17) ββββββββββββββββββββββββ SSH in β update .env.node-server β restart container ββ Wait for /health β smoke test ββ β FAILS HERE? β workflow stops, production untouched ββββοΏ½οΏ½οΏ½βββββββββββββββββββββοΏ½οΏ½βββββββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββ βββ Phase 3: Production (10.1.1.4) ββββββββββββββββββββββββββ SSH in β backup .env.node-server β update secrets ββ SIGTERM api-node-1 β drain β restart β wait healthy ββ SIGTERM api-node-2 β drain β restart β wait healthy ββ Restart sidecars β final public health check ββ β FAILS HERE? β rollback from .bak, restart container βββββββοΏ½οΏ½βββββββββββββββββββββββββββββββββββββββββββββββββββββ β You revoke the old key in the provider dashboard2. One-Time Setup
2.1 Add the SSH key to GitHub Actions
The workflow SSHes into 10.1.1.4 and 10.1.1.17. It needs the private key that both servers already trust.
-
Copy the private key content:
Terminal window cat ~/.ssh/nightly-audit -
Go to the GitHub repo β Settings β Secrets and variables β Actions
-
Click New repository secret:
- Name:
SERVER_SSH_KEY - Value: paste the entire private key (including
-----BEGINand-----ENDlines)
- Name:
-
Click Add secret
2.2 Verify existing GitHub secrets
These should already exist from the deploy workflows. Confirm in Settings β Secrets:
| Secret | Used for |
|---|---|
CLOUDFLARE_API_TOKEN | wrangler secret put to update Cloudflare Worker secrets |
AWS_ACCESS_KEY_ID | aws ssm put-parameter to update Lambda config |
AWS_SECRET_ACCESS_KEY | Same as above |
PACKAGES_TOKEN | Not used by rotation β but verify it exists for deploys |
2.3 Verify server access
From your Mac, confirm SSH works to both servers:
ssh -i ~/.ssh/nightly-audit [email protected] "hostname && docker ps --format '{{.Names}}' | head -5"2.4 Verify scripts are on both servers
The workflow copies scripts via scp on each run, but for manual use:
# Copy to both serversscp -i ~/.ssh/nightly-audit scripts/update-env-secret.sh scripts/smoke-test.sh \ [email protected]:~/accessible/scripts/
scp -i ~/.ssh/nightly-audit scripts/update-env-secret.sh \ [email protected]:~/accessible/scripts/3. Usage
3.1 Rotate a single key
-
Generate a new key in the providerβs dashboard (do NOT revoke the old one yet)
-
Trigger the workflow:
Terminal window gh workflow run rotate-secrets.yml -f anthropic_api_key=sk-ant-api03-newkeyhere -
Watch the run:
Terminal window gh run watch -
Once the run succeeds, revoke the old key in the provider dashboard
-
Log the rotation in
docs/admin/secrets-inventory-and-rotation.mdSection 5
3.2 Rotate multiple keys at once
Pass multiple -f flags. All keys update atomically in the same rolling restart:
gh workflow run rotate-secrets.yml \ -f anthropic_api_key=sk-ant-api03-newkey \ -f gemini_api_key=AIzaSy-newkey \ -f openai_api_key=sk-svcacct-newkey3.3 Rotate high-impact keys
Some keys require extra care:
JWT Secret (invalidates all active sessions)
# Generate a strong random secretNEW_JWT=$(openssl rand -base64 48)
# Rotate during low-traffic hoursgh workflow run rotate-secrets.yml -f jwt_secret="$NEW_JWT"Users will need to re-authenticate after this rotation.
Supabase Service Role Key
This key is generated by Supabase β you cannot choose the value.
- Go to https://supabase.com/dashboard/project/vuvwmfxssjosfphzpzim/settings/api
- Click Generate new keys (this regenerates anon key, service role key, and JWT secret simultaneously)
- Copy the new
service_rolekey - Rotate all three at once:
Terminal window gh workflow run rotate-secrets.yml \-f supabase_service_role_key=eyJhbG... \-f jwt_secret=new-jwt-secret-from-supabase - Update the
SUPABASE_ANON_KEYin any frontend.envfiles manually (this key is public but should stay current)
Stripe Keys
Stripe supports rolling keys β both old and new work during the overlap.
- Go to https://dashboard.stripe.com/apikeys
- Click Roll key on the secret key
- Copy the new key (old key stays valid for 24h by default)
- Rotate:
Terminal window gh workflow run rotate-secrets.yml -f stripe_secret_key=sk_live_newkey - For webhook secrets, create a new webhook endpoint in Stripe, test it, then delete the old one
3.4 Trigger from the GitHub UI
- Go to the repo on GitHub β Actions tab
- Select Rotate Secrets from the left sidebar
- Click Run workflow
- Fill in only the keys you want to rotate (leave the rest blank)
- Click Run workflow
3.5 What to do if the workflow fails
Phase 1 (cloud secrets) fails:
- Cloudflare or AWS credentials may be expired
- Check:
gh run view --log-failed - Fix credentials in GitHub Secrets, re-run
Phase 2 (test server) fails:
- The new key may be invalid, or the test server is down
- Production was NOT touched β safe to investigate
- SSH in and check:
ssh -i ~/.ssh/nightly-audit [email protected] "docker logs accessible-pdf-pptx-remediate --tail 30" - The
.env.node-server.bakon the test server has the previous values
Phase 3 (production) fails:
- The workflow auto-rolled back:
.env.node-server.bakwas restored and the failed container restarted - Check which node failed:
gh run view --log-failed - SSH in and verify:
ssh -i ~/.ssh/nightly-audit [email protected] "docker ps" - The old key should still work since you havenβt revoked it yet
4. Available Secrets
All inputs are optional β only provide the ones youβre rotating:
| Workflow Input | Env Variable | Deployed To |
|---|---|---|
anthropic_api_key | ANTHROPIC_API_KEY | Cloudflare, Docker |
gemini_api_key | GEMINI_API_KEY | Cloudflare, Docker |
openai_api_key | OPENAI_API_KEY | Cloudflare, Docker |
mistral_api_key | MISTRAL_API_KEY | Cloudflare, Docker |
marker_api_key | MARKER_API_KEY | Cloudflare, Docker |
mathpix_app_key | MATHPIX_APP_KEY | Cloudflare, Docker |
stripe_secret_key | STRIPE_SECRET_KEY | Cloudflare, Docker |
stripe_webhook_secret | STRIPE_WEBHOOK_SECRET | Cloudflare, Docker |
resend_api_key | RESEND_API_KEY | Cloudflare, AWS SSM, Docker |
jwt_secret | JWT_SECRET | Cloudflare, AWS SSM, Docker |
supabase_service_role_key | SUPABASE_SERVICE_ROLE_KEY | Cloudflare, AWS SSM, Docker |
ses_webhook_secret | SES_WEBHOOK_SECRET | Cloudflare, Docker |
5. Quarterly Rotation Schedule
The rotation-reminder.yml workflow runs automatically on the 1st of January, April, July, and October. It creates a GitHub issue with a prioritized checklist.
Rotation priority tiers:
| Tier | Frequency | Keys |
|---|---|---|
| High | Every 90 days | AWS_ACCESS_KEY_ID/SECRET, CLOUDFLARE_API_TOKEN, PACKAGES_TOKEN, JWT_SECRET |
| Standard | Every 6 months | ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, STRIPE_SECRET_KEY, RESEND_API_KEY |
| Low | Annually or on suspicion | MISTRAL_API_KEY, MARKER_API_KEY, MATHPIX_APP_KEY, VAPID_PRIVATE_KEY, TELEGRAM_BOT_TOKEN |
6. Next Steps β Automation Roadmap
6.1 Eliminate plaintext .env files on Docker hosts (Priority: HIGH)
The .env.node-server files on 10.1.1.4 and 10.1.1.17 are the weakest link. Two approaches to eliminate them:
Option A: Pull secrets from AWS SSM at container startup
Add an entrypoint wrapper that fetches secrets from SSM before starting the app:
#!/bin/bashfor PARAM in ANTHROPIC_API_KEY GEMINI_API_KEY STRIPE_SECRET_KEY JWT_SECRET; do VALUE=$(aws ssm get-parameter \ --name "/accessible-pdf/production/$PARAM" \ --with-decryption \ --query Parameter.Value \ --output text 2>/dev/null) if [[ -n "$VALUE" ]]; then export "$PARAM=$VALUE" fidoneexec "$@"Effort: Modify each Dockerfile to use the wrapper entrypoint. Add IAM credentials to the Docker host (instance profile or env var). Move all secrets into SSM.
Benefit: Secrets never touch disk. Rotation becomes: update SSM β restart container. No .env file management.
Option B: Docker Swarm secrets
Convert from docker compose to Docker Swarm mode. Secrets are encrypted at rest and only mounted in-memory at /run/secrets/.
Effort: Higher β requires Swarm init, service definitions, and code changes to read from /run/secrets/ instead of env vars.
Recommendation: Option A (SSM at startup). Itβs the smallest change and aligns with the existing AWS infrastructure.
6.2 Fully automated key rotation for supported providers (Priority: MEDIUM)
Some providers support API-driven key rotation. A scheduled GitHub Action could rotate these without human involvement:
| Provider | API Support | Automation Path |
|---|---|---|
| AWS IAM | Full | AWS Secrets Manager auto-rotation with a Lambda rotator function |
| Stripe | Full | POST /v1/api_keys/roll β Stripe rolls the key and both work during overlap |
| GitHub PAT | Full | Create fine-grained tokens with expiry via GitHub API, revoke old ones |
| Cloudflare | Full | Create/revoke API tokens via Cloudflare API |
| Anthropic | None | Manual β dashboard only |
| OpenAI | None | Manual β dashboard only |
| Resend | None | Manual β dashboard only |
| Supabase | None | Manual β regenerates all keys simultaneously |
Implementation plan:
- Start with AWS IAM β highest risk (long-lived credentials), best tooling (Secrets Manager has built-in rotation)
- Add Stripe β rolling key API makes this straightforward
- Add GitHub PAT β use fine-grained tokens with 90-day expiry, auto-create replacements
- Add Cloudflare API token β rotate via API, update GitHub Actions secret via
gh secret set
For providers without rotation APIs, the quarterly reminder issue is the automation ceiling.
6.3 Secret scanning and leak detection (Priority: MEDIUM)
Add automated detection for accidental secret exposure:
-
Enable GitHub secret scanning on the repo (Settings β Code security β Secret scanning)
- GitHub natively detects leaked API keys for Stripe, AWS, Anthropic, OpenAI, and others
- Sends alerts and can auto-revoke with partner providers
-
Add
gitleaksto CI:# Add to .github/workflows/test.yml- name: Scan for secretsuses: gitleaks/gitleaks-action@v2env:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}This blocks PRs that accidentally include secrets.
-
Pre-commit hook (local):
.pre-commit-config.yaml repos:- repo: https://github.com/gitleaks/gitleaksrev: v8.18.0hooks:- id: gitleaks
6.4 Centralized secrets dashboard (Priority: LOW)
Build a simple internal page that shows:
- Last rotation date for each key (read from
docs/admin/secrets-inventory-and-rotation.mdSection 5) - Days until next rotation due
- Color-coded status: green (current), yellow (due soon), red (overdue)
- Direct links to provider dashboards for manual rotation
This could be a static page generated by a GitHub Action that reads the rotation log and publishes to an internal URL.
6.5 Migrate to HashiCorp Vault or AWS Secrets Manager (Priority: LOW, long-term)
The current system works well for the current scale. If the number of services or secrets grows significantly, consider a dedicated secrets manager:
- AWS Secrets Manager: Native integration with Lambda, supports automatic rotation, audit trail via CloudTrail
- HashiCorp Vault: More flexible, supports dynamic secrets (short-lived credentials generated on demand), but requires running and maintaining Vault infrastructure
When to consider: When you exceed ~30 secrets, add more servers, or need audit compliance (SOC 2, HIPAA) that requires formal secret access logging.
7. Security Notes
- Never revoke the old key before the new one is deployed and verified. The workflow enforces this by testing before production.
- Workflow inputs are masked with
::add-mask::so values donβt appear in GitHub Actions logs. - The
SERVER_SSH_KEYsecret grants access to production servers. Rotate it periodically and limit who can trigger workflows. - Concurrency lock: The workflow uses
concurrency: secret-rotationto prevent two rotations running simultaneously. - Rollback is automatic on production β if a container fails health checks, the
.env.node-server.bakbackup is restored. - Cloud secrets are NOT rolled back on failure. This is safe because the old key remains valid until you manually revoke it in the provider dashboard.