Skip to content

AWS Deployment Status

Completed (2026-03-30 — 2026-03-31)

  • Create AWS account (using root account 696081601609)
  • Create IAM user cdk-deployer with CLI access
  • Configure AWS CLI profile accessible
  • Install CDK CLI, bootstrap CDK in us-east-1
  • Set billing alarm ($100/month)
  • Store all secrets in SSM Parameter Store (16 parameters under /accessible-pdf/shared/)
  • Refactor CDK for multi-environment (staging/production)
  • Generalize storage to support any S3-compatible provider (R2, S3, MinIO)
  • Deploy staging stacks (all 7)
  • Deploy production stacks (all 7)
  • Create GitHub Actions CI/CD workflow (deploy-aws.yml)
  • Build Lambda entry point (npm run build:lambda)
  • Lambda API working: auth, files, tags, notifications, teams, upload, convert enqueue
  • Push worker Docker image to ECR
  • SQS convert flow: Lambda → SQS → EC2 worker (preflight + chunk creation)
  • ChunkScheduler running on EC2 worker (processes conversions with Puppeteer/Gemini)
  • End-to-end conversion tested and working on AWS (no Node server)
  • Canary routing: [email protected] → AWS, everyone else → CF Worker
  • GitHub secrets and environments configured
  • ACM certificate issued for api-pdf.theaccessible.org
  • API Gateway custom domain configured
  • ASG min=1 so ChunkScheduler is always available

Remaining

1. Add AWS logging to admin dashboard (#209)

EC2 worker and Lambda logs need to be visible in pdf.theaccessible.org/admin. Options: Supabase conversion_logs table or CloudWatch Logs API.

2. DNS cutover

  • Remove CF Worker route for api-pdf.theaccessible.org
  • Add CNAME: api-pdfd-5exruqu2od.execute-api.us-east-1.amazonaws.com (proxied)
  • Monitor CloudWatch dashboard for 24-48 hours
  • Decommission Cloudflare tunnel on 10.1.1.4

3. Set up GCE cold standby failover

A pre-built GCE instance that stays stopped (~$3/month for disk). On AWS failure, start it and reroute traffic.

GCE setup (one-time)

Terminal window
gcloud config set project pdf-theaccessible-org
gcloud compute addresses create accessible-failover --region=us-central1
gcloud builds submit --config=cloudbuild.yaml .
gcloud compute instances create accessible-failover \
--zone=us-central1-a \
--machine-type=e2-standard-2 \
--image-family=cos-stable \
--image-project=cos-cloud \
--boot-disk-size=30GB \
--address=accessible-failover \
--metadata=startup-script='#!/bin/bash
docker-credential-gcr configure-docker
docker pull us-central1-docker.pkg.dev/pdf-theaccessible-org/pdf-api/node-server:latest
docker run -d --restart=unless-stopped \
-p 8790:8790 \
--env-file /etc/accessible-pdf/.env \
us-central1-docker.pkg.dev/pdf-theaccessible-org/pdf-api/node-server:latest'
gcloud compute scp .env.node-server accessible-failover:/etc/accessible-pdf/.env --zone=us-central1-a
gcloud compute instances stop accessible-failover --zone=us-central1-a

Failover procedure (~2-3 min)

Terminal window
gcloud compute instances start accessible-failover --zone=us-central1-a
# Wait for health, then switch DNS in Cloudflare

Optional: Cloudflare Load Balancer (~$5/month)

Automates failover — monitors AWS health endpoint, switches to GCE on failure.

4. Fix GitHub Actions billing (#207)

The Deploy AWS workflow is disabled. Fix billing, then gh workflow enable "Deploy AWS".

5. Fix zombie Chrome processes (#208)

Add --init to Docker containers or fix Puppeteer cleanup on error.

6. Optimize ASG scaling

Currently min=1 (always-on). Long-term: refactor so the SQS message stays in-flight until chunks are fully processed, allowing scale-to-zero.

Architecture

User → CF Worker (canary check)
[email protected]: proxy to AWS Lambda
→ everyone else: CF Worker → Node server (10.1.1.4)
AWS Flow:
Lambda API (auth, files, enqueue) → SQS
→ EC2 Spot Worker (preflight, chunk creation)
→ EC2 ChunkScheduler (Gemini vision, Puppeteer, remediation)
→ R2 (output) + Supabase (metadata)

Key References

ResourceValue
AWS Account696081601609 (us-east-1)
CLI Profileaccessible
Production APIhttps://x2tx47vflk.execute-api.us-east-1.amazonaws.com/
Staging APIhttps://uvhlpf575b.execute-api.us-east-1.amazonaws.com/
Custom Domainapi-pdf.theaccessible.orgd-5exruqu2od.execute-api.us-east-1.amazonaws.com
Prod ECR696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker
CloudWatch Dashboardaccessible-pdf-production-dashboard
CDK Stacksinfra/cdk/lib/stacks/
Env Configinfra/cdk/lib/env-config.ts
CI/CD Workflow.github/workflows/deploy-aws.yml (disabled)
Canary Configworkers/api/wrangler.tomlAWS_CANARY_URL, AWS_CANARY_EMAILS

Deploy Commands (Manual)

Terminal window
# Build Lambda
cd workers/api && npm run build:lambda
# Deploy CDK (staging or production)
cd infra/cdk
DEPLOY_ENV=production CDK_DEFAULT_ACCOUNT=696081601609 CDK_DEFAULT_REGION=us-east-1 \
AWS_PROFILE=accessible npx cdk deploy --all --require-approval never
# Build and push Docker image (from 10.1.1.4)
cd ~/accessible && git pull
docker build -f infra/cdk/docker/worker/Dockerfile -t 696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker:latest .
docker push 696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker:latest
# Refresh EC2 instances to pick up new image
aws autoscaling start-instance-refresh \
--profile accessible --region us-east-1 \
--auto-scaling-group-name "AccessiblePdfProd-Compute-WorkerAsgASGC59A845D-UUF3it2ENmxm" \
--preferences '{"MinHealthyPercentage":0}'