AWS Deployment Status
Completed (2026-03-30 — 2026-03-31)
- Create AWS account (using root account 696081601609)
- Create IAM user
cdk-deployerwith CLI access - Configure AWS CLI profile
accessible - Install CDK CLI, bootstrap CDK in us-east-1
- Set billing alarm ($100/month)
- Store all secrets in SSM Parameter Store (16 parameters under
/accessible-pdf/shared/) - Refactor CDK for multi-environment (staging/production)
- Generalize storage to support any S3-compatible provider (R2, S3, MinIO)
- Deploy staging stacks (all 7)
- Deploy production stacks (all 7)
- Create GitHub Actions CI/CD workflow (
deploy-aws.yml) - Build Lambda entry point (
npm run build:lambda) - Lambda API working: auth, files, tags, notifications, teams, upload, convert enqueue
- Push worker Docker image to ECR
- SQS convert flow: Lambda → SQS → EC2 worker (preflight + chunk creation)
- ChunkScheduler running on EC2 worker (processes conversions with Puppeteer/Gemini)
- End-to-end conversion tested and working on AWS (no Node server)
- Canary routing: [email protected] → AWS, everyone else → CF Worker
- GitHub secrets and environments configured
- ACM certificate issued for
api-pdf.theaccessible.org - API Gateway custom domain configured
- ASG min=1 so ChunkScheduler is always available
Remaining
1. Add AWS logging to admin dashboard (#209)
EC2 worker and Lambda logs need to be visible in pdf.theaccessible.org/admin.
Options: Supabase conversion_logs table or CloudWatch Logs API.
2. DNS cutover
- Remove CF Worker route for
api-pdf.theaccessible.org - Add CNAME:
api-pdf→d-5exruqu2od.execute-api.us-east-1.amazonaws.com(proxied) - Monitor CloudWatch dashboard for 24-48 hours
- Decommission Cloudflare tunnel on 10.1.1.4
3. Set up GCE cold standby failover
A pre-built GCE instance that stays stopped (~$3/month for disk). On AWS failure, start it and reroute traffic.
GCE setup (one-time)
gcloud config set project pdf-theaccessible-orggcloud compute addresses create accessible-failover --region=us-central1gcloud builds submit --config=cloudbuild.yaml .
gcloud compute instances create accessible-failover \ --zone=us-central1-a \ --machine-type=e2-standard-2 \ --image-family=cos-stable \ --image-project=cos-cloud \ --boot-disk-size=30GB \ --address=accessible-failover \ --metadata=startup-script='#!/bin/bash docker-credential-gcr configure-docker docker pull us-central1-docker.pkg.dev/pdf-theaccessible-org/pdf-api/node-server:latest docker run -d --restart=unless-stopped \ -p 8790:8790 \ --env-file /etc/accessible-pdf/.env \ us-central1-docker.pkg.dev/pdf-theaccessible-org/pdf-api/node-server:latest'
gcloud compute scp .env.node-server accessible-failover:/etc/accessible-pdf/.env --zone=us-central1-agcloud compute instances stop accessible-failover --zone=us-central1-aFailover procedure (~2-3 min)
gcloud compute instances start accessible-failover --zone=us-central1-a# Wait for health, then switch DNS in CloudflareOptional: Cloudflare Load Balancer (~$5/month)
Automates failover — monitors AWS health endpoint, switches to GCE on failure.
4. Fix GitHub Actions billing (#207)
The Deploy AWS workflow is disabled. Fix billing, then gh workflow enable "Deploy AWS".
5. Fix zombie Chrome processes (#208)
Add --init to Docker containers or fix Puppeteer cleanup on error.
6. Optimize ASG scaling
Currently min=1 (always-on). Long-term: refactor so the SQS message stays in-flight until chunks are fully processed, allowing scale-to-zero.
Architecture
User → CF Worker (canary check) → [email protected]: proxy to AWS Lambda → everyone else: CF Worker → Node server (10.1.1.4)
AWS Flow: Lambda API (auth, files, enqueue) → SQS → EC2 Spot Worker (preflight, chunk creation) → EC2 ChunkScheduler (Gemini vision, Puppeteer, remediation) → R2 (output) + Supabase (metadata)Key References
| Resource | Value |
|---|---|
| AWS Account | 696081601609 (us-east-1) |
| CLI Profile | accessible |
| Production API | https://x2tx47vflk.execute-api.us-east-1.amazonaws.com/ |
| Staging API | https://uvhlpf575b.execute-api.us-east-1.amazonaws.com/ |
| Custom Domain | api-pdf.theaccessible.org → d-5exruqu2od.execute-api.us-east-1.amazonaws.com |
| Prod ECR | 696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker |
| CloudWatch Dashboard | accessible-pdf-production-dashboard |
| CDK Stacks | infra/cdk/lib/stacks/ |
| Env Config | infra/cdk/lib/env-config.ts |
| CI/CD Workflow | .github/workflows/deploy-aws.yml (disabled) |
| Canary Config | workers/api/wrangler.toml → AWS_CANARY_URL, AWS_CANARY_EMAILS |
Deploy Commands (Manual)
# Build Lambdacd workers/api && npm run build:lambda
# Deploy CDK (staging or production)cd infra/cdkDEPLOY_ENV=production CDK_DEFAULT_ACCOUNT=696081601609 CDK_DEFAULT_REGION=us-east-1 \ AWS_PROFILE=accessible npx cdk deploy --all --require-approval never
# Build and push Docker image (from 10.1.1.4)cd ~/accessible && git pulldocker build -f infra/cdk/docker/worker/Dockerfile -t 696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker:latest .docker push 696081601609.dkr.ecr.us-east-1.amazonaws.com/accessible-pdf-production-worker:latest
# Refresh EC2 instances to pick up new imageaws autoscaling start-instance-refresh \ --profile accessible --region us-east-1 \ --auto-scaling-group-name "AccessiblePdfProd-Compute-WorkerAsgASGC59A845D-UUF3it2ENmxm" \ --preferences '{"MinHealthyPercentage":0}'