Skip to content

ECS Deployment

Template files: deploy/ecs/

All commands below assume you are running from the repository root.

  • VPC with subnets (public for testing, private + NAT Gateway for production)
  • Database URL stored in AWS Secrets Manager
Terminal window
aws secretsmanager create-secret --name dbward/database-url \
--secret-string "postgres://user:pass@mydb.rds.amazonaws.com:5432/app"

2. Deploy stack (server only, without agent)

Section titled “2. Deploy stack (server only, without agent)”
Terminal window
aws cloudformation deploy --stack-name dbward \
--template-file deploy/ecs/template.yaml \
--parameter-overrides \
VpcId=vpc-xxx \
SubnetIds=subnet-aaa \
DatabaseUrlSecretArn=arn:aws:secretsmanager:REGION:ACCOUNT:secret:dbward/database-url-XXXXXX \
--capabilities CAPABILITY_NAMED_IAM

3. Bootstrap tokens (one-time, after first deploy)

Section titled “3. Bootstrap tokens (one-time, after first deploy)”
Terminal window
TASK=$(aws ecs list-tasks --cluster dbward --service-name server \
--query 'taskArns[0]' --output text)
# Read admin token
aws ecs execute-command --cluster dbward --task $TASK --container server \
--interactive --command "cat /data/admin-token"
# Read agent token
aws ecs execute-command --cluster dbward --task $TASK --container server \
--interactive --command "cat /data/agent-token"
# Store agent token in Secrets Manager
aws secretsmanager create-secret --name dbward/agent-token --secret-string "dbw_..."
Terminal window
CLUSTER=$(aws cloudformation describe-stacks --stack-name dbward \
--query "Stacks[0].Outputs[?OutputKey=='ClusterName'].OutputValue" --output text)
SERVER_SG=$(aws cloudformation describe-stacks --stack-name dbward \
--query "Stacks[0].Outputs[?OutputKey=='ServerSecurityGroupId'].OutputValue" --output text)
aws cloudformation deploy --stack-name dbward-agent \
--template-file deploy/ecs/agent.yaml \
--parameter-overrides \
ClusterName=$CLUSTER \
VpcId=vpc-xxx \
SubnetIds=subnet-aaa \
ServerSecurityGroupId=$SERVER_SG \
AgentTokenSecretArn=arn:aws:secretsmanager:REGION:ACCOUNT:secret:dbward/agent-token-XXXXXX \
DatabaseUrlSecretArn=arn:aws:secretsmanager:REGION:ACCOUNT:secret:dbward/database-url-XXXXXX \
--capabilities CAPABILITY_NAMED_IAM
Terminal window
aws cloudformation deploy --stack-name dbward \
--template-file deploy/ecs/template.yaml \
--parameter-overrides ImageTag=v0.1.3 \
--capabilities CAPABILITY_NAMED_IAM

Store server config in SSM instead of inline to avoid heredoc issues and enable config changes without template updates:

Terminal window
# 1. Create parameter
aws ssm put-parameter --name /dbward/server-config --type SecureString --value '
state_dir = "/data"
[auth]
mode = "token"
[[databases]]
name = "app"
environments = ["production"]
'
# 2. Deploy with SSM
aws cloudformation deploy --stack-name dbward \
--template-file deploy/ecs/template.yaml \
--parameter-overrides \
VpcId=vpc-xxx \
SubnetIds=subnet-aaa \
ConfigSource=ssm \
SsmConfigParameter=/dbward/server-config \
--capabilities CAPABILITY_NAMED_IAM
# 3. Update config (no redeploy needed, just restart)
aws ssm put-parameter --name /dbward/server-config --overwrite --value '...'
aws ecs update-service --cluster dbward --service server --force-new-deployment
Section titled “S3 Result Storage (Recommended for Production)”

Deploy the storage stack separately (long-lived, independent of app lifecycle):

Terminal window
# 1. Deploy storage stack
aws cloudformation deploy --stack-name dbward-storage \
--template-file deploy/ecs/storage.yaml \
--parameter-overrides RetentionDays=30
# 2. Get bucket name
BUCKET=$(aws cloudformation describe-stacks --stack-name dbward-storage \
--query "Stacks[0].Outputs[?OutputKey=='BucketName'].OutputValue" --output text)
# 3. Deploy ECS stack with S3
aws cloudformation deploy --stack-name dbward \
--template-file deploy/ecs/template.yaml \
--parameter-overrides \
ResultStorageBackend=s3 \
ResultBucketName=$BUCKET \
...other params... \
--capabilities CAPABILITY_NAMED_IAM

The storage stack provides:

  • S3 bucket with encryption (AES256) and public access block
  • Lifecycle rule (auto-delete after RetentionDays)
  • DenyInsecureTransport bucket policy

The ECS stack’s ServerTaskRole gets minimal S3 permissions automatically:

  • s3:PutObject/GetObject/DeleteObject/PutObjectTagging on results/*
  • s3:ListBucket with prefix condition results/*

For stricter isolation, add a Deny policy to the bucket manually:

{
"Sid": "DenyAccessExceptAllowedPrincipals",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::BUCKET", "arn:aws:s3:::BUCKET/*"],
"Condition": {
"ArnNotEquals": {
"aws:PrincipalArn": [
"arn:aws:iam::ACCOUNT:role/dbward-server-task",
"arn:aws:iam::ACCOUNT:role/YOUR_CFN_ROLE",
"arn:aws:iam::ACCOUNT:root"
]
}
}
}

⚠️ Do NOT add this via CloudFormation — it can lock out CFn itself. Apply manually after both stacks are stable.

deploy/ecs/
├── template.yaml # Server: Cluster, EFS, Service Connect, IAM, ALB (optional)
├── agent.yaml # Agent: TaskDef, Service, SG, IAM (separate lifecycle)
└── storage.yaml # S3 bucket for result storage (optional, long-lived)
  • Server (template.yaml): Fargate + EFS (SQLite persistent storage, survives restarts)
  • Agent (agent.yaml): Fargate (stateless, polls server for tasks, independent scaling)
  • Service-to-service: Service Connect (sidecar proxy, agent resolves server:3000)
  • Secrets: AWS Secrets Manager → ECS environment variable injection
  • Storage: S3 bucket (separate stack) or local EFS
  • ALB: Optional (EnableAlb=true)

Server state (tokens, workflows, audit logs) is stored in SQLite on EFS:

  • EFS FileSystem with encryption at rest + in transit
  • Access Point enforces UID 10001 (dbward-server user)
  • IAM authorization restricts mount to ServerTaskRole only
  • Single replica — no concurrent writer concern

Agent connects to server via http://server:3000 using ECS Service Connect:

  • No DNS resolution issues (sidecar proxy handles routing)
  • Faster failover than DNS-based service discovery
  • No need for fixed IPs or load balancers for internal traffic
  • Production: Private subnets + NAT Gateway. Set AssignPublicIp: DISABLED in template.
  • Testing: Public subnets OK (template defaults to AssignPublicIp: ENABLED).
  • Server egress: HTTPS (443), NFS/EFS (2049), DNS (53).
  • Agent egress: server (3000 via Service Connect), database (configurable), HTTPS (443), DNS (53).
Terminal window
aws cloudformation delete-stack --stack-name dbward-agent
aws cloudformation delete-stack --stack-name dbward
aws cloudformation delete-stack --stack-name dbward-storage
aws secretsmanager delete-secret --secret-id dbward/database-url --force-delete-without-recovery
aws secretsmanager delete-secret --secret-id dbward/agent-token --force-delete-without-recovery
# Note: EFS and S3 bucket have DeletionPolicy: Retain. Delete manually if no longer needed.
ParameterRequiredDescription
VpcIdVPC ID
SubnetIdsSubnet IDs (at least one)
ImageRepositoryContainer image repo (default: ghcr.io/dbward-dev/dbward-server)
ImageTagImage tag (default: latest)
ServerConfigTomlserver.toml content
EnableAlbCreate ALB (default: false)
AlbSubnetIdsPublic subnets for ALB
EcrRepositoryNameECR repo name (enables ECR pull permissions)
AllowedIngressCidrCIDR for server access (default: 10.0.0.0/8)
ResultStorageBackend”local” or “s3” (default: local)
ResultBucketNameS3 bucket name (required when backend=s3)
ParameterRequiredDescription
ClusterNameECS Cluster name (from server stack)
VpcIdVPC ID
SubnetIdsSubnet IDs
ServerSecurityGroupIdServer SG ID (from server stack)
AgentTokenSecretArnSecrets Manager ARN for agent token
DatabaseUrlSecretArnSecrets Manager ARN for DB URL
ImageRepositoryContainer image repo
ImageTagImage tag
AgentDesiredCountAgent replicas (default: 1)
AgentConfigTomlagent.toml content
DatabasePortDB port for SG rule (default: 5432)
EcrRepositoryNameECR repo name

The CloudFormation templates configure ECS container health checks (/health) and ALB target group health checks (/ready) automatically. To receive alerts when something fails, add:

  • ECS service events → EventBridge rule → SNS topic for deploy failures and task crashes
  • ALB unhealthy targets → CloudWatch Alarm on UnHealthyHostCount metric

For application-level monitoring (agent offline detection, queue depth), poll GET /api/agents with a metrics.view token. See Server Health checks.