Skip to main content

aws-ecs


title: "Aws Ecs" sidebar_label: "Aws Ecs" description: "AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues."​

Aws Ecs

AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues.

Details​

PropertyValue
Skill Directory.github/skills/aws-ecs/
PhaseGeneral
User Invocable✅ Yes
Usage/aws-ecs Container workload type, issue, or configuration to look up (e.g. 'Fargate task definition', 'service with load balancer', 'container keeps restarting')

Documentation​

AWS ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.

Table of Contents​

Core Concepts​

Cluster​

Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.

Task Definition​

Blueprint for your application. Defines containers, resources, networking, and IAM roles.

Task​

Running instance of a task definition. Can run standalone or as part of a service.

Service​

Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.

Launch Types​

TypeDescriptionUse Case
FargateServerless, pay per taskMost workloads
EC2Self-managed instancesGPU, Windows, specific requirements

Common Patterns​

Create a Fargate Cluster​

AWS CLI:

# Create cluster
aws ecs create-cluster --cluster-name my-cluster

# With capacity providers
aws ecs create-cluster \
--cluster-name my-cluster \
--capacity-providers FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,weight=1 \
capacityProvider=FARGATE_SPOT,weight=1

Register Task Definition​

cat > task-definition.json << 'EOF'
{
"family": "web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json

Create Service with Load Balancer​

aws ecs create-service \
--cluster my-cluster \
--service-name web-service \
--task-definition web-app:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678,subnet-87654321],
securityGroups=[sg-12345678],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
--health-check-grace-period-seconds 60

Run Standalone Task​

aws ecs run-task \
--cluster my-cluster \
--task-definition my-batch-job:1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678],
securityGroups=[sg-12345678],
assignPublicIp=ENABLED
}"

Update Service (Deploy New Image)​

# Register new task definition with updated image
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Update service to use new version
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--task-definition web-app:2 \
--force-new-deployment

Auto Scaling​

# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/my-cluster/web-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 10

# Target tracking policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/my-cluster/web-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 120
}'

CLI Reference​

Cluster Management​

CommandDescription
aws ecs create-clusterCreate cluster
aws ecs describe-clustersGet cluster details
aws ecs list-clustersList clusters
aws ecs delete-clusterDelete cluster

Task Definitions​

CommandDescription
aws ecs register-task-definitionCreate task definition
aws ecs describe-task-definitionGet task definition
aws ecs list-task-definitionsList task definitions
aws ecs deregister-task-definitionDeregister version

Services​

CommandDescription
aws ecs create-serviceCreate service
aws ecs update-serviceUpdate service
aws ecs describe-servicesGet service details
aws ecs delete-serviceDelete service

Tasks​

CommandDescription
aws ecs run-taskRun standalone task
aws ecs stop-taskStop running task
aws ecs describe-tasksGet task details
aws ecs list-tasksList tasks

Best Practices​

Security​

  • Use task roles for AWS API access (not access keys)
  • Use execution roles for ECR/Secrets access
  • Store secrets in Secrets Manager or Parameter Store
  • Use private subnets with NAT gateway
  • Enable CloudTrail for API auditing

Performance​

  • Right-size CPU/memory — monitor and adjust
  • Use Fargate Spot for fault-tolerant workloads (70% savings)
  • Enable container insights for monitoring
  • Use service discovery for internal communication

Reliability​

  • Deploy across multiple AZs
  • Configure health checks properly
  • Set appropriate deregistration delay
  • Use circuit breaker for deployments
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
}'

Cost Optimization​

  • Use Fargate Spot for batch workloads
  • Right-size task resources
  • Scale to zero when not needed
  • Use capacity providers for mixed Fargate/Spot

Troubleshooting​

Task Fails to Start​

Check:

# View stopped tasks
aws ecs describe-tasks \
--cluster my-cluster \
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)

Common causes:

  • Image not found (ECR permissions)
  • Secrets access denied
  • Network configuration (subnets, security groups)
  • Resource limits exceeded

Container Keeps Restarting​

Debug:

# Check CloudWatch logs
aws logs get-log-events \
--log-group-name /ecs/web-app \
--log-stream-name "ecs/web/abc123"

# Check task details
aws ecs describe-tasks \
--cluster my-cluster \
--tasks task-arn \
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'

Causes:

  • Health check failing
  • Application crashing
  • Out of memory

Service Stuck Deploying​

# Check deployment status
aws ecs describe-services \
--cluster my-cluster \
--services web-service \
--query 'services[0].deployments'

# Check events
aws ecs describe-services \
--cluster my-cluster \
--services web-service \
--query 'services[0].events[:5]'

Cannot Pull Image from ECR​

Check execution role has:

{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}

References​