aws-ecs
title: "Aws Ecs" sidebar_label: "Aws Ecs" description: "AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues."​
Aws Ecs
AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues.
Details​
| Property | Value |
|---|---|
| Skill Directory | .github/skills/aws-ecs/ |
| Phase | General |
| User Invocable | ✅ Yes |
| Usage | /aws-ecs Container workload type, issue, or configuration to look up (e.g. 'Fargate task definition', 'service with load balancer', 'container keeps restarting') |
Documentation​
AWS ECS
Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.
Table of Contents​
Core Concepts​
Cluster​
Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.
Task Definition​
Blueprint for your application. Defines containers, resources, networking, and IAM roles.
Task​
Running instance of a task definition. Can run standalone or as part of a service.
Service​
Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.
Launch Types​
| Type | Description | Use Case |
|---|---|---|
| Fargate | Serverless, pay per task | Most workloads |
| EC2 | Self-managed instances | GPU, Windows, specific requirements |
Common Patterns​
Create a Fargate Cluster​
AWS CLI:
# Create cluster
aws ecs create-cluster --cluster-name my-cluster
# With capacity providers
aws ecs create-cluster \
--cluster-name my-cluster \
--capacity-providers FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,weight=1 \
capacityProvider=FARGATE_SPOT,weight=1
Register Task Definition​
cat > task-definition.json << 'EOF'
{
"family": "web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
EOF
aws ecs register-task-definition --cli-input-json file://task-definition.json
Create Service with Load Balancer​
aws ecs create-service \
--cluster my-cluster \
--service-name web-service \
--task-definition web-app:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678,subnet-87654321],
securityGroups=[sg-12345678],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
--health-check-grace-period-seconds 60
Run Standalone Task​
aws ecs run-task \
--cluster my-cluster \
--task-definition my-batch-job:1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678],
securityGroups=[sg-12345678],
assignPublicIp=ENABLED
}"
Update Service (Deploy New Image)​
# Register new task definition with updated image
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Update service to use new version
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--task-definition web-app:2 \
--force-new-deployment
Auto Scaling​
# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/my-cluster/web-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 10
# Target tracking policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/my-cluster/web-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 120
}'
CLI Reference​
Cluster Management​
| Command | Description |
|---|---|
aws ecs create-cluster | Create cluster |
aws ecs describe-clusters | Get cluster details |
aws ecs list-clusters | List clusters |
aws ecs delete-cluster | Delete cluster |
Task Definitions​
| Command | Description |
|---|---|
aws ecs register-task-definition | Create task definition |
aws ecs describe-task-definition | Get task definition |
aws ecs list-task-definitions | List task definitions |
aws ecs deregister-task-definition | Deregister version |
Services​
| Command | Description |
|---|---|
aws ecs create-service | Create service |
aws ecs update-service | Update service |
aws ecs describe-services | Get service details |
aws ecs delete-service | Delete service |
Tasks​
| Command | Description |
|---|---|
aws ecs run-task | Run standalone task |
aws ecs stop-task | Stop running task |
aws ecs describe-tasks | Get task details |
aws ecs list-tasks | List tasks |
Best Practices​
Security​
- Use task roles for AWS API access (not access keys)
- Use execution roles for ECR/Secrets access
- Store secrets in Secrets Manager or Parameter Store
- Use private subnets with NAT gateway
- Enable CloudTrail for API auditing
Performance​
- Right-size CPU/memory — monitor and adjust
- Use Fargate Spot for fault-tolerant workloads (70% savings)
- Enable container insights for monitoring
- Use service discovery for internal communication
Reliability​
- Deploy across multiple AZs
- Configure health checks properly
- Set appropriate deregistration delay
- Use circuit breaker for deployments
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
}'
Cost Optimization​
- Use Fargate Spot for batch workloads
- Right-size task resources
- Scale to zero when not needed
- Use capacity providers for mixed Fargate/Spot
Troubleshooting​
Task Fails to Start​
Check:
# View stopped tasks
aws ecs describe-tasks \
--cluster my-cluster \
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
Common causes:
- Image not found (ECR permissions)
- Secrets access denied
- Network configuration (subnets, security groups)
- Resource limits exceeded
Container Keeps Restarting​
Debug:
# Check CloudWatch logs
aws logs get-log-events \
--log-group-name /ecs/web-app \
--log-stream-name "ecs/web/abc123"
# Check task details
aws ecs describe-tasks \
--cluster my-cluster \
--tasks task-arn \
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
Causes:
- Health check failing
- Application crashing
- Out of memory
Service Stuck Deploying​
# Check deployment status
aws ecs describe-services \
--cluster my-cluster \
--services web-service \
--query 'services[0].deployments'
# Check events
aws ecs describe-services \
--cluster my-cluster \
--services web-service \
--query 'services[0].events[:5]'
Cannot Pull Image from ECR​
Check execution role has:
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}