aws-cloudwatch
title: "Aws Cloudwatch" sidebar_label: "Aws Cloudwatch" description: "AWS CloudWatch monitoring for logs, metrics, alarms, and dashboards. Use when setting up monitoring, creating alarms, querying logs with Insights, configuring metric filters, building dashboards, or troubleshooting application issues."​
Aws Cloudwatch
AWS CloudWatch monitoring for logs, metrics, alarms, and dashboards. Use when setting up monitoring, creating alarms, querying logs with Insights, configuring metric filters, building dashboards, or troubleshooting application issues.
Details​
| Property | Value |
|---|---|
| Skill Directory | .github/skills/aws-cloudwatch/ |
| Phase | General |
| User Invocable | ✅ Yes |
| Usage | /aws-cloudwatch Metric namespace, service, or monitoring scenario to look up (e.g. 'Lambda error rate alarm', 'Logs Insights query', 'custom metric', 'dashboard widget') |
Documentation​
AWS CloudWatch
Amazon CloudWatch provides monitoring and observability for AWS resources and applications. It collects metrics, logs, and events, enabling you to monitor, troubleshoot, and optimize your AWS environment.
Table of Contents​
Core Concepts​
Metrics​
Time-ordered data points published to CloudWatch. Key components:
- Namespace: Container for metrics (e.g.,
AWS/Lambda) - Metric name: Name of the measurement (e.g.,
Invocations) - Dimensions: Name-value pairs for filtering (e.g.,
FunctionName=MyFunc) - Statistics: Aggregations (Sum, Average, Min, Max, SampleCount, pN)
Logs​
Log data from AWS services and applications:
- Log groups: Collections of log streams
- Log streams: Sequences of log events from same source
- Log events: Individual log entries with timestamp and message
Alarms​
Automated actions based on metric thresholds:
- States: OK, ALARM, INSUFFICIENT_DATA
- Actions: SNS notifications, Auto Scaling, EC2 actions
Common Patterns​
Create a Metric Alarm​
AWS CLI:
# CPU utilization alarm for EC2
aws cloudwatch put-metric-alarm \
--alarm-name "HighCPU-i-1234567890abcdef0" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
--ok-actions arn:aws:sns:us-east-1:123456789012:alerts
boto3:
import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_alarm(
AlarmName='HighCPU-i-1234567890abcdef0',
MetricName='CPUUtilization',
Namespace='AWS/EC2',
Statistic='Average',
Period=300,
Threshold=80.0,
ComparisonOperator='GreaterThanThreshold',
EvaluationPeriods=2,
Dimensions=[
{'Name': 'InstanceId', 'Value': 'i-1234567890abcdef0'}
],
AlarmActions=['arn:aws:sns:us-east-1:123456789012:alerts'],
OKActions=['arn:aws:sns:us-east-1:123456789012:alerts']
)
Lambda Error Rate Alarm (Math Expression)​
aws cloudwatch put-metric-alarm \
--alarm-name "LambdaErrorRate-MyFunction" \
--metrics '[
{
"Id": "errors",
"MetricStat": {
"Metric": {
"Namespace": "AWS/Lambda",
"MetricName": "Errors",
"Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "invocations",
"MetricStat": {
"Metric": {
"Namespace": "AWS/Lambda",
"MetricName": "Invocations",
"Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "errorRate",
"Expression": "errors/invocations*100",
"Label": "Error Rate",
"ReturnData": true
}
]' \
--threshold 5 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Query Logs with Insights​
# Find errors in Lambda logs
aws logs start-query \
--log-group-name /aws/lambda/MyFunction \
--start-time $(date -d '1 hour ago' +%s) \
--end-time $(date +%s) \
--query-string '
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
'
# Get query results
aws logs get-query-results --query-id <query-id>
boto3:
import boto3
import time
logs = boto3.client('logs')
# Start query
response = logs.start_query(
logGroupName='/aws/lambda/MyFunction',
startTime=int(time.time()) - 3600,
endTime=int(time.time()),
queryString='''
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
'''
)
query_id = response['queryId']
# Wait for results
while True:
result = logs.get_query_results(queryId=query_id)
if result['status'] == 'Complete':
break
time.sleep(1)
for row in result['results']:
print(row)
Create Metric Filter​
# Extract error count metric from log group
aws logs put-metric-filter \
--log-group-name /aws/lambda/MyFunction \
--filter-name ErrorCount \
--filter-pattern "ERROR" \
--metric-transformations \
metricName=ErrorCount,metricNamespace=MyApp,metricValue=1,defaultValue=0
Publish Custom Metrics​
import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[
{
'MetricName': 'OrdersProcessed',
'Value': 1,
'Unit': 'Count',
'Dimensions': [
{'Name': 'Environment', 'Value': 'Production'},
{'Name': 'OrderType', 'Value': 'Standard'}
]
}
]
)
Create Dashboard​
cat > dashboard.json << 'EOF'
{
"widgets": [
{
"type": "metric",
"x": 0, "y": 0, "width": 12, "height": 6,
"properties": {
"title": "Lambda Invocations",
"metrics": [
["AWS/Lambda", "Invocations", "FunctionName", "MyFunction"]
],
"period": 60,
"stat": "Sum",
"region": "us-east-1"
}
},
{
"type": "log",
"x": 12, "y": 0, "width": 12, "height": 6,
"properties": {
"title": "Recent Errors",
"query": "SOURCE '/aws/lambda/MyFunction' | filter @message like /ERROR/ | limit 20",
"region": "us-east-1"
}
}
]
}
EOF
aws cloudwatch put-dashboard \
--dashboard-name MyAppDashboard \
--dashboard-body file://dashboard.json
CLI Reference​
Metrics Commands​
| Command | Description |
|---|---|
aws cloudwatch put-metric-data | Publish custom metrics |
aws cloudwatch get-metric-data | Retrieve metric values |
aws cloudwatch get-metric-statistics | Get aggregated statistics |
aws cloudwatch list-metrics | List available metrics |
Alarms Commands​
| Command | Description |
|---|---|
aws cloudwatch put-metric-alarm | Create or update alarm |
aws cloudwatch describe-alarms | List alarms |
aws cloudwatch set-alarm-state | Manually set alarm state |
aws cloudwatch delete-alarms | Delete alarms |
Logs Commands​
| Command | Description |
|---|---|
aws logs create-log-group | Create log group |
aws logs put-log-events | Write log events |
aws logs filter-log-events | Search log events |
aws logs start-query | Start Insights query |
aws logs put-metric-filter | Create metric filter |
aws logs put-retention-policy | Set log retention |
Best Practices​
Metrics​
- Use dimensions wisely — too many creates metric explosion
- Aggregate before publishing — batch custom metrics
- Use high-resolution metrics (1-second) only when needed
- Set meaningful units for custom metrics
Alarms​
- Use composite alarms for complex conditions
- Set appropriate evaluation periods to avoid flapping
- Include OK actions to track recovery
- Use anomaly detection for dynamic thresholds
Logs​
- Set retention policies — don't keep logs forever
- Use structured logging (JSON) for better querying
- Create metric filters for key events
- Use Contributor Insights for top-N analysis
Cost Optimization​
- Delete unused dashboards
- Reduce log retention for non-critical logs
- Avoid high-resolution metrics unless necessary
Troubleshooting​
Missing Metrics​
Causes:
- Service not publishing yet (wait 1-5 minutes)
- Wrong namespace/dimensions
- Detailed monitoring not enabled (EC2)
Debug:
# List metrics for a namespace
aws cloudwatch list-metrics \
--namespace AWS/Lambda \
--dimensions Name=FunctionName,Value=MyFunction
Alarm Stuck in INSUFFICIENT_DATA​
Causes:
- Metric not being published
- Dimensions mismatch
- Evaluation period too short
Debug:
# Check if metric has data
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Invocations \
--dimensions Name=FunctionName,Value=MyFunction \
--start-time $(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Sum
Log Events Not Appearing​
Causes:
- IAM permissions missing
- CloudWatch Logs agent not running
- Log group doesn't exist
Debug:
# Check log streams
aws logs describe-log-streams \
--log-group-name /aws/lambda/MyFunction \
--order-by LastEventTime \
--descending \
--limit 5