{"skill":{"slug":"aws-ecs-monitor","displayName":"aws-ecs-monitor","summary":"AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch...","description":"---\nname: aws-ecs-monitor\nversion: 1.0.1\ndescription: AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch log analysis for error categorization, restart detection, and production alerts.\nmetadata:\n  openclaw:\n    requires:\n      bins: [\"aws\", \"curl\", \"python3\"]\n      anyBins: [\"openssl\"]\n---\n\n# AWS ECS Monitor\n\nProduction health monitoring and log analysis for AWS ECS services.\n\n## What It Does\n\n- **Health Checks**: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry\n- **Log Analysis**: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise\n- **Auto-Diagnosis**: Reads health status and automatically investigates failing services via log analysis\n\n## Prerequisites\n\n- `aws` CLI configured with appropriate IAM permissions:\n  - `ecs:ListServices`, `ecs:DescribeServices`\n  - `elasticloadbalancing:DescribeTargetGroups`, `elasticloadbalancing:DescribeTargetHealth`\n  - `logs:FilterLogEvents`, `logs:DescribeLogGroups`\n- `curl` for HTTP health checks\n- `python3` for JSON processing and log analysis\n- `openssl` for SSL certificate checks (optional)\n\n## Configuration\n\nAll configuration is via environment variables:\n\n| Variable | Required | Default | Description |\n|---|---|---|---|\n| `ECS_CLUSTER` | **Yes** | — | ECS cluster name |\n| `ECS_REGION` | No | `us-east-1` | AWS region |\n| `ECS_DOMAIN` | No | — | Domain for HTTP/SSL checks (skip if unset) |\n| `ECS_SERVICES` | No | auto-detect | Comma-separated service names to monitor |\n| `ECS_HEALTH_STATE` | No | `./data/ecs-health.json` | Path to write health state JSON |\n| `ECS_HEALTH_OUTDIR` | No | `./data/` | Output directory for logs and alerts |\n| `ECS_LOG_PATTERN` | No | `/ecs/{service}` | CloudWatch log group pattern (`{service}` is replaced) |\n| `ECS_HTTP_ENDPOINTS` | No | — | Comma-separated `name=url` pairs for HTTP probes |\n\n## Directories Written\n\n- **`ECS_HEALTH_STATE`** (default: `./data/ecs-health.json`) — Health state JSON file\n- **`ECS_HEALTH_OUTDIR`** (default: `./data/`) — Output directory for logs, alerts, and analysis reports\n\n## Scripts\n\n### `scripts/ecs-health.sh` — Health Monitor\n\n```bash\n# Full check\nECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh\n\n# JSON output only\nECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json\n\n# Quiet mode (no alerts, just status file)\nECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet\n```\n\nExit codes: `0` = healthy, `1` = unhealthy/degraded, `2` = script error\n\n### `scripts/cloudwatch-logs.sh` — Log Analyzer\n\n```bash\n# Pull raw logs from a service\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30\n\n# Show errors across all services\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120\n\n# Deep analysis with error categorization\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60\n\n# Detect container restarts\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api\n\n# Auto-diagnose from health state file\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose\n\n# Summary across all services\nECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120\n```\n\nOptions: `--minutes N` (default: 60), `--json`, `--limit N` (default: 200), `--verbose`\n\n## Auto-Detection\n\nWhen `ECS_SERVICES` is not set, both scripts auto-detect services from the cluster:\n\n```bash\naws ecs list-services --cluster $ECS_CLUSTER\n```\n\nLog groups are resolved by pattern (default `/ecs/{service}`). Override with `ECS_LOG_PATTERN`:\n\n```bash\n# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc.\nECS_LOG_PATTERN=\"/ecs/prod/{service}\" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose\n```\n\n## Integration\n\nThe health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set `ECS_HEALTH_OUTDIR` to a shared directory and run both scripts together:\n\n```bash\nexport ECS_CLUSTER=my-cluster\nexport ECS_DOMAIN=example.com\nexport ECS_HEALTH_OUTDIR=./data\n\n# Run health check (auto-triggers log analysis on failure)\n./scripts/ecs-health.sh\n\n# Or run log analysis independently\n./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30\n```\n\n## Error Categories\n\nThe log analyzer classifies errors into:\n\n- `panic` — Go panics\n- `fatal` — Fatal errors\n- `oom` — Out of memory\n- `timeout` — Connection/request timeouts\n- `connection_error` — Connection refused/reset\n- `http_5xx` — HTTP 500-level responses\n- `python_traceback` — Python tracebacks\n- `exception` — Generic exceptions\n- `auth_error` — Permission/authorization failures\n- `structured_error` — JSON-structured error logs\n- `error` — Generic ERROR-level messages\n\nHealth check noise (GET/HEAD `/health` from ALB) is automatically filtered from error counts and HTTP status distribution.\n","topics":["Log Analysis","Health"],"tags":{"latest":"1.0.1"},"stats":{"comments":0,"downloads":2608,"installsAllTime":98,"installsCurrent":7,"stars":0,"versions":2},"createdAt":1770066711366,"updatedAt":1778987949777},"latestVersion":{"version":"1.0.1","createdAt":1771529230156,"changelog":"Fix security scan flags: declare runtime dependencies, document env vars and write paths","license":null},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"briancolinger","userId":"s1703pbrqfdv5rjpy9mp3x44ah83qysr","displayName":"Brian Colinger","image":"https://avatars.githubusercontent.com/u/8125655?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1779941191688}}