Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Autoscaling Policy Designer

v1.0.0

Design autoscaling policies based on traffic patterns, cost constraints, and performance SLOs

0· 34· 1 versions· 0 current· 0 all-time· Updated 4h ago· MIT-0

Install

openclaw skills install autoscaling-policy-designer

Autoscaling Policy Designer

Design autoscaling policies that balance performance, cost, and reliability. This skill teaches an AI agent to analyze historical traffic patterns, recommend scaling thresholds, configure Kubernetes HPA/KEDA or cloud-native autoscalers, simulate behavior under load, and model the cost impact of different scaling strategies.

Use when: "design autoscaling", "scaling policy", "HPA configuration", "KEDA setup", "scale to zero", "autoscaling thresholds", "scaling costs", "traffic spike handling", "over-provisioned", "under-provisioned"

Commands

1. analyze -- Study traffic patterns

Before designing a policy, understand the workload. Collect metrics, identify patterns, and classify the traffic shape.

Step 1: Collect historical utilization data

# Kubernetes: Get CPU/memory utilization over 7 days from Prometheus
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=avg(rate(container_cpu_usage_seconds_total{namespace="production",pod=~"api-.*"}[5m])) by (pod)' \
  --data-urlencode "start=$(date -d '7 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1h' | python3 -c "
import json, sys
from datetime import datetime

data = json.load(sys.stdin)
for series in data['data']['result']:
    pod = series['metric'].get('pod', 'aggregate')
    values = [float(v[1]) for v in series['values']]
    print(f'{pod}:')
    print(f'  min:  {min(values):.3f} cores')
    print(f'  avg:  {sum(values)/len(values):.3f} cores')
    print(f'  max:  {max(values):.3f} cores')
    print(f'  p95:  {sorted(values)[int(len(values)*0.95)]:.3f} cores')
    print(f'  p99:  {sorted(values)[int(len(values)*0.99)]:.3f} cores')
"

# AWS: Get CloudWatch CPU utilization for an ASG
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=AutoScalingGroupName,Value="$ASG_NAME" \
  --start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
  --period 3600 \
  --statistics Average Maximum \
  --output json | python3 -c "
import json, sys
data = json.load(sys.stdin)
points = sorted(data['Datapoints'], key=lambda x: x['Timestamp'])
for p in points:
    print(f'{p[\"Timestamp\"]:>25}  avg={p[\"Average\"]:5.1f}%  max={p[\"Maximum\"]:5.1f}%')
"

Step 2: Identify the traffic pattern class

Classify the workload into one of these patterns, because each requires a different scaling strategy:

import json, sys
from collections import defaultdict
from datetime import datetime

def classify_traffic(timestamps_values):
    """Classify traffic into a pattern type based on 7 days of hourly data."""
    by_hour = defaultdict(list)
    by_weekday = defaultdict(list)

    for ts, val in timestamps_values:
        dt = datetime.fromtimestamp(float(ts))
        by_hour[dt.hour].append(float(val))
        by_weekday[dt.weekday()].append(float(val))

    hourly_avgs = {h: sum(v)/len(v) for h, v in by_hour.items()}
    weekday_avgs = {d: sum(v)/len(v) for d, v in by_weekday.items()}

    peak_hour = max(hourly_avgs, key=hourly_avgs.get)
    trough_hour = min(hourly_avgs, key=hourly_avgs.get)
    peak_to_trough = hourly_avgs[peak_hour] / max(hourly_avgs[trough_hour], 0.001)

    weekday_avg = sum(weekday_avgs.get(d, 0) for d in range(5)) / 5
    weekend_avg = sum(weekday_avgs.get(d, 0) for d in range(5, 7)) / 2

    all_values = [v for _, v in timestamps_values]
    max_val = max(float(v) for v in all_values)
    avg_val = sum(float(v) for v in all_values) / len(all_values)
    spike_ratio = max_val / max(avg_val, 0.001)

    pattern = {
        "peak_hour": f"{peak_hour}:00",
        "trough_hour": f"{trough_hour}:00",
        "peak_to_trough_ratio": round(peak_to_trough, 1),
        "weekday_vs_weekend_ratio": round(weekday_avg / max(weekend_avg, 0.001), 1),
        "spike_ratio": round(spike_ratio, 1),
    }

    if peak_to_trough > 3:
        pattern["type"] = "DAILY_CYCLE"
        pattern["strategy"] = "Predictive scaling + reactive HPA. Pre-warm before peak hours."
    elif spike_ratio > 5:
        pattern["type"] = "SPIKE"
        pattern["strategy"] = "Aggressive scale-up (short stabilization window), conservative scale-down."
    elif weekday_avg / max(weekend_avg, 0.001) > 2:
        pattern["type"] = "WEEKLY_CYCLE"
        pattern["strategy"] = "Scheduled scaling for weekday/weekend transitions + HPA for within-day variation."
    else:
        pattern["type"] = "STEADY_STATE"
        pattern["strategy"] = "Simple target-tracking policy. Right-size the baseline."

    return pattern

# Example: parse Prometheus query_range output
# result = classify_traffic(data['data']['result'][0]['values'])
# print(json.dumps(result, indent=2))

Step 3: Analyze request-level metrics (for RPS-based scaling)

# Get requests per second over 7 days
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=sum(rate(http_requests_total{namespace="production",service="api"}[5m]))' \
  --data-urlencode "start=$(date -d '7 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1h' | python3 -c "
import json, sys
data = json.load(sys.stdin)
values = [(float(v[0]), float(v[1])) for v in data['data']['result'][0]['values']]
rps_values = [v for _, v in values]
print(f'RPS over 7 days:')
print(f'  min:  {min(rps_values):.0f} rps')
print(f'  avg:  {sum(rps_values)/len(rps_values):.0f} rps')
print(f'  max:  {max(rps_values):.0f} rps')
print(f'  p99:  {sorted(rps_values)[int(len(rps_values)*0.99)]:.0f} rps')
print(f'  Capacity per pod (from load tests): ~200 rps')
print(f'  Min pods needed at peak: {int(max(rps_values)/200) + 1}')
print(f'  Min pods needed at trough: {max(1, int(min(rps_values)/200))}')
"

# Get response latency percentiles to determine SLO baseline
curl -s "$PROMETHEUS_URL/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="api"}[5m])) by (le))' | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
p99 = float(data['data']['result'][0]['value'][1])
print(f'Current p99 latency: {p99*1000:.0f}ms')
if p99 < 0.2:
    print('SLO headroom: GOOD (p99 < 200ms)')
elif p99 < 0.5:
    print('SLO headroom: TIGHT (p99 200-500ms)')
else:
    print('SLO headroom: CRITICAL (p99 > 500ms, scaling may be needed now)')
"

Report template

## Traffic Pattern Analysis

**Service:** api-service
**Period:** YYYY-MM-DD to YYYY-MM-DD (7 days)
**Data source:** Prometheus

### Utilization Summary
- CPU: avg 0.35 cores, p95 1.2 cores, max 2.1 cores
- Memory: avg 512MB, p95 780MB, max 1.1GB
- RPS: avg 450, p95 1,200, max 2,800

### Pattern Classification
- **Type:** DAILY_CYCLE
- **Peak hours:** 09:00-17:00 UTC
- **Trough hours:** 02:00-06:00 UTC
- **Peak-to-trough ratio:** 4.2x
- **Weekend reduction:** 60% lower than weekday

### Scaling Implications
- Minimum pods needed at trough: 3
- Minimum pods needed at peak: 14
- Currently running: 10 (fixed) -- overprovisioned at night, tight at peak
- Recommended strategy: Predictive scaling + reactive HPA

2. design -- Create a scaling policy

Based on the traffic analysis, generate a concrete autoscaler configuration.

Step 1: Kubernetes HPA (resource-based)

# Standard HPA for daily-cycle workloads
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3           # Floor: handles trough traffic + one pod failure
  maxReplicas: 25          # Ceiling: cost cap
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60    # React to traffic in 1 min
      policies:
        - type: Percent
          value: 100                    # Can double capacity per minute
          periodSeconds: 60
        - type: Pods
          value: 4                      # But add at least 4 pods at a time
          periodSeconds: 60
      selectPolicy: Max                 # Use whichever adds more pods
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25                     # Remove at most 25% per 2 min
          periodSeconds: 120
      selectPolicy: Min                 # Use whichever removes fewer pods
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65        # Target 65% -- headroom for spikes
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Step 2: KEDA (event-driven scaling)

For workloads that should scale based on queue depth, RPS, or custom metrics.

# KEDA ScaledObject for a queue-processing worker
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: worker
  pollingInterval: 15
  cooldownPeriod: 120
  minReplicaCount: 0        # Scale to zero when queue is empty
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://user:pass@rabbitmq.production:5672/
        queueName: jobs
        queueLength: "10"    # 1 pod per 10 queued messages
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        query: sum(rate(http_requests_total{service="api"}[2m]))
        threshold: "100"     # 1 pod per 100 rps
        activationThreshold: "5"  # Don't scale from zero until 5 rps

Step 3: AWS Auto Scaling Group policy

# Create a target-tracking scaling policy for an ASG
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name "cpu-target-tracking" \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 65.0,
    "ScaleInCooldown": 300,
    "ScaleOutCooldown": 60
  }'

# Add a scheduled scaling action for known daily pattern
aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name "$ASG_NAME" \
  --scheduled-action-name "morning-scaleup" \
  --recurrence "0 8 * * MON-FRI" \
  --min-size 6 \
  --desired-capacity 8

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name "$ASG_NAME" \
  --scheduled-action-name "evening-scaledown" \
  --recurrence "0 20 * * *" \
  --min-size 2 \
  --desired-capacity 3

Step 4: Validate the design

# Check current HPA status
kubectl get hpa -n production -o wide

# Verify HPA can read the metrics it needs
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/production/pods" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for pod in data['items']:
    name = pod['metadata']['name']
    for c in pod['containers']:
        cpu = c['usage']['cpu']
        mem = c['usage']['memory']
        print(f'{name}: cpu={cpu}, mem={mem}')
"

# Check if custom metrics API is available (needed for RPS-based scaling)
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" 2>/dev/null && echo "Custom metrics API available" || echo "Custom metrics API NOT available -- install prometheus-adapter"

3. simulate -- Model behavior under load

Before deploying a scaling policy, simulate how it would react to different traffic scenarios.

Step 1: Replay historical traffic against the proposed policy

import json

def simulate_hpa(traffic_rps, capacity_per_pod, target_utilization,
                 min_replicas, max_replicas, scaleup_window_s, scaledown_window_s,
                 interval_s=60):
    """Simulate HPA behavior over a traffic timeline."""
    current_replicas = min_replicas
    history = []
    scaleup_cooldown = 0
    scaledown_cooldown = 0

    for i, rps in enumerate(traffic_rps):
        timestamp_min = i * interval_s // 60
        total_capacity = current_replicas * capacity_per_pod
        utilization = rps / max(total_capacity, 1)

        desired = max(min_replicas, min(max_replicas,
                      int(rps / (capacity_per_pod * target_utilization)) + 1))

        if desired > current_replicas and scaleup_cooldown <= 0:
            # Scale up: can double at most
            scale_to = min(desired, current_replicas * 2, max_replicas)
            current_replicas = scale_to
            scaleup_cooldown = scaleup_window_s // interval_s
            event = "SCALE UP"
        elif desired < current_replicas and scaledown_cooldown <= 0:
            # Scale down: remove at most 25%
            scale_to = max(desired, int(current_replicas * 0.75), min_replicas)
            current_replicas = scale_to
            scaledown_cooldown = scaledown_window_s // interval_s
            event = "SCALE DOWN"
        else:
            event = ""

        scaleup_cooldown = max(0, scaleup_cooldown - 1)
        scaledown_cooldown = max(0, scaledown_cooldown - 1)

        slo_ok = utilization < 0.85  # SLO: stay under 85% utilization
        history.append({
            "minute": timestamp_min,
            "rps": rps,
            "replicas": current_replicas,
            "utilization": round(utilization * 100, 1),
            "slo_ok": slo_ok,
            "event": event
        })

    return history

# Scenario 1: Normal daily cycle (24 hours, 1-min intervals)
import math
daily_traffic = [int(200 + 800 * max(0, math.sin((h - 6) * math.pi / 12)))
                 for h in range(24) for _ in range(60)]

result = simulate_hpa(
    traffic_rps=daily_traffic,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

slo_violations = sum(1 for r in result if not r['slo_ok'])
max_replicas_used = max(r['replicas'] for r in result)
print(f"Daily cycle simulation:")
print(f"  SLO violations: {slo_violations} / {len(result)} minutes ({slo_violations/len(result)*100:.1f}%)")
print(f"  Max replicas used: {max_replicas_used}")
print(f"  Scale events: {sum(1 for r in result if r['event'])}")

Step 2: Simulate a traffic spike

# Scenario 2: 10x traffic spike lasting 15 minutes
spike_traffic = [300] * 60 + [3000] * 15 + [300] * 60  # ramp, spike, recovery

result = simulate_hpa(
    traffic_rps=spike_traffic,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

# Find how long until capacity catches up
spike_start = 60
for r in result[spike_start:]:
    if r['utilization'] < 85:
        catch_up_min = r['minute'] - spike_start
        print(f"Capacity caught up in {catch_up_min} minutes after spike start")
        break
else:
    print("WARNING: Capacity never caught up during spike")

slo_violations_during_spike = sum(1 for r in result[60:75] if not r['slo_ok'])
print(f"SLO violations during spike: {slo_violations_during_spike} / 15 minutes")

Step 3: Check for flapping

# Scenario 3: Oscillating traffic (tests stabilization windows)
import random
oscillating = [500 + 300 * (1 if i % 6 < 3 else -1) + random.randint(-50, 50)
               for i in range(120)]

result = simulate_hpa(
    traffic_rps=oscillating,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

scale_events = [r for r in result if r['event']]
print(f"Oscillation test: {len(scale_events)} scale events in {len(result)} minutes")
if len(scale_events) > 20:
    print("WARNING: Possible flapping. Increase stabilization windows.")
else:
    print("OK: Scaling is stable under oscillating load.")

4. cost -- Project scaling costs

Model the monthly cost of the autoscaling policy versus alternatives.

Step 1: Calculate cost for different strategies

import json

def model_monthly_cost(
    strategy,
    min_pods, max_pods,
    cpu_per_pod, mem_gb_per_pod,
    cpu_cost_hr, mem_cost_hr_gb,
    peak_hours_per_day=8,
    avg_pods_at_peak=None,
    avg_pods_off_peak=None
):
    """Model monthly cost of a scaling strategy."""
    hours_per_month = 730  # 24 * 30.4

    if strategy == "fixed_at_peak":
        pods = max_pods
        cost = pods * hours_per_month * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2), "avg_pods": pods}

    elif strategy == "fixed_at_average":
        pods = (min_pods + max_pods) // 2
        cost = pods * hours_per_month * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2), "avg_pods": pods,
                "risk": "Under-provisioned at peak, SLO violations likely"}

    elif strategy == "autoscaled":
        peak_hours = peak_hours_per_day * 30.4
        off_peak_hours = hours_per_month - peak_hours
        peak_pods = avg_pods_at_peak or int(max_pods * 0.7)
        off_peak_pods = avg_pods_off_peak or min_pods
        cost = ((peak_pods * peak_hours + off_peak_pods * off_peak_hours) *
                (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb))
        return {"strategy": strategy, "monthly_cost": round(cost, 2),
                "avg_pods_peak": peak_pods, "avg_pods_off_peak": off_peak_pods}

    elif strategy == "scale_to_zero":
        # For batch/worker: assume active only when queue has items
        active_hours = peak_hours_per_day * 30.4
        avg_pods = avg_pods_at_peak or max_pods // 2
        cost = avg_pods * active_hours * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2),
                "active_hours_per_month": round(active_hours, 0)}

# Compare strategies
params = dict(min_pods=3, max_pods=20, cpu_per_pod=0.5, mem_gb_per_pod=1.0,
              cpu_cost_hr=0.048, mem_cost_hr_gb=0.006, peak_hours_per_day=8,
              avg_pods_at_peak=14, avg_pods_off_peak=3)

strategies = ["fixed_at_peak", "fixed_at_average", "autoscaled", "scale_to_zero"]
results = []
for s in strategies:
    results.append(model_monthly_cost(strategy=s, **params))

baseline = results[0]["monthly_cost"]
print(f"{'Strategy':<20} {'Monthly Cost':>12} {'vs Fixed Peak':>14}")
print("-" * 48)
for r in results:
    savings = (1 - r["monthly_cost"] / baseline) * 100
    print(f"{r['strategy']:<20} ${r['monthly_cost']:>10.2f} {savings:>+12.1f}%")

Step 2: Factor in spot/preemptible instances

# AWS: Compare on-demand vs spot pricing for the instance type
aws ec2 describe-spot-price-history \
  --instance-types m5.large \
  --product-descriptions "Linux/UNIX" \
  --start-time "$(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S)" \
  --query 'SpotPriceHistory[*].{AZ:AvailabilityZone,Price:SpotPrice,Time:Timestamp}' \
  --output table

# GKE: Check if node pool supports spot VMs
gcloud container node-pools describe "$NODE_POOL" \
  --cluster "$CLUSTER" --zone "$ZONE" \
  --format="value(config.spot)"

Report template

## Autoscaling Cost Projection

**Service:** api-service
**Instance type:** m5.large (2 vCPU, 8GB RAM)
**Region:** us-east-1

### Strategy Comparison (monthly)
| Strategy | Monthly Cost | Savings vs Fixed | Risk |
|----------|-------------|-----------------|------|
| Fixed at peak (20 pods) | $1,576.80 | baseline | None (over-provisioned) |
| Fixed at average (11 pods) | $867.24 | -45.0% | SLO violations at peak |
| Autoscaled (3-20 pods) | $623.88 | -60.4% | 1-2 min lag on spikes |
| Scale-to-zero + autoscale | $412.32 | -73.8% | Cold start latency |

### Recommended: Autoscaled (3-20 pods)
- Estimated savings: $952.92/month ($11,435/year) vs fixed-at-peak
- SLO risk: Minimal (simulation shows 0.3% violation rate)
- Cold start: N/A (min 3 pods always warm)

### Spot instance opportunity
- Current on-demand cost per pod: $0.054/hr
- Current spot price: $0.018/hr (67% discount)
- If 50% of scale-out pods use spot: additional $156/month savings
- Recommendation: Use spot for pods above minReplicas, on-demand for baseline

Version tags

latestvk9727pgjhsn3asvjm00jfrs3nd85r8jt