{"skill":{"slug":"k8s-cost-optimizer","displayName":"K8s Cost Optimizer","summary":"Find and rank Kubernetes cost-saving opportunities from kubectl, metrics-server, kube-state-metrics, and cloud billing. Identifies overprovisioned CPU/memory...","description":"---\nname: k8s-cost-optimizer\ndescription: Find and rank Kubernetes cost-saving opportunities from kubectl, metrics-server, kube-state-metrics, and cloud billing. Identifies overprovisioned CPU/memory requests and limits, idle namespaces and workloads, oversized PersistentVolumes, unused LoadBalancer services, expensive node types, missing HorizontalPodAutoscalers, and clusters that haven't adopted spot/preemptible/Graviton nodes. Outputs a ranked list of recommendations with $/month savings estimates and ready-to-apply YAML patches. Covers EKS, GKE, and AKS specifics including instance pricing, savings plans, committed-use discounts, and reservation strategies. Use when asked to cut a Kubernetes cloud bill, right-size workloads, plan a spot migration, build a FinOps report, or tune HPA settings. Triggers on \"kubernetes cost\", \"k8s cost\", \"eks cost\", \"gke cost\", \"aks cost\", \"right-size\", \"rightsize\", \"kubecost\", \"opencost\", \"vpa\", \"hpa\", \"spot instances\", \"preemptible\", \"savings plan\", \"node pool\", \"pod requests\", \"finops\".\nmetadata:\n  tags: [\"kubernetes\", \"k8s\", \"eks\", \"gke\", \"aks\", \"finops\", \"cost-optimization\", \"rightsizing\", \"autoscaling\", \"spot-instances\", \"kubecost\", \"opencost\", \"platform-engineering\", \"devops\"]\n---\n\n# Kubernetes Cost Optimizer\n\nAudit a Kubernetes cluster (or fleet) and produce a ranked list of cost-saving actions with concrete dollar estimates. Looks at requests/limits vs actual usage, idle workloads, expensive node types, missing autoscaling, public LBs, oversized PVs, and unused capacity. Acts as a senior FinOps engineer who has cut six- and seven-figure cloud bills without breaking workloads.\n\n## Usage\n\nInvoke this skill when a Kubernetes bill is too high, when a quarterly FinOps review is due, or when leadership has asked for \"30% off the cloud.\"\n\n**Basic invocation:**\n> Audit my EKS cluster for cost savings\n> Cut my GKE bill — here's kubectl top + node list\n> What's the highest-ROI optimization I can ship this week?\n\n**With context:**\n> Here's metrics-server data for 30 days, the node list, and the AWS bill\n> I have 14 namespaces — which ones are idle?\n> We're 100% on-demand m5 nodes — what's the spot migration plan?\n\nThe agent produces a ranked recommendation list (highest $/month savings first), per-recommendation YAML patches or commands, and a four-week implementation plan that respects production safety.\n\n## How It Works\n\n### Step 1: Data Collection\n\nCost optimization without data is guesswork. The agent collects from four sources and joins them:\n\n| Source | What It Provides | How To Pull |\n|--------|------------------|-------------|\n| **kubectl + metrics-server** | Real CPU/memory usage per pod, per node | `kubectl top pods -A`, `kubectl top nodes` |\n| **kube-state-metrics / Prometheus** | Requests, limits, replicas, deployment-level history | PromQL: `kube_pod_container_resource_requests`, 30-day window |\n| **Cloud billing** | $/node-hour, instance type, region, sustained-use | AWS Cost Explorer, GCP billing export, Azure Cost Management |\n| **Cluster object inventory** | Namespaces, services, PVCs, ingress, jobs, cronjobs | `kubectl get all,pvc,svc -A -o json` |\n\nData **window** matters. The agent prefers 30 days; 7 days for fast-moving clusters; 90 days for capacity planning. Anything under 7 days is too short — diurnal and weekly patterns dominate the noise.\n\nIf Kubecost or OpenCost is installed, the agent uses the cluster's per-namespace cost allocation directly. Otherwise it computes allocations from node price × pod-share-of-node.\n\n### Step 2: The Cost Recommendation Catalog\n\nThe agent runs the cluster against a fixed set of recommendation **types**, each with a detection rule and a savings formula.\n\n**C1. Overprovisioned CPU requests**\n\n```\nDetection:\n  for each container,\n    p99(cpu_usage over 30d) < 0.50 * cpu_request\n    AND container has >7 days of data\n    AND deployment is not a known-bursty type (cron, batch, init)\n\nSavings estimate:\n  ($/cpu-hour for the node pool) × (request - p99usage) × 24 × 30 × replicas\n\nAction:\n  patch container.resources.requests.cpu down to ceil(p95 × 1.3)\n```\n\n**C2. Overprovisioned memory requests**\n\n```\nDetection:\n  p99(memory_working_set over 30d) < 0.50 * memory_request\n\nSavings:\n  ($/GiB-hour for the node pool) × (request - p99usage) × 24 × 30 × replicas\n\nAction:\n  patch container.resources.requests.memory down to ceil(p99 × 1.25)\n  NOTE: never set requests below working-set-p99 — OOMKills kill the savings\n```\n\n**C3. Limits == requests (no burst)**\n\n```\nDetection:\n  cpu_limit == cpu_request for stateless workloads\n  (typical anti-pattern: \"treat limits as guaranteed quota\")\n\nSavings:\n  None directly — but C1 dominates after limits are unblocked\n\nAction:\n  raise limits or remove (for cpu); keep limits for memory\n```\n\n**C4. Idle namespace**\n\n```\nDetection:\n  sum(p95 cpu over 30d) across all pods in ns < 0.05 cores\n  AND sum(p95 memory) < 200 MiB\n  AND no recent kubectl apply (last_modified > 30 days)\n\nSavings:\n  All allocated capacity (request × node $)\n\nAction:\n  warn → tag → archive (Helm release deleted, namespace archived)\n```\n\n**C5. Idle deployment / statefulset**\n\n```\nDetection:\n  replicas > 0 AND p99(cpu) < 0.02 cores AND request_count == 0 over 30d\n  (request_count from ingress-controller or service mesh)\n\nSavings:\n  replicas × pod_cost / month\n\nAction:\n  scale to zero (KEDA cron, or just `kubectl scale --replicas=0`)\n```\n\n**C6. Oversized PersistentVolume**\n\n```\nDetection:\n  for each PVC, kubelet_volume_stats_used / capacity < 0.3\n  AND age > 30 days\n\nSavings:\n  ($/GB-month for storage class) × (capacity - used × 1.5)\n\nAction:\n  - On EKS gp3: shrink not supported. Migrate via snapshot → smaller PV.\n  - On GKE pd-balanced: same — snapshot migration.\n  - On AKS managed-disks: same. Plan downtime.\n```\n\n**C7. Unused LoadBalancer service**\n\n```\nDetection:\n  Service type=LoadBalancer\n  AND no NetworkPolicy hits\n  AND no ingress traffic in 30d (cloud LB metrics)\n\nSavings:\n  AWS NLB:  ~$22/mo + $0.006/LCU-hr → $25-50/mo typical\n  GCP LB:   ~$18/mo per forwarding rule\n  Azure LB: ~$25/mo standard tier\n\nAction:\n  delete service or convert to ClusterIP behind a shared ingress\n```\n\n**C8. Expensive node type**\n\n```\nDetection:\n  Node pool uses x86 on a workload that's arch-independent\n  AND no GPU/specialized requirement\n  AND newer-gen / Graviton / Tau alternative is cheaper per CPU-hour\n\nSavings:\n  AWS: m5 → m7g (Graviton)  ~20% cheaper, similar perf\n  GCP: n2 → t2d (Tau AMD)   ~28% cheaper, comparable perf\n  Azure: Dsv3 → Dpdsv5 (Arm) ~20% cheaper\n\nAction:\n  add Arm/AMD node pool, taint, set tolerations on workloads,\n  recompile multi-arch images (most public images are already multi-arch)\n```\n\n**C9. Missing HorizontalPodAutoscaler**\n\n```\nDetection:\n  Deployment with stable replica count > 3\n  AND p95/p50 cpu ratio > 2.5x (variance)\n  AND no HPA / KEDA / Karpenter scaler attached\n\nSavings:\n  (max_replicas - avg_replicas) × pod_cost\n  typical: 30-60% of deployment's compute\n\nAction:\n  emit HPA YAML targeted at p50 of recent CPU\n```\n\n**C10. No spot / preemptible / Spot VM adoption**\n\n```\nDetection:\n  Cluster is 100% on-demand\n  AND has stateless workloads (Deployments without local volume requirements)\n  AND tolerates interruption (replicas > 1, restart-safe)\n\nSavings:\n  AWS Spot:        60-90% off on-demand\n  GCP Preemptible: 60-91% off (24h max lifetime)\n  GCP Spot VMs:    60-91% off (no time limit, lower preemption rate)\n  Azure Spot:      60-90% off (eviction subject to capacity)\n\nAction:\n  - Tag stateless workloads with affinity for spot pool\n  - Add a managed-on-demand fallback pool sized to baseline\n  - Use Karpenter (AWS) / cluster-autoscaler with multiple ASGs / Azure Spot Priority Mix\n```\n\n**C11. Missing pod disruption budgets / wrong topology spread**\n\n```\nDetection:\n  Workload running on spot but no PDB\n  OR PDB minAvailable >= replicas (always blocks eviction)\n\nSavings:\n  Indirect — wrong PDB blocks spot benefits\n\nAction:\n  set PDB minAvailable = replicas - 1 (or maxUnavailable = 25%)\n  add topologySpreadConstraints across zones\n```\n\n**C12. Stale CronJobs and Jobs**\n\n```\nDetection:\n  Job/CronJob age > 90 days, never succeeded recently\n  OR successfulJobsHistoryLimit unset (default 3 retains 3, no cleanup)\n\nSavings:\n  Small per-cluster but accumulates: PVC retention, image-pull, scheduling churn\n\nAction:\n  set ttlSecondsAfterFinished, prune old job objects, remove dead cronjobs\n```\n\n**C13. Image pull cost (GCR / ECR / ACR cross-region)**\n\n```\nDetection:\n  Cluster in region X pulls from registry in region Y\n  → cross-region egress charges\n\nSavings:\n  Egress cost (typically $0.02-0.09/GB) × image-size × pod-restarts\n\nAction:\n  replicate registry into cluster region\n  enable image-pull-policy: IfNotPresent\n  pre-pull common images via DaemonSet\n```\n\n**C14. Reserved capacity / Savings Plans / CUDs not purchased**\n\n```\nDetection:\n  Stable baseline > 70% of cluster running 24/7 for 90+ days\n  AND no savings plans or CUDs in cloud account\n\nSavings:\n  AWS Compute Savings Plan: 30-66% off depending on commitment\n  GCP CUD: 20-57% off (1-year and 3-year)\n  Azure Reservations: 30-72% off\n\nAction:\n  size commitment to baseline (not peak)\n  use Savings Plans (flexible) over RI (rigid) on AWS\n```\n\n**C15. Logs / metrics / traces ingestion cost**\n\n```\nDetection:\n  Cluster sends 100% of logs to managed observability (Datadog, NewRelic, Splunk)\n  AND log volume > 1 TB/month\n  AND no sampling on chatty namespaces (kube-system, ingress-controller)\n\nSavings:\n  Often the largest single line item — Datadog logs at $0.10/GB ingested adds\n  up fast; one chatty pod can cost $5,000/mo\n\nAction:\n  - Drop INFO logs from kube-system, controller-manager\n  - Sample debug logs at 1%\n  - Route to S3 + Athena for cold logs\n```\n\n### Step 3: Ranking The Recommendations\n\nRecommendations are ordered by **expected savings × implementation safety / implementation effort**. The agent renders this as a table:\n\n```\nRank  Type  Description                              $/mo saved  Effort  Risk\n1     C1    Right-size payments-api requests         $4,200      2 hrs   Low\n2     C10   Migrate stateless to spot (Karpenter)    $3,800      2 days  Med\n3     C8    Move backend pool to Graviton            $2,100      1 day   Low\n4     C15   Drop kube-system DEBUG logs in Datadog   $1,900      1 hr    Low\n5     C7    Delete 4 unused NLBs                     $190        30 min  Low\n6     C9    HPA on the api-gateway deployment        $850        2 hrs   Low\n7     C6    Shrink 3 oversized PVs                   $310        4 hrs   Med\n... etc\n```\n\n**Effort** is hours-of-engineering. **Risk** is Low/Med/High based on user-facing blast radius.\n\n### Step 4: Right-Sizing Methodology\n\nC1 and C2 dominate most clusters' savings. The agent applies a careful methodology:\n\n```\n1. WINDOW\n   30 days minimum. Cover at least one full release cycle.\n   Exclude windows containing incidents (skewed up) or maintenance\n   windows (skewed down).\n\n2. PERCENTILE\n   Memory: p99 working_set × 1.25 = new request\n   CPU:    p95 × 1.3            = new request\n   Why different: memory OOMKills are catastrophic, CPU throttling is\n   recoverable.\n\n3. FLOOR\n   Never below 50m CPU / 64Mi memory — startup probes need headroom.\n   For Java/JVM: use the JVM's heap-max as memory floor.\n   For Go: 2x the runtime's resident set.\n\n4. LIMITS\n   CPU limits: REMOVE for most workloads (CFS throttling causes more\n              latency than it saves money).\n   Memory limits: KEEP at request × 1.5 — OOM is a controlled failure.\n\n5. STAGED ROLLOUT\n   Apply to a canary deployment (1 of N replicas) for 48h.\n   Monitor: error rate, latency p99, restart count, OOMKill count.\n   If clean, roll out to remaining replicas.\n   If regression, revert; widen the window or raise the percentile.\n\n6. VPA RECOMMENDATION MODE\n   For unfamiliar workloads, run VPA in `recommendOnly` mode for 7+\n   days. Use its target as a sanity-check on the agent's calculation.\n```\n\n### Step 5: Cloud-Specific Specifics\n\n**EKS (AWS):**\n\n- **Karpenter** beats cluster-autoscaler on EKS for cost: provisions exactly the right node, mixes instance types, adopts spot natively. Migrating to Karpenter is often a 15-25% saving on its own.\n- **Graviton (m7g, c7g, r7g)**: ~20% cheaper than equivalent x86. Multi-arch images are required; check `docker manifest inspect`.\n- **Spot via Karpenter**: set `karpenter.sh/capacity-type: spot` in the NodePool; add an on-demand fallback pool.\n- **EKS control-plane** is $0.10/hr per cluster — cluster consolidation is sometimes the right move (one prod, one staging, one dev rather than per-team clusters).\n- **AWS Compute Savings Plan**: covers EC2, Fargate, Lambda; flexible across instance families. Buy at baseline, not peak.\n- **EBS gp3** is cheaper than gp2 at equal IOPS — easy migration via volume modify.\n\n**GKE:**\n\n- **Autopilot** mode: no node management, charged per pod-second. Often cheaper for low-utilization clusters; more expensive for dense clusters.\n- **Spot VMs** preferred over old Preemptible (no 24h cap, lower preemption rate).\n- **CUDs** apply to vCPU and memory committed for 1y or 3y; flexible across instance families with the new \"flex CUD.\"\n- **T2D / Tau**: AMD-based, ~28% cheaper for general-purpose workloads.\n- **GKE control-plane** is free for one zonal cluster per project; regional clusters cost extra.\n- **Cluster Autoscaler** scales node pools but cannot create new pools — pre-create spot/Graviton pools with appropriate taints.\n\n**AKS:**\n\n- **AKS control-plane** is free (Azure pays); only nodes are billed.\n- **Spot node pools** require explicit provisioning; eviction policy `Deallocate` keeps disks but releases compute.\n- **Reserved Instances** apply at the VM-family level (Dsv5, Esv5); commit to baseline.\n- **Arm-based Dpsv5/Dpdsv5**: ~20% cheaper; multi-arch images required.\n- **Cluster Autoscaler** is the only scaler (no Karpenter equivalent); pool consolidation is more manual.\n\n### Step 6: Safety Constraints\n\nThe agent never proposes changes that risk data loss or sustained outage. Specifically:\n\n| Action | Hard Constraint |\n|--------|----------------|\n| Right-size memory | Never set request below historic working-set p99 |\n| Migrate to spot | Never for: databases, message brokers, stateful sets without proper PDB, single-replica deployments |\n| Shrink PV | Never in-place; always snapshot → smaller PV with cutover window |\n| Delete idle namespace | Only after 14-day notice + tag + archive (Helm uninstall preserves manifests) |\n| Migrate to Arm | Only after multi-arch image verified with `docker manifest inspect` |\n| Remove HPA | Never; only adjust |\n| Reduce replicas below PDB minAvailable | Never; adjust PDB first or keep replicas |\n\n### Step 7: Implementation Plan (4 Weeks)\n\n```\nWEEK 1 — QUICK WINS (LOW RISK, HIGH ROI)\nMon  Run audit: collect kubectl + Prom + billing data\nTue  Identify C1/C2 candidates with > $500/mo savings each\nWed  Apply right-size patches to canary replicas\nThu  Monitor canary 48h\nFri  Roll out to full deployments; baseline new bill\n\nWEEK 2 — IDLE & UNUSED\nMon  Identify idle namespaces (C4) and idle deployments (C5)\nTue  Tag with `cost-review: pending-deletion`; notify owners\nWed  Scale-to-zero on confirmed idle workloads\nThu  Delete unused LBs (C7); delete oversized PVs (C6 staged)\nFri  Tally savings; update FinOps dashboard\n\nWEEK 3 — STRUCTURAL (NODE STRATEGY)\nMon  Add Karpenter (EKS) / Spot pool (GKE/AKS)\nTue  Add taints and tolerations on stateless workloads\nWed  Drain a small percentage to spot; observe interruption rate\nThu  Add Graviton/Arm pool; pin one workload as canary\nFri  Roll out to remaining stateless workloads\n\nWEEK 4 — COMMITMENTS & OBSERVABILITY\nMon  Compute baseline post-optimization\nTue  Purchase Savings Plan / CUDs at 70% of baseline\nWed  Tune logs/metrics ingestion (C15)\nThu  Add cost dashboards: per-namespace, per-team showback\nFri  Postmortem + savings report; schedule next quarter audit\n```\n\n### Step 8: FinOps Reporting\n\nThe agent produces a weekly / monthly report:\n\n```markdown\n## Cluster: prod-us-east-1\n### Savings achieved this month: $14,820 (-31%)\n\n| Category | $ saved | % of total |\n|----------|--------|-----------|\n| Right-size requests (C1+C2) | $6,420 | 43% |\n| Spot migration (C10)        | $4,100 | 28% |\n| Graviton (C8)               | $2,100 | 14% |\n| Logs ingestion (C15)        | $1,900 | 13% |\n| LB cleanup (C7)             |   $300 |  2% |\n\n### Remaining opportunities: $9,400/mo\n- HPA rollout pending on api-gateway: ~$850/mo\n- Savings Plan not yet purchased: ~$8,000/mo (commit 1-year)\n- Oversized PVs not yet migrated: ~$310/mo\n\n### Risks accepted\n- Spot interruption rate: 3.2%/day (within budget of 5%)\n- Right-sized memory on cart-svc near p99 — monitoring OOMKills\n```\n\n## Worked Examples\n\n### Example 1: Right-Size A Java API\n\n**Input:**\n\n```\nDeployment: payments-api\nReplicas: 12\nContainer: payments\n  resources.requests.cpu: 2\n  resources.requests.memory: 4Gi\n  resources.limits.cpu: 2\n  resources.limits.memory: 4Gi\nMetrics (30d):\n  cpu_usage    p50: 0.18  p95: 0.31  p99: 0.52\n  memory_used  p50: 1.8Gi p95: 2.1Gi p99: 2.4Gi\nNode pool: m5.2xlarge on-demand, ~$0.384/hr/instance\n```\n\n**Recommendation:**\n\n```yaml\nresources:\n  requests:\n    cpu: 500m       # was 2;   p95 × 1.3 ≈ 0.4, floor 500m\n    memory: 3Gi     # was 4Gi; p99 × 1.25 = 3Gi\n  limits:\n    memory: 4Gi     # keep memory limit at 1.5x request (3Gi → 4Gi)\n    # cpu limit removed\n```\n\n**Savings:** (2 - 0.5 cpu) × $0.0384/cpu-hr × 24 × 30 × 12 replicas = ~$497/mo\nPlus memory: (4 - 3 GiB) × ~$0.0048/GiB-hr × 24 × 30 × 12 = ~$41/mo\n**Total: ~$540/mo just on this deployment.**\n\n### Example 2: Spot Migration For A Stateless Fleet\n\n**Input:** EKS cluster, 60 nodes (m5.xlarge on-demand), 80% of workloads are stateless web/api.\n\n**Plan:**\n\n1. Install Karpenter, add a NodePool with `karpenter.sh/capacity-type: spot`, instance types `m5,m5a,m6i,m6a,m7g.xlarge` (mixed for availability).\n2. Keep an on-demand pool sized to the **stateful baseline** (databases, message queues, single-replica services).\n3. Apply node affinity on stateless deployments: prefer spot, fallback on-demand.\n4. Set PDBs `maxUnavailable: 25%` on every spot-eligible deployment.\n5. Roll out workloads to spot in three waves (frontend → mid-tier → workers).\n6. Monitor `karpenter_nodes_terminated{reason=\"interrupted\"}` — interruption rate target < 5%/day.\n\n**Savings:** ~70% of compute on the stateless fleet × $0.192/hr m5.xlarge × 50 nodes × 24 × 30 = **~$4,840/mo savings**.\n\n### Example 3: Idle Namespace Cleanup\n\n**Audit output:**\n\n```\nNamespace        cpu_p95  mem_p95   age   last_apply  recommendation\n---------------  -------  --------  ----  ----------  --------------\ndemo-2024-q3     0.01     45Mi      9mo   8mo ago     archive\nsandbox-alice    0.00     12Mi      6mo   5mo ago     archive\nhackathon        0.02     180Mi     14mo  13mo ago    archive\n```\n\n**Action:** tag, notify owners, archive after 14 days. Combined savings: full reclaim of ~6 nodes worth of allocation = ~$830/mo.\n\n## Output\n\nThe agent produces:\n\n- **Ranked recommendation list** — every C-type with $/month estimate, effort, risk\n- **Per-recommendation YAML patch** — kubectl-apply-ready or Helm-values diff\n- **Right-sizing report** — per-deployment requests/limits before/after\n- **Node strategy plan** — Karpenter / spot / Graviton migration steps\n- **Idle inventory** — namespaces, deployments, LBs, PVs flagged for cleanup\n- **FinOps dashboard spec** — Grafana / Datadog panel definitions for ongoing tracking\n- **4-week implementation plan** — daily checklist with safety gates\n- **Cloud-specific notes** — EKS/GKE/AKS particulars relevant to this cluster\n- **Pre-purchase commit plan** — Savings Plans / CUDs / Reservations sized to post-optimization baseline\n\n## Common Scenarios\n\n### \"Our EKS bill jumped 40% this quarter\"\nThe agent diffs current cluster against a baseline 90 days ago: new namespaces, new deployments, replica creep, node pool growth, log ingestion delta. Most jumps are one-off: a new team, a stuck rollout, a logging misconfig. Identify and reverse, then run the standard audit.\n\n### \"We can't move to spot because we have stateful workloads\"\nThe agent splits the cluster: stateful pool stays on-demand (with reserved instances), stateless moves to spot. The mistake to avoid is one node pool serving both; taints and tolerations are mandatory.\n\n### \"Our requests look right but the bill is still high\"\nProbably nodes, not pods. Look at node utilization (`kubectl top nodes`); if average node utilization is below 50%, the cluster is over-provisioned at the node layer (binpacking failure or large pod requests blocking dense placement). Karpenter / cluster-autoscaler tuning beats per-pod right-sizing here.\n\n### \"Should I use VPA in auto mode?\"\nNo, in most production clusters. VPA in auto mode restarts pods when it adjusts requests — that's a disruption budget hit on every change. Use `recommendationOnly` and apply manually, or use VPA with a long `targetRef` change budget.\n\n### \"Kubecost says one thing, AWS Cost Explorer says another\"\nKubecost allocates *cluster cost to namespaces*; AWS bills *resources*. The deltas: cluster control-plane cost (in AWS but not Kubecost by default), cross-region egress, registry transfer, NAT gateway. Reconcile both before claiming savings.\n\n## Tips For Best Results\n\n- Provide at least 30 days of metrics — anything shorter misses the weekend pattern and over-rightsizes\n- Share both `kubectl get all -A -o json` and a Prometheus snapshot — neither alone is enough\n- State the cloud provider (EKS / GKE / AKS) and region — pricing varies materially\n- Identify any workloads under SLAs or compliance constraints — those get conservative sizing margins\n- Include the previous quarter's bill — flat-rate analysis misses growth-driven savings\n- Mention any committed-spend agreements (EDP, GCP commitment, Azure MCA) — recommendations adjust to maximize commit utilization\n- If running Kubecost or OpenCost, include its allocation export — the agent uses it directly instead of recomputing\n\n## When NOT To Use\n\n- **Brand-new cluster (< 30 days old)** — not enough data; metrics will mislead. Wait until the cluster has had a full month including a release cycle.\n- **Single-tenant cluster running one workload at near 100% utilization** — already optimized; further savings are at the cloud-commit layer, not Kubernetes.\n- **Non-Kubernetes workloads (ECS, Cloud Run, App Service)** — different scaling primitives and cost models; use a serverless cost optimizer instead.\n- **Cluster with strict regulatory pinning** (FedRAMP, PCI-DSS regions, sovereign cloud) — instance type and region freedom is constrained; many recommendations don't apply.\n- **Pre-production performance test environments** — these are intentionally over-provisioned; right-sizing them invalidates load-test results.\n- **Clusters where the cost is dwarfed by other line items** (e.g. $2k/mo K8s vs $50k/mo data warehouse) — optimize the bigger line first.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":371,"installsAllTime":14,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1777851329295,"updatedAt":1778492842520},"latestVersion":{"version":"1.0.0","createdAt":1777851329295,"changelog":"Kubernetes Cost Optimizer 1.0.0 – initial release\n\n- Audits Kubernetes clusters for cost-saving opportunities across EKS, GKE, and AKS.\n- Detects and ranks savings from overprovisioned CPU/memory, idle namespaces and workloads, oversized PersistentVolumes, and unused LoadBalancer services.\n- Analyzes node types, spot/preemptible/Graviton adoption, and missing HorizontalPodAutoscalers.\n- Integrates signals from kubectl, metrics-server, kube-state-metrics, and cloud billing data.\n- Outputs a prioritized recommendation list with estimated monthly savings and ready-to-apply YAML patches.","license":"MIT-0"},"metadata":null,"owner":{"handle":"charlie-morrison","userId":"s17cttbdxry5kkyafjw983mq8s83p4y3","displayName":"charlie-morrison","image":"https://avatars.githubusercontent.com/u/271589886?v=4"},"moderation":null}