Huawei Cloud Cce Cost Optimization Advisor

Other

Huawei Cloud CCE cost optimization analysis skill. Identifies idle resources, oversized CPU/memory requests, low-utilization nodes, 24h/7d utilization trends, HPA recommendations, and node autoscaler policy optimization. Read-only analysis and configuration suggestions only — does not modify HPA, autoscaler, node pools, or workloads without explicit user confirmation. Trigger: user mentions "cost optimization", "成本优化", "cost advisor", "成本顾问", "resource waste", "资源浪费", "cost reduction", "成本降低", "billing analysis", "账单分析", "over-provisioned", "超配", "CCE cost", "idle nodes", "oversized request", "HPA recommendation", "autoscaler policy"

Install

openclaw skills install huawei-cloud-cce-cost-optimization-advisor

Huawei Cloud CCE Cost Optimization Advisor

Overview

Analyze CCE (Cloud Container Engine) cluster cost optimization opportunities. This skill performs read-only analysis and generates configuration suggestions — it does not directly modify HPA, autoscaler, node pools, or workload requests. All configuration changes require explicit user confirmation.

Analysis scope:

  • 24-hour and 7-day node CPU/memory utilization trends
  • Low-utilization node detection (below cluster average or below 30%)
  • Oversized resource request detection (business workloads only)
  • HPA and node autoscaler status review and recommendations
  • Cost optimization report with execution plan

Architecture: Python SDK v3 → CCE API + AOM PromQL → Inventory + Metrics → Cost Analysis → Report

Security Constraints

Dangerous Operation Confirmation Mechanism

This skill enforces a strict read-only-by-default policy. All write operations require confirm=true.

Operations Requiring Confirmation

ToolOperation TypeRisk LevelDescription
huawei_configure_cce_hpaCreate/Update HPA🟠 HighCreates or replaces a HorizontalPodAutoscaler
Node pool resize/scale-downScale🟠 HighReduces node pool capacity

Write operations without confirm=true return a preview only. The huawei_configure_cce_hpa tool returns a manifest preview and risk warning when called without confirm=true. Only after explicit user approval can it be called with confirm=true to apply the configuration.

Workflow

Step 1: Preview Operation — Call without confirm=true

python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \
  region=cn-north-4 \
  cluster_id=xxx \
  workload_name=my-deploy \
  namespace=default \
  min_replicas=1 \
  max_replicas=3 \
  target_cpu_utilization=60

Returns: HPA manifest preview, risk warning, confirmation hint

Step 2: Confirm Execution — Call with confirm=true after user approval

python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \
  region=cn-north-4 \
  cluster_id=xxx \
  workload_name=my-deploy \
  namespace=default \
  min_replicas=1 \
  max_replicas=3 \
  target_cpu_utilization=60 \
  confirm=true

Prohibited Actions

  • No automatic node pool scale-down — never delete nodes or shrink node pools automatically
  • No workload request modification — never change CPU/memory requests directly
  • No automatic HPA installation/update — never apply HPA without explicit user confirmation
  • No autoscaler enable/disable — never toggle autoscaler without user approval

Allowed Actions

  • Read-only queries: nodes, node pools, pods, deployments, metrics, AOM PromQL
  • Generate HPA YAML manifests, autoscaler parameter suggestions, and execution plans
  • huawei_configure_cce_hpa without confirm=true returns preview only

Credential Security

  1. No persistent credential storage — AK/SK exists only during API calls
  2. No credential leakage — never includes AK/SK in logs, responses, or errors
  3. Environment variable preferredHW_ACCESS_KEY / HW_SECRET_KEY / HW_REGION_NAME

Prerequisites

Python Environment

  • Python 3.8+
  • Install SDKs: pip install huaweicloudsdkcce huaweicloudsdkcore huaweicloudsdkces
  • Optional for HPA operations: pip install kubernetes
  • Optional for dashboard charts: pip install matplotlib numpy

Environment Variables (Recommended)

export HW_ACCESS_KEY="your-access-key-id"
export HW_SECRET_KEY="your-secret-access-key"
export HW_REGION_NAME="cn-north-4"

IAM Permission Policies

Ensure the IAM user has the minimum required permissions:

PermissionDescription
cce:cluster:listList clusters
cce:cluster:getGet cluster details
cce:node:listList nodes
cce:node:getGet node details
cce:nodepool:listList node pools
cce:nodepool:getGet node pool details
aom:*:getRead AOM metrics and PromQL data

Core Commands

Recommended: Combined Analysis

ToolFunctionParameters
huawei_analyze_cce_cost_optimizationOne-shot cost optimization analysis — inventory, 24h/7d node utilization, pod usage/request, HPA/autoscaler status, and report outputregion, cluster_id, exclude_namespaces, business_namespaces, short_hours, long_hours, top_n, output_dir

Prefer huawei_analyze_cce_cost_optimization for comprehensive analysis. Only use individual tools below for supplementing details, reviewing specific metrics, or manually generating HPA YAML.

Resource Inventory

ToolFunctionParameters
huawei_list_cce_clustersList all CCE clusters in regionregion
huawei_list_cce_nodesList cluster nodesregion, cluster_id
huawei_get_kubernetes_nodesGet Kubernetes node details (including allocatable resources)region, cluster_id
huawei_list_cce_nodepoolsList node pools with autoscaling inforegion, cluster_id
huawei_get_cce_podsGet pod list with labels, status, requestsregion, cluster_id
huawei_get_cce_deploymentsGet deployment listregion, cluster_id
huawei_list_cce_hpasList HPA configurations (excludes kube-system by default)region, cluster_id

Metrics Analysis

ToolFunctionParameters
huawei_get_cce_node_metrics_topNNode CPU/memory/disk utilization Top Nregion, cluster_id, top_n, hours
huawei_get_cce_node_metricsSingle node utilization time seriesregion, cluster_id, node_ip, hours
huawei_get_cce_pod_metrics_topNPod CPU/memory utilization Top N (supports custom PromQL)region, cluster_id, top_n, hours, cpu_query, memory_query
huawei_get_cce_pod_metricsSingle pod utilization time seriesregion, cluster_id, pod_name, namespace, hours
huawei_get_aom_metricsGeneric AOM PromQL queryregion, aom_instance_id, query, hours

Elasticity Policy

ToolFunctionRisk LevelRequires Confirmation
huawei_generate_cce_hpa_manifestGenerate autoscaling/v2 HPA YAML (no cluster modification)🟢 LowNo
huawei_configure_cce_hpaCreate or replace HPA in cluster🟠 HighYes (confirm=true)

HPA configuration workflow:

  1. Use huawei_generate_cce_hpa_manifest or huawei_configure_cce_hpa without confirm=true to generate a preview
  2. Review the manifest with the user
  3. Only after explicit user approval, call huawei_configure_cce_hpa with confirm=true

HPA recommendations must be based on request sizing. If requests are clearly oversized, first recommend calibrating requests, then configure HPA.

Dashboard

ToolFunctionParameters
huawei_generate_monitor_dashboardGenerate monitoring dashboard chart imagesregion, cluster_id, metrics_type, hours

Parameter Reference

Common Parameters

All tools accept these common parameters for authentication and region:

ParameterTypeRequiredDefaultDescription
regionstringYesHuawei Cloud region code (e.g., cn-north-4)
cluster_idstringYes*CCE cluster ID; not required for huawei_list_cce_clusters
akstringNoenv HW_ACCESS_KEYAccess Key ID; environment variable preferred
skstringNoenv HW_SECRET_KEYSecret Access Key; environment variable preferred
project_idstringNoautoIAM project ID; auto-resolved from region if omitted

* cluster_id is not required for huawei_list_cce_clusters (lists all clusters in region).

Combined Analysis Parameters (huawei_analyze_cce_cost_optimization)

ParameterTypeRequiredDefaultDescription
regionstringYesHuawei Cloud region code
cluster_idstringYesCCE cluster ID
short_hoursintNo24Short-window metrics duration in hours
long_hoursintNo168 (7d)Long-window metrics duration in hours
top_nintNo50Top N pods/nodes for oversized-request and utilization ranking
exclude_namespacesstringNokube-systemComma-separated namespaces to exclude from analysis
business_namespacesstringNoComma-separated namespaces to treat as business workloads; if omitted, all non-excluded namespaces are analyzed
output_dirstringNoDirectory to write summary JSON and report markdown
include_rawboolNofalseInclude raw metrics data in output

HPA Parameters (huawei_generate_cce_hpa_manifest / huawei_configure_cce_hpa)

ParameterTypeRequiredDefaultDescription
workload_namestringYesTarget Deployment/StatefulSet name
namespacestringYesNamespace of the target workload
min_replicasintYesMinimum replica count for HPA
max_replicasintYesMaximum replica count for HPA
workload_typestringNodeploymentWorkload kind: deployment or statefulset
hpa_namestringNoautoHPA object name; defaults to <workload_name>-hpa
target_cpu_utilizationintNo60Target average CPU utilization percentage
target_memory_utilizationintNoTarget average memory utilization percentage; omit to skip memory metric
behaviorobjectNoHPA behavior policy (scaling rates, stabilization windows)
confirmboolNofalsehuawei_configure_cce_hpa only: must be true to apply changes

Metrics Parameters (huawei_get_cce_node_metrics_topN / huawei_get_cce_pod_metrics_topN)

ParameterTypeRequiredDefaultDescription
top_nintNo10Number of top nodes/pods to return
hoursintNo1Metrics query time range in hours
cpu_querystringNoautoCustom PromQL for CPU; defaults to built-in query
memory_querystringNoautoCustom PromQL for memory; defaults to built-in query
node_ipstringYes*Required for huawei_get_cce_node_metrics (single node)
pod_namestringYes*Required for huawei_get_cce_pod_metrics (single pod)
namespacestringYes*Required for huawei_get_cce_pod_metrics (single pod)

* Only required for single-entity metrics tools.

Dashboard Parameters (huawei_generate_monitor_dashboard)

ParameterTypeRequiredDefaultDescription
hoursintNo1Monitoring data time range in hours
top_nintNo10Top N pods for dashboard ranking
namespacestringNoFilter by namespace
label_selectorstringNoFilter by label (e.g., app=nginx)
output_filestringNoautoOutput HTML file path
titlestringNoautoDashboard title

Analysis Workflow

See references/workflow.md for detailed analysis steps, thresholds, and decision logic.

Quick Summary

  1. Scope: Confirm region, cluster_id, namespace range, and exclusion rules (default: exclude kube-system)
  2. Node utilization: Analyze 24h and 7d windows for CPU/memory usage per node and cluster average
  3. Low-utilization detection: Flag nodes below cluster average by 20 percentage points or below 60% of cluster average; cluster average below 30% signals overall over-provisioning
  4. Oversized requests: Compare business workload request vs actual p95 usage; mark as high (p95 < 33% of request), optimize (p95 < 50%), or observe (short-window only)
  5. Elasticity review: Check node pool autoscaling and HPA status; generate recommendations
  6. Output: Summary, utilization tables, oversized request list, HPA/autoscaler recommendations, risks, and verification steps

Risk Rules

See references/risk-rules.md for complete safety boundaries.

Key constraints:

  • Auto-execution limited to R1 read-only queries only
  • No automatic scale-down, request modification, or HPA/autoscaler changes
  • Must reference both 24h and 7d windows before recommending scale-down
  • Cost optimization suggestions must include rollback strategy and verification metrics
  • Data gaps (missing metrics, missing requests, invisible HPA) must be flagged in the report

Output Schema

See references/output-schema.md for the complete JSON report structure.

All tools return JSON with:

  • status / success: operation result
  • data: analysis results, metrics, or configuration preview
  • message: human-readable description
  • warning: risk warning for write operations (preview mode only)
  • files: paths to generated summary JSON and report markdown

Supported Regions

Region CodeRegion Name
cn-north-4North China-Beijing 4
cn-north-1North China-Beijing 1
cn-east-3East China-Shanghai 1
cn-south-1South China-Guangzhou
ap-southeast-1Asia-Pacific-Hong Kong
ap-southeast-2Asia Pacific-Bangkok
ap-southeast-3Asia Pacific-Singapore

Best Practices

  1. Run the combined analysis first — use huawei_analyze_cce_cost_optimization for a complete picture before drilling into individual tools; avoid piecemeal queries that miss cross-resource dependencies.
  2. Always check both time windows — rely on 7-day data for stable optimization decisions; use 24-hour data only for short-term fluctuation observation, never as the sole basis for scale-down recommendations.
  3. Exclude kube-system by default — system workloads have fixed sizing requirements; analyzing them produces misleading oversized-request signals and wastes analysis capacity.
  4. Calibrate requests before configuring HPA — HPA scales based on request percentages; if requests are oversized, HPA will trigger premature scaling. Fix requests first, then set HPA targets.
  5. Use environment variables for credentials — prefer HW_ACCESS_KEY / HW_SECRET_KEY over passing AK/SK as parameters to avoid credential leakage in command history and logs.
  6. Review HPA preview before confirming — always call huawei_configure_cce_hpa without confirm=true first; inspect the manifest YAML and risk warning with the user before applying.
  7. Include rollback strategy in every recommendation — cost optimization changes can impact availability; every suggestion must specify how to revert and how to verify the change was safe.
  8. Flag data gaps explicitly — if metrics are missing, requests are absent, or HPA status is invisible, report these as data gaps; do not infer optimization decisions from incomplete data.
  9. Set top_n appropriately — use top_n=50 for large clusters (100+ pods) to capture all significant outliers; reduce to top_n=10 for focused analysis of specific namespaces.
  10. Save outputs to a persistent directory — use output_dir to write the summary JSON and report markdown to a known location; this enables later review and comparison across multiple analysis runs.

Common Pitfalls

PitfallSymptomQuick Fix
Missing AK/SK credentialsAll tools return "success": false with credential errorSet HW_ACCESS_KEY and HW_SECRET_KEY environment variables before running
Wrong cluster IDEmpty or error results from cluster-specific toolsRun huawei_list_cce_clusters first to confirm the correct cluster_id for your region
Analyzing kube-system workloadsFalse oversized-request alerts on system DaemonSetsSet exclude_namespaces=kube-system (default) or add other system namespaces
Single-window scale-down decisionNode marked low-utilization in 24h only but stable in 7dAlways require both short_hours=24 and long_hours=168 before recommending scale-down
HPA on oversized requestsHPA triggers scaling at low actual usage because requests are inflatedFirst reduce CPU/memory requests to realistic values, then configure HPA with target_cpu_utilization=60
Missing AOM metricsEmpty utilization data, data_gaps flagged in reportVerify IAM has aom:*:get permission and AOM is enabled on the cluster
Applying HPA without previewhuawei_configure_cce_hpa called with confirm=true without reviewAlways call without confirm=true first, review manifest, then re-run with confirm=true
kubernetes SDK not installedHPA tools fail with "Kubernetes SDK not installed"Install with pip install kubernetes before using HPA listing or configuration tools
Large cluster with small top_nOversized-request pods missing from reportIncrease top_n to 50 or higher for clusters with 100+ business pods
No output directory specifiedReport files written to temporary location, may be lostSet output_dir to a persistent path like ./cost-reports

Output Format

All tools return JSON with status, success, data, message, warning, and iles fields. See references/output-schema.md for the complete report structure.

Verification

See Verification Method for step-by-step verification.

Cross-Skill References

SkillWhen to Use
huawei-cloud-cce-cluster-managementCreate/delete/hibernate clusters, manage node pools, manage addons, cordon/uncordon/drain nodes, create/delete individual nodes

Reference Documents

DocumentPathDescription
Workflowreferences/workflow.mdDetailed analysis workflow, thresholds, and decision logic
Risk Rulesreferences/risk-rules.mdSafety boundaries, prohibited actions, and confirmation requirements
Output Schemareferences/output-schema.mdCost optimization report JSON structure

Notes

  • Ensure AK/SK has correct IAM permissions (CCE read + AOM read)
  • Default analysis excludes kube-system namespace
  • HPA recommendations require request sizing to be reasonable first
  • Node scale-down suggestions require both 24h and 7d data confirmation
  • Cost optimization reports must include rollback strategy
  • Data gaps must be explicitly flagged