Huawei Cloud Cce Cost Optimization Advisor

Other

Huawei Cloud CCE cost optimization analysis skill. Identifies idle resources, oversized CPU/memory requests, low-utilization nodes, 24h/7d utilization trends, HPA recommendations, and node autoscaler policy optimization. Read-only analysis and configuration suggestions only — does not modify HPA, autoscaler, node pools, or workloads without explicit user confirmation. Trigger: user mentions "cost optimization", "成本优化", "cost advisor", "成本顾问", "resource waste", "资源浪费", "cost reduction", "成本降低", "billing analysis", "账单分析", "over-provisioned", "超配", "CCE cost", "idle nodes", "oversized request", "HPA recommendation", "autoscaler policy"

Install

openclaw skills install huawei-cloud-cce-cost-optimization-advisor

Huawei Cloud CCE Cost Optimization Advisor

Overview

Analyze CCE (Cloud Container Engine) cluster cost optimization opportunities. This skill performs read-only analysis and generates configuration suggestions — it does not directly modify HPA, autoscaler, node pools, or workload requests. All configuration changes require explicit user confirmation.

Analysis scope:

24-hour and 7-day node CPU/memory utilization trends
Low-utilization node detection (below cluster average or below 30%)
Oversized resource request detection (business workloads only)
HPA and node autoscaler status review and recommendations
Cost optimization report with execution plan

Architecture: Python SDK v3 → CCE API + AOM PromQL → Inventory + Metrics → Cost Analysis → Report

Security Constraints

Dangerous Operation Confirmation Mechanism

This skill enforces a strict read-only-by-default policy. All write operations require confirm=true.

Operations Requiring Confirmation

Tool	Operation Type	Risk Level	Description
`huawei_configure_cce_hpa`	Create/Update HPA	🟠 High	Creates or replaces a HorizontalPodAutoscaler
Node pool resize/scale-down	Scale	🟠 High	Reduces node pool capacity

Write operations without confirm=true return a preview only. The huawei_configure_cce_hpa tool returns a manifest preview and risk warning when called without confirm=true. Only after explicit user approval can it be called with confirm=true to apply the configuration.

Workflow

Step 1: Preview Operation — Call without confirm=true

python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \
  region=cn-north-4 \
  cluster_id=xxx \
  workload_name=my-deploy \
  namespace=default \
  min_replicas=1 \
  max_replicas=3 \
  target_cpu_utilization=60

Returns: HPA manifest preview, risk warning, confirmation hint

Step 2: Confirm Execution — Call with confirm=true after user approval

python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \
  region=cn-north-4 \
  cluster_id=xxx \
  workload_name=my-deploy \
  namespace=default \
  min_replicas=1 \
  max_replicas=3 \
  target_cpu_utilization=60 \
  confirm=true

Prohibited Actions

No automatic node pool scale-down — never delete nodes or shrink node pools automatically
No workload request modification — never change CPU/memory requests directly
No automatic HPA installation/update — never apply HPA without explicit user confirmation
No autoscaler enable/disable — never toggle autoscaler without user approval

Allowed Actions

Read-only queries: nodes, node pools, pods, deployments, metrics, AOM PromQL
Generate HPA YAML manifests, autoscaler parameter suggestions, and execution plans
huawei_configure_cce_hpa without confirm=true returns preview only

Credential Security

No persistent credential storage — AK/SK exists only during API calls
No credential leakage — never includes AK/SK in logs, responses, or errors
Environment variable preferred — HW_ACCESS_KEY / HW_SECRET_KEY / HW_REGION_NAME

Prerequisites

Python Environment

Python 3.8+
Install SDKs: pip install huaweicloudsdkcce huaweicloudsdkcore huaweicloudsdkces
Optional for HPA operations: pip install kubernetes
Optional for dashboard charts: pip install matplotlib numpy

Environment Variables (Recommended)

export HW_ACCESS_KEY="your-access-key-id"
export HW_SECRET_KEY="your-secret-access-key"
export HW_REGION_NAME="cn-north-4"

IAM Permission Policies

Ensure the IAM user has the minimum required permissions:

Permission	Description
`cce:cluster:list`	List clusters
`cce:cluster:get`	Get cluster details
`cce:node:list`	List nodes
`cce:node:get`	Get node details
`cce:nodepool:list`	List node pools
`cce:nodepool:get`	Get node pool details
`aom:*:get`	Read AOM metrics and PromQL data

Core Commands

Recommended: Combined Analysis

Tool	Function	Parameters
`huawei_analyze_cce_cost_optimization`	One-shot cost optimization analysis — inventory, 24h/7d node utilization, pod usage/request, HPA/autoscaler status, and report output	`region`, `cluster_id`, `exclude_namespaces`, `business_namespaces`, `short_hours`, `long_hours`, `top_n`, `output_dir`

Prefer huawei_analyze_cce_cost_optimization for comprehensive analysis. Only use individual tools below for supplementing details, reviewing specific metrics, or manually generating HPA YAML.

Resource Inventory

Tool	Function	Parameters
`huawei_list_cce_clusters`	List all CCE clusters in region	`region`
`huawei_list_cce_nodes`	List cluster nodes	`region`, `cluster_id`
`huawei_get_kubernetes_nodes`	Get Kubernetes node details (including allocatable resources)	`region`, `cluster_id`
`huawei_list_cce_nodepools`	List node pools with autoscaling info	`region`, `cluster_id`
`huawei_get_cce_pods`	Get pod list with labels, status, requests	`region`, `cluster_id`
`huawei_get_cce_deployments`	Get deployment list	`region`, `cluster_id`
`huawei_list_cce_hpas`	List HPA configurations (excludes kube-system by default)	`region`, `cluster_id`

Metrics Analysis

Tool	Function	Parameters
`huawei_get_cce_node_metrics_topN`	Node CPU/memory/disk utilization Top N	`region`, `cluster_id`, `top_n`, `hours`
`huawei_get_cce_node_metrics`	Single node utilization time series	`region`, `cluster_id`, `node_ip`, `hours`
`huawei_get_cce_pod_metrics_topN`	Pod CPU/memory utilization Top N (supports custom PromQL)	`region`, `cluster_id`, `top_n`, `hours`, `cpu_query`, `memory_query`
`huawei_get_cce_pod_metrics`	Single pod utilization time series	`region`, `cluster_id`, `pod_name`, `namespace`, `hours`
`huawei_get_aom_metrics`	Generic AOM PromQL query	`region`, `aom_instance_id`, `query`, `hours`

Elasticity Policy

Tool	Function	Risk Level	Requires Confirmation
`huawei_generate_cce_hpa_manifest`	Generate `autoscaling/v2` HPA YAML (no cluster modification)	🟢 Low	No
`huawei_configure_cce_hpa`	Create or replace HPA in cluster	🟠 High	Yes (`confirm=true`)

HPA configuration workflow:

Use huawei_generate_cce_hpa_manifest or huawei_configure_cce_hpa without confirm=true to generate a preview
Review the manifest with the user
Only after explicit user approval, call huawei_configure_cce_hpa with confirm=true

HPA recommendations must be based on request sizing. If requests are clearly oversized, first recommend calibrating requests, then configure HPA.

Dashboard

Tool	Function	Parameters
`huawei_generate_monitor_dashboard`	Generate monitoring dashboard chart images	`region`, `cluster_id`, `metrics_type`, `hours`

Parameter Reference

Common Parameters

All tools accept these common parameters for authentication and region:

Parameter	Type	Required	Default	Description
`region`	string	Yes	—	Huawei Cloud region code (e.g., `cn-north-4`)
`cluster_id`	string	Yes*	—	CCE cluster ID; not required for `huawei_list_cce_clusters`
`ak`	string	No	env `HW_ACCESS_KEY`	Access Key ID; environment variable preferred
`sk`	string	No	env `HW_SECRET_KEY`	Secret Access Key; environment variable preferred
`project_id`	string	No	auto	IAM project ID; auto-resolved from region if omitted

* cluster_id is not required for huawei_list_cce_clusters (lists all clusters in region).

Combined Analysis Parameters (`huawei_analyze_cce_cost_optimization`)

Parameter	Type	Required	Default	Description
`region`	string	Yes	—	Huawei Cloud region code
`cluster_id`	string	Yes	—	CCE cluster ID
`short_hours`	int	No	`24`	Short-window metrics duration in hours
`long_hours`	int	No	`168` (7d)	Long-window metrics duration in hours
`top_n`	int	No	`50`	Top N pods/nodes for oversized-request and utilization ranking
`exclude_namespaces`	string	No	`kube-system`	Comma-separated namespaces to exclude from analysis
`business_namespaces`	string	No	—	Comma-separated namespaces to treat as business workloads; if omitted, all non-excluded namespaces are analyzed
`output_dir`	string	No	—	Directory to write summary JSON and report markdown
`include_raw`	bool	No	`false`	Include raw metrics data in output

HPA Parameters (`huawei_generate_cce_hpa_manifest` / `huawei_configure_cce_hpa`)

Parameter	Type	Required	Default	Description
`workload_name`	string	Yes	—	Target Deployment/StatefulSet name
`namespace`	string	Yes	—	Namespace of the target workload
`min_replicas`	int	Yes	—	Minimum replica count for HPA
`max_replicas`	int	Yes	—	Maximum replica count for HPA
`workload_type`	string	No	`deployment`	Workload kind: `deployment` or `statefulset`
`hpa_name`	string	No	auto	HPA object name; defaults to `<workload_name>-hpa`
`target_cpu_utilization`	int	No	`60`	Target average CPU utilization percentage
`target_memory_utilization`	int	No	—	Target average memory utilization percentage; omit to skip memory metric
`behavior`	object	No	—	HPA behavior policy (scaling rates, stabilization windows)
`confirm`	bool	No	`false`	`huawei_configure_cce_hpa` only: must be `true` to apply changes

Metrics Parameters (`huawei_get_cce_node_metrics_topN` / `huawei_get_cce_pod_metrics_topN`)

Parameter	Type	Required	Default	Description
`top_n`	int	No	`10`	Number of top nodes/pods to return
`hours`	int	No	`1`	Metrics query time range in hours
`cpu_query`	string	No	auto	Custom PromQL for CPU; defaults to built-in query
`memory_query`	string	No	auto	Custom PromQL for memory; defaults to built-in query
`node_ip`	string	Yes*	—	Required for `huawei_get_cce_node_metrics` (single node)
`pod_name`	string	Yes*	—	Required for `huawei_get_cce_pod_metrics` (single pod)
`namespace`	string	Yes*	—	Required for `huawei_get_cce_pod_metrics` (single pod)

* Only required for single-entity metrics tools.

Dashboard Parameters (`huawei_generate_monitor_dashboard`)

Parameter	Type	Required	Default	Description
`hours`	int	No	`1`	Monitoring data time range in hours
`top_n`	int	No	`10`	Top N pods for dashboard ranking
`namespace`	string	No	—	Filter by namespace
`label_selector`	string	No	—	Filter by label (e.g., `app=nginx`)
`output_file`	string	No	auto	Output HTML file path
`title`	string	No	auto	Dashboard title

Analysis Workflow

See references/workflow.md for detailed analysis steps, thresholds, and decision logic.

Quick Summary

Scope: Confirm region, cluster_id, namespace range, and exclusion rules (default: exclude kube-system)
Node utilization: Analyze 24h and 7d windows for CPU/memory usage per node and cluster average
Low-utilization detection: Flag nodes below cluster average by 20 percentage points or below 60% of cluster average; cluster average below 30% signals overall over-provisioning
Oversized requests: Compare business workload request vs actual p95 usage; mark as high (p95 < 33% of request), optimize (p95 < 50%), or observe (short-window only)
Elasticity review: Check node pool autoscaling and HPA status; generate recommendations
Output: Summary, utilization tables, oversized request list, HPA/autoscaler recommendations, risks, and verification steps

Risk Rules

See references/risk-rules.md for complete safety boundaries.

Key constraints:

Auto-execution limited to R1 read-only queries only
No automatic scale-down, request modification, or HPA/autoscaler changes
Must reference both 24h and 7d windows before recommending scale-down
Cost optimization suggestions must include rollback strategy and verification metrics
Data gaps (missing metrics, missing requests, invisible HPA) must be flagged in the report

Output Schema

See references/output-schema.md for the complete JSON report structure.

All tools return JSON with:

status / success: operation result
data: analysis results, metrics, or configuration preview
message: human-readable description
warning: risk warning for write operations (preview mode only)
files: paths to generated summary JSON and report markdown

Supported Regions

Region Code	Region Name
cn-north-4	North China-Beijing 4
cn-north-1	North China-Beijing 1
cn-east-3	East China-Shanghai 1
cn-south-1	South China-Guangzhou
ap-southeast-1	Asia-Pacific-Hong Kong
ap-southeast-2	Asia Pacific-Bangkok
ap-southeast-3	Asia Pacific-Singapore

Best Practices

Run the combined analysis first — use huawei_analyze_cce_cost_optimization for a complete picture before drilling into individual tools; avoid piecemeal queries that miss cross-resource dependencies.
Always check both time windows — rely on 7-day data for stable optimization decisions; use 24-hour data only for short-term fluctuation observation, never as the sole basis for scale-down recommendations.
Exclude kube-system by default — system workloads have fixed sizing requirements; analyzing them produces misleading oversized-request signals and wastes analysis capacity.
Calibrate requests before configuring HPA — HPA scales based on request percentages; if requests are oversized, HPA will trigger premature scaling. Fix requests first, then set HPA targets.
Use environment variables for credentials — prefer HW_ACCESS_KEY / HW_SECRET_KEY over passing AK/SK as parameters to avoid credential leakage in command history and logs.
Review HPA preview before confirming — always call huawei_configure_cce_hpa without confirm=true first; inspect the manifest YAML and risk warning with the user before applying.
Include rollback strategy in every recommendation — cost optimization changes can impact availability; every suggestion must specify how to revert and how to verify the change was safe.
Flag data gaps explicitly — if metrics are missing, requests are absent, or HPA status is invisible, report these as data gaps; do not infer optimization decisions from incomplete data.
Set top_n appropriately — use top_n=50 for large clusters (100+ pods) to capture all significant outliers; reduce to top_n=10 for focused analysis of specific namespaces.
Save outputs to a persistent directory — use output_dir to write the summary JSON and report markdown to a known location; this enables later review and comparison across multiple analysis runs.

Common Pitfalls

Pitfall	Symptom	Quick Fix
Missing AK/SK credentials	All tools return `"success": false` with credential error	Set `HW_ACCESS_KEY` and `HW_SECRET_KEY` environment variables before running
Wrong cluster ID	Empty or error results from cluster-specific tools	Run `huawei_list_cce_clusters` first to confirm the correct `cluster_id` for your region
Analyzing kube-system workloads	False oversized-request alerts on system DaemonSets	Set `exclude_namespaces=kube-system` (default) or add other system namespaces
Single-window scale-down decision	Node marked low-utilization in 24h only but stable in 7d	Always require both `short_hours=24` and `long_hours=168` before recommending scale-down
HPA on oversized requests	HPA triggers scaling at low actual usage because requests are inflated	First reduce CPU/memory requests to realistic values, then configure HPA with `target_cpu_utilization=60`
Missing AOM metrics	Empty utilization data, `data_gaps` flagged in report	Verify IAM has `aom:*:get` permission and AOM is enabled on the cluster
Applying HPA without preview	`huawei_configure_cce_hpa` called with `confirm=true` without review	Always call without `confirm=true` first, review manifest, then re-run with `confirm=true`
kubernetes SDK not installed	HPA tools fail with `"Kubernetes SDK not installed"`	Install with `pip install kubernetes` before using HPA listing or configuration tools
Large cluster with small `top_n`	Oversized-request pods missing from report	Increase `top_n` to 50 or higher for clusters with 100+ business pods
No output directory specified	Report files written to temporary location, may be lost	Set `output_dir` to a persistent path like `./cost-reports`

Output Format

All tools return JSON with status, success, data, message, warning, and iles fields. See references/output-schema.md for the complete report structure.

Verification

See Verification Method for step-by-step verification.

Cross-Skill References

Skill	When to Use
`huawei-cloud-cce-cluster-management`	Create/delete/hibernate clusters, manage node pools, manage addons, cordon/uncordon/drain nodes, create/delete individual nodes

Reference Documents

Document	Path	Description
Workflow	references/workflow.md	Detailed analysis workflow, thresholds, and decision logic
Risk Rules	references/risk-rules.md	Safety boundaries, prohibited actions, and confirmation requirements
Output Schema	references/output-schema.md	Cost optimization report JSON structure

Notes

Ensure AK/SK has correct IAM permissions (CCE read + AOM read)
Default analysis excludes kube-system namespace
HPA recommendations require request sizing to be reasonable first
Node scale-down suggestions require both 24h and 7d data confirmation
Cost optimization reports must include rollback strategy
Data gaps must be explicitly flagged