Install
openclaw skills install huawei-cloud-cce-auto-remediation-runnerHuawei Cloud CCE auto-remediation runner skill that converts remediation intent into preview-first, confirm-required, post-verify execution plans. Use this skill only when the user asks for a CCE remediation action or a diagnosis result needs a preview-first recovery plan, including Deployment rollback, restart/scale/resize, cordon/drain, reboot, isolation, traffic cutover, vulnerability status change, or cluster hibernate/awake. This skill performs MUTATION actions (drain, cordon, scale, restart, delete, reboot, hibernate) that require preview+confirm workflow. NEVER auto-add confirm=true. Trigger: "auto remediation", "自动恢复", "remediation action", "恢复动作", "node drain", "节点 drain", "node cordon", "节点 cordon", "scale workload", "扩缩容", "restart pod", "重启 Pod", "remediation preview", "恢复预览", "confirm remediation", "确认恢复"
openclaw skills install huawei-cloud-cce-auto-remediation-runner⚠️ Execution Method (Must Read): This skill executes remediation actions via local Python scripts using the
scripts/huawei-cloud.pydispatcher. Using hcloud, kubectl, or other CLI tools or direct API calls is prohibited.
- All actions are dispatched through
scripts/huawei-cloud.pywith--action <action_name>and--params <json_params>- All scripts and environment check scripts are inside the skill package. You must use
skill action=execto execute them; do not run them directly in a shell- For action names and parameters, see the Core Tools section below
- Do not attempt hcloud, kubectl, curl IAM, or other CLI/API methods. This skill does not depend on these tools
- All paths are relative to the skill directory, which is the directory where this SKILL.md resides
This skill converts remediation intent into reviewable, confirmable, verifiable execution plans. It operates in preview-first mode by default — all mutation actions require preview without confirm=true, explicit user confirmation of action/object/risks, then execution with confirm=true, followed by read-only verification.
This skill is applicable to the following scenarios:
huawei_auto_remediation_run for multi-step remediation plansThis skill does NOT handle the following:
huawei-cloud-cce-root-cause-analyzer or domain-specific diagnoser skills)Before using, you must run the environment check script to complete environment validation and dependency installation in one step:
skill action=exec: bash skill://scripts/check_env.shskill action=exec: powershell -ExecutionPolicy Bypass -File skill://scripts/check_env.ps1Windows Note: Do not use
&&to chain commands (PowerShell 5.x does not support it). Use semicolons if you need to change directories first.
The script will check in sequence: Python >= 3.6 → install dependencies → validate SDK → validate credentials → validate service availability. If the environment check fails, fix the issues before continuing with other actions.
Environment Variables:
| Variable | Required | Description |
|---|---|---|
| HW_ACCESS_KEY | Yes | Huawei Cloud AK |
| HW_SECRET_KEY | Yes | Huawei Cloud SK |
| HW_REGION_NAME | No | Default cn-north-4 |
| HW_PROJECT_ID | No | Project ID (automatically obtained via IAM API when not set) |
| HW_SECURITY_TOKEN | No | Required when using temporary AK/SK |
| HW_CLUSTER_ID | No | Default CCE cluster ID (can also be passed per action) |
Security Constraints:
Do not output the values of the above environment variables.
| API Action | Permission | Purpose |
|---|---|---|
| cce:cluster:get | Get cluster | View cluster details |
| cce:cluster:list | List clusters | List CCE clusters |
| cce:node:get | Get node | View node details |
| cce:node:list | List nodes | List cluster nodes |
| cce:node:update | Update node | Cordon/uncordon/drain nodes |
| cce:nodepool:update | Update node pool | Resize node pools |
| cce:nodepool:get | Get node pool | View node pool details |
| cce:nodepool:list | List node pools | List node pools |
| aom:*:get | Read AOM | Query AOM metrics and alarms |
| aom:alarmRule:list | List alarm rules | Query alarm rules for validation |
| aom:event:list | List events | Query AOM alarm events |
Permission Failure Handling:
All actions are dispatched through scripts/huawei-cloud.py using skill action=exec.
| Action | Required Parameters | Description |
|---|---|---|
huawei_auto_remediation_run | region, cluster_id, strategy | Orchestrate multi-step remediation plan; strategy determines actions (rollback_previous_revision, scale_out, drain_and_replace, etc.) |
| Action | Required Parameters | Description |
|---|---|---|
huawei_rollback_cce_workload | region, cluster_id, namespace, kind, name | Rollback Deployment/StatefulSet/DaemonSet to previous revision |
huawei_scale_cce_workload | region, cluster_id, namespace, kind, name, replicas | Scale workload replicas |
huawei_resize_cce_workload | region, cluster_id, namespace, kind, name | Resize workload resource limits |
huawei_delete_cce_workload | region, cluster_id, namespace, kind, name | Delete a workload |
| Action | Required Parameters | Description |
|---|---|---|
huawei_cce_node_cordon | region, cluster_id, node_name | Mark node as unschedulable |
huawei_cce_node_uncordon | region, cluster_id, node_name | Mark node as schedulable again |
huawei_cce_node_drain | region, cluster_id, node_name | Evict all pods from node |
huawei_reboot_ecs | region, ecs_id | Reboot the underlying ECS instance |
| Action | Required Parameters | Description |
|---|---|---|
huawei_resize_cce_nodepool | region, cluster_id, nodepool_id, target_count | Resize node pool to target count |
huawei_hibernate_cce_cluster | region, cluster_id | Hibernate (sleep) the CCE cluster |
huawei_awake_cce_cluster | region, cluster_id | Awake (wake) the CCE cluster |
huawei_delete_cce_cluster | region, cluster_id | Delete the CCE cluster |
huawei_delete_cce_node | region, cluster_id, node_name | Delete a node from the cluster |
| Action | Required Parameters | Description |
|---|---|---|
huawei_start_ecs_instance | region, ecs_id | Start ECS instance |
huawei_stop_ecs_instance | region, ecs_id | Stop ECS instance |
| Action | Required Parameters | Description |
|---|---|---|
huawei_configure_cce_hpa | region, cluster_id, namespace, kind, name, min_replicas, max_replicas | Configure HPA policy for workload |
| Action | Required Parameters | Description |
|---|---|---|
huawei_bind_cce_cluster_eip | region, cluster_id, eip_id | Bind EIP to cluster for external access |
huawei_unbind_cce_cluster_eip | region, cluster_id | Unbind EIP from cluster |
huawei_network_verify_pod_scheduling | region, cluster_id, namespace | Verify pod scheduling network connectivity |
| Action | Required Parameters | Description |
|---|---|---|
huawei_hss_change_vul_status | region, vul_id, status | Change HSS vulnerability handling status |
| Action | Required Parameters | Description |
|---|---|---|
huawei_get_cce_pods | region, cluster_id | List pods in cluster |
huawei_get_kubernetes_nodes | region, cluster_id | List Kubernetes nodes in cluster |
huawei_get_cce_events | region, cluster_id | List Kubernetes Events in cluster |
huawei_workload_rollout_diagnose | region, cluster_id, namespace, kind, name | Diagnose workload rollout status |
huawei_root_cause_analyze | region, cluster_id | Comprehensive root cause analysis (cross-skill: huawei-cloud-cce-root-cause-analyzer) |
huawei_dependency_impact_analyze | region, cluster_id | Dependency impact analysis (cross-skill: huawei-cloud-cce-root-cause-analyzer) |
huawei_node_diagnose | region, cluster_id | Node-level diagnosis |
huawei_workload_diagnose | region, cluster_id | Workload status diagnosis |
Common Parameters:
| Parameter | Required | Description |
|---|---|---|
| region | Yes | Huawei Cloud region, e.g., cn-north-4 |
| cluster_id | Yes* | CCE cluster ID |
| namespace | Yes* | Kubernetes namespace (required for workload actions) |
| kind | Yes* | Workload type: Deployment, StatefulSet, or DaemonSet |
| name | Yes* | Workload name or node name |
| node_name | Yes* | Node name (required for node actions) |
| nodepool_id | Yes* | Node pool ID (required for node pool resize) |
| ecs_id | Yes* | ECS instance ID (required for ECS actions) |
| replicas | Yes* | Target replica count (required for scale) |
| target_count | Yes* | Target node count (required for node pool resize) |
| strategy | Yes* | Remediation strategy (required for auto-remediation) |
| confirm | No | Set to true ONLY after explicit user confirmation |
*Required for specific actions as noted.
Optional Parameters (passed via --params JSON):
| Parameter | Description |
|---|---|
| ak | Override AK (uses HW_ACCESS_KEY by default) |
| sk | Override SK (uses HW_SECRET_KEY by default) |
| project_id | Override project ID (auto-obtained via IAM when not set) |
| min_replicas | HPA minimum replicas |
| max_replicas | HPA maximum replicas |
| vul_id | HSS vulnerability ID |
| status | HSS vulnerability handling status |
| eip_id | EIP ID for bind action |
{
"success": false,
"requires_confirmation": true,
"remediation_trace_id": "ARR-...",
"strategy": "rollback_previous_revision",
"diagnosis": {},
"action_result": {},
"preview": {
"action": "huawei_rollback_cce_workload",
"target": {
"region": "cn-north-4",
"cluster_id": "cluster-id",
"namespace": "default",
"kind": "Deployment",
"name": "app-server"
},
"current_state": {},
"expected_state": {},
"impact_scope": {},
"rollback_method": "Re-apply current revision"
},
"risk_level": "R2",
"rollback_notes": [],
"summary": "Remediation plan preview — requires user confirmation before execution"
}
{
"success": true,
"requires_confirmation": false,
"confirmation_received": true,
"remediation_trace_id": "ARR-...",
"strategy": "rollback_previous_revision",
"action_result": {},
"execution": {
"action": "huawei_rollback_cce_workload",
"timestamp": "...",
"result": {}
},
"verification": [
{
"method": "huawei_get_cce_pods",
"status": "healthy",
"details": {}
}
],
"report_markdown": "# CCE Auto Remediation Execution Report...",
"report_file": "optional"
}
{
"success": false,
"requires_confirmation": true,
"remediation_trace_id": "ARR-...",
"strategy": "rollback_previous_revision",
"diagnosis": {},
"action_result": {},
"verification": {},
"summary": "remediation plan or execution result",
"action": "huawei_auto_remediation_run",
"risk_level": "R2",
"target": {
"region": "cn-north-4",
"cluster_id": "optional",
"resource": "optional"
},
"preview": {},
"requires_confirmation": true,
"confirmation_received": false,
"execution": {},
"verification": [],
"rollback_notes": [],
"report_markdown": "# CCE Auto Remediation Execution Report...",
"report_file": "optional"
}
huawei_cce_node_cordon (without confirm=true) on a test node to verify preview mode returns requires_confirmation: trueconfirm=true and verify node status with huawei_get_kubernetes_nodeshuawei_rollback_cce_workload preview mode to verify it shows current vs expected statehuawei_workload_rollout_diagnosehuawei_auto_remediation_run preview mode to verify multi-step orchestration plan is shown before executionconfirm=true on the first invocation. Always preview without confirm=true firsthuawei-cloud-cce-root-cause-analyzer and involves startup command, CrashLoop, probe, or image causing new version unavailability, prefer huawei_auto_remediation_run with rollback_previous_revision strategyreferences/risk-rules.md for R1/R2/R3 classification; apply appropriate confirmation requirementshuawei_auto_remediation_run to produce a complete execution report with diagnosis basis, action results, and verification resultshuawei-cloud-cce-root-cause-analyzer; this skill only executes confirmed remediation actionsreferences/workflow.mdreferences/risk-rules.mdreferences/output-schema.mdskill action=exec; do not run them directly in a shellconfirm=true. User must explicitly confirm the specific action, object, and riskshuawei-cloud-cce-root-cause-analyzer; domain-specific diagnosis → huawei-cloud-cce-pod-failure-diagnoser, huawei-cloud-cce-node-failure-diagnoser, huawei-cloud-cce-network-failure-diagnoserhuawei-cloud-cce-root-cause-analyzer or domain diagnoser before remediation. Blind remediation without evidence is prohibitedreferences/risk-rules.md