Install
openclaw skills install huawei-cloud-cce-network-failure-diagnoserHuawei Cloud CCE Network failure diagnosis skill using Python SDK dispatcher. Use this skill when the user wants to: (1) diagnose CCE network connectivity issues, Service/Ingress failures, (2) analyze ELB configuration, VPC/Subnet issues, (3) diagnose DNS resolution failures, (4) check network policies and security group rules. Trigger: user mentions "network failure", "网络故障", "Service unreachable", "Service 不通", "Ingress 502", "Ingress 504", "ELB error", "ELB 异常", "DNS failure", "DNS 解析失败", "network diagnosis", "网络诊断", "VPC", "subnet", "子网", "安全组", "网络策略"
openclaw skills install huawei-cloud-cce-network-failure-diagnoser⚠️ Execution Method (Must Read): This skill executes diagnosis via local Python scripts using a dispatcher pattern. Using hcloud, openstack, or other CLI tools or direct API calls is prohibited.
- The dispatcher script is
scripts/huawei-cloud.py, invoked aspython3 scripts/huawei-cloud.py <action> <key=value params>- All scripts and environment check scripts are inside the skill package. You must use
skill action=execto execute them. Do not run them directly in a shell.- For action details and parameters, refer to
references/workflow.md,references/risk-rules.md, andreferences/output-schema.md- Do not attempt hcloud, openstack, curl IAM, or any other CLI/API methods. This skill does not depend on those tools.
- All paths are relative to the skill directory, which is the directory where this SKILL.md is located.
This skill diagnoses CCE (Cloud Container Engine) network failures by performing a layered, read-only diagnosis across the full network stack — from node infrastructure, DNS, Service/EndpointSlice, NetworkPolicy, Ingress to cloud-side ELB/EIP/NAT/VPC security policies. It produces a complete Markdown diagnosis report that must include the investigation process, evidence, conclusions, confidence levels, and verification criteria.
Use this skill when:
This skill does NOT handle:
huawei-cloud-cce-pod-failure-diagnoser, huawei-cloud-cce-node-failure-diagnoser, huawei-cloud-cce-workload-failure-diagnoser)You must run the environment check script first to complete environment validation and dependency installation in one step:
skill action=exec: bash skill://scripts/check_env.shskill action=exec: powershell -ExecutionPolicy Bypass -File skill://scripts/check_env.ps1Windows note: Do not use
&&to chain commands (PowerShell 5.x does not support it); use semicolons if you need to change directories first.
The script will check in order: Python >= 3.6 → install dependencies → validate SDK → validate credentials → validate service availability. If the environment check fails, fix the issues before proceeding.
Environment Variables:
| Variable | Required | Description |
|---|---|---|
| HW_ACCESS_KEY | Yes | Huawei Cloud AK (Access Key) |
| HW_SECRET_KEY | Yes | Huawei Cloud SK (Secret Key) |
| HW_REGION_NAME | No | Default cn-north-4 |
| HW_PROJECT_ID | No | Project ID (automatically obtained via IAM API when not set) |
| HW_SECURITY_TOKEN | No | Required when using temporary AK/SK |
| HW_CCE_CLUSTER_ID | Yes | CCE cluster ID for diagnosis target |
| KUBECONFIG | No | Kubernetes config; auto-obtained from CCE API if not set |
Security Constraints:
Do not output the values of environment variables.
| API Action | Permission | Purpose |
|---|---|---|
| cce:cluster:get | Get cluster | View cluster details |
| cce:cluster:list | List clusters | List CCE clusters |
| cce:node:list | List nodes | List cluster nodes |
| vpc:vpc:list | List VPCs | Query VPC details |
| vpc:subnet:list | List subnets | Query subnet details |
| elb:loadbalancer:list | List ELBs | Query ELB details |
| elb:listener:list | List listeners | Query ELB listeners |
| aom:*:get | Read AOM | Query AOM metrics and alarms |
Permission Failure Handling:
All actions are invoked via the Python dispatcher script:
python3 scripts/huawei-cloud.py <action> region=<region> cluster_id=<cluster_id> namespace=<namespace> [other_params...]
Execution via skill:
skill action=exec: skill://.venv/bin/python3 skill://scripts/huawei-cloud.py <action> <params>skill action=exec: skill://.venv/Scripts/python3.exe skill://scripts/huawei-cloud.py <action> <params>| Action | Description |
|---|---|
huawei_network_failure_diagnose | One-shot diagnosis: collects K8s and cloud-side read-only snapshots, returns structured findings + report_markdown |
| Action | Description |
|---|---|
huawei_get_cce_services | List Services in a namespace |
huawei_get_cce_ingresses | List Ingresses in a namespace |
huawei_get_cce_pods | List Pods in a namespace |
huawei_get_kubernetes_nodes | List cluster Nodes |
huawei_get_cce_events | List cluster Events |
huawei_get_pod_logs | Retrieve Pod container logs |
| Action | Description |
|---|---|
huawei_get_elb_backend_status | Read ELB pool/member/health monitor/load balancer status |
huawei_get_elb_metrics | Retrieve ELB monitoring metrics |
huawei_list_elb | List ELB load balancers |
huawei_list_elb_listeners | List ELB listeners |
huawei_list_eip | List EIP addresses |
huawei_get_eip_metrics | Retrieve EIP monitoring metrics |
huawei_list_nat | List NAT gateways |
huawei_get_nat_gateway_metrics | Retrieve NAT gateway metrics |
huawei_list_security_groups | List VPC security groups |
huawei_list_vpc_acls | List VPC ACLs |
| Action | Description |
|---|---|
huawei_network_diagnose | Legacy comprehensive network diagnosis |
huawei_network_diagnose_by_alarm | Diagnosis triggered by alarm correlation |
huawei_network_verify_pod_scheduling | Verify Pod scheduling constraints (read-only) |
| Parameter | Description |
|---|---|
region | Huawei Cloud region, e.g., cn-north-4 |
cluster_id | CCE cluster ID |
namespace | Kubernetes namespace |
| Parameter | Description |
|---|---|
failure_symptom | Symptom description: domain_unresolvable, in_cluster_service_unreachable, service_intermittent, external_access_failed, ingress_502_504 |
target_kind | Resource type: Pod, Service, Ingress, etc. |
target_name | Resource name |
service_name | Target Service name |
ingress_name | Target Ingress name |
source_pod | Source Pod name or label |
destination_pod | Destination Pod name or label |
domain | Domain name for DNS diagnosis |
elb_id | ELB load balancer ID |
huawei_network_failure_diagnose returns structured JSON with an embedded report_markdown:
{
"success": true,
"action": "huawei_network_failure_diagnose",
"region": "cn-north-4",
"cluster_id": "cluster-id",
"namespace": "default",
"conclusion": "high signal conclusion",
"confidence": "High",
"pipeline_pruned": false,
"findings": [
{
"stage": "Stage 3: East-West Routing and Policy Layer",
"type": "NetworkPolicyBlocked",
"title": "NetworkPolicy selects target Pod but does not allow source Pod labels or target port",
"confidence": 1.0,
"severity": "critical",
"evidence": [],
"recommendation": [],
"prune": false
}
],
"top_causes": [],
"snapshot": {
"inputs": {},
"nodes": [],
"pods": [],
"services": [],
"ingresses": [],
"endpoint_slices": [],
"network_policies": [],
"events": [],
"logs": {},
"cloud": {
"elb_ids": [],
"elbs": {},
"eips": {},
"nat": {},
"security_groups": {},
"vpc_acls": {}
}
},
"report_markdown": "# CCE Network Failure Automated Diagnosis Report\n..."
}
The report_markdown must contain the following headings:
huawei-cloud-cce-auto-remediation-runnerCommon type values in findings:
| Type | Description |
|---|---|
NodeUnhealthy | Node Ready=False or Ready=Unknown |
NodePressure | Memory/Disk/PID/Network pressure on node |
PodDNSConfigMissing | Pod dnsPolicy=None with no dnsConfig |
KubeDnsNoEndpoint | kube-dns EndpointSlice has 0 ready endpoints |
CoreDNSRestarting | CoreDNS pods showing OOMKilled/LivenessProbe failures |
CoreDNSNxDomain | CoreDNS logs showing NXDOMAIN responses |
CoreDNSUpstreamTimeout | CoreDNS logs showing upstream i/o timeout |
NetworkPolicyBlocked | NetworkPolicy blocks source Pod traffic (confidence 100%) |
ServiceNoReadyEndpoint | Service has 0 ready endpoints in EndpointSlice |
ServiceSelectorMismatch | Service selector matches no Pods |
ReadinessFlapping | Backend Pod readiness probe flapping |
BackendOverloaded | Application logs show OOM/connection pool exhausted |
LoadBalancerProvisioningFailed | LoadBalancer Ingress status empty with CCM errors |
ELBBackendUnhealthy | ELB member unhealthy while K8s backend Pod is Ready |
IngressUpstreamError | Ingress controller logs show 502/504 |
huawei_network_failure_diagnose with a known-healthy cluster and verify the report structurepipeline_pruned flag is set correctly when node-level issues prune upper layersconfidence and severity values are present in all findingsfailure_symptom to direct the diagnosis pipeline to the relevant stage (DNS, east-west, or north-south)service_name, ingress_name, source_pod, destination_pod, domain) for more precise diagnosishuawei_network_failure_diagnose for one-shot comprehensive diagnosis; use individual actions only for targeted follow-up querieshuawei_get_elb_backend_status and huawei_list_security_groups to check cloud-side configurationhuawei-cloud-cce-node-failure-diagnoserreferences/workflow.mdreferences/risk-rules.mdreferences/output-schema.mdkubectl exec, packet capture, stress testing, or active traffic injection unless the user explicitly requests and acknowledges the riskhuawei_network_verify_pod_scheduling is for verification only; it does not replace scaling actionshuawei-cloud-cce-auto-remediation-runner for previewskill action=exec; do not run them directly in a shellcluster_id parameter is required for all CCE actions. If the user only provides a cluster name, query huawei_list_cce_clusters first to resolve the IDingress_502_504 for an in-cluster issue) may misdirect the pipeline. Always confirm the symptom type with the userhuawei_get_elb_backend_status and huawei_get_cce_pods together