Install
openclaw skills install huawei-cloud-cce-observability-context-builderCollect and consolidate Huawei Cloud CCE alarms, metrics, logs, and events into a comprehensive observability context package for diagnosis handoff.
openclaw skills install huawei-cloud-cce-observability-context-builder⚠️ Execution Method (Must Read): This skill executes actions via local Python scripts using the
scripts/huawei-cloud.pydispatcher. Using hcloud, kubectl, or other CLI tools or direct API calls is prohibited.
- All actions are dispatched through
scripts/huawei-cloud.pywith--action <action_name>and--params <json_params>- All scripts and environment check scripts are inside the skill package. You must use
skill action=execto execute them; do not run them directly in a shell- For action names and parameters, see the Core Tools section below
- Do not attempt hcloud, kubectl, curl IAM, or other CLI/API methods. This skill does not depend on these tools
- All paths are relative to the skill directory, which is the directory where this SKILL.md resides
This skill consolidates scattered fault signals into a structured, diagnosable context package. It first collects the time window, cluster, namespace, workload, Pod, node, and alarm scope, then gathers evidence by type (alarms, events, metrics, logs), merges signals along a timeline, and identifies gaps and the appropriate next diagnostic skill for hand-off. This skill is strictly read-only — it never executes remediation actions.
Architecture: python3 scripts/huawei-cloud.py dispatcher → Huawei Cloud Python SDK + AOM/LTS API → alarms, metrics, logs, events aggregation
Related Skills:
huawei-cloud-cce-pod-failure-diagnoser - Pod CrashLoopBackOff, ImagePullBackOff, OOMKilled diagnosishuawei-cloud-cce-node-failure-diagnoser - Node health, resource pressure, NPD events diagnosishuawei-cloud-cce-network-failure-diagnoser - Network connectivity, DNS, ELB diagnosishuawei-cloud-cce-storage-failure-diagnoser - PVC/PV mount, storage provisioning diagnosishuawei-cloud-cce-root-cause-analyzer - Cross-domain root cause analysis and reportshuawei-cloud-cce-auto-remediation-runner - Remediation actions (scale, drain, rollback, etc.)huawei-cloud-cce-alarm-correlation-engine - Alarm deduplication and correlationhuawei-cloud-cce-metric-analyzer - Deep metric trend analysishuawei-cloud-cce-log-analyzer - Deep log pattern analysisCapabilities:
huawei_list_aom_alarms, huawei_analyze_aom_alarms)huawei_get_cce_events)huawei_get_cce_pod_metrics_topN, huawei_get_cce_node_metrics_topN)huawei_query_aom_logs, huawei_get_recent_logs)huawei_get_pod_logs)huawei_get_aom_metrics, huawei_list_aom_instances)huawei_generate_monitor_dashboard)Typical Use Cases:
huaweicloudsdkcce, huaweicloudsdkcore, huaweicloudsdkaom, huaweicloudsdklts packagesecho $HUAWEI_AK or echo $HUAWEI_SK to check credentialsHUAWEI_AK, HUAWEI_SK, HUAWEI_REGIONConfiguration Method (Environment Variables Only):
export HUAWEI_AK=<your-ak>
export HUAWEI_SK=<your-sk>
export HUAWEI_REGION=cn-north-4
Important Security Notes:
| API Action | Permission | Purpose |
|---|---|---|
cce:cluster:get | Get cluster | View CCE cluster details |
cce:cluster:createCert | Create certificate | Obtain kubeconfig for kubectl access |
aom:alarm:list | List alarms | Query AOM active/history alarms |
aom:alarm:analyze | Analyze alarms | Deduplicate and group alarms |
aom:metricsData:get | Get metrics data | Query Pod/node CPU/memory metrics |
aom:instance:list | List AOM instances | Discover AOM Prom instance |
aom:logData:get | Get log data | Query AOM/LTS log data |
lts:log:list | List LTS logs | Query LTS log streams |
cce:event:list | List events | Query Kubernetes Events |
Permission Failure Handling:
All actions are dispatched through scripts/huawei-cloud.py using skill action=exec.
| Action | Required Parameters | Description |
|---|---|---|
huawei_list_aom_alarms | region, cluster_id | Collect active + history AOM alarms for the cluster |
huawei_analyze_aom_alarms | region, cluster_id | Deduplicate alarms and group by severity level |
| Action | Required Parameters | Description |
|---|---|---|
huawei_get_cce_events | region, cluster_id | Retrieve Kubernetes Events grouped by object and reason |
huawei_get_cce_pod_metrics_topN | region, cluster_id, namespace | TopN Pod metrics (CPU/memory) for anomaly detection |
huawei_get_cce_node_metrics_topN | region, cluster_id | TopN Node metrics for resource pressure detection |
huawei_get_aom_metrics | region, cluster_id, namespace | Query AOM metrics for specific resources |
huawei_list_aom_instances | region | Discover AOM Prom instance for metrics queries |
| Action | Required Parameters | Description |
|---|---|---|
huawei_query_aom_logs | region, cluster_id, namespace | Query AOM structured log data |
huawei_get_recent_logs | region, cluster_id, namespace | Get recent log entries (LTS) |
huawei_get_pod_logs | region, cluster_id, pod_name, namespace | Fetch Pod container logs (previous or current) |
| Action | Required Parameters | Description |
|---|---|---|
huawei_generate_monitor_dashboard | region, cluster_id | Generate monitoring dashboard from collected data |
| Parameter | Required | Description | Default |
|---|---|---|---|
region | Yes | Huawei Cloud region | HUAWEI_REGION |
cluster_id | Yes | CCE cluster ID | N/A |
namespace | No | Kubernetes namespace | N/A |
ak | Optional | Override AK | HUAWEI_AK |
sk | Optional | Override SK | HUAWEI_SK |
project_id | Optional | Project ID | Auto from IAM |
| Parameter | Required | Description | Default |
|---|---|---|---|
alarm_id | No | Specific alarm ID to query | N/A |
alarm_level | No | Alarm severity filter | All |
hours | No | History lookback window (hours) | 1 |
| Parameter | Required | Description | Default |
|---|---|---|---|
pod_name | Yes* | Pod name (for huawei_get_pod_logs) | N/A |
container | No | Container name | First |
previous | No | Fetch previous (crashed) logs | false |
tail_lines | No | Number of log tail lines | 100 |
| Parameter | Required | Description | Default |
|---|---|---|---|
top_n | No | Number of top results | 10 |
hours | No | Metrics lookback window (hours) | 1 |
*Required for specific actions as noted.
region, cluster_id, namespace, workload, pod, node, and alarm_id provided by the userhuawei_list_aom_alarms to collect active + history alarms, then use huawei_analyze_aom_alarms for deduplication and severity groupinghuawei_get_cce_events to retrieve Kubernetes Events grouped by involved object and reasonhuawei_query_aom_logs, then supplement with Pod-side logs from huawei_get_recent_logs or huawei_get_pod_logsFor the complete evidence-gathering workflow, see references/workflow.md.
See references/output-schema.md for the complete JSON response structure.
Context Package Output:
{
"summary": "one paragraph context summary",
"scope": {
"region": "cn-north-4",
"cluster_id": "optional",
"namespace": "optional",
"workload": "optional",
"time_window": "optional"
},
"signals": {
"alarms": [],
"events": [],
"metrics": [],
"logs": []
},
"timeline": [],
"gaps": [],
"next_skill": "huawei-cloud-cce-pod-failure-diagnoser | huawei-cloud-cce-node-failure-diagnoser | huawei-cloud-cce-network-failure-diagnoser | huawei-cloud-cce-root-cause-analyzer"
}
Key output fields:
summary — one paragraph summarizing the collected observability contextscope — region, cluster, namespace, workload, and time windowsignals — collected evidence grouped by type (alarms, events, metrics, logs)timeline — merged signal timeline showing event chronologygaps — missing data that could improve diagnosisnext_skill — recommended diagnostic skill for hand-off based on signal analysisThis skill is strictly read-only observability — no mutations allowed.
confirm=true — no mutations allowedFor complete risk classification, see references/risk-rules.md.
python3 scripts/huawei-cloud.py huawei_list_aom_alarms region=cn-north-4 cluster_id=<cluster-id> to verify alarm query connectivitypython3 scripts/huawei-cloud.py huawei_get_cce_events region=cn-north-4 cluster_id=<cluster-id> limit=10 to verify Event query workspython3 scripts/huawei-cloud.py huawei_get_cce_pod_metrics_topN region=cn-north-4 cluster_id=<cluster-id> namespace=default top_n=5 to verify metrics TopNscope, signals, timeline, gaps, and next_skill fieldshuawei_list_aom_alarms and huawei_analyze_aom_alarms — alarms provide the most direct fault signalshuawei_get_cce_pod_metrics_topN and huawei_get_cce_node_metrics_topN efficiently highlight resource peaks without scanning all resourceshuawei_query_aom_logs provides structured log data; use huawei_get_pod_logs or huawei_get_recent_logs for Pod-side container log detailsgaps field — this guides the next diagnostic skill on what additional evidence to collecthuawei-cloud-cce-auto-remediation-runnerhuawei-cloud-cce-pod-failure-diagnoser, node issues → huawei-cloud-cce-node-failure-diagnoser, network → huawei-cloud-cce-network-failure-diagnoser, cross-domain → huawei-cloud-cce-root-cause-analyzer| Document | Description |
|---|---|
| Workflow | Evidence-gathering workflow and step sequence |
| Risk Rules | Safety constraints and risk classification |
| Output Schema | JSON response format for context package |
huawei-cloud-cce-auto-remediation-runnerconfirm=true is ever neededscripts/huawei-cloud.py) is the only execution method — do not use hcloud CLI or direct API callsnext_skill field in the output uses huawei-cloud-cce-* naming for cross-skill hand-offhuawei-cloud-cce-alarm-correlation-engine| Pitfall | Symptom | Quick Fix |
|---|---|---|
Missing cluster_id | All actions fail immediately | Provide cluster_id from cluster list |
| No time window specified | Broad, noisy results | Default to last 1 hour; note assumption in output |
| Skipping alarm collection | Missing critical fault signals | Always start with huawei_list_aom_alarms |
| Not merging signals on timeline | Isolated data points, no causal chain | Chronologically merge alarms, events, metrics |
| Suggesting mutation actions | Unsafe recommendations | All mutations → huawei-cloud-cce-auto-remediation-runner |
| Not marking data gaps | Diagnosis skill lacks direction | Always populate the gaps field |
| Querying all namespaces | Slow response, too many results | Scope with namespace and workload |
| AOM Prom instance not found | Metrics queries return empty | Verify with huawei_list_aom_instances first |