Install
openclaw skills install huawei-cloud-cce-alarm-correlation-engineHuawei Cloud AOM alarm correlation analysis skill for CCE operations. Use this skill when the user wants to: (1) query AOM active and historical alarms, (2) analyze alarm deduplication, severity grouping, and burst/steady alarm identification, (3) inspect CCE cluster alarm health, (4) manage AOM alarm rules (query, create, update, delete, enable, disable), (5) check AOM action rules and mute rules for notification gaps, (6) create event alarm rules referencing CCE event lists or Prometheus metric alarms. Trigger: user mentions "alarm correlation", "告警关联", "AOM alarm", "AOM 告警", "alarm deduplication", "告警去重", "alarm storm", "告警风暴", "alarm inspection", "告警巡检", "alarm rules", "告警规则"
openclaw skills install huawei-cloud-cce-alarm-correlation-engine⚠️ Execution Method (Must Read): This skill executes queries via the local Python dispatcher script. Using hcloud, openstack, or other CLI tools or direct API calls is prohibited.
- The dispatcher script is located at
scripts/huawei-cloud.pywithin the skill directory- All scripts and environment check scripts are inside the skill package. You must use
skill action=execto execute them. Do not run them directly in a shell.- Do not attempt hcloud, openstack, curl IAM, or any other CLI/API methods. This skill does not depend on those tools.
- All paths are relative to the skill directory, which is the directory where this SKILL.md is located.
This skill correlates Huawei Cloud AOM active and historical alarms for CCE, transforming raw alarm event streams into actionable alarm leads. The core principle is to consider both active and history alarms, avoiding omission of resource-type alarms that have already recovered but still impact diagnosis.
This skill has both read-only tools (alarm query, analysis, inspection, rule query) and mutation tools (alarm rule create/update/delete, action rule delete). All mutation operations require a two-step confirmation workflow with confirm=true.
| Skill | Purpose |
|---|---|
huawei-cloud-cce-pod-failure-diagnoser | Pod-level failure diagnosis (CrashLoopBackOff, ImagePullBackOff, etc.) |
huawei-cloud-cce-node-failure-diagnoser | Node failure diagnosis (NotReady, resource pressure, NPD events) |
huawei-cloud-cce-network-failure-diagnoser | Network failure diagnosis (Ingress 502/504, ELB anomalies) |
huawei-cloud-cce-auto-remediation-runner | Execute remediation actions (scale, reboot, drain) |
huawei-cloud-cce-root-cause-analyzer | Multi-category alarm root cause analysis |
huawei-cloud-cce-observability-context-builder | Observability context enrichment |
huawei_list_aom_alarms)huawei_list_aom_current_alarms)huawei_analyze_aom_alarms)confirm=true)huawei_create_aom_event_alarm_rule)confirm=true)huawei_list_aom_mute_rules)huawei_aom_alarm_inspection)The dispatcher script requires Python >= 3.6 and the following packages:
huaweicloudsdkcorehuaweicloudsdkaomhuaweicloudsdkiam| Variable | Required | Description |
|---|---|---|
| HUAWEI_AK | Yes | Huawei Cloud Access Key |
| HUAWEI_SK | Yes | Huawei Cloud Secret Key |
| HUAWEI_REGION | No | Default region (overrides region param if set) |
| HUAWEI_PROJECT_ID | No | Project ID (auto-obtained via IAM API when not set) |
| HUAWEI_SECURITY_TOKEN | No | Required when using temporary AK/SK |
🚫 Never expose or log AK/SK values. Credentials exist only in the current request call stack and are released after each invocation. Do not write credentials to files, logs, or responses.
✅ Use environment variables HUAWEI_AK / HUAWEI_SK for authentication. The dispatcher reads them automatically.
Security rules for credentials:
AK/SK may be provided in two ways:
HUAWEI_AK / HUAWEI_SK (recommended)ak and sk (not recommended for production)| Permission | Description |
|---|---|
aom:event:list | Query AOM alarm events |
aom:alarmRule:list | Query AOM alarm rules |
aom:alarmRule:create | Create AOM alarm rules |
aom:alarmRule:update | Update AOM alarm rules |
aom:alarmRule:delete | Delete AOM alarm rules |
aom:actionRule:list | Query AOM action rules |
aom:muteRule:list | Query AOM mute rules |
cce:cluster:list | Get cluster name and auxiliary filter info via cluster ID |
All actions are invoked via the dispatcher script:
python3 scripts/huawei-cloud.py <action> region=<region> [key=value ...]
Mutation operations (create, update, delete, enable, disable alarm rules; delete action rules) require
confirm=trueto execute. Withoutconfirm, the tool returns a preview and confirmation prompt only.
Step 1: Preview — call without confirm:
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2
Returns: operation preview, target rule, rule fields, and confirmation example. No real creation is performed.
Step 2: Confirm execution — call again with confirm=true:
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2 \
confirm=true
| Tool | Operation | Risk Level | Description |
|---|---|---|---|
huawei_create_aom_alarm_rule | Create | 🟡 Medium | Create new AOM alarm rule, may introduce new alarm notifications |
huawei_create_aom_event_alarm_rule | Create | 🟡 Medium | Create AOM event alarm rule, may introduce new event notifications |
huawei_update_aom_alarm_rule | Update | 🟠 High | Update AOM alarm rule threshold, toggle, notification action, description, etc. |
huawei_delete_aom_alarm_rule | Delete | 🔴 High | Delete AOM alarm rule, may prevent future alarms from triggering |
huawei_disable_aom_alarm_rule | Disable | 🔴 High | Disable AOM alarm rule, may stop related alarms from triggering |
huawei_enable_aom_alarm_rule | Enable | 🟠 High | Enable AOM alarm rule, may restore and trigger alarm notifications |
huawei_delete_aom_action_rule | Delete | 🔴 High | Delete AOM notification action rule, may prevent alarm notifications |
| Action | Description |
|---|---|
| Create/update action rules | Do not create or update notification action rules |
| Modify mute rules | Do not create, update, or delete mute rules |
| Execute remediation actions | Do not scale, reboot, drain, or delete workloads or nodes |
| Modify cluster resources | Do not change CCE, ECS, ELB, EIP, VPC, security groups, etc. |
If analysis results require scaling, rebooting, draining, vulnerability status changes, or other remediation actions, output recommendations only and hand off to huawei-cloud-cce-auto-remediation-runner for preview, confirmation, and post-verification.
| Action | Description | Cluster Filter | Required Params |
|---|---|---|---|
huawei_list_aom_alarms | Query active + history alarms, merged and deduplicated | Supports cluster_id | region |
huawei_list_aom_current_alarms | Query current active alarms only | Supports cluster_id | region |
huawei_analyze_aom_alarms | Analyze alarms: deduplication, severity grouping, burst/steady identification | Supports cluster_id | region |
# Query active + history alarms in a region
python3 scripts/huawei-cloud.py huawei_list_aom_alarms region=cn-north-4
# Query alarms for a specific cluster
python3 scripts/huawei-cloud.py huawei_list_aom_alarms \
region=cn-north-4 cluster_id=xxx
# Query current active alarms for a cluster
python3 scripts/huawei-cloud.py huawei_list_aom_current_alarms \
region=cn-north-4 cluster_id=xxx
# Analyze alarms for a cluster (burst, attention, steady groups)
python3 scripts/huawei-cloud.py huawei_analyze_aom_alarms \
region=cn-north-4 cluster_id=xxx
confirm=true)Event alarm rule constraints:
huawei_create_aom_event_alarm_rule, the event_name must reference the event list and naming format in references/cce-event-list.md (recommended format: Chinese event description##Event name).route_group_enable=true).Metric alarm rule constraints:
huawei_create_aom_alarm_rule, PromQL/metric thresholds should reference references/cce-prometheus-metric-alarms.md.| Action | Description | Risk Level | Requires confirm | Required Params |
|---|---|---|---|---|
huawei_list_aom_alarm_rules | Query AOM alarm rules | 🟢 Low | No | region |
huawei_create_aom_alarm_rule | Create AOM metric alarm rule | 🟡 Medium | Yes | region, rule_name, metric_name, namespace, comparison_operator, threshold, period, evaluation_periods, statistic, alarm_level |
huawei_create_aom_event_alarm_rule | Create AOM event alarm rule | 🟡 Medium | Yes | region, rule_name, event_name, namespace |
huawei_update_aom_alarm_rule | Update AOM alarm rule | 🟠 High | Yes | region, rule_name |
huawei_delete_aom_alarm_rule | Delete AOM alarm rule | 🔴 High | Yes | region, rule_name |
huawei_disable_aom_alarm_rule | Disable AOM alarm rule | 🔴 High | Yes | region, rule_id |
huawei_enable_aom_alarm_rule | Enable AOM alarm rule | 🟠 High | Yes | region, rule_id |
huawei_list_aom_action_rules | Query AOM action/notification rules | 🟢 Low | No | region |
huawei_delete_aom_action_rule | Delete AOM notification action rule | 🔴 High | Yes | region, rule_name |
huawei_list_aom_mute_rules | Query AOM mute rules | 🟢 Low | No | region |
# Query alarm rules
python3 scripts/huawei-cloud.py huawei_list_aom_alarm_rules region=cn-north-4
# Preview create alarm rule (no execution)
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2
# Confirm create alarm rule
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2 \
confirm=true
# Preview update alarm rule
python3 scripts/huawei-cloud.py huawei_update_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule threshold=80
# Confirm update alarm rule
python3 scripts/huawei-cloud.py huawei_update_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule threshold=80 confirm=true
# Preview delete alarm rule
python3 scripts/huawei-cloud.py huawei_delete_aom_alarm_rule \
region=cn-north-4 rule_name=xxx
# Confirm delete alarm rule
python3 scripts/huawei-cloud.py huawei_delete_aom_alarm_rule \
region=cn-north-4 rule_name=xxx confirm=true
# Preview disable alarm rule
python3 scripts/huawei-cloud.py huawei_disable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx
# Confirm disable alarm rule
python3 scripts/huawei-cloud.py huawei_disable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx confirm=true
# Preview enable alarm rule
python3 scripts/huawei-cloud.py huawei_enable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx
# Confirm enable alarm rule
python3 scripts/huawei-cloud.py huawei_enable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx confirm=true
# Query action rules
python3 scripts/huawei-cloud.py huawei_list_aom_action_rules region=cn-north-4
# Preview delete action rule
python3 scripts/huawei-cloud.py huawei_delete_aom_action_rule \
region=cn-north-4 rule_name=xxx
# Confirm delete action rule
python3 scripts/huawei-cloud.py huawei_delete_aom_action_rule \
region=cn-north-4 rule_name=xxx confirm=true
# Query mute rules
python3 scripts/huawei-cloud.py huawei_list_aom_mute_rules region=cn-north-4
| Action | Description | Required Params |
|---|---|---|
huawei_aom_alarm_inspection | Inspect AOM alarms for a CCE cluster and output risk items | region, cluster_id |
# Inspect alarms for a specific cluster
python3 scripts/huawei-cloud.py huawei_aom_alarm_inspection \
region=cn-north-4 cluster_id=xxx
| Parameter | Required | Description |
|---|---|---|
region | Yes | Huawei Cloud region (e.g., cn-north-4) |
cluster_id | No | CCE cluster ID; when provided, only alarms related to this cluster are returned |
ak | No | Access Key ID; HUAWEI_AK environment variable preferred |
sk | No | Secret Access Key; HUAWEI_SK environment variable preferred |
project_id | No | Huawei Cloud project ID; auto-obtained via IAM API when not provided |
| Parameter | Required | Description |
|---|---|---|
region | Yes | Huawei Cloud region |
rule_name | Yes (create, update, delete) | Alarm rule name |
rule_id | Yes (enable, disable) | Alarm rule ID |
metric_name | Yes (create metric rule) | Metric name (e.g., cpuUsage) |
namespace | Yes (create) | Metric namespace (e.g., PAAS.NODE) |
event_name | Yes (create event rule) | Event name; reference references/cce-event-list.md for naming format |
comparison_operator | Yes (create metric rule) | Threshold comparison operator (e.g., >, <, >=, <=) |
threshold | Yes (create metric rule) | Alarm threshold value |
period | Yes (create metric rule) | Statistics period in seconds (recommended: 60) |
evaluation_periods | Yes (create metric rule) | Consecutive trigger period count (recommended: 3) |
statistic | Yes (create metric rule) | Statistics method (e.g., average, max, min) |
alarm_level | Yes (create) | Alarm severity level (1=Critical, 2=Major, 3=Minor, 4=Info) |
fields | No (create) | Additional JSON fields for rule creation, e.g., {"unit":"%","is_turn_on":true} |
updates | No (update) | JSON batch update fields, e.g., {"threshold":"80","is_turn_on":true} |
enterprise_project_id | No (list action rules) | Enterprise project scope; default all_granted_eps |
confirm | No | Must be explicitly set to true for mutation operations to execute |
ak | No | Access Key ID |
sk | No | Secret Access Key |
project_id | No | Huawei Cloud project ID |
Output must include:
| Field | Description |
|---|---|
region | Queried region |
cluster_id | Cluster ID (if specified by user) |
total_count | Total alarm count |
firing_count | Currently firing alarm count |
resolved_count | Resolved alarm count |
severity_stats | Severity level distribution |
type_stats | Alarm type grouping statistics |
For CCE alarms, prioritize the following resource dimensions:
| Field | Description |
|---|---|
cluster_name | Cluster name |
namespace | Kubernetes namespace |
pod_name | Pod name |
resource_kind | Resource type |
event_name | Alarm name |
message | Alarm message |
| Alarm Characteristics | Recommended Skill |
|---|---|
CrashLoopBackOff, BackOffStart, FailedStart, ImagePullBackOff | huawei-cloud-cce-pod-failure-diagnoser |
FailedScheduling, Insufficient cpu, Insufficient memory | huawei-cloud-cce-pod-failure-diagnoser or huawei-cloud-cce-node-failure-diagnoser |
NodeNotReady, node resource pressure, NPD events | huawei-cloud-cce-node-failure-diagnoser |
| Ingress 502/504, Service unreachable, ELB anomalies | huawei-cloud-cce-network-failure-diagnoser |
| Multiple alarm categories impacting business simultaneously | huawei-cloud-cce-root-cause-analyzer |
| Scaling, reboot, drain, or other remediation actions needed | huawei-cloud-cce-auto-remediation-runner |
See references/output-schema.md for the full JSON response schema.
python3 scripts/huawei-cloud.py huawei_list_aom_alarms region=cn-north-4 cluster_id=<cluster_id>
huawei_analyze_aom_alarms and verify that burst, attention, and steady groupings are returned correctlyhuawei_aom_alarm_inspection returns cluster alarm summary with risk itemshuawei_create_aom_alarm_rule without confirm and verify it returns a preview only (no actual creation)confirm=true, call huawei_list_aom_alarm_rules to verify the rule state changehuawei_list_aom_alarms; never assume absence of active alarms means no problems — check history alarmshuawei_analyze_aom_alarms for alarm storm scenarios to deduplicate and identify burst vs. steady alarmsconfirm), then confirm (with confirm=true) only after explicit user approvalevent_name follows the format in references/cce-event-list.mdhuawei_list_aom_action_rules) and mute rules (huawei_list_aom_mute_rules)references/cce-prometheus-metric-alarms.md for recommended PromQL expressions and thresholds| Document | Description |
|---|---|
references/workflow.md | Alarm correlation workflow: collection, deduplication, grouping, and diagnosis handoff |
references/output-schema.md | Output JSON schema for alarm correlation and inspection results |
references/risk-rules.md | Risk boundary rules: read-only vs. mutation actions, prohibited operations |
references/cce-event-list.md | CCE event list with naming format for creating event alarm rules |
references/cce-prometheus-metric-alarms.md | Prometheus metric alarm reference for creating metric alarm rules |
| Huawei Cloud Python SDK Documentation | SDK reference |
| Huawei Cloud API Explorer | API interactive explorer |
confirm=true two-step confirmationhuawei-cloud-cce-auto-remediation-runnerpython3 scripts/huawei-cloud.py <action>; do not use hcloud CLI or direct API calls| Pitfall | Correct Approach |
|---|---|
| Only querying active alarms and ignoring history | Always use huawei_list_aom_alarms which merges active + history; history alarms may indicate recurring resource issues |
| Calling mutation tools without preview step | Always call without confirm first to preview; only add confirm=true after explicit user approval |
Creating event alarm rules with incorrect event_name format | Reference references/cce-event-list.md and use Chinese description##EventName format |
| Creating metric alarm rules with arbitrary thresholds | Reference references/cce-prometheus-metric-alarms.md for recommended PromQL and threshold values |
| Deleting action rules without understanding notification impact | Preview first; verify which alarms depend on the action rule before confirming deletion |
| Executing remediation actions directly from this skill | This skill does not perform remediation; hand off to huawei-cloud-cce-auto-remediation-runner |
| Assuming "no active alarms" means "no problems" | Check history alarms — resolved alarms may indicate ongoing resource issues that flare periodically |
| Not checking mute rules when alarms are missing from notifications | Always query mute rules (huawei_list_aom_mute_rules) alongside action rules to identify suppression |