Huawei Cloud Cce Kubernetes Event Analyzer

Data & APIs

Query and analyze Kubernetes events in Huawei Cloud CCE clusters using API or LTS logs, apply filters, identify patterns, and hand off to diagnosis skills.

Install

openclaw skills install huawei-cloud-cce-kubernetes-event-analyzer

Kubernetes Event Analyzer

Overview

Analyze Kubernetes events in Huawei Cloud CCE clusters to find warning events, anomalies, and failure patterns. Queries events via K8s API or LTS log streams, applies client-side filtering, groups patterns, and hands off to diagnosis skills for remediation.

Architecture: MCP Tool → CCE K8s API / LTS Log Streams → Events → Client-side Filter & Group → Pattern Summary → Diagnosis Handoff

Standard workflow:

1. Identify region, cluster_id, and optional namespace from user query
2. Fetch events using huawei_get_cce_events (K8s API) or huawei_query_k8s_events_from_lts (LTS)
3. Apply client-side filters (type, reason, involved_object, time window)
4. Group and aggregate by reason, namespace, or pattern
5. Summarize top reasons, repeated patterns, and affected resources
6. Hand off to diagnosis skill if specific failures identified

Related Skills (handoff targets):

Pod failures -> huawei-cloud-cce-pod-failure-diagnoser
Workload rollout issues -> huawei-cloud-cce-workload-failure-diagnoser
Node issues -> huawei-cloud-cce-node-failure-diagnoser
Storage issues -> huawei-cloud-cce-storage-failure-diagnoser
Service/Network issues -> huawei-cloud-cce-network-failure-diagnoser
Action requested -> huawei-cloud-cce-auto-remediation-runner

Prerequisites

1. Python Dependencies

Python 3.8+ with huaweicloudsdkcce, huaweicloudsdkcore, kubernetes packages
Run environment check before first use (see Verification section)

2. Credential Configuration

Valid Huawei Cloud credentials (AK/SK mode)
Security Rules:
- 🚫 Never expose AK/SK values in code, conversation, or commands
- 🚫 Never use echo $HUAWEI_AK or echo $HUAWEI_SK to check credentials
- ✅ Use environment variables: HUAWEI_AK, HUAWEI_SK, HUAWEI_REGION
- ✅ Prefer IAM users over root account for cloud operations
- ✅ Enable MFA for sensitive operations

Configuration Method (Environment Variables Only):

export HUAWEI_AK=<your-ak>
export HUAWEI_SK=<your-sk>
export HUAWEI_REGION=cn-north-4

⚠️ Important Security Notes:

Never commit credentials to version control
Use IAM users with minimal required permissions
Enable MFA for sensitive operations
Rotate AK/SK regularly

3. IAM Permission Requirements

API Action	Permission	Purpose
`cce:cluster:get`	Get cluster	View CCE cluster details
`cce:cluster:createCert`	Create certificate	Obtain kubeconfig for kubectl access
`cce:node:list`	List nodes	Query CCE cluster nodes
`lte:logStream:list`	List LTS log streams	Discover LTS log streams for event queries
`lte:logs:search`	Search LTS logs	Query K8s events from LTS log streams

Permission Failure Handling:

When any command fails due to IAM permission errors, display the required permission list
Guide the user to create a custom policy in the IAM console and grant authorization
Pause execution and wait for user confirmation that permissions have been granted

Security Constraints

Read-Only Skill

This skill is strictly read-only. It only queries Kubernetes events and lists related resources. No modifications are made to the cluster.

No write operations: Never modify, delete, or create any Kubernetes resources
Redact sensitive data: Do not expose node names, pod names, or workload names that could identify production systems. Use redacted or fictional examples in summaries
Hand off remediation: If event analysis reveals a clear remediation path, provide evidence and hand off to the appropriate diagnosis or remediation skill instead of executing recovery actions here
Time-bounded queries: Keep event queries time-bounded. Prefer recent windows (1-24 hours) to avoid overwhelming results
Redirect action requests: If the user asks to take action based on event findings, redirect to huawei-cloud-cce-auto-remediation-runner with the evidence summarized

Tools

Tool	Purpose	Required Parameters	Optional Parameters
`huawei_get_cce_events`	Query CCE Kubernetes events via K8s API Server	`region`, `cluster_id`	`namespace`, `limit`
`huawei_query_k8s_events_from_lts`	Query K8s events from LTS log streams (Event→LTS LogConfig required)	`region`, `cluster_id`, `start_time`, `end_time`	`keywords`

Scenario Routing

User Intent	Reference Document
Full event query workflow (5-step)	references/workflow.md
Event pattern recognition table	references/workflow.md
Time-window analysis guidance	references/workflow.md
Risk constraints & guardrails	references/risk-rules.md
Output schema (query & analysis)	references/output-schema.md

Core Commands

Step 1: Query Events via K8s API (huawei_get_cce_events)

Fetches raw Kubernetes events from the cluster API Server. All filtering beyond namespace and limit is done client-side after fetching.

# Query all events in a cluster
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id>

# Query events in a specific namespace
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  namespace=default

# Limit event count
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  limit=100

Supported API filters: namespace, limit (default 500)

Unsupported filters (apply client-side): event_type, reason, involved_object_kind, involved_object_name, hours, start_time, end_time

Step 2: Query Events via LTS (huawei_query_k8s_events_from_lts)

Queries K8s events collected to LTS via Event→LTS LogConfig. Requires a LogConfig with event collection enabled and pointing to LTS output.

# Query events from LTS in a time window
python3 scripts/huawei-cloud.py huawei_query_k8s_events_from_lts \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  start_time="2026-05-30 06:00:00" \
  end_time="2026-05-30 08:00:00"

# Query with keyword filter
python3 scripts/huawei-cloud.py huawei_query_k8s_events_from_lts \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  start_time="2026-05-30 00:00:00" \
  end_time="2026-05-30 23:59:59" \
  keywords=FailedScheduling

LTS time format: YYYY-MM-DD HH:MM:SS

Fallback: If no Event→LTS LogConfig is found with events enabled, returns an error. Use huawei_get_cce_events instead.

Step 3: Apply Client-Side Filters

After fetching events, apply filters based on user needs:

type == "Warning" — warning events only
reason — specific patterns (FailedScheduling, ImagePullBackOff, FailedMount, etc.)
involved_object.kind + involved_object.name — specific resources
namespace — namespace-specific analysis
first_timestamp / last_timestamp — time-window analysis

Step 4: Group and Aggregate

Group by reason to find top event patterns
Group by namespace to find high-noise namespaces
Flag events with count > 1 as repeated patterns
Calculate warning_count vs normal_count for quick health signal

Step 5: Summarize and Hand Off

Summarize findings with counts, timestamps, and affected objects. If events point to specific failures, hand off to the appropriate diagnosis skill with evidence.

Parameter Reference

Common Parameters

Parameter	Required/Optional	Description	Default
`region`	Required	Huawei Cloud region	`HUAWEI_REGION`
`cluster_id`	Required	CCE cluster ID	N/A
`namespace`	Optional	Kubernetes namespace filter	N/A (all namespaces)
`ak`	Optional	Override AK	`HUAWEI_AK`
`sk`	Optional	Override SK	`HUAWEI_SK`
`project_id`	Optional	Project ID	Auto from IAM

`huawei_get_cce_events` Parameters

Parameter	Required	Description	Default
`region`	Yes	Huawei Cloud region	`HUAWEI_REGION`
`cluster_id`	Yes	CCE cluster ID	N/A
`namespace`	No	Kubernetes namespace filter	N/A (all namespaces)
`limit`	No	Maximum number of events to return	500

`huawei_query_k8s_events_from_lts` Parameters

Parameter	Required	Description	Default
`region`	Yes	Huawei Cloud region	`HUAWEI_REGION`
`cluster_id`	Yes	CCE cluster ID	N/A
`start_time`	Yes	Query start time (YYYY-MM-DD HH:MM:SS)	N/A
`end_time`	Yes	Query end time (YYYY-MM-DD HH:MM:SS)	N/A
`keywords`	No	Keyword filter for LTS search	N/A

Event Pattern Quick Reference

Pattern	Likely Cause	Handoff Target
`ImagePullBackOff` repeated	Wrong image or pull secret missing	`huawei-cloud-cce-pod-failure-diagnoser`
`FailedScheduling` + `insufficient`	Resource pressure or node not ready	`huawei-cloud-cce-workload-failure-diagnoser`
`FailedMount`	Volume attach or PVC issue	`huawei-cloud-cce-storage-failure-diagnoser`
`Evicted` pods	Budget disruption or node pressure	`huawei-cloud-cce-pod-failure-diagnoser`
`NodeNotReady`	Node agent or network issue	`huawei-cloud-cce-node-failure-diagnoser`
`Unhealthy` + Readiness probe	Application issue or startup failure	`huawei-cloud-cce-pod-failure-diagnoser`
`FailedCreatePodSandBox`	CNI or network issue	`huawei-cloud-cce-network-failure-diagnoser`
`OOMKilled`	Memory limit exceeded	`huawei-cloud-cce-pod-failure-diagnoser`

Output Format

From huawei_get_cce_events (K8s API)

Field	Description
`region`	Huawei Cloud region
`cluster_id`	CCE cluster ID
`namespace`	Kubernetes namespace filter (if applied)
`total_fetched`	Number of events returned by the API
`events`	Raw event list (apply filters client-side)
`warning_count`	Number of Warning events (calculated)
`top_reasons`	Top event reasons with counts (calculated)
`repeated_patterns`	Events with count > 1 grouped by reason
`namespace_breakdown`	Event counts by namespace
`next_steps`	Suggested follow-up query or diagnosis skill

From huawei_query_k8s_events_from_lts (LTS)

Field	Description
`region`	Huawei Cloud region
`cluster_id`	CCE cluster ID
`log_group_id`	LTS log group ID
`log_stream_id`	LTS log stream ID
`keywords`	Keywords used for filtering
`event_count`	Number of events returned
`events`	Parsed event list with normalized structure
`time_range`	Start/end time of the query
`log_config`	LogConfig info (name, events enabled, etc.)

Verification

Run environment check script
Query CCE events with huawei_get_cce_events
Verify event filtering and pattern grouping
Confirm handoff to diagnosis skills works correctly

Best Practices

Start with K8s API — use huawei_get_cce_events for quick queries; fall back to LTS only when time-range precision is needed
Filter Warning first — Warning events are the primary signal; filter type == "Warning" before deep analysis
Group by reason — event reason grouping reveals systemic issues faster than per-event analysis
Time-bound queries — prefer recent windows (1-24 hours) to avoid overwhelming results
Hand off, don't remediate — this skill is read-only; always hand off to diagnosis skills with evidence
Redact sensitive names — use generic labels in summaries; do not expose production pod/node/workload names

Common Pitfalls

Pitfall	Symptom	Quick Fix
Missing `cluster_id`	Action fails immediately	Provide `cluster_id` from `huawei_get_cce_clusters`
No Event→LTS LogConfig configured	LTS query returns error	Use `huawei_get_cce_events` (K8s API) instead
Unbounded time query	Overwhelming results, slow response	Always provide `start_time`/`end_time` or limit scope to 1-24 hours
LTS time format mismatch	LTS query fails or returns no results	Use exact format `YYYY-MM-DD HH:MM:SS` for `start_time` and `end_time`
Large namespace scan with no filter	Too many events, hard to analyze	Narrow with `namespace` or client-side filters (`type`, `reason`)
Permission denied on kubeconfig	Cannot access cluster	Verify `cce:cluster:createCert` IAM permission
LTS permission denied	Cannot query LTS log streams	Verify `lte:logStream:list` and `lte:logs:search` IAM permissions
Ignoring `count > 1` events	Missing systemic patterns	Always group by `reason` and flag repeated events first

Notes

This skill does not modify, delete, or create any Kubernetes or LTS resources — all actions are read-only
Event summaries must redact production pod/node/workload names; use generic labels in public outputs
AK/SK must never be hardcoded — use environment variables only
The Python dispatcher script (scripts/huawei-cloud.py) is the only execution method — do not use hcloud CLI or direct API calls for event queries
For LTS queries, the cluster must have an Event→LTS LogConfig configured; otherwise fall back to K8s API
Hand off remediation requests to huawei-cloud-cce-auto-remediation-runner with evidence summarized — this skill never executes recovery actions
All event data is point-in-time — K8s API events have a retention limit; use LTS for historical analysis beyond the API retention window

Reference Documents

Document	Description
workflow.md	Full event query workflow, pattern recognition, time-window analysis, aggregation guidance, LTS vs K8s API selection guide
risk-rules.md	Read-only constraints, data redaction rules, handoff policies, guardrails
output-schema.md	Event query summary, analysis summary, and per-event detail schema