Huawei Cloud Cce Kubernetes Event Analyzer

Data & APIs

Query and analyze Kubernetes events in Huawei Cloud CCE clusters using API or LTS logs, apply filters, identify patterns, and hand off to diagnosis skills.

Install

openclaw skills install huawei-cloud-cce-kubernetes-event-analyzer

Kubernetes Event Analyzer

Overview

Analyze Kubernetes events in Huawei Cloud CCE clusters to find warning events, anomalies, and failure patterns. Queries events via K8s API or LTS log streams, applies client-side filtering, groups patterns, and hands off to diagnosis skills for remediation.

Architecture: MCP Tool → CCE K8s API / LTS Log Streams → Events → Client-side Filter & Group → Pattern Summary → Diagnosis Handoff

Standard workflow:

1. Identify region, cluster_id, and optional namespace from user query
2. Fetch events using huawei_get_cce_events (K8s API) or huawei_query_k8s_events_from_lts (LTS)
3. Apply client-side filters (type, reason, involved_object, time window)
4. Group and aggregate by reason, namespace, or pattern
5. Summarize top reasons, repeated patterns, and affected resources
6. Hand off to diagnosis skill if specific failures identified

Related Skills (handoff targets):

  • Pod failures -> huawei-cloud-cce-pod-failure-diagnoser
  • Workload rollout issues -> huawei-cloud-cce-workload-failure-diagnoser
  • Node issues -> huawei-cloud-cce-node-failure-diagnoser
  • Storage issues -> huawei-cloud-cce-storage-failure-diagnoser
  • Service/Network issues -> huawei-cloud-cce-network-failure-diagnoser
  • Action requested -> huawei-cloud-cce-auto-remediation-runner

Prerequisites

1. Python Dependencies

  • Python 3.8+ with huaweicloudsdkcce, huaweicloudsdkcore, kubernetes packages
  • Run environment check before first use (see Verification section)

2. Credential Configuration

  • Valid Huawei Cloud credentials (AK/SK mode)
  • Security Rules:
    • 🚫 Never expose AK/SK values in code, conversation, or commands
    • 🚫 Never use echo $HUAWEI_AK or echo $HUAWEI_SK to check credentials
    • ✅ Use environment variables: HUAWEI_AK, HUAWEI_SK, HUAWEI_REGION
    • ✅ Prefer IAM users over root account for cloud operations
    • ✅ Enable MFA for sensitive operations

Configuration Method (Environment Variables Only):

export HUAWEI_AK=<your-ak>
export HUAWEI_SK=<your-sk>
export HUAWEI_REGION=cn-north-4

⚠️ Important Security Notes:

  • Never commit credentials to version control
  • Use IAM users with minimal required permissions
  • Enable MFA for sensitive operations
  • Rotate AK/SK regularly

3. IAM Permission Requirements

API ActionPermissionPurpose
cce:cluster:getGet clusterView CCE cluster details
cce:cluster:createCertCreate certificateObtain kubeconfig for kubectl access
cce:node:listList nodesQuery CCE cluster nodes
lte:logStream:listList LTS log streamsDiscover LTS log streams for event queries
lte:logs:searchSearch LTS logsQuery K8s events from LTS log streams

Permission Failure Handling:

  1. When any command fails due to IAM permission errors, display the required permission list
  2. Guide the user to create a custom policy in the IAM console and grant authorization
  3. Pause execution and wait for user confirmation that permissions have been granted

Security Constraints

Read-Only Skill

This skill is strictly read-only. It only queries Kubernetes events and lists related resources. No modifications are made to the cluster.

  • No write operations: Never modify, delete, or create any Kubernetes resources
  • Redact sensitive data: Do not expose node names, pod names, or workload names that could identify production systems. Use redacted or fictional examples in summaries
  • Hand off remediation: If event analysis reveals a clear remediation path, provide evidence and hand off to the appropriate diagnosis or remediation skill instead of executing recovery actions here
  • Time-bounded queries: Keep event queries time-bounded. Prefer recent windows (1-24 hours) to avoid overwhelming results
  • Redirect action requests: If the user asks to take action based on event findings, redirect to huawei-cloud-cce-auto-remediation-runner with the evidence summarized

Tools

ToolPurposeRequired ParametersOptional Parameters
huawei_get_cce_eventsQuery CCE Kubernetes events via K8s API Serverregion, cluster_idnamespace, limit
huawei_query_k8s_events_from_ltsQuery K8s events from LTS log streams (Event→LTS LogConfig required)region, cluster_id, start_time, end_timekeywords

Scenario Routing

User IntentReference Document
Full event query workflow (5-step)references/workflow.md
Event pattern recognition tablereferences/workflow.md
Time-window analysis guidancereferences/workflow.md
Risk constraints & guardrailsreferences/risk-rules.md
Output schema (query & analysis)references/output-schema.md

Core Commands

Step 1: Query Events via K8s API (huawei_get_cce_events)

Fetches raw Kubernetes events from the cluster API Server. All filtering beyond namespace and limit is done client-side after fetching.

# Query all events in a cluster
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id>

# Query events in a specific namespace
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  namespace=default

# Limit event count
python3 scripts/huawei-cloud.py huawei_get_cce_events \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  limit=100

Supported API filters: namespace, limit (default 500)

Unsupported filters (apply client-side): event_type, reason, involved_object_kind, involved_object_name, hours, start_time, end_time

Step 2: Query Events via LTS (huawei_query_k8s_events_from_lts)

Queries K8s events collected to LTS via Event→LTS LogConfig. Requires a LogConfig with event collection enabled and pointing to LTS output.

# Query events from LTS in a time window
python3 scripts/huawei-cloud.py huawei_query_k8s_events_from_lts \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  start_time="2026-05-30 06:00:00" \
  end_time="2026-05-30 08:00:00"

# Query with keyword filter
python3 scripts/huawei-cloud.py huawei_query_k8s_events_from_lts \
  region=cn-north-4 \
  cluster_id=<cluster-id> \
  start_time="2026-05-30 00:00:00" \
  end_time="2026-05-30 23:59:59" \
  keywords=FailedScheduling

LTS time format: YYYY-MM-DD HH:MM:SS

Fallback: If no Event→LTS LogConfig is found with events enabled, returns an error. Use huawei_get_cce_events instead.

Step 3: Apply Client-Side Filters

After fetching events, apply filters based on user needs:

  • type == "Warning" — warning events only
  • reason — specific patterns (FailedScheduling, ImagePullBackOff, FailedMount, etc.)
  • involved_object.kind + involved_object.name — specific resources
  • namespace — namespace-specific analysis
  • first_timestamp / last_timestamp — time-window analysis

Step 4: Group and Aggregate

  • Group by reason to find top event patterns
  • Group by namespace to find high-noise namespaces
  • Flag events with count > 1 as repeated patterns
  • Calculate warning_count vs normal_count for quick health signal

Step 5: Summarize and Hand Off

Summarize findings with counts, timestamps, and affected objects. If events point to specific failures, hand off to the appropriate diagnosis skill with evidence.

Parameter Reference

Common Parameters

ParameterRequired/OptionalDescriptionDefault
regionRequiredHuawei Cloud regionHUAWEI_REGION
cluster_idRequiredCCE cluster IDN/A
namespaceOptionalKubernetes namespace filterN/A (all namespaces)
akOptionalOverride AKHUAWEI_AK
skOptionalOverride SKHUAWEI_SK
project_idOptionalProject IDAuto from IAM

huawei_get_cce_events Parameters

ParameterRequiredDescriptionDefault
regionYesHuawei Cloud regionHUAWEI_REGION
cluster_idYesCCE cluster IDN/A
namespaceNoKubernetes namespace filterN/A (all namespaces)
limitNoMaximum number of events to return500

huawei_query_k8s_events_from_lts Parameters

ParameterRequiredDescriptionDefault
regionYesHuawei Cloud regionHUAWEI_REGION
cluster_idYesCCE cluster IDN/A
start_timeYesQuery start time (YYYY-MM-DD HH:MM:SS)N/A
end_timeYesQuery end time (YYYY-MM-DD HH:MM:SS)N/A
keywordsNoKeyword filter for LTS searchN/A

Event Pattern Quick Reference

PatternLikely CauseHandoff Target
ImagePullBackOff repeatedWrong image or pull secret missinghuawei-cloud-cce-pod-failure-diagnoser
FailedScheduling + insufficientResource pressure or node not readyhuawei-cloud-cce-workload-failure-diagnoser
FailedMountVolume attach or PVC issuehuawei-cloud-cce-storage-failure-diagnoser
Evicted podsBudget disruption or node pressurehuawei-cloud-cce-pod-failure-diagnoser
NodeNotReadyNode agent or network issuehuawei-cloud-cce-node-failure-diagnoser
Unhealthy + Readiness probeApplication issue or startup failurehuawei-cloud-cce-pod-failure-diagnoser
FailedCreatePodSandBoxCNI or network issuehuawei-cloud-cce-network-failure-diagnoser
OOMKilledMemory limit exceededhuawei-cloud-cce-pod-failure-diagnoser

Output Format

From huawei_get_cce_events (K8s API)

FieldDescription
regionHuawei Cloud region
cluster_idCCE cluster ID
namespaceKubernetes namespace filter (if applied)
total_fetchedNumber of events returned by the API
eventsRaw event list (apply filters client-side)
warning_countNumber of Warning events (calculated)
top_reasonsTop event reasons with counts (calculated)
repeated_patternsEvents with count > 1 grouped by reason
namespace_breakdownEvent counts by namespace
next_stepsSuggested follow-up query or diagnosis skill

From huawei_query_k8s_events_from_lts (LTS)

FieldDescription
regionHuawei Cloud region
cluster_idCCE cluster ID
log_group_idLTS log group ID
log_stream_idLTS log stream ID
keywordsKeywords used for filtering
event_countNumber of events returned
eventsParsed event list with normalized structure
time_rangeStart/end time of the query
log_configLogConfig info (name, events enabled, etc.)

Verification

  1. Run environment check script
  2. Query CCE events with huawei_get_cce_events
  3. Verify event filtering and pattern grouping
  4. Confirm handoff to diagnosis skills works correctly

Best Practices

  1. Start with K8s API — use huawei_get_cce_events for quick queries; fall back to LTS only when time-range precision is needed
  2. Filter Warning first — Warning events are the primary signal; filter type == "Warning" before deep analysis
  3. Group by reason — event reason grouping reveals systemic issues faster than per-event analysis
  4. Time-bound queries — prefer recent windows (1-24 hours) to avoid overwhelming results
  5. Hand off, don't remediate — this skill is read-only; always hand off to diagnosis skills with evidence
  6. Redact sensitive names — use generic labels in summaries; do not expose production pod/node/workload names

Common Pitfalls

PitfallSymptomQuick Fix
Missing cluster_idAction fails immediatelyProvide cluster_id from huawei_get_cce_clusters
No Event→LTS LogConfig configuredLTS query returns errorUse huawei_get_cce_events (K8s API) instead
Unbounded time queryOverwhelming results, slow responseAlways provide start_time/end_time or limit scope to 1-24 hours
LTS time format mismatchLTS query fails or returns no resultsUse exact format YYYY-MM-DD HH:MM:SS for start_time and end_time
Large namespace scan with no filterToo many events, hard to analyzeNarrow with namespace or client-side filters (type, reason)
Permission denied on kubeconfigCannot access clusterVerify cce:cluster:createCert IAM permission
LTS permission deniedCannot query LTS log streamsVerify lte:logStream:list and lte:logs:search IAM permissions
Ignoring count > 1 eventsMissing systemic patternsAlways group by reason and flag repeated events first

Notes

  • This skill does not modify, delete, or create any Kubernetes or LTS resources — all actions are read-only
  • Event summaries must redact production pod/node/workload names; use generic labels in public outputs
  • AK/SK must never be hardcoded — use environment variables only
  • The Python dispatcher script (scripts/huawei-cloud.py) is the only execution method — do not use hcloud CLI or direct API calls for event queries
  • For LTS queries, the cluster must have an Event→LTS LogConfig configured; otherwise fall back to K8s API
  • Hand off remediation requests to huawei-cloud-cce-auto-remediation-runner with evidence summarized — this skill never executes recovery actions
  • All event data is point-in-time — K8s API events have a retention limit; use LTS for historical analysis beyond the API retention window

Reference Documents

DocumentDescription
workflow.mdFull event query workflow, pattern recognition, time-window analysis, aggregation guidance, LTS vs K8s API selection guide
risk-rules.mdRead-only constraints, data redaction rules, handoff policies, guardrails
output-schema.mdEvent query summary, analysis summary, and per-event detail schema