Install
openclaw skills install @zw008/k8s-aiopsUse this skill whenever the user needs to operate a Kubernetes cluster — list/inspect pods, deployments, services, nodes, namespaces, and events, read pod logs, scale and rollout-restart deployments, delete pods/deployments, and cordon/uncordon nodes. Works with any kubeconfig-reachable cluster (standard Kubernetes, k3s, EKS, GKE, AKS). Always use this skill for "list k8s pods", "scale deployment", "kubernetes pod logs", "cordon node", "restart deployment", "k3s", or "kubectl"-style tasks when the context is explicitly Kubernetes / a cluster. Do NOT use when the target is not a Kubernetes cluster (hypervisor VM lifecycle, backup products, or cloud-provider consoles are out of scope). Preview — common Kubernetes operations with a built-in governance harness (audit, policy, token budget, undo, risk-tiers).
openclaw skills install @zw008/k8s-aiopsDisclaimer: This is a community-maintained open-source project and is not affiliated with, endorsed by, or sponsored by the Cloud Native Computing Foundation, the Kubernetes project, or k3s/Rancher. "Kubernetes" and "k3s" are trademarks of their respective owners. Source code is publicly auditable at github.com/AIops-tools/K8s-AIops under the MIT license.
Governed Kubernetes operations — 15 MCP tools, every one wrapped with the bundled @governed_tool harness: a local unified audit log under ~/.k8s-aiops/, policy engine, token/runaway budget guard, undo-token recording, and graduated-autonomy risk tiers. Works with any kubeconfig-reachable cluster (standard Kubernetes, k3s, EKS, GKE, AKS).
Standalone: the governance harness is bundled in the package (
k8s_aiops.governance) — k8s-aiops has no external skill-family dependency. Preview: common operations, not yet exhaustive.
| Category | Tools | Count | Read or Write |
|---|---|---|---|
| Pods | list, get, logs, delete | 4 | 3 read / 1 write |
| Deployments | list, get, scale, rollout restart, delete | 5 | 2 read / 3 write |
| Services | list | 1 | 1 read |
| Nodes | list, cordon, uncordon | 3 | 1 read / 2 write |
| Namespaces | list | 1 | 1 read |
| Events | list | 1 | 1 read |
uv tool install k8s-aiops
k8s-aiops doctor # uses your current kube-context out of the box
Do NOT use when the target is not a Kubernetes cluster (hypervisor VM lifecycle, backup products, or cloud-provider consoles are out of scope for this skill).
| If the user wants… | Use |
|---|---|
| Kubernetes pods / deployments / nodes | k8s-aiops (this skill) |
| Hypervisor VM lifecycle (power, snapshot, migrate) | a hypervisor ops skill |
| Backup & restore | a backup ops skill |
k8s-aiops pod list -n prod → find the pod with high restarts / non-Running phasek8s-aiops pod logs <pod> -n prod --tail 200 → read the recent logs for the crash causek8s-aiops events -n prod → check for FailedScheduling / image-pull eventsk8s-aiops deployment restart <deploy> -n prod → roll the deployment after fixing the cause403, the kube context lacks the verb — run kubectl auth can-i get pods -n prod and switch to a context with adequate RBAC; the skill never retries a denied auth.k8s-aiops node list → identify the node and confirm it is Ready/schedulablek8s-aiops node cordon <node> --dry-run → preview, then k8s-aiops node cordon <node> (double confirm) — records an inverse uncordon_node undo descriptork8s-aiops node uncordon <node> → re-enable schedulingdoctor shows the cluster unreachable, fix the kubeconfig context (kubectl config get-contexts) before retrying — cordon is never issued against an unauthenticated session.| Scenario | Recommended | Why |
|---|---|---|
| Local/small models (Ollama, Qwen) | CLI | fewer tokens than MCP |
| Cloud models (Claude, GPT) | Either | MCP gives structured JSON I/O |
| Automated pipelines | MCP | type-safe parameters, audited |
| Category | Tools | R/W |
|---|---|---|
| Pods | pod_list, pod_get, pod_logs | Read |
delete_pod | Write | |
| Deployments | deployment_list, deployment_get | Read |
scale_deployment, rollout_restart_deployment, delete_deployment | Write | |
| Services | service_list | Read |
| Nodes | node_list | Read |
cordon_node, uncordon_node | Write | |
| Namespaces | namespace_list | Read |
| Events | event_list | Read |
Harness features that light up: write tools with a clean inverse pass an undo= lambda so the harness records an inverse descriptor (with _undo_id) to the undo store — scale_deployment records a scale-back to its returned previous_replicas, and cordon_node ↔ uncordon_node are mutual inverses. delete_pod, delete_deployment, and rollout_restart_deployment declare no undo; delete_deployment is tagged risk_level=high. All 15 tools are audit-logged under ~/.k8s-aiops/ and pass through the policy pre-check + budget/runaway guard + graduated risk-tier gate. Avoid tight poll loops (re-listing pods every second) — the runaway breaker backs this up.
k8s-aiops pod list [-n <ns>] [-t <target>]
k8s-aiops pod get <name> [-n <ns>]
k8s-aiops pod logs <name> [-n <ns>] [--tail 200] [-c <container>]
k8s-aiops pod delete <name> [-n <ns>] [--dry-run] # double confirm
k8s-aiops deployment list [-n <ns>]
k8s-aiops deployment get <name> [-n <ns>]
k8s-aiops deployment scale <name> <replicas> [-n <ns>]
k8s-aiops deployment restart <name> [-n <ns>]
k8s-aiops deployment delete <name> [-n <ns>] [--dry-run] # double confirm
k8s-aiops service list [-n <ns>]
k8s-aiops node list
k8s-aiops node cordon <name> [--dry-run] # double confirm
k8s-aiops node uncordon <name>
k8s-aiops namespace list
k8s-aiops events [-n <ns>]
k8s-aiops doctor
k8s-aiops mcp # start MCP server (stdio)
See references/cli-reference.md for the full command list.
The named context does not exist in your kubeconfig. Run kubectl config get-contexts and set the target's context: to a listed name (or omit it to use current-context).
The kube context lacks the RBAC verb for the resource. Check with kubectl auth can-i <verb> <resource> -n <ns> and switch to a context/ServiceAccount with adequate roles. For EKS/GKE/AKS, confirm the exec-plugin (aws/gcloud/az CLI) is installed and logged in.
The pod/deployment/node name or namespace is wrong, or the object was deleted. List the parent collection first (pod list, deployment list, node list) to get a current name. Remember most commands default to the default namespace unless -n is given.
The object changed concurrently (or already exists). Re-read it and retry the write.
pod logs returns the trailing --tail lines (default 100); raise --tail. For a multi-container pod, pass -c <container> or the API returns an error naming the available containers.
All operations are automatically audited via the bundled @governed_tool decorator (k8s_aiops.governance):
~/.k8s-aiops/audit.db (local SQLite audit DB; relocate with K8S_AIOPS_HOME)~/.k8s-aiops/rules.yaml (deny rules, maintenance windows, risk tiers)The harness is bundled in the package — no external dependency, no manual setup. See references/setup-guide.md for security details.