Sre Engineer
SuspiciousAudited by ClawScan on May 10, 2026.
Overview
The skill is coherent for SRE work, but its reference examples encourage production-impacting automation, chaos actions, and persistent self-healing without clear approval or containment requirements.
Install only if you want an SRE guidance skill and are prepared to review generated operational commands carefully. Do not allow it to execute runbooks, chaos tests, Kubernetes commands, service restarts, or persistent self-healing without explicit approval, environment checks, dry runs, and rollback plans.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If an agent or user follows these examples directly, it could run powerful operational commands that alter databases or Kubernetes services.
The automated runbook pattern executes arbitrary shell commands by default, and the shown example performs database failover and Kubernetes service changes. This is purpose-aligned for SRE but lacks explicit approval, dry-run-by-default, or production-safety boundaries.
def execute(self, dry_run: bool = False) ... subprocess.run(step.command, shell=True, ...); ... success, output = failover_runbook.execute(dry_run=False)
Require explicit user approval before any command execution, make dry-run the default, restrict commands to reviewed allowlists, and clearly separate generated examples from actions the agent should execute.
A poorly scoped chaos experiment could cause outages or customer-facing degradation across services.
The chaos engineering guidance includes experiments affecting production traffic and deleting Kubernetes pods. Although chaos testing is SRE-relevant, the artifact does not require explicit user approval, non-production defaults, or strong containment before disruption.
blast_radius="Single database instance, 50% of production traffic" ... subprocess.run(["kubectl", "delete", pod, "-n", self.namespace])
Default chaos examples to staging or a small canary, require blast-radius approval, define abort thresholds before execution, and never run destructive experiments autonomously.
Persistent automation could keep modifying a host or restarting services after the original request, potentially hiding evidence or worsening an incident.
The self-healing example recommends persistent scheduled execution and includes automatic deletion of log files and service restarts. Persistent remediation is plausible for SRE, but the artifact does not define operator approval, rollback, or ownership boundaries.
# Run as cron job or systemd timer ... cleanup_disk() ... ["find", "/var/log", "-name", "*.log", "-mtime", "+7", "-delete"] ... ["systemctl", "restart", "myservice"]
Require opt-in installation for timers, log every remediation action, scope cleanup paths carefully, and include a clear disable/rollback procedure.
Commands may run with the user's current cluster or system privileges and could affect the wrong environment if context is misconfigured.
The examples assume access to existing Kubernetes privileges and could operate against whatever current context the user has selected. This is relevant to SRE work, but users should verify scope and identity before use.
command="kubectl patch service postgres -p '{\"spec\":{\"selector\":{\"role\":\"replica\"}}}'"Use least-privilege credentials, require explicit namespace/context selection, and confirm the target environment before generating or running operational commands.
