Sre Engineer

SuspiciousAudited by ClawScan on May 10, 2026.

Overview

The skill is coherent for SRE work, but its reference examples encourage production-impacting automation, chaos actions, and persistent self-healing without clear approval or containment requirements.

Install only if you want an SRE guidance skill and are prepared to review generated operational commands carefully. Do not allow it to execute runbooks, chaos tests, Kubernetes commands, service restarts, or persistent self-healing without explicit approval, environment checks, dry runs, and rollback plans.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If an agent or user follows these examples directly, it could run powerful operational commands that alter databases or Kubernetes services.

Why it was flagged

The automated runbook pattern executes arbitrary shell commands by default, and the shown example performs database failover and Kubernetes service changes. This is purpose-aligned for SRE but lacks explicit approval, dry-run-by-default, or production-safety boundaries.

Skill content
def execute(self, dry_run: bool = False) ... subprocess.run(step.command, shell=True, ...); ... success, output = failover_runbook.execute(dry_run=False)
Recommendation

Require explicit user approval before any command execution, make dry-run the default, restrict commands to reviewed allowlists, and clearly separate generated examples from actions the agent should execute.

ConcernHigh Confidence
ASI08: Cascading Failures
What this means

A poorly scoped chaos experiment could cause outages or customer-facing degradation across services.

Why it was flagged

The chaos engineering guidance includes experiments affecting production traffic and deleting Kubernetes pods. Although chaos testing is SRE-relevant, the artifact does not require explicit user approval, non-production defaults, or strong containment before disruption.

Skill content
blast_radius="Single database instance, 50% of production traffic" ... subprocess.run(["kubectl", "delete", pod, "-n", self.namespace])
Recommendation

Default chaos examples to staging or a small canary, require blast-radius approval, define abort thresholds before execution, and never run destructive experiments autonomously.

ConcernHigh Confidence
ASI10: Rogue Agents
What this means

Persistent automation could keep modifying a host or restarting services after the original request, potentially hiding evidence or worsening an incident.

Why it was flagged

The self-healing example recommends persistent scheduled execution and includes automatic deletion of log files and service restarts. Persistent remediation is plausible for SRE, but the artifact does not define operator approval, rollback, or ownership boundaries.

Skill content
# Run as cron job or systemd timer ... cleanup_disk() ... ["find", "/var/log", "-name", "*.log", "-mtime", "+7", "-delete"] ... ["systemctl", "restart", "myservice"]
Recommendation

Require opt-in installation for timers, log every remediation action, scope cleanup paths carefully, and include a clear disable/rollback procedure.

What this means

Commands may run with the user's current cluster or system privileges and could affect the wrong environment if context is misconfigured.

Why it was flagged

The examples assume access to existing Kubernetes privileges and could operate against whatever current context the user has selected. This is relevant to SRE work, but users should verify scope and identity before use.

Skill content
command="kubectl patch service postgres -p '{\"spec\":{\"selector\":{\"role\":\"replica\"}}}'"
Recommendation

Use least-privilege credentials, require explicit namespace/context selection, and confirm the target environment before generating or running operational commands.