AI Safety Rails

PassAudited by ClawScan on May 10, 2026.

Overview

This is an instruction-only safety template that appears purpose-aligned, but users should separately review the extra skills it recommends and carefully define any autonomy or approval-channel settings.

This skill appears safe to install as an instruction-only safety template. Before using it, choose conservative trust settings unless you have clear action boundaries, secure the approval channel, and separately review any additional skills it recommends installing.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If the user chooses an aggressive level without precise limits, the agent may take actions the user did not expect.

Why it was flagged

The trust ladder intentionally allows some autonomous actions at higher levels. This is purpose-aligned for a safety-configuration skill, but users must define clear bounds for what counts as low-stakes and reversible.

Skill content
| 3 | Act Within Bounds | Specific pre-approved autonomous actions. |\n| 4 | Full Autonomy | Low-stakes, reversible actions only. |
Recommendation

Use Conservative or Moderate settings unless you have clearly listed allowed autonomous actions, reversibility rules, and when the agent must ask first.

What this means

Installing the recommended add-ons could introduce additional code, permissions, or instructions not assessed in this review.

Why it was flagged

The skill recommends installing additional skills that are not included in the supplied artifact set and uses a latest-version CLI command. This is disclosed and aligned with the security purpose, but the referenced skills should be reviewed separately.

Skill content
Also installs: ai-sentinel (prompt injection firewall), skill-guard (malware scanner)\n\n```bash\nnpx clawhub@latest install ai-sentinel\nnpx clawhub@latest install skill-guard\n```
Recommendation

Before running these commands, inspect the ai-sentinel and skill-guard skill pages, permissions, source, and review results.

What this means

Drafts or approval requests could be exposed to the chosen messaging service or to anyone with access to that channel.

Why it was flagged

The approval workflow may route message drafts through a user-selected messaging channel. This is expected for an approval queue, but the artifact does not define channel authentication, retention, or what sensitive content may be posted there.

Skill content
"What's your verified messaging channel? (e.g., Telegram)" ... "All external messages: draft → post to approval channel → user approves → send"
Recommendation

Use a private, secured approval channel, avoid posting secrets or sensitive personal data there, and confirm the agent only trusts messages from verified user identities.