AI Safety Rails
PassAudited by ClawScan on May 10, 2026.
Overview
This is an instruction-only safety template that appears purpose-aligned, but users should separately review the extra skills it recommends and carefully define any autonomy or approval-channel settings.
This skill appears safe to install as an instruction-only safety template. Before using it, choose conservative trust settings unless you have clear action boundaries, secure the approval channel, and separately review any additional skills it recommends installing.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If the user chooses an aggressive level without precise limits, the agent may take actions the user did not expect.
The trust ladder intentionally allows some autonomous actions at higher levels. This is purpose-aligned for a safety-configuration skill, but users must define clear bounds for what counts as low-stakes and reversible.
| 3 | Act Within Bounds | Specific pre-approved autonomous actions. |\n| 4 | Full Autonomy | Low-stakes, reversible actions only. |
Use Conservative or Moderate settings unless you have clearly listed allowed autonomous actions, reversibility rules, and when the agent must ask first.
Installing the recommended add-ons could introduce additional code, permissions, or instructions not assessed in this review.
The skill recommends installing additional skills that are not included in the supplied artifact set and uses a latest-version CLI command. This is disclosed and aligned with the security purpose, but the referenced skills should be reviewed separately.
Also installs: ai-sentinel (prompt injection firewall), skill-guard (malware scanner)\n\n```bash\nnpx clawhub@latest install ai-sentinel\nnpx clawhub@latest install skill-guard\n```
Before running these commands, inspect the ai-sentinel and skill-guard skill pages, permissions, source, and review results.
Drafts or approval requests could be exposed to the chosen messaging service or to anyone with access to that channel.
The approval workflow may route message drafts through a user-selected messaging channel. This is expected for an approval queue, but the artifact does not define channel authentication, retention, or what sensitive content may be posted there.
"What's your verified messaging channel? (e.g., Telegram)" ... "All external messages: draft → post to approval channel → user approves → send"
Use a private, secured approval channel, avoid posting secrets or sensitive personal data there, and confirm the agent only trusts messages from verified user identities.
