Sharpagent Content Safety
PassAudited by ClawScan on May 11, 2026.
Overview
This appears to be a disclosed, instruction-only content-safety filter, but users should notice that it can block outputs, create audit logs, and references rule files that were not included for review.
Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The agent may refuse, alter, or replace content based on the loaded policy rules.
The skill intentionally authorizes the agent to replace or suppress content when rules match. This is aligned with its content-safety purpose, but it directly affects what the user receives.
| 🚫 block | High severity match | Block + return alternative content |
Use this skill only when policy enforcement is desired, and review which jurisdictions and rulesets are active.
Users cannot verify the exact policy rules from the provided artifacts alone.
The documentation references built-in ruleset files, but the supplied manifest contains only SKILL.md. This does not show malicious behavior, but the actual rule content is unavailable for review.
| `global` | Universal safety (hate speech/PII/privacy) | `rules/global.yaml` |
Inspect or provide the referenced ruleset files before relying on this skill for compliance decisions.
Safety logs may reveal what kinds of content were reviewed or blocked.
The skill describes recording audit logs for safety checks. The example does not log full content, but it records policy decisions and context that may be sensitive depending on use.
"event": "safety_check", "jurisdictions": ["cn", "global"], "rules_matched": [{"rule": "cn/content-001", "severity": "critical"}], "verdict": "block"Clarify where logs are stored, how long they are retained, and who can access them before using the skill with sensitive content.
Users might over-trust the skill's policy decisions if they assume the self-declared trust level means independent review.
The skill self-describes a verified trust level, while the registry metadata in the provided artifacts lists an unknown source and no homepage. This may be an internal label, but users should not treat it as external verification.
trust_level: verified
Treat the trust label as a claim from the skill author unless independent verification is provided.
