Sharpagent Content Safety

PassAudited by ClawScan on May 11, 2026.

Overview

This appears to be a disclosed, instruction-only content-safety filter, but users should notice that it can block outputs, create audit logs, and references rule files that were not included for review.

Before installing, confirm that you want policy-based content filtering, review the actual rulesets and jurisdictions that will be loaded, and clarify audit-log handling if sensitive content may be checked.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI01: Agent Goal Hijack

What this means

The agent may refuse, alter, or replace content based on the loaded policy rules.

Why it was flagged

The skill intentionally authorizes the agent to replace or suppress content when rules match. This is aligned with its content-safety purpose, but it directly affects what the user receives.

Skill content

| 🚫 block | High severity match | Block + return alternative content |

Recommendation

Use this skill only when policy enforcement is desired, and review which jurisdictions and rulesets are active.

Info

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Users cannot verify the exact policy rules from the provided artifacts alone.

Why it was flagged

The documentation references built-in ruleset files, but the supplied manifest contains only SKILL.md. This does not show malicious behavior, but the actual rule content is unavailable for review.

Skill content

| `global` | Universal safety (hate speech/PII/privacy) | `rules/global.yaml` |

Recommendation

Inspect or provide the referenced ruleset files before relying on this skill for compliance decisions.

Low

#ASI06: Memory and Context Poisoning

What this means

Safety logs may reveal what kinds of content were reviewed or blocked.

Why it was flagged

The skill describes recording audit logs for safety checks. The example does not log full content, but it records policy decisions and context that may be sensitive depending on use.

Skill content

"event": "safety_check", "jurisdictions": ["cn", "global"], "rules_matched": [{"rule": "cn/content-001", "severity": "critical"}], "verdict": "block"

Recommendation

Clarify where logs are stored, how long they are retained, and who can access them before using the skill with sensitive content.

Info

#ASI09: Human-Agent Trust Exploitation

What this means

Users might over-trust the skill's policy decisions if they assume the self-declared trust level means independent review.

Why it was flagged

The skill self-describes a verified trust level, while the registry metadata in the provided artifacts lists an unknown source and no homepage. This may be an internal label, but users should not treat it as external verification.

Skill content

trust_level: verified

Recommendation

Treat the trust label as a claim from the skill author unless independent verification is provided.