Reef Prompt Guard
v1.0.0Detect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation.
⭐ 0· 836·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the included artifacts: a Python filter script and a reference doc about attack patterns. No credentials, external downloads, or unrelated binaries are requested — everything present is proportional to a local prompt-filtering tool.
Instruction Scope
SKILL.md stays within scope (scanning/sanitizing untrusted text, sandwich defense, integration examples). One integration example runs the Python script via a shell exec (Node.js execSync with a JSON string embedded), which if used as shown could introduce command-injection risk when untrusted text is interpolated into a shell command. The SKILL.md also intentionally contains injection examples (e.g., “ignore previous instructions”) — this is expected for a detector but was flagged by the pre-scan.
Install Mechanism
No install spec or remote downloads; the skill is instruction + a local Python script. That is low-risk compared with installers that fetch/extract remote archives.
Credentials
No environment variables, credentials, or config paths are requested. The tool does not ask for unrelated secrets and operates on local input only.
Persistence & Privilege
always:false and normal user-invocable/autonomous invocation defaults are used. The skill does not request permanent system presence or attempt to modify other skills or global agent settings.
Scan Findings in Context
[ignore-previous-instructions] expected: The SKILL.md intentionally includes example attack payloads (e.g., 'ignore previous instructions') to demonstrate detection; the pre-scan flag is therefore expected and not itself evidence of malicious intent.
Assessment
This skill appears to be what it claims: a regex-based prompt-injection filter implemented as a local Python script. Before installing/use: 1) Prefer invoking the filter module directly (import/call) rather than interpolating untrusted text into shell commands — the Node.js execSync example in SKILL.md can be unsafe and lead to command injection if input isn't properly escaped. 2) Understand the tool's limitation: it's regex-based and will miss novel/semantic attacks; consider adding a classifier or anomaly/perplexity checks for ambiguous inputs. 3) Review and test the pattern lists in scripts/filter.py and references/attack-patterns.md to ensure no false-positives block legitimate content and to tune context multipliers. 4) Keep the script on a secure path and avoid running it with elevated privileges. If you need stronger guarantees (e.g., in production-facing pipelines or multi-agent systems), perform adversarial testing and consider layered defenses (sandboxed processing, dual-LLM architecture, strict escaping when calling subprocesses).Like a lobster shell, security has layers — review code before you run it.
latestvk97156kqhcyw9vf3t36waw3h4d810b84
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
