Anti-Injection-Skill
v2.0.3Detect prompt injection, jailbreak, role-hijack, and system extraction attempts. Applies multi-layer defense with semantic analysis and penalty scoring.
⭐ 10· 9.3k·19 current·21 all-time
byWesley Armando@georges91560
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The skill's name and description match the files: many blacklist patterns, semantic scoring, multi-lingual detection, and audit/alerting are present and appropriate for a defensive tool. However, the SKILL.md and CONFIGURATION docs describe using external semantic APIs (Claude/OpenAI) and agent Telegram channels/webhooks without declaring any required credentials or environment variables in the registry metadata — an inconsistency. The repo also bundles an install.sh and many docs; that is plausible for a security product but increases surface area and requires provenance checks.
Instruction Scope
Runtime instructions require running the sentinel on EVERY user input and EVERY tool output and 'BEFORE any plan formulation' — effectively intercepting and gating all agent I/O. The skill writes to an AUDIT.md, sends alerts via the agent's Telegram connection, and documents optional external webhooks (which would transmit event data off-agent). Those behaviors are within a defender's remit, but they are broad: they can block/modify normal operation and transmit information to external endpoints, so scope is high and should be explicitly authorized by the operator.
Install Mechanism
Registry metadata lists no install spec (instruction-only), which is lowest risk for automatic installs. But the package includes a non-empty install.sh and multiple docs that instruct cloning the repo, copying files, and installing Python models (sentence-transformers). The docs recommend pip install with flags (--break-system-packages) and downloading models (~400MB). Because there is no canonical trusted install URL in the registry metadata and the install script content isn't shown here, this raises a moderate risk: manual code review of install.sh and any network calls it makes is recommended before running.
Credentials
The skill declares no required env vars or credentials in the registry, yet the docs reference: agent Telegram bot tokens (it uses the agent's existing Telegram connection), optional SEMANTIC_MODE 'api' (which would require OpenAI/Claude keys), SECURITY_AUDIT_LOG path, and an optional external webhook URL. That mismatch (using external APIs and channels without declaring or requiring the associated credentials) is an incoherence and increases risk if operators enable API mode or webhooks without auditing what is sent.
Persistence & Privilege
Metadata does not set always:true, but the documentation repeatedly instructs integrators to enable the skill 'ALWAYS RUN BEFORE ANY OTHER LOGIC' and set priority to 'highest' in agent config. If enabled as recommended this gives the skill effective global enforcement capability over all inputs and tool outputs. Combined with autonomous invocation (default) and the ability to send alerts/webhook payloads, this raises the blast radius if the skill or its install script is malicious or buggy. The registry flags don’t reflect this elevated operational requirement, so the operator must consciously opt in and review.
Scan Findings in Context
[ignore-previous-instructions] expected: The SKILL.md contains a blacklist of common injection phrases (including 'ignore previous instructions') — this is expected for a defensive skill and likely triggered the detector.
[you-are-now] expected: The skill enumerates persona/roleplay jailbreak patterns (e.g., 'you are now DAN'); presence in blacklists is expected and matches the documented detection goals.
[system-prompt-override] expected: System prompt extraction/override is a core detection category for this skill; listing such patterns is consistent with its purpose.
What to consider before installing
Plain-language steps before you install or enable this skill:
1) Verify provenance: try to find and inspect the upstream repository referenced in the docs (announced GitHub links). If you can't corroborate the author and repo, treat as higher risk.
2) Review install.sh and any scripts before running: the package includes an install script and doc instructions to pip-install large models and use flags like --break-system-packages — review for network calls, commands that alter system files, or unexpected downloads.
3) Disable webhook/API modes by default: the skill documents optional external webhooks and an 'API' semantic mode (which would need API keys). Do not enable these until you’ve reviewed exactly what payloads are sent and where.
4) Prefer local semantic mode and sandboxed testing: if you want to test, run semantic analysis locally (no external API keys) on a throwaway agent in an isolated environment to validate false positives/negatives and performance.
5) Audit alert channels: the skill uses the agent's Telegram connection to send alerts. Confirm you are comfortable with the skill using existing channel credentials and review alert contents (ensure sensitive user messages are not leaked).
6) Confirm config/priority explicitly: the skill insists on running before all logic; only enable that if you accept the risk that it will gate and modify all inputs and tool outputs. Consider staged rollout (monitor-only) first.
7) If you lack capacity to audit, ask for an external code/security review or run the skill in a restricted sandbox (no outbound network, limited filesystem access) until you're confident.
Bottom line: the repo and docs are consistent with a real defender tool, but multiple operational/information-flow mismatches and the presence of install scripts and network-configurable webhooks justify caution — review the code and test in isolated conditions before enabling it in production.Like a lobster shell, security has layers — review code before you run it.
latestvk978nv2mvmk9zj1qv9hr3ep36d81ea1c
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🛡️ Clawdis
