Input Safety Guard

v1.0.0

Lightweight two-stage input safety guard for agents. Use this skill when an agent must screen user input before answering, block prompt injection or prompt l...

1· 60·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the provided code and config files: a deterministic prefilter (prefilter.py) and a semantic stage-2 prompt (pipeline.py) with profile-based rules. No unrelated environment variables, binaries, or install steps are required.
Instruction Scope
SKILL.md specifies intercepting raw user input, running stage-1 prefilter, and then a stage-2 semantic review using the host agent's model and a canonical prompt. This is appropriate for the stated purpose. Two integration points deserve attention: (1) stage-2 depends on the host model performing classification correctly (so integration must ensure the agent actually runs the canonical STAGE2_PROMPT_TEMPLATE and returns only the structured result), and (2) the prompt template itself must be kept internal to avoid becoming a target of 'prompt leakage' tests. The SKILL.md admonition 'do not call an external model just to perform stage 2' is implementation guidance (use the same agent/model pipeline) — it's not contradictory but should be clarified for integrators.
Install Mechanism
No install spec (instruction-only plus included source files). No downloads, external package installs, or archive extraction are used. All code shown is local Python logic and YAML configs.
Credentials
The skill requests no environment variables, no credentials, and no config paths beyond its own config/ directory. The rules and profiles are stored in local config YAML files — proportional to the stated function.
Persistence & Privilege
always is false and the skill is user-invocable; autonomous invocation (model invocation) remains allowed by default which is normal for skills. The skill does not request persistent system-wide privileges or modify other skills' configs.
Assessment
This skill appears coherent and safe in isolation, but pay attention to integration details before installing: 1) Ensure your agent actually intercepts raw user input and cannot be bypassed by other code paths—if an app builds prompts before the guard runs, the guard is ineffective. 2) Stage 2 relies on your host model executing the canonical STAGE2_PROMPT_TEMPLATE and returning only the structured result; test adversarial inputs to confirm the model follows the template reliably. 3) Keep the stage-2 prompt and rule files private to avoid making them targets for prompt-leakage probes. 4) Review and tune config/default_rules*.yaml for your deployment (use strict profile for higher conservatism in production). 5) Verify logging/telemetry in your integration does not send raw user inputs or internal prompts to external services. If you need higher assurance, perform an integration test with known bypass attempts and review the agent's actual runtime trace to confirm the guard ran as intended.

Like a lobster shell, security has layers — review code before you run it.

latestvk9762t1s38bvhk19gqjhz7n9p1841p6e

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments