Openclaw Warden

Verify workspace file integrity and scan for prompt injection patterns in agent identity and memory files. Detects unauthorized modifications to SOUL.md, AGENTS.md, IDENTITY.md, memory files, and installed skills. Free detection layer — upgrade to openclaw-warden-pro for automated countermeasures.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 1.6k · 2 current installs · 2 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, required binary (python3), and the included script all align: the tool collects workspace files, computes checksums, and scans for prompt-injection patterns. No unrelated credentials, binaries, or external services are requested.
Instruction Scope
Runtime instructions are constrained to workspace scanning and integrity operations (baseline, verify, scan, full, status, accept). The script resolves a workspace via CLI arg, OPENCLAW_WORKSPACE, current dir, or a default; this is consistent with the stated behavior. Note: SKILL.md explicitly lists injection patterns (e.g., 'ignore previous instructions'), which triggered the static pre-scan detector — that is expected because the skill documents those strings as things to flag.
Install Mechanism
No install spec; user copies skill into workspace and runs python3 scripts/integrity.py. This is low-risk compared to remote downloads or package installs. The script writes a local .integrity directory in the workspace for manifests/snapshots, which is expected.
Credentials
No environment variables or credentials are required. The only environment access is optional workspace detection via OPENCLAW_WORKSPACE and reading/writing files inside the workspace directory — proportionate to its purpose.
Persistence & Privilege
The tool will create .integrity snapshots and can quarantine/restore/rollback skills and files (QUARANTINE_PREFIX indicates renaming/moving skills). Those filesystem modifications are in-scope for a countermeasure tool, but they are destructive actions so users should review code and run with care. always:false (not force-installed) and no extra privileges requested.
Scan Findings in Context
[ignore-previous-instructions] expected: The SKILL.md enumerates prompt-injection patterns (including phrases like 'ignore previous instructions') so the static detector flagged those strings. This is a documentation artifact, not evidence of malicious intent.
Assessment
This skill is coherent with its stated purpose and runs entirely locally under python3, but it will create .integrity state in your workspace and can rename/quarantine other skill directories and restore files. Before installing or enabling autonomous invocation: (1) inspect scripts/integrity.py (especially quarantine/restore/protect/rollback implementations) to confirm their behavior matches your expectations, (2) back up your workspace, (3) consider running it manually first rather than granting automated startup hooks or autonomous agent invocation, and (4) verify there are no network calls or hidden endpoints in the remainder of the code if you will run it with sensitive data.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.3
Download zip
latestvk97a22ryxg2frf1nj3fkmvj9hs810q99

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🛡️ Clawdis
OSmacOS · Linux · Windows
Binspython3

SKILL.md

OpenClaw Warden

Monitors your workspace files for unauthorized modifications and prompt injection attacks. Existing security tools scan skills before installation — this tool watches the workspace itself after installation, catching tampering that other tools miss.

Why This Matters

Your agent reads SOUL.md, AGENTS.md, IDENTITY.md, USER.md, and memory files on every session startup and trusts them implicitly. A compromised skill, a malicious heartbeat payload, or an unauthorized process can modify these files to:

  • Inject hidden instructions that alter agent behavior
  • Embed data exfiltration URLs in markdown images
  • Override identity and safety boundaries
  • Plant persistent backdoors in memory files

This skill detects all of these.

Commands

Establish Baseline

Create or reset the integrity baseline. Run this after setting up your workspace or after reviewing and accepting all current file states.

python3 {baseDir}/scripts/integrity.py baseline --workspace /path/to/workspace

Verify Integrity

Check all monitored files against the stored baseline. Reports modifications, deletions, and new untracked files.

python3 {baseDir}/scripts/integrity.py verify --workspace /path/to/workspace

Scan for Injections

Scan workspace files for prompt injection patterns: hidden instructions, base64 payloads, Unicode tricks, markdown image exfiltration, HTML injection, and suspicious system prompt markers.

python3 {baseDir}/scripts/integrity.py scan --workspace /path/to/workspace

Full Check (Verify + Scan)

Run both integrity verification and injection scanning in one pass.

python3 {baseDir}/scripts/integrity.py full --workspace /path/to/workspace

Quick Status

One-line summary of workspace health.

python3 {baseDir}/scripts/integrity.py status --workspace /path/to/workspace

Accept Changes

After reviewing a legitimate change, update the baseline for a specific file.

python3 {baseDir}/scripts/integrity.py accept SOUL.md --workspace /path/to/workspace

Workspace Auto-Detection

If --workspace is omitted, the script tries:

  1. OPENCLAW_WORKSPACE environment variable
  2. Current directory (if AGENTS.md exists)
  3. ~/.openclaw/workspace (default)

What Gets Monitored

CategoryFilesAlert Level on Change
CriticalSOUL.md, AGENTS.md, IDENTITY.md, USER.md, TOOLS.md, HEARTBEAT.mdWARNING
Memorymemory/*.md, MEMORY.mdINFO (expected to change)
Config*.json in workspace rootWARNING
Skillsskills/*/SKILL.mdWARNING

Injection patterns trigger CRITICAL alerts regardless of file category.

Injection Patterns Detected

  • Instruction override: "ignore previous instructions", "disregard above", "you are now", "new system prompt"
  • Base64 payloads: Suspiciously long base64 strings outside code blocks
  • Unicode manipulation: Zero-width characters, RTL overrides, homoglyphs
  • Markdown exfiltration: Image tags with data-encoding URLs
  • HTML injection: script tags, iframes, hidden elements
  • System prompt markers: <system>, [SYSTEM], <<SYS>> blocks
  • Shell injection: $(...) outside code blocks

Exit Codes

  • 0 — Clean, no issues
  • 1 — Modifications detected (review needed)
  • 2 — Injection patterns detected (action needed)

No External Dependencies

Python standard library only. No pip install. No network calls. Everything runs locally.

Cross-Platform

Works with OpenClaw, Claude Code, Cursor, and any tool using the Agent Skills specification.

Files

3 total
Select a file
Select a file to preview.

Comments

Loading comments…