Agent Firewall

Prompts

Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model.

Install

openclaw skills install agent-firewall

Agent Firewall — Input/Output Guardian

Architecture

[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
                        ↓                                  ↓
                  ┌─────────────┐                  ┌──────────────┐
                  │ Block List  │                  │ Secret Scan  │
                  │ Pattern DB  │                  │ PII Redact   │
                  │ Rate Limit  │                  │ Path Scrub   │
                  │ Encoding Det│                  │ URL Checker  │
                  └─────────────┘                  └──────────────┘

Input Filters

#FilterDescription
1Injection patternsRegex + heuristic match for "ignore previous", "you are now", role confusion
2Unicode sanitizerStrip zero-width chars, control characters, RTL overrides
3Encoding detectorDetect Base64, hex, ROT13 encoded payloads in user messages
4Role confusionDetect fake system messages, assistant impersonation
5Rate limiterMax messages per user per channel per minute
6Size limiterReject inputs exceeding token budget

Output Filters

#FilterDescription
1Secret scannerHigh-entropy strings + known patterns (AWS key, GitHub token)
2PII redactorEmail, phone, SSN, credit card → [REDACTED]
3Path scrubberRemove internal filesystem paths from outputs
4URL checkerBlock responses containing known malicious URLs
5Consistency checkVerify output doesn't contradict system prompt directives

Configuration

# .security/firewall-rules.yaml
input:
  injection_patterns:
    - pattern: "ignore (all )?previous instructions"
      action: BLOCK
      severity: CRITICAL
    - pattern: "you are now (?!helping)"
      action: BLOCK
      severity: HIGH
  rate_limit:
    max_per_minute: 30
    max_per_hour: 500
  max_input_tokens: 4096

output:
  secret_patterns:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: REDACT
    - name: github_token
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
      action: REDACT
  pii_redaction: true
  path_scrubbing: true

Guardrails

  • Firewall rules are append-only in production — deletion requires human approval
  • False positives → log, alert, pass through with warning (don't silently drop)
  • All blocks are logged with: timestamp, rule matched, full context, channel, user hash
  • Firewall itself cannot be disabled by agent instructions
  • Rules file is read-only from the agent's perspective