Install
openclaw skills install guardDeep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful outputs, misuse, or policy violations in LLM products.
openclaw skills install guardGuardrails turn product and legal policy into enforced behavior: blocking, rewriting, logging, and human review—with attention to false positives and latency.
Trigger conditions:
Initial offer:
Use six stages: (1) policy scope, (2) threat model, (3) controls stack, (4) implementation patterns, (5) monitoring & review, (6) iteration & appeals). Confirm latency budget and jurisdictions.
Goal: Define prohibited categories (hate, sexual content, violence, self-harm, malware instructions, etc.) and required disclaimers for sensitive domains (medical, legal).
Exit condition: Policy document owned by legal/product; escalation path for gray areas.
Goal: Identify adversaries (prompt injection, data exfiltration, tool abuse) and assets (user data, system prompts, connectors).
Goal: Layer defenses: input screening, model safety APIs, output classifiers, tool sandboxing, allowlists for tools and URLs.
Goal: Structured refusal messages; telemetry on every block; distinguish block vs rewrite vs warn; avoid silent failures.
Goal: Sample borderline cases for human review; dashboards on block rates by category; abuse spike alerts.
Goal: User appeals path where appropriate; version policy changes; measure false positives by locale and use case.