Sharpagent Content Safety

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports concurrent jurisdictions (global/China/US/EU). Coordinates with the calibration framework and five-factor review. Independent Layer 3 of the SharpAgent four-layer architecture.

Audits

Pass

Install

openclaw skills install sharpagent-content-safety

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

  • Content empty?
  • Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

VerdictMeaningAction
✅ passNo matchesLet through to output
⚠️ flagLow severity matchTag + allow + log
🚫 blockHigh severity matchBlock + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

RulesetCoverageFile
globalUniversal safety (hate speech/PII/privacy)rules/global.yaml
cnChina internet content regulationsrules/cn.yaml
usUS export control/safe harborrules/us.yaml
euGDPR relatedrules/eu.yaml

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

SituationAction
Conflicting jurisdiction rulesStrictest wins (block > flag > pass)
Rule false positiveAdd exemption, log false positive
Cross-chunk sensitive phraseOverlap scanning (±200 chars)
No jurisdiction configuredLoad global only
Corrupt rule fileSkip + log error, don't crash engine
Exemption conditions metSkip rule, log exemption reason

Quality Gates

CheckWhatFail action
At least 1 rulesetNo rules = nothing blockedDon't start
Verdict unambiguouspass/flag/blockDefault block
Block provides reasonUser knows whyAdd reason
Complete audit logEvery check recordedBackfill
Rules versionedUpdates don't break running checksSemver rules

Integration Points

Five-Factor Review

  • Safety engine output (compliance_check: fail) can trigger five-factor
  • Independent but cooperative

Calibration Framework

  • Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
  • Calibration compliance field maps to safety engine rule selection

Self-Evolving

  • Safety false positives/negatives trigger self-evolving reflection
  • New rules as improvement hypotheses

Layered Memory

  • Safety logs go to L6 archive (legal compliance)

Version History

  • v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11