sentryskills

SentrySkills - Automatic security guard that runs on EVERY task. Protects against prompt injection, data leaks, unsafe commands, and code vulnerabilities with zero external dependencies.

BiaoJie Zeng@zengbiaojie

Install

openclaw skills install @zengbiaojie/sentryskills

SentrySkills - Always-On Security Guard

SentrySkills is designed to run AUTOMATICALLY on every task. It provides three-stage protection (preflight → runtime → output) with predictive risk analysis.

⚠️ Installation ≠ Automatic Activation

After installing SentrySkills, you must configure your AGENTS.md to make it run automatically. See install/first_time_setup.md for details.

How It Works

On every task, SentrySkills:

Preflight - Analyzes user intent and planned actions
Runtime - Monitors behavior during execution
Output - Redacts sensitive data before response
Predictive - Warns about potential risks

Quick Enable (One Command)

Add this to your ~/.codex/AGENTS.md:

bash

cat >> ~/.codex/AGENTS.md << 'EOF'
# SentrySkills - Security First
Before EVERY response, run:
python ~/.codex/sentryskills/shared/scripts/self_guard_runtime_hook_template.py \
  input.json \
  --policy-profile balanced \
  --out result.json
Then check result.json for final_action and follow it.
EOF

Restart Codex and you're protected!

When to Use SentrySkills

Use SentrySkills when you need AI agents to operate safely with:

Sensitive data access - Agents reading credentials, secrets, or private information
System modifications - Agents executing commands, writing files, or changing configurations
External communications - Agents making network requests or calling external APIs
Code generation - Agents producing code that might contain vulnerabilities
Production environments - Any scenario where security cannot be compromised
Multi-turn conversations - Detect subtle manipulation across multiple interactions

Examples:

text

✅ Use: When an agent needs to read environment variables or config files
✅ Use: When an agent is asked to execute shell commands
✅ Use: When an agent generates database queries or API calls
✅ Use: When an agent modifies system files or configurations
❌ Skip: Simple read-only queries on public documentation
❌ Skip: Basic explanations without system access

Skill Package Structure

This is a skill package that orchestrates multiple sub-skills:

using-sentryskills - User-facing entry point
sentryskills-orchestrator - Central coordination
sentryskills-preflight - Pre-execution checks
sentryskills-runtime - Runtime monitoring
sentryskills-output - Output validation & redaction

Each sub-skill has its own SKILL.md with specific requirements.

Execution Requirements

Run guard checks before each external output
Process sequence: preflight → runtime → output guard → final decision
Block: Prohibit original response, must refuse or redact
Downgrade: Must downgrade expression and declare uncertainty
Explanatory responses must also go through output guard

Recommended Usage

Default (turn_dir layout)

bash

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --policy shared/references/runtime_policy.balanced.json \
  --policy-profile balanced

With summary output

bash

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --out ./sentry_skill_log/sentryskills_summary.json

Legacy event stream

bash

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --log-layout legacy \
  --events-log ./sentry_skill_log/sentryskills_events.jsonl

Mandatory Logging Protocol

Text-only judgment is prohibited - Runtime hook must execute each round
Input JSON must include project_path (absolute path to avoid drift)
Final response must provide:
- self_guard_final_action
- self_guard_trace_id
- self_guard_events_log (path to index or legacy events)
If script execution fails, declare "security self-check not completed" and adopt conservative output strategy

Default Log Layout

Log root: ./sentry_skill_log/

Per-turn directories:

./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/input.json
./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/result.json

Global index:

./sentry_skill_log/index.jsonl

Session state:

./sentry_skill_log/.self_guard_state/

Policy Profiles

balanced: Standard security (default)
strict: Maximum security
permissive: Minimal interference

Detection Coverage

Preflight Stage

Prompt injection patterns
Malicious intent detection
Sensitive topic inference
Action classification

Runtime Stage

Event monitoring
Source tracking
Anomaly detection
Behavioral analysis

Output Stage

Sensitive data redaction
Source disclosure handling
Confidence assessment
Safe response generation

Predictive Analysis

Resource exhaustion prediction
Scope creep detection
Privilege escalation warning
Data exfiltration path analysis
Multi-turn grooming detection

Integration

As Codex Skill

Copy to skills/sentryskills/ and reference in agent configuration.

Configuration Files

shared/references/runtime_policy.*.json - Security policy profiles
shared/references/detection_rules.json - Detection rule definitions
shared/references/input_schema.json - Input validation schema

Testing

bash

# Test predictive analysis
python test_predictive_analysis.py

# Test integration
python test_integration.py

Event Types

The system emits structured events for:

preflight_result - Pre-execution check outcome
runtime_result - Runtime monitoring outcome
output_guard_result - Output validation outcome
predictive_analysis_result - Risk prediction (if enabled)
final_decision - Overall decision with rationale
hook_end - Completion with duration

Each event includes:

Trace ID for correlation
Decision (block/downgrade/allow/continue)
Reason codes
Matched rules
Metadata

Performance

Typical latency: 50-100ms per check
Memory: <50MB baseline
Zero external dependencies (Python stdlib only)

Security Properties

No data exfiltration: All processing is local
No LLM calls: Pure rule-based and heuristic
Audit trail: Complete event log for compliance
Transparent: All decisions include reason codes

sentryskills

Install

SentrySkills - Always-On Security Guard

⚠️ Installation ≠ Automatic Activation

How It Works

Quick Enable (One Command)

When to Use SentrySkills

Skill Package Structure

Execution Requirements

Recommended Usage

Default (turn_dir layout)

With summary output

Legacy event stream

Mandatory Logging Protocol

Default Log Layout

Policy Profiles

Detection Coverage

Preflight Stage

Runtime Stage

Output Stage

Predictive Analysis

Integration

As Codex Skill

Configuration Files

Testing

Event Types

Performance

Security Properties

Related skills