sentryskills
SentrySkills - Automatic security guard that runs on EVERY task. Protects against prompt injection, data leaks, unsafe commands, and code vulnerabilities wit...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
SentrySkills - Always-On Security Guard
SentrySkills is designed to run AUTOMATICALLY on every task. It provides three-stage protection (preflight → runtime → output) with predictive risk analysis.
⚠️ Installation ≠ Automatic Activation
After installing SentrySkills, you must configure your AGENTS.md to make it run automatically. See install/first_time_setup.md for details.
How It Works
On every task, SentrySkills:
- Preflight - Analyzes user intent and planned actions
- Runtime - Monitors behavior during execution
- Output - Redacts sensitive data before response
- Predictive - Warns about potential risks
Quick Enable (One Command)
Add this to your ~/.codex/AGENTS.md:
cat >> ~/.codex/AGENTS.md << 'EOF'
# SentrySkills - Security First
Before EVERY response, run:
python ~/.codex/sentryskills/shared/scripts/self_guard_runtime_hook_template.py \
input.json \
--policy-profile balanced \
--out result.json
Then check result.json for final_action and follow it.
EOF
Restart Codex and you're protected!
When to Use SentrySkills
Use SentrySkills when you need AI agents to operate safely with:
- Sensitive data access - Agents reading credentials, secrets, or private information
- System modifications - Agents executing commands, writing files, or changing configurations
- External communications - Agents making network requests or calling external APIs
- Code generation - Agents producing code that might contain vulnerabilities
- Production environments - Any scenario where security cannot be compromised
- Multi-turn conversations - Detect subtle manipulation across multiple interactions
Examples:
✅ Use: When an agent needs to read environment variables or config files
✅ Use: When an agent is asked to execute shell commands
✅ Use: When an agent generates database queries or API calls
✅ Use: When an agent modifies system files or configurations
❌ Skip: Simple read-only queries on public documentation
❌ Skip: Basic explanations without system access
Skill Package Structure
This is a skill package that orchestrates multiple sub-skills:
- using-sentryskills - User-facing entry point
- sentryskills-orchestrator - Central coordination
- sentryskills-preflight - Pre-execution checks
- sentryskills-runtime - Runtime monitoring
- sentryskills-output - Output validation & redaction
Each sub-skill has its own SKILL.md with specific requirements.
Execution Requirements
- Run guard checks before each external output
- Process sequence:
preflight → runtime → output guard → final decision - Block: Prohibit original response, must refuse or redact
- Downgrade: Must downgrade expression and declare uncertainty
- Explanatory responses must also go through output guard
Recommended Usage
Default (turn_dir layout)
python shared/scripts/self_guard_runtime_hook_template.py \
shared/references/input_schema.json \
--policy shared/references/runtime_policy.balanced.json \
--policy-profile balanced
With summary output
python shared/scripts/self_guard_runtime_hook_template.py \
shared/references/input_schema.json \
--out ./sentry_skill_log/sentryskills_summary.json
Legacy event stream
python shared/scripts/self_guard_runtime_hook_template.py \
shared/references/input_schema.json \
--log-layout legacy \
--events-log ./sentry_skill_log/sentryskills_events.jsonl
Mandatory Logging Protocol
- Text-only judgment is prohibited - Runtime hook must execute each round
- Input JSON must include
project_path(absolute path to avoid drift) - Final response must provide:
self_guard_final_actionself_guard_trace_idself_guard_events_log(path to index or legacy events)
- If script execution fails, declare "security self-check not completed" and adopt conservative output strategy
Default Log Layout
Log root: ./sentry_skill_log/
Per-turn directories:
./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/input.json./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/result.json
Global index:
./sentry_skill_log/index.jsonl
Session state:
./sentry_skill_log/.self_guard_state/
Policy Profiles
- balanced: Standard security (default)
- strict: Maximum security
- permissive: Minimal interference
Detection Coverage
Preflight Stage
- Prompt injection patterns
- Malicious intent detection
- Sensitive topic inference
- Action classification
Runtime Stage
- Event monitoring
- Source tracking
- Anomaly detection
- Behavioral analysis
Output Stage
- Sensitive data redaction
- Source disclosure handling
- Confidence assessment
- Safe response generation
Predictive Analysis
- Resource exhaustion prediction
- Scope creep detection
- Privilege escalation warning
- Data exfiltration path analysis
- Multi-turn grooming detection
Integration
As Codex Skill
Copy to skills/sentryskills/ and reference in agent configuration.
Configuration Files
shared/references/runtime_policy.*.json- Security policy profilesshared/references/detection_rules.json- Detection rule definitionsshared/references/input_schema.json- Input validation schema
Testing
# Test predictive analysis
python test_predictive_analysis.py
# Test integration
python test_integration.py
Event Types
The system emits structured events for:
preflight_result- Pre-execution check outcomeruntime_result- Runtime monitoring outcomeoutput_guard_result- Output validation outcomepredictive_analysis_result- Risk prediction (if enabled)final_decision- Overall decision with rationalehook_end- Completion with duration
Each event includes:
- Trace ID for correlation
- Decision (block/downgrade/allow/continue)
- Reason codes
- Matched rules
- Metadata
Performance
- Typical latency: 50-100ms per check
- Memory: <50MB baseline
- Zero external dependencies (Python stdlib only)
Security Properties
- No data exfiltration: All processing is local
- No LLM calls: Pure rule-based and heuristic
- Audit trail: Complete event log for compliance
- Transparent: All decisions include reason codes
Files
83 totalComments
Loading comments…
