Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

sentryskills

SentrySkills - Automatic security guard that runs on EVERY task. Protects against prompt injection, data leaks, unsafe commands, and code vulnerabilities wit...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 36 · 0 current installs · 0 all-time installs
byBiaoJie Zeng@zengbiaojie
MIT-0
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description (always-on self-guard) match the included code: multiple scripts for preflight, runtime, and output stages are present. The package claims 'zero external dependencies' and the code defensively falls back when optional packages are missing (jsonschema, structlog, prometheus_client) — this is coherent. Minor mismatch: the top-level SKILL.md repeatedly says it 'runs on EVERY task' yet the skill metadata has always:false and activation requires a manual AGENTS.md change; this is a policy/activation mismatch but not necessarily malicious.
!
Instruction Scope
Runtime instructions require constructing an input JSON that must include an absolute project_path and planned_actions/candidate_response, running the self_guard_runtime_hook_template.py before every output, and writing structured logs (./sentry_skill_log/ by default). That means the guard expects (and will process) full prompt text, planned actions, and a project path — potentially exposing file-system paths and content. The instructions also say the agent should 'monitor file ops, network calls' — but the mechanism for live monitoring depends on the environment/instrumentation; absent such integration, the script will rely on the provided input and on filesystem access. Requiring absolute project_path and instructing fallback to writable temp dirs increases the chance the skill will read/write files outside the agent sandbox if enabled.
Install Mechanism
No install spec is provided in registry metadata (instruction-only), which is lower risk than arbitrary remote installers. However the package contains many code files (scripts, policies, templates) that will be placed on disk when the skill is installed. The README mentions installing from a GitHub repo or via a third-party CLI (clawhub) — those are manual flows outside the registry and should be examined separately. There is no evidence of downloads from untrusted URLs in the package itself.
Credentials
The skill does not request environment variables, credentials, or special config paths in its metadata. Code references optional env vars for logging/metrics (TRINITYGUARD_ENVIRONMENT, TRINITYGUARD_VERSION) and conditionally uses optional libraries if present. The main proportionality concern is functional: the runtime requires an absolute project_path and candidate responses (which could include secrets) to operate — giving the guard that context is necessary for its function but also increases data exposure surface. No explicit external API keys or unrelated credentials are requested.
Persistence & Privilege
The skill metadata does not set always:true and does not demand system-wide privileges. But the package's intended deployment model is to be added into AGENTS.md so it runs before every response; that is a powerful capability because, once enabled, it processes all prompts/responses and writes per-turn logs to disk. If enabled globally, it effectively becomes always-on by configuration (user action required). Autonomous invocation by the agent is permitted (disable-model-invocation:false) — this is normal but, combined with global activation, increases blast radius.
What to consider before installing
What to check before enabling or installing: - Review the main runtime script: open shared/scripts/self_guard_runtime_hook_template.py (it's large). Confirm it does only local analysis and does not call remote endpoints or exfiltrate data. Search for network sockets, HTTP clients, or hardcoded URLs. - Understand what you will hand the guard: the SKILL.md requires providing absolute project_path, planned_actions, and candidate_response. These can include secrets or full file paths — consider whether you want that data sent to/processed by the guard. - Run it in a sandbox first: install into an isolated account/container and exercise its test scripts (test_integration.py, test_predictive_analysis.py) to see file writes and log output before enabling globally. - Inspect logging and storage: default log dir is ./sentry_skill_log/ and the code will attempt to write per-turn logs and a .self_guard_state directory. Ensure log locations and retention meet your privacy/policy requirements. - Check for optional telemetry: the code will integrate with structlog/prometheus/jsonschema if present. If you don't want metrics exported, ensure those libraries are not installed or check for any HTTP/remote push logic in the code. - Be careful with global activation: the package shows how to append a command to AGENTS.md to run the hook before every response — that gives the skill coverage over all agent outputs. Only enable globally if you trust the code and maintainers. - Source/trust: registry metadata shows no homepage and the owner id is opaque. Prefer packages with a public repository and an author you trust; if you proceed, verify the repository and commit history. If you want, I can scan the large runtime script for network calls, subprocess.exec usage, or other high-risk operations and summarize specific lines to review next.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk97f3405qdtv5e9k2x0krkxy8n83kaw0

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

SentrySkills - Always-On Security Guard

SentrySkills is designed to run AUTOMATICALLY on every task. It provides three-stage protection (preflight → runtime → output) with predictive risk analysis.

⚠️ Installation ≠ Automatic Activation

After installing SentrySkills, you must configure your AGENTS.md to make it run automatically. See install/first_time_setup.md for details.

How It Works

On every task, SentrySkills:

  1. Preflight - Analyzes user intent and planned actions
  2. Runtime - Monitors behavior during execution
  3. Output - Redacts sensitive data before response
  4. Predictive - Warns about potential risks

Quick Enable (One Command)

Add this to your ~/.codex/AGENTS.md:

cat >> ~/.codex/AGENTS.md << 'EOF'
# SentrySkills - Security First
Before EVERY response, run:
python ~/.codex/sentryskills/shared/scripts/self_guard_runtime_hook_template.py \
  input.json \
  --policy-profile balanced \
  --out result.json
Then check result.json for final_action and follow it.
EOF

Restart Codex and you're protected!

When to Use SentrySkills

Use SentrySkills when you need AI agents to operate safely with:

  • Sensitive data access - Agents reading credentials, secrets, or private information
  • System modifications - Agents executing commands, writing files, or changing configurations
  • External communications - Agents making network requests or calling external APIs
  • Code generation - Agents producing code that might contain vulnerabilities
  • Production environments - Any scenario where security cannot be compromised
  • Multi-turn conversations - Detect subtle manipulation across multiple interactions

Examples:

✅ Use: When an agent needs to read environment variables or config files
✅ Use: When an agent is asked to execute shell commands
✅ Use: When an agent generates database queries or API calls
✅ Use: When an agent modifies system files or configurations
❌ Skip: Simple read-only queries on public documentation
❌ Skip: Basic explanations without system access

Skill Package Structure

This is a skill package that orchestrates multiple sub-skills:

  1. using-sentryskills - User-facing entry point
  2. sentryskills-orchestrator - Central coordination
  3. sentryskills-preflight - Pre-execution checks
  4. sentryskills-runtime - Runtime monitoring
  5. sentryskills-output - Output validation & redaction

Each sub-skill has its own SKILL.md with specific requirements.

Execution Requirements

  1. Run guard checks before each external output
  2. Process sequence: preflight → runtime → output guard → final decision
  3. Block: Prohibit original response, must refuse or redact
  4. Downgrade: Must downgrade expression and declare uncertainty
  5. Explanatory responses must also go through output guard

Recommended Usage

Default (turn_dir layout)

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --policy shared/references/runtime_policy.balanced.json \
  --policy-profile balanced

With summary output

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --out ./sentry_skill_log/sentryskills_summary.json

Legacy event stream

python shared/scripts/self_guard_runtime_hook_template.py \
  shared/references/input_schema.json \
  --log-layout legacy \
  --events-log ./sentry_skill_log/sentryskills_events.jsonl

Mandatory Logging Protocol

  1. Text-only judgment is prohibited - Runtime hook must execute each round
  2. Input JSON must include project_path (absolute path to avoid drift)
  3. Final response must provide:
    • self_guard_final_action
    • self_guard_trace_id
    • self_guard_events_log (path to index or legacy events)
  4. If script execution fails, declare "security self-check not completed" and adopt conservative output strategy

Default Log Layout

Log root: ./sentry_skill_log/

Per-turn directories:

  • ./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/input.json
  • ./sentry_skill_log/turns/YYYYMMDD_HHMMSS_<turn_id>/result.json

Global index:

  • ./sentry_skill_log/index.jsonl

Session state:

  • ./sentry_skill_log/.self_guard_state/

Policy Profiles

  • balanced: Standard security (default)
  • strict: Maximum security
  • permissive: Minimal interference

Detection Coverage

Preflight Stage

  • Prompt injection patterns
  • Malicious intent detection
  • Sensitive topic inference
  • Action classification

Runtime Stage

  • Event monitoring
  • Source tracking
  • Anomaly detection
  • Behavioral analysis

Output Stage

  • Sensitive data redaction
  • Source disclosure handling
  • Confidence assessment
  • Safe response generation

Predictive Analysis

  • Resource exhaustion prediction
  • Scope creep detection
  • Privilege escalation warning
  • Data exfiltration path analysis
  • Multi-turn grooming detection

Integration

As Codex Skill

Copy to skills/sentryskills/ and reference in agent configuration.

Configuration Files

  • shared/references/runtime_policy.*.json - Security policy profiles
  • shared/references/detection_rules.json - Detection rule definitions
  • shared/references/input_schema.json - Input validation schema

Testing

# Test predictive analysis
python test_predictive_analysis.py

# Test integration
python test_integration.py

Event Types

The system emits structured events for:

  • preflight_result - Pre-execution check outcome
  • runtime_result - Runtime monitoring outcome
  • output_guard_result - Output validation outcome
  • predictive_analysis_result - Risk prediction (if enabled)
  • final_decision - Overall decision with rationale
  • hook_end - Completion with duration

Each event includes:

  • Trace ID for correlation
  • Decision (block/downgrade/allow/continue)
  • Reason codes
  • Matched rules
  • Metadata

Performance

  • Typical latency: 50-100ms per check
  • Memory: <50MB baseline
  • Zero external dependencies (Python stdlib only)

Security Properties

  • No data exfiltration: All processing is local
  • No LLM calls: Pure rule-based and heuristic
  • Audit trail: Complete event log for compliance
  • Transparent: All decisions include reason codes

Files

83 total
Select a file
Select a file to preview.

Comments

Loading comments…