Claw Gatekeeper

Security checks across malware telemetry and agentic risk

Overview

This looks like a genuine safety tool, but its promised protections do not fully match its defaults, examples, and installation guidance.

Review the effective configuration before installing. Prefer a pinned ClawHub package over the README's curl-latest flow, use strict or hardened mode if you expect MEDIUM actions to require confirmation, verify CRITICAL operations cannot be persistently allowed in your setup, and understand where session approvals, whitelists, audit logs, backups, and cron cleanup are stored.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (31)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 96% confidence
Finding: The skill advertises only a guard/confirmation role, but the documentation clearly indicates capabilities spanning file read/write, shell, and network operations without any declared permission model. Undeclared powerful capabilities make review, sandboxing, and user consent much weaker, especially for a persistent resident skill intended to intercept sensitive operations.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented behavior goes well beyond a simple confirmation gate and includes policy editing, whitelist/blacklist persistence, log export/deletion, hardening scripts, cron/config modification, and sensitive-data scanning. This mismatch is dangerous because operators may trust the skill as a passive safeguard while it actually has broader administrative and persistence-related functionality that can alter the host and reduce auditability.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The documented default mode says LOW+MEDIUM are auto-allowed, which contradicts the manifest statement that MEDIUM/HIGH require confirmation. For a security control, policy inconsistency is dangerous because users may deploy the skill with incorrect assumptions, resulting in riskier operations being silently permitted.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The skill claims CRITICAL actions cannot receive persistent approval, yet the documented UI offers 'Always allow' for CRITICAL operations. That contradiction can normalize permanent authorization for the very actions that should require per-action review, enabling destructive commands or sensitive access to bypass future human checks.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: Repeated examples show CRITICAL workflows alongside persistent allow/deny semantics, contradicting the stated guarantee that every CRITICAL action is individually confirmed. In a safety-control skill, inconsistent policy documentation is itself hazardous because users may incorrectly assume protections exist while risky approvals are actually available or expected.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The hardened config claims all operations require per-action human confirmation, but the embedded notes still state session approvals are available for MEDIUM/HIGH actions. This inconsistency can cause implementers or downstream components to permit broader approval caching than intended, weakening the safety boundary for risky operations.

Intent-Code Divergence

Low

Confidence: 84% confidence
Finding: The file advertises 'no auto-allow' maximum safety, yet later describes whitelist entries as bypassing detailed analysis. Even if confirmation is still required, this creates ambiguity about whether whitelisted actions receive reduced scrutiny, which can lead to unsafe assumptions or implementation drift.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The guide documents persistent whitelist/blacklist behavior, but the manifest describes session-level approval semantics for MEDIUM/HIGH risk. In a safety-control skill, mismatched documentation is dangerous because users may configure broader or longer-lived trust decisions than the actual security model intends, causing risky operations to be silently allowed beyond the approved scope.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The guide suggests users can 'always deny' or otherwise persist decisions for a CRITICAL operation example, while the manifest states CRITICAL actions must be confirmed every time with no session approval. This undermines the core safety guarantee of the skill and could normalize persistent handling of actions that should always require fresh human review.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The documentation introduces multiple operating modes and policy behaviors not described in the manifest's fixed approval model. In a resident security gatekeeper, undocumented or contradictory modes can lead operators to believe protections exist or are relaxed in ways that bypass required confirmations, reducing trust in the control boundary.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The guide describes persistent whitelist/blacklist management capabilities outside the manifest's stated scope. For a permission-enforcement skill, unscoped persistent trust lists can create durable bypasses for dangerous commands or paths, especially if users follow the guide and assume those changes are part of the approved threat model.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The troubleshooting advice explicitly tells users how to auto-allow MEDIUM-risk operations in non-interactive environments, contradicting the manifest requirement that MEDIUM/HIGH operations require user confirmation with only session-level approval after confirmation. This creates a direct path to downgrade protections in automation contexts where dangerous actions may execute unattended.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The export function can write the full audit history, which may contain sensitive operation details, comments, and metadata, to any caller-supplied path without restriction or redaction. In a resident gatekeeping skill, this increases the chance of local data disclosure, accidental placement in shared locations, or abuse by another component to copy sensitive audit records outside the protected log directory.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The script presents a security report claiming that all operations require confirmation, while its verification logic expects LOW-risk operations to remain auto-allowed. This inconsistency can mislead operators into believing the system is stricter than it actually is, causing unsafe reliance on protections that are not enforced.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The secure deployment script advertises and reports a policy that differs from the skill metadata, changing expected gatekeeper behavior without clear alignment. In a security control tool, policy drift is dangerous because users may make trust decisions based on documentation while the actual enforcement behaves differently.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: When `interactive` mode short-circuits on a pre-approved operation, it always reports the decision as `allow_session`, even for LOW-risk auto-allowed actions. This mislabels the authorization source and can poison downstream audit, policy, or enforcement logic that distinguishes unconditional LOW-risk allowance from session-scoped approval, potentially causing over-broad trust or incorrect operator understanding.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The default policy sets medium_requires_approval to False, which directly contradicts the skill metadata claiming MEDIUM risk operations require user confirmation. Because this skill is a resident safety control, such a mismatch can silently permit higher-risk actions than operators expect, undermining trust and protection boundaries.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The standard mode preset preserves MEDIUM auto-allow behavior instead of requiring approval as the manifest promises. In a guardrail component, insecure-by-default presets are dangerous because users will typically rely on standard mode and assume the documented controls are active.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: Loose mode disables approval for HIGH-risk operations, conflicting with the stated guarantee that HIGH risks require confirmation with at most session-level approval. For a persistent gatekeeper skill, this creates a serious fail-open path where dangerous actions may execute without the human checkpoint users were told to expect.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The summary reports LOW approval status using 'not self.policy.low_auto_allow', which is semantically confusing and likely inverted relative to the displayed approval rules. Misreporting effective policy is dangerous in a security control because operators may make decisions based on incorrect status output and fail to notice permissive settings.

Description-Behavior Mismatch

Medium

Confidence: 98% confidence
Finding: The file sets `requires_approval` only for `HIGH` and `CRITICAL`, while the skill metadata explicitly says `MEDIUM/HIGH` operations must require user confirmation. In a resident safety skill, this policy mismatch creates a fail-open condition where medium-risk operations are silently allowed, undermining the advertised guardrail and potentially enabling destructive or sensitive actions without the intended human checkpoint.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill description promises logging of all `MEDIUM+` operations to `Operate_Audit.log`, but this file only prints JSON and exits without any audit write path. For a security control intended to monitor and gate risky actions, missing audit logging weakens detection, forensic visibility, and accountability, especially if the skill is loaded persistently.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The release notes instruct users to download a remote skill artifact via curl and install it persistently, but do not include any warning about executing untrusted third-party code or guidance to verify integrity/signatures before installation. In the context of a resident security-control skill, persistent installation increases trust and blast radius if the package or release channel is compromised.

Missing User Warnings

Medium

Confidence: 81% confidence
Finding: The release notes advertise one-click hardening and restore commands that change security posture, configuration, and permissions, but they do not clearly warn users that these scripts will modify local state when executed. In a security-control skill intended to run persistently, encouraging copy-paste execution without an explicit modification warning increases the chance of unintended system changes or unsafe assumptions about what the scripts do.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: Audit records can contain sensitive operational history and user comments, yet export() writes them to arbitrary files with no privacy notice, sanitization, or destination restrictions. In this skill context, the danger is elevated because the logs belong to a security gatekeeper and may reveal attempted high-risk actions, denials, and metadata useful to an attacker or inappropriate for broad sharing.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal