Counterclaw Core

Security checks across malware telemetry and agentic risk

Overview

The core scanner is defensive, but the protected email path and persistent memory logging need review before use.

Install only if you are comfortable with default local logging into OpenClaw memory and optional Gmail sending authority. Before using the email helper, verify CounterClaw imports correctly, test with dry-run, use a dedicated or least-privilege Gmail account, and avoid relying on it to redact sensitive text automatically.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (14)

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
The README makes a broad 'No Network' security claim while also documenting bundled email scripts that send mail through the external `gog` CLI, which necessarily performs network operations. This is a security-relevant contradiction because operators may deploy the skill under an incorrect trust assumption and enable components that exfiltrate data externally.

Context-Inappropriate Capability

Medium
Confidence
72% confidence
Finding
The README introduces `TRUSTED_ADMIN_IDS` for 'admin-locked commands' even though the described skill purpose is prompt-injection detection and PII masking, not privileged command execution. Undocumented or unjustified admin-gated functionality expands the attack surface and suggests hidden control paths that could be abused if implemented elsewhere in the skill.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The skill description and module comments claim 'basic PII masking', but the implementation only detects PII and then warns or blocks; it never redacts or masks sensitive values. This mismatch is security-relevant because integrators may rely on masking behavior that does not exist, leading to accidental disclosure of raw PII in outbound content or logs.

Intent-Code Divergence

Medium
Confidence
92% confidence
Finding
The docstring states outbound emails are scanned 'before sending', implying a protective control, but the code provides an explicit bypass via allow_unsafe=True. That discrepancy can cause operators or downstream agents to assume sensitive data cannot be sent when in fact the protection is optional and easily disabled.

Description-Behavior Mismatch

Medium
Confidence
83% confidence
Finding
The package metadata declares access to TRUSTED_ADMIN_IDS and a persistent memory path even though the stated purpose is prompt-injection defense and basic PII masking. That scope expansion is security-relevant because it gives the skill access to privileged identity data and durable storage that could be abused for policy bypass, unauthorized data retention, or hidden state across runs.

Context-Inappropriate Capability

Medium
Confidence
85% confidence
Finding
Introducing trusted-admin and persistent-memory capabilities without a clear defensive justification violates least privilege and creates an unnecessary trust boundary expansion. In the context of an agent skill, this makes the package more dangerous because a supposedly defensive component could accumulate sensitive data or make decisions based on privileged identifiers outside its advertised role.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The script performs a real outbound action by sending email through `gog gmail send`, which exceeds the stated defensive purpose of a prompt-injection/PII-protection skill. In an agent context, adding message-sending capability creates a data exfiltration and unauthorized-action path, especially because the content being sent can include material that the scanner just flagged as unsafe.

Context-Inappropriate Capability

High
Confidence
96% confidence
Finding
An outbound Gmail capability is not necessary for a defensive scanning component and materially increases risk by enabling external communication from within the skill. In practice, this turns a filter into an active exfiltration mechanism if an agent is induced to send sensitive or manipulated content to attacker-controlled recipients.

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
The script is presented as a protective wrapper, but it includes an explicit bypass allowing transmission after PII detection. That mismatch is dangerous because users and calling agents may trust the wrapper to enforce blocking, when in reality it can be instructed to ignore its own safety check.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The middleware persists violation data and text samples to a local file even though the skill is presented as a defensive interceptor and basic PII masker. That creates undisclosed data retention of user content, and the masking only covers a few patterns, so sensitive prompts, secrets, or regulated data can still be stored on disk.

Intent-Code Divergence

Medium
Confidence
78% confidence
Finding
The code comments assert that writes are within a declared scope, but in the provided context there is no evidence that this file access is actually declared or consented to. That mismatch matters because the middleware creates and appends to a persistent file in the user's home directory, expanding the skill's effective data access and retention surface.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The skill description claims 'basic PII masking', but scan_output only reports whether PII patterns are present and returns the original text unchanged. In a defensive security component, this mismatch can cause downstream callers to assume sensitive data was redacted when it was merely flagged, leading to unintended disclosure of emails, phone numbers, or card numbers.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The middleware writes violation details and excerpts of user input/output to a persistent local file without any visible runtime disclosure or consent mechanism. This is dangerous because users may assume a defensive scanner inspects transiently, not that it stores content on disk where it may later be read, backed up, or exfiltrated.

Ssd 3

Medium
Confidence
97% confidence
Finding
The logged sample includes up to 100 characters of user-provided text and a list of detected violations, while masking only email, UK-style phone-like numbers, and simple card patterns. Secrets, tokens, health data, legal content, authentication material, and many other sensitive values can remain in plain language in persistent storage.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal