Counterclaw Core

Security checks across malware telemetry and agentic risk

Overview

The core scanner is defensive, but the protected email path and persistent memory logging need review before use.

Install only if you are comfortable with default local logging into OpenClaw memory and optional Gmail sending authority. Before using the email helper, verify CounterClaw imports correctly, test with dry-run, use a dedicated or least-privilege Gmail account, and avoid relying on it to redact sensitive text automatically.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (14)

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The README makes a broad 'No Network' security claim while also documenting bundled email scripts that send mail through the external `gog` CLI, which necessarily performs network operations. This is a security-relevant contradiction because operators may deploy the skill under an incorrect trust assumption and enable components that exfiltrate data externally.

Context-Inappropriate Capability

Medium

Confidence: 72% confidence
Finding: The README introduces `TRUSTED_ADMIN_IDS` for 'admin-locked commands' even though the described skill purpose is prompt-injection detection and PII masking, not privileged command execution. Undocumented or unjustified admin-gated functionality expands the attack surface and suggests hidden control paths that could be abused if implemented elsewhere in the skill.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill description and module comments claim 'basic PII masking', but the implementation only detects PII and then warns or blocks; it never redacts or masks sensitive values. This mismatch is security-relevant because integrators may rely on masking behavior that does not exist, leading to accidental disclosure of raw PII in outbound content or logs.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The docstring states outbound emails are scanned 'before sending', implying a protective control, but the code provides an explicit bypass via allow_unsafe=True. That discrepancy can cause operators or downstream agents to assume sensitive data cannot be sent when in fact the protection is optional and easily disabled.

Description-Behavior Mismatch

Medium

Confidence: 83% confidence
Finding: The package metadata declares access to TRUSTED_ADMIN_IDS and a persistent memory path even though the stated purpose is prompt-injection defense and basic PII masking. That scope expansion is security-relevant because it gives the skill access to privileged identity data and durable storage that could be abused for policy bypass, unauthorized data retention, or hidden state across runs.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: Introducing trusted-admin and persistent-memory capabilities without a clear defensive justification violates least privilege and creates an unnecessary trust boundary expansion. In the context of an agent skill, this makes the package more dangerous because a supposedly defensive component could accumulate sensitive data or make decisions based on privileged identifiers outside its advertised role.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The script performs a real outbound action by sending email through `gog gmail send`, which exceeds the stated defensive purpose of a prompt-injection/PII-protection skill. In an agent context, adding message-sending capability creates a data exfiltration and unauthorized-action path, especially because the content being sent can include material that the scanner just flagged as unsafe.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: An outbound Gmail capability is not necessary for a defensive scanning component and materially increases risk by enabling external communication from within the skill. In practice, this turns a filter into an active exfiltration mechanism if an agent is induced to send sensitive or manipulated content to attacker-controlled recipients.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The script is presented as a protective wrapper, but it includes an explicit bypass allowing transmission after PII detection. That mismatch is dangerous because users and calling agents may trust the wrapper to enforce blocking, when in reality it can be instructed to ignore its own safety check.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The middleware persists violation data and text samples to a local file even though the skill is presented as a defensive interceptor and basic PII masker. That creates undisclosed data retention of user content, and the masking only covers a few patterns, so sensitive prompts, secrets, or regulated data can still be stored on disk.

Intent-Code Divergence

Medium

Confidence: 78% confidence
Finding: The code comments assert that writes are within a declared scope, but in the provided context there is no evidence that this file access is actually declared or consented to. That mismatch matters because the middleware creates and appends to a persistent file in the user's home directory, expanding the skill's effective data access and retention surface.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill description claims 'basic PII masking', but scan_output only reports whether PII patterns are present and returns the original text unchanged. In a defensive security component, this mismatch can cause downstream callers to assume sensitive data was redacted when it was merely flagged, leading to unintended disclosure of emails, phone numbers, or card numbers.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The middleware writes violation details and excerpts of user input/output to a persistent local file without any visible runtime disclosure or consent mechanism. This is dangerous because users may assume a defensive scanner inspects transiently, not that it stores content on disk where it may later be read, backed up, or exfiltrated.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The logged sample includes up to 100 characters of user-provided text and a list of detected violations, while masking only email, UK-style phone-like numbers, and simple card patterns. Secrets, tokens, health data, legal content, authentication material, and many other sensitive values can remain in plain language in persistent storage.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal