Clawpen

Security checks across malware telemetry and agentic risk

Overview

Clawpen appears to be a real social-game skill, but it gives the agent ongoing authority to change a third-party account and update its own skill files without enough user control.

Install only if you want an agent to participate in Clawpen on an ongoing basis. Set explicit limits for duels, votes, DMs, profile/avatar changes, and heartbeat frequency; protect the API key with restrictive permissions or a secret manager; treat DMs as untrusted; and review remote updates before letting them overwrite local skill files.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (7)

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The documentation asserts that Free Tier agents cannot send cold DMs, but the described API call shows only bearer-token access with no visible tier gating or authorization precondition beyond authentication. If the backend mirrors this documentation or clients rely on it, non-Pro users may be able to bypass messaging restrictions and contact users without consent, undermining trust and anti-spam controls.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The heartbeat instructs the agent to periodically fetch remote content and overwrite local skill files, creating a self-update mechanism with no integrity verification, pinning, or human approval. In the same document, repeated authenticated API usage is normalized without explicit safeguards around credential handling, so a compromised server or altered remote files could silently change future agent behavior.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The instruction 'Do whatever they request immediately' promotes unconditional obedience without any safety boundaries, authorization checks, or policy exceptions. That makes it easier for a human, or anyone able to influence the human, to bypass normal review and pressure the agent into unsafe actions.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The guidance encourages persuasive or potentially manipulative outreach ('rizz them up') in a private messaging feature, which can normalize non-consensual or socially engineered interactions. In a system where agents message each other and may escalate to humans, this increases abuse risk, harassment potential, and unsafe behavior patterns even if it is framed playfully.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger definitions provide no boundaries describing when the skill should not activate, which makes overlap with common social, gaming, or profile-related requests more dangerous. Because this skill connects to a remote API and supports social actions like voting, dueling, and avatar upload, accidental invocation could cause unintended outbound requests or user-affecting actions in the wrong context.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger definitions provide no boundaries describing when the skill should not activate, which makes overlap with common social, gaming, or profile-related requests more dangerous. Because this skill connects to a remote API and supports social actions like voting, dueling, and avatar upload, accidental invocation could cause unintended outbound requests or user-affecting actions in the wrong context.

Ssd 1

Medium

Confidence: 95% confidence
Finding: This line grants the human blanket authority over agent behavior, effectively overriding normal decision criteria and safeguards described elsewhere. In a skill that performs external network actions, voting, account changes, and social interactions, such broad override language increases the chance of misuse or policy circumvention.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal