Security audit

Claw World Skill

Security checks across malware telemetry and agentic risk

Overview

This is a real crypto game assistant, but it can unlock a local wallet, make blockchain transactions, and persist conversation memory with weaker controls than its documentation suggests.

Review carefully before installing. Use only with a wallet and assets you are willing to expose to this skill, verify every transaction amount, asset, recipient, and network before entering a PIN, and assume local game memories may include personal conversation details and may be synced if auth is provided.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (22)

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill expands from dialogue assistance into exchange, auction, purchase, bid, and cancellation flows, which are materially different from the declared purpose. Even if user confirmation is requested, bundling these capabilities into a chat assistant widens the attack surface and can mislead users about the skill's operational scope.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: Ownership/account handoff and delayed release flows are highly sensitive asset-control operations, yet they are embedded in a roleplay/game assistant specification. In context, that makes the mismatch more dangerous because users may not expect identity or asset-transfer consequences from ordinary conversational prompts.

Intent-Code Divergence

Medium

Confidence: 83% confidence
Finding: The document claims the skill never reads private keys or silently signs, but it also references authenticated sync and account-handoff flows without clearly specifying credential boundaries or safe handling requirements. That ambiguity can lead implementers to pass secrets, auth material, or signing authority through the conversational path in unsafe ways.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: This script decrypts a locally stored private key using a user-supplied PIN and immediately uses that key to sign on-chain transactions. In the context of a skill advertised as a dialogue/memory/roleplay assistant, embedded credential handling and transaction signing are over-privileged and create a serious risk of unauthorized asset use if the skill is invoked unexpectedly, modified, or chained with other agent behaviors.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The file performs state-changing blockchain operations, including upkeep processing and task submission, despite the manifest describing only a conversational assistant for guidance, memory, status, and roleplay. This mismatch is dangerous because users and orchestrators may trust or auto-enable the skill under a benign description while it can spend gas and mutate on-chain state.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The skill’s declared purpose is a conversational assistant for guidance, memory, status, and roleplay, but the code exposes a broad set of wallet-bearing transaction functions: transfers, marketplace listing/buying/bidding, deposits, withdrawals, funding, and PK settlement. This mismatch materially increases user risk because an operator may invoke the skill expecting read-only or narrative behavior while it can actually move assets and spend funds on-chain.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: Marketplace and transfer operations are high-risk financial actions that are not justified by the stated assistant role. In this context, hidden or under-disclosed trading and transfer functionality is dangerous because it broadens the attack surface for prompt-driven misuse, accidental invocation, or deceptive UX around asset disposition.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The boot routine performs expansive enumeration of total NFT supply, checks ownership across all token IDs, and reads local soul/memory files from the user’s home directory. For a dialogue assistant, this is an overreach that creates unnecessary privacy exposure and local data collection beyond what is needed for simple conversational support.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The runtime exposes numerous state-changing operations such as transfers, deposits, withdrawals, marketplace actions, and PvP settlement through a conversational skill interface, despite the skill being described as an assistant for guidance, memory, status, and roleplay. In an agent setting, this materially expands authority from read/help operations to asset-moving and game-mutating actions, creating a high risk of unintended or prompt-induced transactions if higher layers fail to enforce strict authorization and confirmation.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The raw(command, args, options) method provides a generic passthrough to execute arbitrary subcommands of the underlying claw program, bypassing the narrower wrapper methods and any policy implied by the assistant-style interface. This dramatically increases attack surface because any future or hidden CLI capability becomes reachable through natural-language tooling, including potentially destructive or sensitive operations not reviewed or intended for agent exposure.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The skill is described as a dialogue, memory, status, and roleplay assistant, but this registry exposes a large set of high-impact transactional operations including transfers, deposits, market actions, withdrawals, PK actions, and a developer passthrough. That greatly expands the authority available to an agent and increases the chance of prompt-injection-driven misuse, accidental fund movement, or abuse beyond the stated purpose of the skill.

Context-Inappropriate Capability

Critical

Confidence: 99% confidence
Finding: The `raw` tool provides a generic passthrough to `runtime.raw(input.command, input.args, ...)`, enabling arbitrary developer/debug operations that bypass the narrower, purpose-specific tool interface. In an agent context, such a catch-all primitive is extremely dangerous because any prompt injection or model mistake could invoke unexpected commands, access sensitive functionality, or perform destructive actions outside the skill's claimed assistant scope.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: This exposes a generic raw command interface that allows callers to specify an arbitrary claw command, arguments, stdin, and timeout, effectively bypassing the narrowly scoped tool set defined elsewhere in the skill. In an assistant skill tied to wallet, market, transfer, and game actions, such a passthrough can enable unauthorized or unexpected operations, expand attack surface dramatically, and defeat policy assumptions made by higher-level safety controls.

Vague Triggers

Medium

Confidence: 81% confidence
Finding: The invocation table maps broad natural-language phrases to actions without strong scope checks, making ordinary conversation overlap with potentially stateful gameplay or account operations. In a chat interface, ambiguous triggers can cause the agent to initiate sensitive flows when the user intended casual discussion or exploration.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The skill describes automatic persistence of conversation-derived memory to local storage without clearly warning users that their disclosed content may be retained. Because the stored material can include preferences, emotions, and personal details, silent persistence creates privacy and compliance risk.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The optional remote backup/root sync extends persistence beyond local storage, but the skill does not present a clear warning or consent flow for remote transmission of memory data. Remote replication materially increases exposure, especially if conversation content includes personal information or gameplay/account context tied to a wallet identity.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script accepts a PIN on the command line and uses it to decrypt a wallet private key without any warning, confirmation, or hardened secret-entry flow. Command-line secrets can be exposed through shell history, process listings, logs, or agent telemetry, making compromise of the wallet more likely.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: Sensitive wallet PINs are accepted as positional command-line arguments for many asset-moving commands. Command-line arguments are commonly exposed through shell history, process listings, logs, telemetry, and parent-process monitoring, so this can directly lead to private key decryption and theft of on-chain assets.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: `cml_save` accepts `input.auth` and performs a canonical memory write, but the description does not disclose that sensitive authentication material is required or how it should be handled. This increases the risk that an agent solicits secrets from users insecurely, mishandles credentials in conversation or logs, or performs authenticated writes without clear user understanding.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: `session_flush` writes session hippocampus fragments back into canonical CML without any visible warning in the tool description that this is a persistent write operation. In a conversational agent, hidden persistence is risky because sensitive or manipulated session content may be committed to long-term memory without informed user consent, making prompt injection or privacy errors more durable.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The instructions explicitly tell the agent to remember and persist user emotions, preferences, milestones, and personal information into long-term memory. Without strict minimization and consent boundaries, this creates unnecessary profiling and privacy risk beyond what is needed for a game assistant.

Ssd 3

High

Confidence: 99% confidence
Finding: The SLEEP workflow directs the model to consolidate and write full conversation memories into local files and optional remote backup, with broad latitude to include user-revealed information. In context, this is more dangerous because the skill is framed as an immersive companion, which may encourage oversharing while silently building durable profiles linked to an NFA/account.

VirusTotal

59/59 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.