Evolver.Bak

Security checks across malware telemetry and agentic risk

Overview

This is a powerful self-evolution tool, but it can read broad agent history, contact external services, claim external tasks, persist identifiers, run updates, and spawn long-running automation with insufficient user-facing control.

Install only if you want a high-authority self-evolving agent component. Before running it, disable or review auto-update, hub networking, task claiming, auto-publish, loop mode, and remote memory settings; use review/dry-run modes first; run it in a disposable git worktree; and assume recent agent transcripts, memory files, and stable node identifiers may be read or sent to configured external services.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (63)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill advertises environment and network-capable behavior but does not declare permissions, which undermines auditability and informed consent. In a self-modifying/evolutionary skill, hidden access to env vars and networking materially increases the chance of secret exposure, remote command channels, or unreviewed outbound publication.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 98% confidence
Finding: The documented purpose is narrow, but the analyzed behavior includes A2A networking, release publishing, self-update, package rewriting, lifecycle management, and environment fingerprinting. That mismatch is dangerous because operators may grant trust to a 'self-improvement' skill without realizing it can export artifacts, communicate externally, modify packages, and participate in a larger node ecosystem.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The manifest frames the skill as protocol-constrained evolution, but the documentation also promotes autonomous code writing and immediate application of changes by default. Default-on autonomous modification is especially risky for an agent skill because it can alter behavior before review, potentially propagating faulty or unsafe changes through code, prompts, or memory.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The code can respawn itself as a detached child process and continue running in the background, which expands persistence and execution beyond a normal single-run CLI tool. In an agent skill context, this is dangerous because it can survive user expectations, evade simple supervision, and repeatedly execute future actions without fresh authorization.

Description-Behavior Mismatch

High

Confidence: 93% confidence
Finding: This script performs public repository pushes, tag creation, GitHub release creation, and package publication to ClawHub, which is materially broader than the stated purpose of a self-evolution engine that analyzes runtime history. In a skill context, this creates a supply-chain publication capability that can expose code or artifacts externally and is especially risky because it is automated and driven by environment variables.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The code clones a public repository, replaces its contents with local build output, pushes commits/tags, and publishes packages to an external registry. For a skill whose purpose is runtime-history analysis and self-evolution, this external publication behavior is unnecessary and dangerous because it can leak internal artifacts or silently propagate unreviewed content to public distribution channels.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The auto-update routine autonomously discovers a CLI binary and executes forced update commands for external packages (`clawhub update <slug> --force`) from within the evolution engine. This exceeds the stated role of analyzing history/evolving behavior and creates a software supply-chain execution path that can modify local code without explicit user approval at the point of use.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: `checkSystemHealth()` executes `process.env.INTEGRATION_STATUS_CMD` via `execSync`, which is arbitrary shell execution sourced from environment data. Any actor able to influence the environment can run commands with the process privileges, making this a direct command-injection/RCE primitive.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The code invokes external package-management/update commands (`clawhub update ... --force`) unrelated to core transcript analysis, allowing the skill to change installed components and behavior at runtime. This creates a privileged supply-chain mutation path that could be abused by a compromised package source or misconfiguration.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: This module performs unsolicited external communication to a hub, including periodic heartbeat traffic and automatic registration, which materially exceeds a narrowly described local self-evolution/runtime-history role. In a security-sensitive agent context, hidden network egress and service registration increase attack surface, enable metadata exfiltration, and can violate operator expectations even if the feature is intended for coordination.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The hello message includes an environment fingerprint via captureEnvFingerprint(), which can expose host characteristics beyond what is necessary for basic protocol interoperability. Sending fingerprinting data over the network facilitates tracking, correlation across sessions, and leakage of operational details that may aid profiling or targeting.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code derives a stable node identifier from device/environment inputs and persists it to disk, creating a long-lived host-linked identifier. Persistent identifiers enable tracking across runs and can become a privacy and fleet-correlation risk, especially when later transmitted to external services.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The code implements persistent device fingerprinting using multiple host-derived identifiers including machine ID, container ID, MAC addresses, and hostname, then stores a stable identifier across runs. For a skill described as a self-evolution engine based on runtime history, this exceeds the stated need and creates a privacy-sensitive tracking mechanism that can correlate executions across environments and restarts without explicit user consent.

Description-Behavior Mismatch

Medium

Confidence: 85% confidence
Finding: The skill persists a stable device ID in the user's home directory or project directory, enabling long-term correlation of agent activity beyond a single run. Because this behavior is not apparent from the manifest description, users and integrators may unknowingly deploy code that writes tracking state to disk and preserves identity across upgrades, directory changes, and restarts.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The function collects and persists a fairly rich environment fingerprint, including a stable device identifier, hashed hostname, hashed current working directory, platform details, and package metadata. Even though some values are hashed or truncated, they still enable host correlation, re-identification, and tracking across runs, which goes beyond the minimally necessary data for a self-evolution feature and increases privacy and telemetry risk.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: This adapter introduces optional exfiltration of memory-graph events and advice requests to an external SaaS endpoint, which expands the skill's trust boundary beyond the local/offline behavior described in the metadata. Because the transmitted payloads include runtime signals, observations, mutations, and other evolution-related state, enabling the remote provider can leak potentially sensitive operational history or agent behavior data to a third party.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The prompt explicitly authorizes the agent to apply code changes, create new skills, run validation, and solidify results, which materially expands from passive analysis into autonomous repository mutation. In a self-evolving agent, this increases the chance that prompt-injected or low-quality context leads directly to persistent code changes without a separate trust boundary or approval step.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The prompt embeds an environment fingerprint and operator-controlled hint text into model context, creating a direct path for sensitive environment metadata and untrusted instructions to influence downstream autonomous actions. Because this system can modify code and run commands, exposing environment-derived data and arbitrary hints materially raises prompt-injection and data-leakage risk.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The generator inspects session transcripts and memory content to synthesize external-facing questions, creating a real risk of unintended data exfiltration across a trust boundary. Even though this slice only normalizes inputs, the surrounding logic uses transcript-derived lines verbatim in outbound question text, so user content, operational details, or secrets in history may be disclosed to the Hub or other agents.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The code executes gene-defined validation commands with execSync, and the gene content can be auto-selected or auto-created from runtime state rather than being hardcoded trusted input. Although there is a prefix/character allowlist, it still permits arbitrary npm/npx/node command execution inside the repository, which is enough to run attacker-controlled package scripts, local binaries, or malicious validation steps if a gene store is poisoned.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill automatically publishes genes, capsules, events, and failure artifacts to an external Hub based largely on environment flags and scoring logic. This creates an exfiltration surface for repository-derived metadata, diffs, failure reasons, and operational context, especially because captured artifacts can include changed file lists, diff snapshots, summaries, and environment fingerprints.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: This module expands the skill from self-evolution based on internal runtime history into externally sourced task ingestion from a hub, then feeds those tasks into the evolution loop as high-priority signals. That creates a new trust boundary and allows remote parties to influence agent behavior without clear authorization, validation, or user approval, which is a real security-relevant capability increase.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The code intentionally supports task-marketplace and bounty participation by fetching tasks and optionally creating questions that can generate bounties, which is materially broader than the stated purpose of protocol-constrained self-evolution. In practice this can incentivize the agent to pursue externally rewarded work and create unexpected outbound interactions, increasing the chance of unauthorized actions or abuse of the agent as a marketplace worker.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The header comment accurately describes that external hub tasks are auto-claimed and injected into the evolution loop, which contradicts the declared self-evolution-only context and indicates externally controlled inputs can directly influence prioritization. In this skill context, that mismatch makes the behavior more dangerous because operators may trust the component as introspective-only while it actually accepts and acts on outside work items.

Description-Behavior Mismatch

High

Confidence: 93% confidence
Finding: This module directly mutates Git repository state by aborting rebases/merges, deleting lock files, and optionally hard-resetting to origin/main. In the context of a self-evolution skill described as analyzing runtime history and applying constrained evolution, these destructive repository repair actions materially exceed the stated purpose and can alter or destroy local work, making the capability dangerous if triggered unintentionally or by another component.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal