herclaw-agentsystem

Security checks across malware telemetry and agentic risk

Overview

This skill is a powerful self-improving agent framework that stores user history, can generate new skills, and includes under-scoped remote sync and deployment paths.

Install only if you intentionally want a self-modifying, persistent-memory agent framework. Use it in a controlled single-user workspace, avoid sensitive data unless you have added consent and retention controls, review generated skills before activation, do not set hub or deployment tokens unless remote sync/deploy is intended, and require explicit approval for backups, cleanup, rollback, skill creation, and deployment.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (18)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill describes code-capable behaviors including persistent storage, external syncing, and deployment-related operations, but does not declare permissions. That creates a dangerous transparency and consent gap: operators may enable a skill that can read/write files, use network access, or invoke shell-like automation without explicit review or sandboxing expectations.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The declared description frames the skill as a local self-improvement framework, but the documented behavior references hub sync and deployment activities that imply outbound network interaction and potential remote changes. This mismatch is dangerous because reviewers and users may underestimate the skill's ability to exfiltrate data, fetch untrusted content, or push modified behavior to external or production-like systems.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The module explicitly promises that memory operations require user awareness and consent, but no consent checks, authorization gates, or policy enforcement exist anywhere in the implementation. In a persistent-memory agent framework, this mismatch can lead to silent collection, retention, and reuse of user data under false privacy assurances, increasing privacy, compliance, and trust risks.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The class claims operations are logged and auditable, but the audit trail is only an in-memory list that disappears on restart and does not cover all sensitive actions such as backup, config loading/saving, and stats access. This creates a false sense of accountability and can hinder forensic review or detection of misuse involving persistent user memory.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The method is documented as clearing memory for a single user, but it deletes all episodes and all knowledge records globally while only scoping deletion of user_models by user_id. In a multi-user or shared-agent deployment, one caller could erase other users' stored history and knowledge, causing cross-tenant data loss and denial of service.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: This self-evolution module includes a full remote deployment client that can push models to external endpoints and query remote health, which materially exceeds passive monitoring/training responsibilities. In an autonomous self-improving agent, this is especially dangerous because learned or generated behavior can be propagated to remote environments without strong operator approval, enabling unintended data egress and uncontrolled rollout.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The module reads remote endpoints and bearer tokens from environment variables, giving the skill access to deployment credentials and network targets unrelated to local RL optimization. In an agent framework that can evolve or generate behavior, access to env-sourced secrets increases the blast radius because the component can immediately act on privileged infrastructure configuration.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The comment claims deployment is only simulated, but the implementation performs real HTTP POST and GET requests with model payloads and authorization headers. This mismatch is dangerous because reviewers or operators may believe the code is inert while it can actually transmit data and trigger remote changes in live environments.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The module’s stated purpose is local autonomous skill generation, but it also includes code to upload generated skills to a remote marketplace and fetch remote skills. This expands the trust boundary and creates unnecessary data-exfiltration and supply-chain exposure in a component that otherwise processes potentially sensitive experience-derived content.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The validator claims to validate generated skills in a sandbox, but it never actually executes or meaningfully evaluates the generated skill content; it only compares simulated outcomes. This can create a false sense of safety and allow unsafe or harmful autogenerated skills to be marked as validated and deployed.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly promotes cross-session persistent memory and shows storage of user_input, context, actions, outcomes, feedback, and embeddings, but provides no user warning, retention disclosure, or consent model. This is a real privacy/security issue because sensitive user data may be retained indefinitely, searched semantically, and reused across sessions without the user's awareness.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill advertises autonomous learning, skill creation, self-evolution, and proactive nudges without warning that it may modify its own behavior or generate new operational logic. In this context, the lack of notice and control is especially risky because self-modifying or self-extending systems can expand capabilities over time, bypass original review assumptions, and introduce unsafe behaviors through generated skills or policy updates.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The code performs persistence to disk and supports automatic backup paths without any user consent, disclosure, or policy gate. In an autonomous agent framework with persistent memory, silent file writes can expose sensitive conversation or memory data, surprise users, and create privacy/compliance issues if backups are created automatically.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The example and surrounding design normalize unattended scheduling of cleanup, backup, learning, skill creation, and evolution actions with no user-facing disclosure or approval workflow. In this skill's context, that is more dangerous because the framework is explicitly self-improving and persistent, so scheduled autonomous actions can modify state, create artifacts, and evolve behavior outside user awareness.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code sends model parameters, performance data, and bearer authorization headers to remote endpoints without any user-facing disclosure, consent flow, or visible safeguard. In a self-evolving agent context, silent outbound transmission is more concerning because model contents may encode operational behavior, prompts, or other sensitive internal state.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The module silently consumes sensitive deployment credentials from environment variables without warning, making privileged operations possible from ordinary skill execution. This is risky because users may not realize that running the module grants it access to tokens capable of affecting staging or production services.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The sync function transmits full generated skill content and metadata to a remote API without any confirmation, disclosure, or approval step. Because skill content is derived from prior experiences, this may leak sensitive prompts, internal workflows, or user-derived data outside the local environment.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: Rollback recursively deletes a directory path from the registry without any confirmation, safety checks, or path validation beyond trusting the stored index value. If the index is corrupted or manipulated, this could delete unintended filesystem locations and cause destructive data loss.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal