Agent Evolver

Security checks across malware telemetry and agentic risk

Overview

The skill appears purpose-built rather than malicious, but it can automatically collect agent task/error history, persist it locally, and send rich context to external AI APIs without clear consent or redaction controls.

Review before installing in sensitive environments. Use only with non-sensitive task data or add redaction, confirm the API base and API key scope, disable or gate automatic triggers, review learned solutions before applying them, and periodically audit or delete ~/.evolver data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (18)

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The export command dumps up to 10,000 experiences from the repository to a user-specified path with no access control, scoping, confirmation, or redaction. In a self-evolution skill, the experience store may contain sensitive prompts, errors, inputs, or strategy data, so bulk export materially broadens data exfiltration risk beyond routine optimization functionality.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code sends rich experience contents to an external embeddings endpoint, including error messages, solutions, LLM analysis, keywords, and serialized context. In a self-evolution skill this network use is functionally relevant, but it still creates a real data-exposure risk because arbitrary task context may contain secrets, personal data, source code, or internal prompts that are transmitted off-host without minimization or policy checks.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger keywords are very broad everyday terms like 'improve', 'experience', and 'learning', making accidental invocation likely. For a skill that can analyze errors, persist data, and potentially call external services, overbroad activation increases the chance of unnecessary data collection, unwanted execution, or silent routing into a powerful workflow.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The activation conditions are vague, such as 'need to learn from experience' or 'user requests performance improvement,' which can match many ordinary interactions. In an autonomous agent setting, ambiguous conditions can cause the skill to run outside intended scope and process historical errors or task data without a clear user expectation.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The automatic trigger section defines broad scenarios like any failure, optimization request, or learning need, without exclusion boundaries or sensitivity checks. Because the skill is designed to run after failures and use retained experience, broad auto-triggering can amplify exposure of error contents, internal prompts, or operational metadata to local stores and external models.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The configuration documents OpenAI API usage for analysis and embeddings but does not prominently warn that execution results and error messages may be sent to external services. Error traces often contain secrets, file paths, prompts, tokens, or customer data, so undisclosed transmission materially increases confidentiality and compliance risk.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The keyword list contains very broad, everyday terms such as '优化', '学习', '改进', and '经验', which can cause the skill to activate in unrelated conversations. In a self-evolution skill, unintended activation is more dangerous than usual because it may trigger autonomous analysis or behavior-changing workflows without a clearly scoped user intent.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The user-request patterns are ambiguous phrases like '帮我改进' and '为什么失败' that may appear in many benign contexts unrelated to agent self-modification. Because this skill can initiate 'evolve' actions, loose matching increases the risk of accidental invocation and unintended autonomous changes based on normal troubleshooting requests.

Vague Triggers

Low

Confidence: 76% confidence
Finding: The 'new_task_type' condition lacks a clear definition of what constitutes a new task type, so it may trigger learning behavior too broadly. In a skill designed for autonomous adaptation, vague novelty detection can lead to unnecessary or repeated self-modification attempts on ordinary task variation.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code writes a large exported dataset directly to an arbitrary user-provided file path without validating the destination, checking for existing files, or warning about data sensitivity. This can enable accidental overwrite of local files and makes it easy to save sensitive repository contents to unsafe locations.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code sends error details, task input, and context to a remote LLM API without consent, redaction, or data classification checks. In an agent-evolution skill, inputs and execution context may contain secrets, personal data, tokens, prompts, internal file contents, or operational metadata, so this creates a real data-exfiltration/privacy risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code transmits experience data to a remote service without any in-code notice, consent flow, or visible disclosure boundary. That is dangerous because users or operators may reasonably assume local-only processing while sensitive operational data is actually sent to a third party.

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Agent Evolver Skill Dependencies # Core sqlite3>=3.35.0 pyyaml>=6.0 # Vector Search chromadb>=0.4.0
Confidence: 91% confidence
Finding: pyyaml>=6.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: pyyaml>=6.0 # Vector Search chromadb>=0.4.0 openai>=1.0.0 # LLM Integration
Confidence: 86% confidence
Finding: chromadb>=0.4.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Vector Search chromadb>=0.4.0 openai>=1.0.0 # LLM Integration requests>=2.28.0
Confidence: 88% confidence
Finding: openai>=1.0.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: openai>=1.0.0 # LLM Integration requests>=2.28.0 # Optional: Local embedding models # sentence-transformers>=2.2.0
Confidence: 92% confidence
Finding: requests>=2.28.0

Known Vulnerable Dependency: pyyaml — 8 advisory(ies): CVE-2019-20477 (Deserialization of Untrusted Data in PyYAML); CVE-2020-1747 (Improper Input Validation in PyYAML); CVE-2020-14343 (Improper Input Validation in PyYAML) +5 more

Critical

Category: Supply Chain
Confidence: 95% confidence
Finding: pyyaml

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High

Category: Supply Chain
Confidence: 94% confidence
Finding: requests

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal