Agent Evolver

Security checks across malware telemetry and agentic risk

Overview

The skill appears purpose-built rather than malicious, but it can automatically collect agent task/error history, persist it locally, and send rich context to external AI APIs without clear consent or redaction controls.

Review before installing in sensitive environments. Use only with non-sensitive task data or add redaction, confirm the API base and API key scope, disable or gate automatic triggers, review learned solutions before applying them, and periodically audit or delete ~/.evolver data.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (18)

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The export command dumps up to 10,000 experiences from the repository to a user-specified path with no access control, scoping, confirmation, or redaction. In a self-evolution skill, the experience store may contain sensitive prompts, errors, inputs, or strategy data, so bulk export materially broadens data exfiltration risk beyond routine optimization functionality.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The code sends rich experience contents to an external embeddings endpoint, including error messages, solutions, LLM analysis, keywords, and serialized context. In a self-evolution skill this network use is functionally relevant, but it still creates a real data-exposure risk because arbitrary task context may contain secrets, personal data, source code, or internal prompts that are transmitted off-host without minimization or policy checks.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The trigger keywords are very broad everyday terms like 'improve', 'experience', and 'learning', making accidental invocation likely. For a skill that can analyze errors, persist data, and potentially call external services, overbroad activation increases the chance of unnecessary data collection, unwanted execution, or silent routing into a powerful workflow.

Vague Triggers

Medium
Confidence
85% confidence
Finding
The activation conditions are vague, such as 'need to learn from experience' or 'user requests performance improvement,' which can match many ordinary interactions. In an autonomous agent setting, ambiguous conditions can cause the skill to run outside intended scope and process historical errors or task data without a clear user expectation.

Vague Triggers

Medium
Confidence
87% confidence
Finding
The automatic trigger section defines broad scenarios like any failure, optimization request, or learning need, without exclusion boundaries or sensitivity checks. Because the skill is designed to run after failures and use retained experience, broad auto-triggering can amplify exposure of error contents, internal prompts, or operational metadata to local stores and external models.

Missing User Warnings

High
Confidence
97% confidence
Finding
The configuration documents OpenAI API usage for analysis and embeddings but does not prominently warn that execution results and error messages may be sent to external services. Error traces often contain secrets, file paths, prompts, tokens, or customer data, so undisclosed transmission materially increases confidentiality and compliance risk.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The keyword list contains very broad, everyday terms such as '优化', '学习', '改进', and '经验', which can cause the skill to activate in unrelated conversations. In a self-evolution skill, unintended activation is more dangerous than usual because it may trigger autonomous analysis or behavior-changing workflows without a clearly scoped user intent.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The user-request patterns are ambiguous phrases like '帮我改进' and '为什么失败' that may appear in many benign contexts unrelated to agent self-modification. Because this skill can initiate 'evolve' actions, loose matching increases the risk of accidental invocation and unintended autonomous changes based on normal troubleshooting requests.

Vague Triggers

Low
Confidence
76% confidence
Finding
The 'new_task_type' condition lacks a clear definition of what constitutes a new task type, so it may trigger learning behavior too broadly. In a skill designed for autonomous adaptation, vague novelty detection can lead to unnecessary or repeated self-modification attempts on ordinary task variation.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The code writes a large exported dataset directly to an arbitrary user-provided file path without validating the destination, checking for existing files, or warning about data sensitivity. This can enable accidental overwrite of local files and makes it easy to save sensitive repository contents to unsafe locations.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The code sends error details, task input, and context to a remote LLM API without consent, redaction, or data classification checks. In an agent-evolution skill, inputs and execution context may contain secrets, personal data, tokens, prompts, internal file contents, or operational metadata, so this creates a real data-exfiltration/privacy risk.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The code transmits experience data to a remote service without any in-code notice, consent flow, or visible disclosure boundary. That is dangerous because users or operators may reasonably assume local-only processing while sensitive operational data is actually sent to a third party.

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Agent Evolver Skill Dependencies
# Core
sqlite3>=3.35.0
pyyaml>=6.0

# Vector Search
chromadb>=0.4.0
Confidence
91% confidence
Finding
pyyaml>=6.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
pyyaml>=6.0

# Vector Search
chromadb>=0.4.0
openai>=1.0.0

# LLM Integration
Confidence
86% confidence
Finding
chromadb>=0.4.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Vector Search
chromadb>=0.4.0
openai>=1.0.0

# LLM Integration
requests>=2.28.0
Confidence
88% confidence
Finding
openai>=1.0.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
openai>=1.0.0

# LLM Integration
requests>=2.28.0

# Optional: Local embedding models
# sentence-transformers>=2.2.0
Confidence
92% confidence
Finding
requests>=2.28.0

Known Vulnerable Dependency: pyyaml — 8 advisory(ies): CVE-2019-20477 (Deserialization of Untrusted Data in PyYAML); CVE-2020-1747 (Improper Input Validation in PyYAML); CVE-2020-14343 (Improper Input Validation in PyYAML) +5 more

Critical
Category
Supply Chain
Confidence
95% confidence
Finding
pyyaml

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High
Category
Supply Chain
Confidence
94% confidence
Finding
requests

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal