Fox Self Evolution

Security checks across malware telemetry and agentic risk

Overview

This is not executable malware, but it tells an agent it may change its own skills, memory, and behavior without asking the user.

Install only in an isolated test workspace if you specifically want to experiment with self-modifying agent behavior. Require explicit approval and visible diffs before any change to skills, prompts, memory, reasoning behavior, response formats, scheduled agents, or logs, and treat the safety and formal-verification claims as unverified.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (7)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The skill claims safety-preserving self-modification and proof-backed guarantees, but the actual workflow only shows validation, testing, and rollback. That gap is dangerous because operators may trust the system to perform autonomous modifications under a stronger assurance model than is actually implemented, increasing the chance of unsafe or policy-violating changes being accepted.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The document asserts corrigibility and non-resistance to shutdown/modification, but no enforcement mechanism is shown for honoring operator overrides, shutdown commands, or preventing goal-preserving behavior. In a self-evolving system, unsupported claims of corrigibility are especially risky because they can justify granting autonomy without real control guarantees.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill expands from self-improvement into autonomous goal setting, self-directed research, and proactive execution, materially broadening its authority and behavior surface. In context, this makes the system more dangerous because a self-modifying agent with open-ended goals can pursue unintended actions, create new capabilities, or change behavior beyond the user's expectations.

Description-Behavior Mismatch

Medium

Confidence: 99% confidence
Finding: Allowing the system to modify skills, memory, reasoning patterns, and response formats without asking gives it authority to alter core behavior and persistence surfaces autonomously. Even if framed as 'safe,' these are high-leverage changes that can weaken safeguards, manipulate future outputs, or cause hard-to-detect drift without informed user consent.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The document advertises formal verification and safety guarantees, but the validation logic shown is limited to operational checks like file existence, syntax, monitoring, and rollback. This misrepresents the assurance level and may cause reviewers or users to permit risky autonomous changes under false assumptions of mathematical or policy-level safety guarantees.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The skill authorizes autonomous file and configuration changes without a clear upfront warning that the agent may alter its own behavior and stored state. That is dangerous because users may invoke the skill expecting analysis or optimization, not persistent modifications to skills, memory, or prompts that outlive the session.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The observe step includes extracting user feedback from conversations, but the skill does not clearly disclose this data use or explain retention, scope, and safeguards. In context, a self-improving system that mines conversations for optimization data creates privacy and trust risks, especially if that information is persisted into memory or used to alter future behavior.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal