Safe Self-Improvement

Security checks across malware telemetry and agentic risk

Overview

This skill is mostly local and transparent, but it needs review because it automatically keeps persistent learning logs and includes an undocumented promotion-gate bypass.

Review the scripts before installing, especially promotion-gate.sh. Use this only in non-sensitive workspaces, periodically inspect .learnings, avoid logging secrets, and approve core-file promotions only after seeing the exact proposed change.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (4)

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The script advertises a mandatory human-approval gate, but the `force` command bypasses `cmd_check` and still records a promotion as if it were approved. In a self-improvement/promotion workflow, this undermines the core security control by allowing unreviewed learnings to be promoted despite rate limits and cooldowns, making policy enforcement effectively optional.

Intent-Code Divergence

High
Confidence
96% confidence
Finding
The header comments state the script enforces human approval and blocks unsafe promotion behavior, but the implementation contains a bypass that skips that enforcement. This documentation/behavior mismatch is dangerous because operators and downstream automation may rely on the stated guarantees while an attacker or careless user can invoke the hidden weaker path.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The skill is designed to trigger in many routine situations, including ordinary failures, corrections, and missing-feature requests. Broad activation increases the chance of excessive logging of user content, operational context, and potentially sensitive text, especially because the skill is explicitly autonomous between sessions.

Vague Triggers

Medium
Confidence
90% confidence
Finding
Automatic triggers based on common conversational phrases like corrections and feature requests are prone to false positives in normal dialogue. That can cause silent retention of incidental user statements and create an avoidable privacy risk, especially in a self-improvement skill that accumulates cross-session memory.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal