Safe Self-Improvement

Security checks across malware telemetry and agentic risk

Overview

This skill is mostly local and transparent, but it needs review because it automatically keeps persistent learning logs and includes an undocumented promotion-gate bypass.

Review the scripts before installing, especially promotion-gate.sh. Use this only in non-sensitive workspaces, periodically inspect .learnings, avoid logging secrets, and approve core-file promotions only after seeing the exact proposed change.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The script advertises a mandatory human-approval gate, but the `force` command bypasses `cmd_check` and still records a promotion as if it were approved. In a self-improvement/promotion workflow, this undermines the core security control by allowing unreviewed learnings to be promoted despite rate limits and cooldowns, making policy enforcement effectively optional.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The header comments state the script enforces human approval and blocks unsafe promotion behavior, but the implementation contains a bypass that skips that enforcement. This documentation/behavior mismatch is dangerous because operators and downstream automation may rely on the stated guarantees while an attacker or careless user can invoke the hidden weaker path.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The skill is designed to trigger in many routine situations, including ordinary failures, corrections, and missing-feature requests. Broad activation increases the chance of excessive logging of user content, operational context, and potentially sensitive text, especially because the skill is explicitly autonomous between sessions.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Automatic triggers based on common conversational phrases like corrections and feature requests are prone to false positives in normal dialogue. That can cause silent retention of incidental user statements and create an avoidable privacy risk, especially in a self-improvement skill that accumulates cross-session memory.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal