Safe Self Improving

Security checks across malware telemetry and agentic risk

Overview

The skill is local and consent-oriented overall, but it needs Review because it gives a path from private learning notes to generated skills that can be installed or published despite promising writes stay only in .learnings.

Install only if you are comfortable with a skill that records project learning notes and can synthesize them into new skill drafts. Before using the skill-generation feature, review generated content closely, avoid installing generated skills directly into your active skills directory unless you understand their instructions, and do not publish exports or generated skills if the learning records may contain private project details.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (8)

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The changelog documents a trust-boundary expansion from storing data only in `.learnings/` to also generating `skill-drafts/` artifacts and exports, which weakens the original containment guarantee. In a self-improvement skill, additional output locations and export paths increase the chance of unintended persistence, disclosure, or use of derived content beyond the narrowly stated logging purpose.

Context-Inappropriate Capability

Medium

Confidence: 80% confidence
Finding: Auto-generating new skill drafts adds artifact-synthesis capability that goes beyond passive learning capture and review, creating a pathway for the system to produce new operational content from accumulated records. Even with stated confirmation requirements, this broadens the attack surface because sensitive or unsafe patterns could be propagated into generated drafts or used to influence future agent behavior.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documented guarantee that all data stays in `.learnings/` is contradicted by the later addition of a `skill-drafts/` directory, making the skill's safety and privacy assurances unreliable. Contradictory storage claims are dangerous because users and downstream reviewers may make trust decisions based on incomplete or outdated data-flow expectations.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The skill claims it only writes inside `.learnings/`, but the documented flow explicitly allows saving generated drafts into `skills/` and publishing them to ClawHub. This breaks the stated containment boundary and creates a path from local note-taking to code/skill installation or external distribution, which materially increases risk if generated content is unsafe or prompt-influenced.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The hard safety rule says the skill writes only under `.learnings/`, but other sections permit writing to `skills/` and publishing externally. Contradictory safety guarantees are dangerous because users and downstream agents may rely on the stricter promise while the workflow actually enables broader filesystem modification and data egress.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The skill repeatedly states that all operations require user confirmation, but the export workflow describes creating Markdown and JSON archive files without an approval step before writing them. This can lead to unintended persistence of sensitive summarized data, even if sanitized, and undermines the consent-based safety model advertised by the skill.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The listed trigger phrases are broad, natural-language expressions such as '怎么改进' and '打个分' that could plausibly appear in normal conversation and unintentionally invoke the skill. In an agent environment, ambiguous activation can lead to unintended recording, analysis, export, or draft-generation actions, especially if the runtime binds these phrases directly to tool behavior.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: Several trigger phrases are broad everyday language such as '记下来', '最佳实践', or '怎么改进', which can be used naturally in conversation without intending to invoke the skill. In a tool-using agent, ambiguous triggers can cause unintended file reads, writes, or analysis actions based on casual phrasing.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal