Self Improving Agent

Security checks across malware telemetry and agentic risk

Overview

This skill is transparent about self-improvement logging, but it gives agents broad long-term memory and cross-session sharing workflows that could retain or spread sensitive information.

Review before installing. Use this only if you want agents to keep persistent learning notes, and configure it so logs are sanitized summaries rather than raw prompts, commands, error output, tokens, personal data, or customer information. Disable cross-session transcript sharing unless explicitly needed and authorized, and require human approval before anything is promoted into future instruction or memory files.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (14)

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The guide expands a narrowly scoped self-improvement skill into writing or promoting content into broader injected prompt files such as AGENTS.md, SOUL.md, and TOOLS.md. Because those files are loaded as session context, this creates a path for persistent prompt-surface modification beyond simple error logging, which can unintentionally amplify bad instructions or poisoned learnings across future sessions.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: The documentation introduces cross-session tools such as session listing, transcript access, messaging, and agent spawning even though the skill's stated purpose is self-improvement logging. This broadens the capability surface to include coordination and data movement across sessions, increasing the risk of unintended information sharing, privilege creep, or misuse of the skill as an orchestration channel.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The automatic triggers match very common conversational phrases and instruct logging of corrections, feature requests, and knowledge gaps. In practice this can cause routine chat content to be persistently recorded without meaningful user intent, increasing the chance that sensitive or proprietary text is stored and later propagated.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill encourages persistent logging of learnings, errors, corrections, and requests but does not warn against storing secrets, personal data, or confidential project details. Because these logs are durable and designed for later reuse, accidental capture can turn ephemeral sensitive content into a standing disclosure risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The inter-session communication section promotes reading other sessions' transcripts and forwarding learnings without any privacy or authorization boundary. That creates a direct pathway for sensitive conversation data to move across sessions and agents, increasing exposure well beyond the original context.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The error logging template explicitly asks for commands attempted, inputs, parameters, and environment details, all of which commonly contain secrets or internal identifiers. Without strong redaction guidance, the template normalizes copying raw operational context into durable markdown files where secrets may be exposed or later committed.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The template asks for a 'concise description' with trigger conditions, but it does not require those triggers to be precise, bounded, or disambiguated from other skills. In an agentic system, vague activation criteria can cause inappropriate skill invocation, making it easier for low-quality or adversarial skill content to be selected in the wrong context and influence behavior unexpectedly.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The minimal template further weakens activation safety by allowing a description of only 'what this skill does and when to use it' without any required specificity or guardrails. Because this repository is for self-improvement and operational learnings, loosely scoped skills are more likely to be promoted and then over-applied, increasing the chance of harmful automation, prompt injection propagation, or incorrect corrective behavior across sessions.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The standard detection triggers are broad terms like user corrections, command failures, API errors, and knowledge gaps without precise thresholds or scope boundaries. In a self-improvement skill, vague triggers can cause over-activation, leading to excessive logging, accidental capture of sensitive context, and unreviewed persistence of transient or adversarial inputs.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The OpenClaw-specific trigger table maps loosely defined events like tool call error, session handoff confusion, and model behavior surprise to writes into persistent prompt or memory locations. Without explicit constraints, benign anomalies or attacker-induced situations could be transformed into durable instructions or notes that affect future behavior.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The skill instructs agents to record user corrections, requests, and operational errors into persistent logs, which naturally captures free-form language that may contain sensitive business, personal, or security-relevant details. Because the content is retained for future processing and promotion, the retention risk is broader than a transient conversation artifact.

Ssd 3

High

Confidence: 97% confidence
Finding: The cross-session sharing features explicitly allow viewing another session's transcript and sending learnings to another session. That creates a clear confidentiality risk because transcripts may contain secrets, user data, or privileged context that should not be broadly accessible or re-shared.

Ssd 3

High

Confidence: 97% confidence
Finding: The learning templates ask for full context, what was wrong, corrective details, related files, and other metadata. This encourages dumping rich conversational and operational context into durable storage, which can easily include secrets, internal architecture details, or sensitive user information.

Ssd 4

High

Confidence: 96% confidence
Finding: The skill not only stores interaction-derived content in learning logs, but also instructs promoting that content into durable agent memory and prompt files such as CLAUDE.md, AGENTS.md, SOUL.md, and TOOLS.md. This creates a gradual leakage pathway where initially local conversation details become embedded into persistent instruction surfaces that influence future sessions and may be more widely exposed.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal