Self Improving Intent Security Agent

Security checks across malware telemetry and agentic risk

Overview

This skill is a local documentation and template toolkit, with the main risk being overstated safety language rather than hidden malicious behavior.

Install only if you want local intent/audit templates and helper scripts. Do not rely on this package alone for real enforcement, rollback, anomaly detection, or self-improvement; those require separate host-runtime controls. Review any optional hook configuration before enabling it, and keep .agent logs out of shared repositories if they may contain sensitive data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (16)

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The README says the package is not a production runtime enforcement engine, but later describes enforcement, monitoring, rollback, and safety guarantees in a way that can reasonably cause users to overestimate what the skill actually does. This kind of security overclaim is dangerous because operators may rely on nonexistent protections for high-risk workflows, resulting in unsafe deployment or reduced human oversight.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The skill initially frames itself as documentation-only and explicitly says it does not provide automatic runtime enforcement, but later describes automatic validation and rollback in a way that can lead users to overtrust built-in protections. This mismatch is dangerous because operators may deploy the skill assuming active safeguards exist when they may only be templates or optional scripts, creating a false sense of security around high-risk actions.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation claims the skill keeps data local and does not transmit externally, but it also promotes hook-driven command execution, which extends behavior beyond passive local documentation and may invoke scripts with broader side effects. Even if the scripts are intended to be local, the claim is too absolute and can mislead users about the real execution surface and trust boundary.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The top-level description markets the skill as a documentation-first toolkit, but the body expands into runtime interception, hook-based validation, rollback orchestration, anomaly monitoring, and self-improving strategy mechanisms. This scope expansion can cause security reviewers and users to underestimate the operational power of the skill and enable it in environments where active control hooks were not expected.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The introduction initially states the package is documentation-first and does not provide runtime enforcement, but later sections describe blocking, rollback, and policy enforcement as if they are active capabilities. This inconsistency can mislead users into trusting nonexistent protections, causing unsafe deployment or overreliance on documentation/templates instead of real controls.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: These sections describe integrated pillars such as real-time monitoring, automatic adoption of improvements, rollback workflows, and policy enforcement in language that reads like present-tense product behavior. In a security-oriented skill, overstating defensive automation is dangerous because operators may assume high-risk actions are being validated or blocked when the repository only provides scaffolding.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The quick example presents ALLOWED/BLOCKED decisions, auto-triggered rollback, and learning outcomes as if they occur automatically in this package. Because the skill is positioned around security and self-improvement, readers may incorrectly infer that dangerous actions will be prevented in practice, increasing the chance of misuse in sensitive environments.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The 'Safety Guarantees' section asserts hard guarantees like intent alignment, permission boundaries, reversibility, and human oversight, directly contradicting the earlier disclaimer that no production runtime engine is provided. In a security tool, false guarantees are especially risky because they can create a dangerous illusion of enforcement, leading teams to expose autonomous agents to production workloads without real safeguards.

Description-Behavior Mismatch

High

Confidence: 94% confidence
Finding: The architecture document describes an autonomous execution system with validation, authorization, runtime monitoring, rollback, and self-improvement, which materially exceeds the skill's declared documentation-first scope. This kind of scope mismatch is dangerous because downstream users, reviewers, or agent runtimes may treat the skill as a higher-trust operational control system than advertised, enabling unreviewed execution-oriented adoption or permissioning.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This section presents concrete live-control behaviors such as authorization, safety guardrails, checkpoint creation, rollback, monitoring, and logging as if the system actively performs them. If consumers rely on these claims, they may delegate risky tasks under the false assumption that strong runtime protections exist, creating a dangerous gap between perceived and actual enforcement.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The self-improvement, pattern extraction, strategy optimization, and A/B testing material implies adaptive behavioral evolution beyond a documentation/prototyping toolkit. That is risky because self-modifying or self-optimizing language can encourage deployment of learning loops without robust safeguards, especially when paired with earlier claims of authorization and execution control.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: Leaving the activation mechanism unspecified creates ambiguous trigger boundaries for a security-sensitive skill centered on intent validation and autonomous workflows. In practice, ambiguous invocation scope can cause the skill to be applied too broadly, too narrowly, or in unintended contexts, weakening enforcement guarantees and increasing the chance of unsafe or unauthorized actions.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The publishing guide instructs users to configure long-lived publishing tokens but does not warn them to keep tokens out of plaintext files, screenshots, terminal history, or CI logs. In a document aimed at operational publishing workflows, omission of basic secret-handling guidance increases the chance of accidental credential exposure and subsequent unauthorized package or skill publication.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The example `echo "YOUR_CLAWHUB_TOKEN" | npx clawhub login --token` encourages passing a secret directly on the command line pipeline without warning about exposure risks. Depending on shell, environment, and CI usage, the token may be captured in shell history, copied into logs, exposed to other users through process inspection, or retained in build output, enabling credential theft and unauthorized publishing.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The auto-application triggers are broad and ambiguous, covering common categories like multi-step tasks, learning opportunities, and high-risk operations without tight scoping. In practice this can cause the skill or surrounding automation to activate in many ordinary workflows, increasing logging, blocking, or intervention in contexts where it was not intentionally enabled.

Vague Triggers

High

Confidence: 98% confidence
Finding: The hook configuration uses an empty matcher for UserPromptSubmit, which effectively causes the intent-capture script to run on every prompt. A universal trigger materially increases attack surface, makes the behavior hard to predict, and can interfere with unrelated tasks or be abused to force persistent interception across all interactions.

VirusTotal

62/62 vendors flagged this skill as clean.

View on VirusTotal