Agentic Loop Upgrade

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it changes the agent loop in high-impact ways and its safety controls do not reliably require explicit approval before risky actions proceed.

Install only after reviewing the Mode settings and using it first on a non-production agent. Treat the approval gate as fail-open by default, assume planning and summarization may send task details, tool outputs, and file paths to the configured LLM provider, and avoid enabling it where unattended high-impact actions or sensitive local project data are present.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (55)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The design adds persistent local session storage under a user home directory and stores full state/history, but the skill metadata does not disclose this behavior. In a multi-tenant or shared-host setting, undocumented persistence increases the chance of sensitive prompts, plans, tool outputs, or identifiers being retained longer than expected and accessed by other processes, backups, or operators.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The manifest minimizes credential risk by saying the skill does not request, store, or transmit credentials, yet it explicitly operates with inherited host auth and can use any provider/model the host can access. That means the skill can cause actions and data flows under the host's existing privileges, so the safety statement is materially incomplete and could mislead users into underestimating exposure.

Intent-Code Divergence

Medium

Confidence: 80% confidence
Finding: The manifest asserts the wrapper never bypasses or short-circuits the core runner, but a runner wrapper with orchestration, planning, and context injection still sits in a privileged control point that can alter inputs, ordering, retries, and effective execution behavior. Even if the original runner is eventually called, this placement can materially change security outcomes, so the claim is stronger than what the architecture safely guarantees.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The main run() path is presented as the enhanced orchestrator, but it does not actually invoke real tool execution, confidence gates, or recovery logic and instead uses mock placeholders. In an agent skill, this mismatch is dangerous because callers may rely on advertised safeguards and behaviors that never occur, causing silent security-control bypass, incorrect automation assumptions, and unsafe deployment decisions.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The file-level description and exported capability claims state that the loop provides parallel execution, confidence gates, and semantic error recovery, but the primary run() flow does not perform those operations. This is a security-relevant integrity issue because downstream users may trust the component to enforce safety checks that are only available in auxiliary methods and not on the main execution path.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The implementation contradicts the stated approval model: timed-out approval requests are treated as allowed to proceed. That means high-risk or critical operations can execute without explicit human consent if the approver is unavailable, misses the prompt, or an attacker deliberately causes delay.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The retry engine automatically rewrites failed exec commands to run under `sudo`, which changes the privilege boundary of the original action rather than safely retrying it. In an agentic execution loop, this can cause commands that were intended to run unprivileged to be re-executed as root without explicit approval, greatly increasing the blast radius of mistakes, prompt injection, or unsafe tool inputs.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: On command-not-found errors, the engine constructs and may execute `apt-get install -y` or `npm install -g` commands automatically based on the failed command name. That turns error recovery into unaudited software installation and shell execution, which is dangerous because it can alter the host environment, install untrusted packages, and be abused via crafted command inputs.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The code automatically harvests API credentials from environment variables and multiple local OpenClaw auth files, even when no explicit key is supplied. In an agent skill context, this expands access to secrets beyond what a user may expect and enables silent use of locally stored credentials for outbound model calls.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The header comment says risky tool calls 'wait for human response or proceeds after timeout,' which documents a fail-open approval model for risky actions. In this skill context, high-risk and critical tools may execute without explicit approval if no human responds in time, undermining the safety boundary the approval gate is supposed to provide.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The comment states critical operations 'require explicit approval,' but the default configuration sets autoDenyCritical to false and timed-out requests are treated as proceed. That mismatch is dangerous because operators may believe critical actions cannot run without consent, while in reality they can execute after 10 seconds with no approval.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The retry engine automatically rewrites failed exec commands to run under sudo when it detects a permission-related error. That creates an implicit privilege-escalation path where untrusted or mistaken commands can be re-executed with elevated privileges without explicit approval, greatly increasing the blast radius of command execution.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: On command-not-found errors, the engine constructs a new exec command that may invoke apt-get install or npm install -g automatically based on the missing command token. This permits unattended software installation, expands supply-chain risk, and can alter the host environment based on attacker-influenced command strings or accidental failures.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The code automatically harvests API credentials from environment variables and multiple local auth-profile files, broadening its access to secrets beyond what a lightweight caller strictly needs. In an agent skill focused on orchestration/UI, this undisclosed credential discovery increases the blast radius if the skill is invoked in an unexpected context or if message routing/base URLs are influenced elsewhere.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The intent-detection triggers are broad enough to match ordinary conversational phrases like 'help me', 'I need to', or 'what's the best way', which can activate planning or orchestration unexpectedly. In this skill's context, false activation is more dangerous than in a passive assistant because activation can lead to prompt injection, state persistence, checkpoint creation, and potentially gated tool workflows that alter agent behavior without clear user intent.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The recovery logic explicitly proposes state-changing actions such as creating missing files and substituting shell commands (for example changing an install command) without any approval gate, risk classification, or user-visible confirmation. In an autonomous loop, this can cause unintended filesystem changes or execution of more permissive commands based only on error text, increasing the chance of unsafe side effects or abuse through crafted failures.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The design sends tool name, arguments, error messages, and relevant context into an LLM diagnosis prompt, but the documentation does not require redaction, minimization, or disclosure that these potentially sensitive values are being transmitted to a model. Tool arguments and errors often contain secrets, file paths, tokens, internal URLs, or user data, so this creates a realistic confidentiality risk, especially in agent frameworks that may use remote LLM providers.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The persistence example writes the full context object, including metadata and history, to disk with no filtering or disclosure. Since metadata may contain user input, questions, plans, tool errors, or other sensitive operational details, this creates a realistic privacy and data-retention risk, especially on shared machines, in backups, or in multi-tenant deployments.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The logging observer serializes and prints arbitrary transition metadata, which is populated from inputs, plans, questions, results, and errors throughout the state machine. This can leak sensitive user-derived or operational data into logs that are commonly centralized, retained, and accessible to support staff or other tenants.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The dashboard observer broadcasts session-linked state changes together with arbitrary metadata, potentially exposing user content and internal agent context to frontend clients or other subscribers. If broadcast authorization, tenant isolation, or metadata filtering is weak, this can disclose sensitive data across sessions or to unintended viewers.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code sends raw tool results to an LLM for summarization, and tool outputs can contain sensitive data such as file contents, secrets, tokens, environment details, or proprietary project information. In an agentic loop skill, this is more dangerous because tool output is often broad and automatically collected, so data can be exfiltrated to the model provider without explicit user awareness or consent.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The work-session summarization sends task lists, tools used, and affected file paths to an LLM, which may reveal sensitive operational metadata even if full file contents are not included. In this skill's context, that metadata can expose repository structure, internal project names, security-relevant workflows, or confidential work activity, making silent transmission to an external model a real privacy and data-governance issue.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The code sends prior conversation content to an LLM summarizer via llmCaller without any visible consent, disclosure, minimization, or trust-boundary checks. Because conversations can contain secrets, personal data, and tool outputs, this creates a real privacy and data-exposure risk whenever the configured LLM is remote or third-party.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code sends raw tool results to an LLM for summarization, which can expose sensitive data from command output, files, secrets, environment details, or proprietary content to an external model provider without any explicit user disclosure or consent gate. In an agentic loop, tool outputs often aggregate high-value data, so silent forwarding meaningfully increases confidentiality risk and expands the trust boundary.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The work-session summarizer sends task descriptions, tools used, and affected file paths to an LLM without user-facing notice, which can leak sensitive operational metadata even if full file contents are not included. In this skill, session summaries are part of persistent context management, so the behavior may repeatedly transmit internal workflow details to a third party and create unintended data exposure.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal