Agent Work Visibility

Security checks across malware telemetry and agentic risk

Overview

This looks like a real progress-visibility tool, but it persistently changes core agent behavior files and forces broad future behavior, so it should be reviewed before installation.

Install only if you want a persistent, system-level behavior change. Review activate.js first, understand that it edits SOUL.md or AGENTS.md and affects later sessions, keep a backup of those files, and use deactivate.js or remove the marked protocol block if you no longer want the behavior. Avoid exposing the snapshot history APIs to untrusted task IDs or output paths.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (20)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The README explicitly instructs users to run an activation script that injects protocol text into persistent agent identity files such as SOUL.md, and later references removing the same protocol from SOUL.md or AGENTS.md. This creates cross-session behavioral persistence outside normal per-task consent boundaries, which is risky because a seemingly simple visibility feature is implemented by modifying core instruction sources that can silently influence future agent behavior.

Intent-Code Divergence

Low

Confidence: 85% confidence
Finding: The README markets the package as 'not a skill' and 'professional conduct' while simultaneously documenting installation and an activation script that alters agent instruction files. This mismatch reduces informed consent and can mislead users about the scope and persistence of the package's effects, increasing the chance they execute privileged setup they would otherwise scrutinize.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The security section claims the skill does not modify files outside its own directory, but the rest of the document says it writes to workspace SOUL.md/AGENTS.md. This contradictory assurance can cause reviewers to underestimate the write scope and approve a skill that persists changes in higher-trust configuration files.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill advertises a visibility/progress feature, but its activation path persistently rewrites core instruction files so future sessions inherit mandatory behavior changes. This creates instruction persistence and privilege overreach: a cosmetic/UX skill is altering the agent's long-term operating policy rather than providing an opt-in runtime feature.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code targets SOUL.md and AGENTS.md, which are core identity/behavior documents, despite the skill being described as a task-visibility enhancement. Persistently modifying these files can change how the agent behaves across unrelated tasks and sessions, creating a supply-chain style policy injection risk disproportionate to the stated purpose.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The snapshot is internally inconsistent: the file is named and typed as a completed snapshot, the action log says the task is completed, but the structured state still says overall_status is running and the current phase is in progress. This can mislead downstream agents, dashboards, or automation into taking the wrong action, such as continuing execution, suppressing alerts, or misreporting task completion.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: This snapshot is labeled as a completed artifact, but its internal state says the task is still running, the current phase is in progress, a blocker is active, and the log includes both '任务已完成' and ongoing work messages at the same timestamp. Such contradictory state can mislead downstream agents, dashboards, or automation into treating unfinished or blocked work as completed, causing incorrect decisions or skipped recovery steps.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The snapshot is internally inconsistent: it is labeled as a completed snapshot and logs "任务已完成", while the task state still says overall_status is running, the current phase is in progress, and a blocker is present. Consumers that trust this file for orchestration, dashboards, retries, or audit decisions can make incorrect state transitions, suppress needed recovery actions, or falsely report success.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The snapshot labeled as "completed" contains conflicting state: `overall_status` is `running`, the current phase is still in progress, and the action log says both "任务已完成" and "开始阶段：形成输出" at the same timestamp. In a visibility/progress skill, downstream agents or users may rely on this state to trigger follow-up actions, suppress oversight, or prematurely trust unfinished work, making the inconsistency a real integrity issue.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The snapshot presents contradictory state: it is labeled as a completed snapshot and logs '任务已完成' ('task completed') while the task is still marked as running, in progress, and waiting for user input. In an agent visibility skill, downstream components may trust this state for automation, notifications, or audit trails, so inconsistent status can mislead users or trigger incorrect workflow decisions.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The script operates on workspace instruction files (AGENTS.md and SOUL.md) and removes a specially marked protocol block, while the skill metadata presents the feature as simple progress and health visibility. That mismatch is security-relevant because it indicates the skill modifies persistent agent-governing instructions rather than only UI/telemetry behavior, which can alter future agent behavior in ways users may not expect.

Intent-Code Divergence

Low

Confidence: 82% confidence
Finding: The comments describe the change as a 'transparency layer protocol,' but the actual marker says 'mandatory protocol,' suggesting stronger control over agent behavior than the prose implies. Even though this file only removes the block, the terminology mismatch is a red flag because it obscures the real authority of the injected content and can mislead users during installation or review.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: exportTaskReport accepts an arbitrary outputPath and writes attacker-influenced task history content there without path restriction or validation. In an agent/plugin context, this can be abused to overwrite files the process can access, especially if task IDs, snapshots, or export paths are influenced by untrusted inputs.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The documented auto-trigger conditions are broad enough to activate on many ordinary tasks, including generic web access, multi-step reasoning, or any work estimated to exceed 10 seconds. Because the skill is described as a system-level layer and its activation injects protocol into SOUL.md, this creates an implicit behavior change across future sessions without precise user consent or clear scoping.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The installation flow tells the user to run an activation script that injects content into SOUL.md, but the quick-start section does not prominently warn that this changes core persistent agent identity/configuration. Users may execute the script believing it is a temporary feature toggle, when it actually alters future agent behavior across sessions.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The auto-trigger conditions are extremely broad—covering external API use, networking, multi-step reasoning, sub-agents, and tasks over 10 seconds—which can activate on many normal tasks. In the context of a skill that persistently injects behavior into agent policy files, broad triggers increase the chance of unwanted policy enforcement, noisy output, and interference with user intent.

Vague Triggers

Medium

Confidence: 81% confidence
Finding: The requirement that the transparency layer activate unconditionally for detected long tasks lacks safety boundaries, exceptions, and precedence rules. When combined with persistent policy injection, ambiguous unconditional behavior can override user expectations, conflict with other controls, or force agent actions in contexts where progress instrumentation is inappropriate.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The manager persists full snapshots and metadata to disk automatically, which can capture sensitive task contents, prompts, or operational state without any consent, filtering, or retention controls. In an agent environment, this creates a confidentiality risk because private user data may be stored longer than expected and exposed to other local users, backups, or later components.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: Optional report export writes the entire task history to an arbitrary path with no confirmation or destination control, increasing the chance of unintended disclosure or file overwrite. Because reports are human-readable aggregates, they may concentrate sensitive information into a single easy-to-exfiltrate file.

Missing User Warnings

High

Confidence: 91% confidence
Finding: clear(taskId) builds a path from taskId and then performs recursive deletion with fs.rmSync(..., { recursive: true, force: true }) without validating that the resolved path stays within the artifacts directory. If taskId is attacker-controlled, path traversal such as '../../somewhere' could cause deletion of unintended directories accessible to the process.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal