Agent Guardian

Security checks across malware telemetry and agentic risk

Overview

This skill is a real status/watchdog helper, but it installs persistent host-level components and changes bot message handling in ways that require careful review.

Install only in a controlled single-user environment after reviewing the scripts. Expect persistent cron/systemd components, root service behavior, /tmp-based state, status messages containing host metrics, and optional bot plugin source changes. Avoid shared hosts or production bots unless you first move state to a private directory, add authenticated admin-only triggers, disable automatic rewriting unless desired, and create a clear rollback/uninstall path.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (20)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill invokes shell scripts, reads and writes state files, and expects persistent OS-level integration, yet no permissions are declared. That creates hidden capability expansion: an operator may install or trust the skill as low-risk while it can execute commands and manipulate local files. In this context, the mismatch is more dangerous because the skill is positioned as a generic UX helper suitable for all channels, which encourages broad deployment.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented behavior understates several sensitive actions: patching channel plugins, installing persistent cron/system services, using /tmp for cross-process signaling, collecting host metrics, and auto-resetting state based on message patterns. This is dangerous because users may consent to a 'status/watchdog' feature without realizing it alters host and bot infrastructure and can expose system information or affect message handling. The broad, cross-channel deployment claim increases risk because these side effects may be applied in many environments without proper review.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The patch introduces shell-based subprocess execution in the message path using `execSync` with user-controlled text interpolated into a shell command. Even though `JSON.stringify` provides some quoting, constructing commands via shell still creates command-injection and reliability risk, and it also grants the bot process the ability to run arbitrary local scripts for routine message handling.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: Any inbound `/new` or `/reset` message causes execution of a local shell script, allowing remote chat users to trigger host-side state-reset actions. This creates an externally reachable command trigger in the bot gateway and can be abused for denial of service, state tampering, or chaining with unsafe reset script behavior.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The patch stores cross-message state in predictable global `/tmp` files, including activity timestamps, language state, and status triggers. Global temp files can leak data across users or processes, are susceptible to race conditions and symlink attacks on multi-user systems, and can cause one conversation to influence another in unintended ways.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The message pipeline invokes external queue-management scripts for every inbound message and on completion, expanding the trusted computing base and creating a frequent execution path tied to untrusted traffic volume. This increases denial-of-service risk, operational fragility, and the blast radius if those helper scripts are buggy or unsafe.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The script rewrites source files in an external QQ Bot plugin and injects runtime hooks that execute shell and Python commands on message-processing paths. This materially expands the trusted computing base and creates a hidden code-execution surface that is not evident from the skill’s stated UX/guarding purpose, making downstream compromise or abuse much easier.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: Incoming chat content can trigger a local reset script via checkAndResetWorkState, effectively allowing remote message content to invoke privileged operational actions. Even if intended as a convenience feature, tying session reset behavior directly to inbound text creates an unsafe control channel that can disrupt service, erase state, or be chained with weaknesses in the reset script.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The patch injects multiple execSync calls to shell and Python helpers directly into bot runtime processing. Synchronous subprocess execution on untrusted message flow increases attack surface, risks command/argument handling mistakes, and allows user-driven inputs to influence local process execution and filesystem side effects.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The installer performs host-level package installation, writes a systemd unit under /etc/systemd/system, and enables/restarts a persistent root service. For a skill described as an end-user experience guard, these actions exceed the minimally necessary scope and increase blast radius by modifying the host environment in privileged, persistent ways.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The script collects host-level memory and load metrics and sends them to an external messaging target, even though the skill is described as a user-experience/status guardian rather than a host telemetry tool. This leaks infrastructure details that are not necessary for the stated purpose and can aid fingerprinting, capacity inference, or broader environmental reconnaissance if the recipient, channel, or downstream logs are compromised.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The skill is triggered by broad complaint-like phrases such as 'waited too long' or 'don't know what it's doing,' which can overlap with normal conversation. That can cause unintended activation, leading to unsolicited status reporting, watchdog behaviors, or altered message flow when the user did not actually request this functionality. Given the skill's persistent and message-intercepting design, accidental activation is more impactful than in a purely informational skill.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The top-level description states the skill will enforce Chinese language consistency to prevent mixed-language output, but it does not indicate any user consent, configurability, or justification. Automatically constraining output language can alter meaning, suppress legitimate technical terminology, and override user preference or accessibility needs. In a multi-channel assistant used across varied contexts, this policy can silently distort communications.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: This component automatically detects and replaces English phrases in outbound messages without user choice. Automatic semantic rewriting of assistant output can change technical meaning, remove safety-critical wording, or corrupt code/API/error text before it reaches the user. Because it operates as an outbound hook, it can affect every response system-wide and is therefore especially risky.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The document explicitly instructs implementers to persist user activity time and detected message language into predictable /tmp files without any notice, consent, retention policy, or access controls. In a multi-user system, /tmp is commonly world-accessible or otherwise low-assurance, so this can expose user behavioral metadata and create privacy and cross-session leakage risks.

Missing User Warnings

Low

Confidence: 81% confidence
Finding: Bypassing normal AI handling for the "状态" keyword changes system behavior in a way users may not expect, which can cause confusion and make responses opaque. While this is not a classic code-execution issue, hidden command interception can leak internal state semantics or create trust and usability problems, especially across channels where users are unaware of reserved keywords.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The mandated language filter alters AI replies to enforce Chinese-language consistency without user consent or locale selection, which can silently modify meaning, technical terminology, or quoted text. In a cross-channel assistant, forced rewriting can degrade integrity of responses and create misleading output for bilingual users or contexts where English terms are required.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script silently modifies plugin source files and creates backups without a confirmation step or prominent warning. In a security-sensitive environment, undisclosed code rewriting increases supply-chain risk, makes review harder, and can trick operators into deploying behavior they did not knowingly approve.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The injected hooks cause inbound messages to trigger local subprocess execution and writes to /tmp state files, but the script does not clearly disclose that user messages will drive local command execution. This creates a covert behavior change in the bot runtime, increasing risk of abuse, denial of service, privacy leakage, and operational compromise.

Missing User Warnings

Low

Confidence: 91% confidence
Finding: The script stores operational state in a predictable world-accessible location under /tmp, which is commonly shared across users and processes. Even though the file content is not highly sensitive by itself, it may expose task names, timing, and error activity, and the use of a fixed path also creates tampering and symlink risks if another local user can pre-create or replace the file.

VirusTotal

62/62 vendors flagged this skill as clean.

View on VirusTotal