Anti-Hallucination

Security checks across malware telemetry and agentic risk

Overview

This accuracy skill is not malicious, but it gives agents broad automatic authority to browse, inspect files, run code, and potentially change dependencies without clear user control.

Install only if you are comfortable with an accuracy helper that may prompt your agent to browse the web, inspect local paths, run generated code, and handle dependencies. Prefer a mode that disables automatic execution and network access by default, and require confirmation before URL visits, filesystem checks outside the current task, package installation, or running code.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (13)

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill explicitly instructs the agent to use web search and fetch external documentation to answer code/API questions. That expands the agent's capabilities from local verification into network access, which can expose the agent to prompt injection, data exfiltration paths, and unintended outbound requests not clearly required by a generic anti-hallucination pre-check.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Requiring the agent to visit URLs to confirm accessibility forces outbound network activity whenever a source/reference question is detected. This increases attack surface because malicious or user-supplied URLs can trigger unsolicited requests, interact with internal resources, or expose the agent to hostile remote content.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The auto-trigger rules are broad enough to activate on routine mentions of dates, numbers, file paths, APIs, or URLs, which can cause the agent to perform unnecessary lookups or file checks outside the user's intended scope. In an agent environment with tool access, this increases the chance of over-collection, unintended file probing, or excessive external queries triggered by common content rather than explicit user consent.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Claiming the skill works for any accuracy-sensitive task and all AI agents creates an unbounded activation surface and encourages use in contexts with very different trust, permission, and safety models. This can lead operators to apply the skill where its verification behaviors are inappropriate or risky, especially in agents with filesystem, shell, or network capabilities.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The hook explicitly describes automatically extracting and executing detected code in a sandbox as part of normal processing, but it does not require an explicit user warning or consent before running generated code. Even in a sandbox, executing model-produced code can consume resources, access permitted network/filesystem capabilities, or trigger harmful side effects depending on sandbox configuration, making this an unsafe default for a general-purpose agent skill.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The configuration includes an automatic response of "install_or_warn" for missing dependencies, which implies the system may modify the environment by installing packages without a clearly documented warning or approval step. Automatic dependency installation is risky because it changes system state, may introduce untrusted packages or vulnerable versions, and can be abused through dependency confusion or malicious package names present in generated code.

Vague Triggers

High

Confidence: 95% confidence
Finding: Several trigger patterns are overly broad, especially `r'/'` for Unix paths and common relative-date phrases like '今天|昨天|明天'. In an auto-triggered hook, these patterns can cause near-constant activation on ordinary text, which can drive unnecessary verification actions and, in this skill, may cascade into file checks, command use, or network lookups that expand attack surface and leak context.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill explicitly directs web search and URL access during verification without warning about outbound network activity or limiting what data may be sent. In an agent context, this can expose sensitive prompts, code, file paths, proprietary references, or user data to external services and can be abused by adversarial content to induce unwanted network access.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The date/time trigger words are very broad and likely to activate during normal conversation, causing the agent to run local commands like date or equivalent checks more often than necessary. Overbroad autonomous triggering can lead to unnecessary tool use, privacy leakage about the runtime environment, and reduced predictability of agent behavior.

Vague Triggers

Medium

Confidence: 81% confidence
Finding: The numeric/statistical triggers are generic enough to match many harmless prompts, which can cause unnecessary file reads or script execution. This creates avoidable tool invocation opportunities and broadens the paths through which sensitive local data might be accessed in response to loosely related queries.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The code/API trigger list is ambiguous and directs the agent to search locally, use web search, and execute validation steps whenever generic technical terms appear. This can unnecessarily invoke both execution and network-capable tools for ordinary discussion, increasing attack surface and creating opportunities for prompt injection or unsafe code execution.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: Generic file/path trigger phrases can cause the agent to probe the filesystem even when the user is speaking abstractly or hypothetically. While lower impact than network-triggering behavior, this still increases unnecessary environment inspection and may reveal local structure or existence of files unintentionally.

Vague Triggers

Low

Confidence: 76% confidence
Finding: The source-validation triggers are broad and pair with instructions to search and verify accessibility, which can lead to external lookups for many ordinary prompts mentioning papers, URLs, or standards. In this skill context, that makes the behavior more dangerous because a general anti-hallucination layer would apply widely across tasks, amplifying unnecessary network exposure.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal