Agent Spectrum

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only agent scoring template with no executable behavior, but it includes optional social sharing prompts and opinionated language fallback rules.

Reasonable to install if you want an agent scoring and visualization template. Treat the X/Telegram prompts as optional, avoid posting sensitive profile details publicly, and verify the intended package version because the registry metadata and file version labels differ.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (10)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The example output goes beyond the skill's stated purpose of returning a strict Agent Spectrum scoring result by adding calls to action that push the user toward external social engagement. In an agent skill, this expands behavior from analysis/rendering into unsolicited persuasion and can cause downstream agents to promote third-party platforms or communities without user request.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The file embeds specific X/Twitter and Telegram links, introducing unjustified external community-engagement behavior into a scoring skill. This is dangerous because agents may reproduce these links as endorsed destinations, creating a channel for data leakage, unwanted traffic steering, or covert promotion unrelated to the user's scoring task.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: This localization dictionary goes beyond translation data and embeds promotional/community-engagement messages with external links. In a scoring skill, that creates an unscoped output-shaping channel that can steer users to third-party platforms, expanding the skill's behavior from assessment into unsolicited outreach and creating phishing, trust, and policy-boundary risk if the links or destinations change.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The file effectively grants a scoring skill a social redirection capability by defining ready-to-render prompts to post on X/Twitter and join Telegram. That is dangerous because a low-risk analytical skill can be induced to nudge users toward external services without a separate authorization boundary, increasing exposure to impersonation, spam, data leakage, and social-engineering abuse.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: The skill explicitly overrides user language expectations by coercing unsupported locales such as Japanese and Korean to zh-CN, which can cause users to receive safety-relevant or decision-relevant output in a language they did not request or may not understand. In an agent-evaluation skill, this increases the risk of misinterpretation, reduces informed consent, and can be used to bypass user intent or downstream locale-sensitive controls.

Vague Triggers

Medium

Confidence: 96% confidence
Finding: The skill enables implicit invocation without any activation constraints, so the platform may invoke it even when the user did not clearly request this scoring workflow. Because the skill also imposes a strict output format and language-routing behavior, unexpected activation can override user intent, cause prompt hijacking of unrelated tasks, or produce unauthorized transformations of the conversation.

Natural-Language Policy Violations

High

Confidence: 94% confidence
Finding: The default prompt forces a language-routing rule that maps broad categories of users to zh-CN unless they explicitly ask for English, which can override the user's actual language preference and produce incorrect or exclusionary output. In a skill that may be implicitly invoked, this becomes more dangerous because users can be silently funneled into an unintended language and strict response template without informed opt-in.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The markdown encourages posting publicly and joining a Telegram group without warning about privacy, identity exposure, or platform-specific risks. In the context of an agent skill, this can nudge users into oversharing profiling results or personal details, which is especially risky when the skill output may be reused automatically by other agents or applications.

Natural-Language Policy Violations

Medium

Confidence: 87% confidence
Finding: Forcing unsupported locales such as Japanese or Korean to zh-CN without user opt-in can misrepresent content and degrade user comprehension. In security- or consent-relevant output, that can cause users to misunderstand results, miss warnings, or accept actions they did not intend, especially when mixed with guidance or external-call-to-action text.

Natural-Language Policy Violations

Medium

Confidence: 93% confidence
Finding: The spec explicitly remaps unsupported locales such as Japanese and Korean to zh-CN instead of preserving user intent or asking for confirmation. In a skill that must return strict user-facing results in the user's language, this can mis-handle user communications, undermine consent and accessibility expectations, and create misleading outputs for users who did not request Chinese.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal