Security audit

Citation Injector

Security checks across malware telemetry and agentic risk

Overview

This skill is not malware, but its citation-editing script can add citations outside the stated section scope and the package includes broader research-pipeline tooling than its narrow description suggests.

Install only if you are comfortable reviewing document diffs carefully. Use it in a backed-up or version-controlled workspace, verify every added citation is actually allowed for the specific section, and avoid relying on the automated PASS report without checking output/DRAFT.md and output/CITATION_INJECTION_REPORT.md.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 86% confidence
Finding: The skill advertises no explicit permissions while its documented workflow and optional script imply file reads, file writes, and shell/script execution. That mismatch is dangerous because it prevents accurate policy enforcement and user review, allowing broader capabilities than expected in a content-editing skill.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The reported behavior violates the skill's core safety guarantees: it can generate new prose, source citation keys from outside the approved budget, redistribute keys across sections, write appendix-level top-up text, and skip bibliography verification. In a citation-manipulation skill, this directly risks fabricated or out-of-scope citations, misleading attribution, and integrity failures in the draft while presenting itself as a constrained, safe editor.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: This is a true vulnerability because the code explicitly falls back to injecting citations from a pooled/global source when per-H3 budget suggestions are insufficient, and can finally append generic 'broader survey' citations near appendix/global locations. That behavior contradicts the skill's stated guardrail to keep injected citations within each H3's allowed scope, creating a mechanism for scope-violating citation insertion that can silently alter scholarly grounding and misrepresent support for section-specific claims.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: This file implements a full research-ideation pipeline—signal extraction, clustering, scoring, ranking, and report generation—which is materially unrelated to a citation-injector skill whose guardrails explicitly require no new facts and scoped citation edits only. In an agent-skill setting, such scope drift is dangerous because it can cause the agent to generate new analytical content and overwrite or create artifacts outside the user’s expected task boundary, bypassing workflow and review assumptions.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The code from this region synthesizes new research directions, rankings, screening rationales, memos, and appendix content, which goes beyond citation insertion and effectively creates new substantive analytical output. That directly conflicts with the skill’s 'NO NEW FACTS' constraint and makes the skill more dangerous in context, because a caller expecting safe citation augmentation could instead receive agent-authored research claims or prioritization artifacts presented as pipeline output.

Scope Creep

Medium

Confidence: 90% confidence
Finding: These helper functions enable writing arbitrary JSON, JSONL, and Markdown artifacts, supporting creation of multiple new outputs well beyond a narrowly scoped citation-editing operation. In this skill context, broad write capability increases the blast radius of the scope mismatch: the skill can persist unexpected analytical artifacts, alter downstream pipeline behavior, and make unintended content appear legitimate because it is written into normal workspace outputs.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The routing hint list includes very generic phrases like "snapshot", "one-page", and "one page", which can easily appear in ordinary user requests unrelated to this pipeline. In an agentic system that uses hint matching for tool or skill selection, this can cause accidental pipeline activation, misrouting tasks, and unintended execution of downstream research workflow steps.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.