Agent Survey Corpus

Security checks across malware telemetry and agentic risk

Overview

The main downloader looks benign, but the package also includes broad research-writing pipelines and workflow tooling that exceed the advertised arXiv corpus purpose.

Review before installing. Use it only if you intentionally want a broader academic pipeline toolkit, not just an arXiv corpus downloader. In a sensitive repo, keep it sandboxed, disable or remove the extra pipeline files/routing, and invoke scripts/run.py directly for the advertised ref/agent-surveys workflow.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (12)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file defines a large end-to-end survey-authoring pipeline with retrieval, structuring, evidence processing, citation management, and drafting outputs, which materially exceeds the skill’s stated purpose of downloading a small arXiv survey corpus for style learning. This mismatch is dangerous because users or upstream routers may invoke the skill under a narrow, low-risk expectation while it performs broad workspace mutations and autonomous content-generation behavior.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The later stages explicitly generate outlines, claims, section drafts, merged paper text, and polishing/audit artifacts, enabling autonomous paper production rather than simple reference corpus collection. In the context of a skill advertised as a corpus downloader, this hidden expansion of capability increases the chance of unauthorized content generation, excessive resource use, and unintended modification of project outputs.

Scope Creep

High

Confidence: 99% confidence
Finding: The manifest guardrail says downloads should be stored under `ref/`, but the pipeline declares extensive outputs across `papers/`, `outline/`, `sections/`, `citations/`, and `output/`. This violates least surprise and can cause broad workspace contamination, overwrite risks, and persistence of large or generated artifacts outside the advertised safe storage boundary.

Scope Creep

Medium

Confidence: 81% confidence
Finding: Although the metadata claims only arXiv PDFs should be downloaded, the pipeline text does not enforce that restriction at the pipeline level and instead references broader retrieval/import behavior and optional metadata enrichment. Without explicit source allowlisting and URL validation, the implementation contract is weaker than the stated guardrail and could permit non-arXiv inputs or network access patterns beyond the user’s expectation.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The file content is materially inconsistent with the declared skill metadata: instead of a narrowly scoped arXiv PDF download/extraction skill, it defines a broad graduate-thesis reconstruction pipeline with many unrelated sub-skills and workflows. This mismatch can cause an agent to invoke the wrong capability, violate user expectations and guardrails, and potentially access or transform local thesis materials outside the intended network-only reference-corpus scope.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The implementation materially exceeds the declared skill purpose of downloading arXiv survey PDFs into `ref/`. Instead, it acts as a general workflow executor that reads workspace state, changes unit statuses, and launches an external runner, creating a much broader capability surface than users would expect from the manifest.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: Human-approval and checkpoint-management logic is unrelated to the stated task and indicates the skill can alter workflow control state beyond corpus download/extraction. That mismatch increases the chance that invoking the skill has side effects on the broader agent pipeline, including auto-marking units as approved or done.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This section builds a command from workspace/unit metadata and executes a generic external runner, enabling behavior far beyond downloading and extracting survey PDFs. In the context of this skill, that is dangerous because a caller expecting limited network/file activity instead gets a general task-execution primitive tied to mutable workspace inputs.

Scope Creep

Medium

Confidence: 95% confidence
Finding: The code writes persistent files such as `output/RUN_ERRORS.md`, logs, and status artifacts outside the manifest-declared `ref/` storage area. For a skill whose guardrail says to store downloads under `ref/`, this broader write scope creates unexpected persistence and possible interference with unrelated workspace state.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: The routing hints include generic terms such as `survey`, `review`, and literature-review equivalents, while `routing_default: true` and elevated priority increase the chance this pipeline is selected for broad requests. Because the pipeline actually performs much more than lightweight corpus download, overbroad triggering materially raises the risk of accidental invocation of a powerful workflow.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The routing hints include very broad natural-language triggers such as 'idea', 'brainstorm', '点子', and '找方向', which can match many unrelated user requests and cause this pipeline to activate unintentionally. In an agent system, misrouting can launch a multi-stage literature workflow, create files, invoke networked or downstream skills, and produce outputs the user did not ask for, increasing both operational risk and the chance of unsafe side effects.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The routing hint includes the broad generic phrase "peer review", which can match many ordinary user requests and cause this pipeline to be selected outside its intended scope. In an agent system, overly broad routing increases the chance of mis-execution, accidental artifact creation, and unintended processing of sensitive manuscript or review content under the wrong workflow.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal