Paper Contribution Helper

Security checks across malware telemetry and agentic risk

Overview

This paper-analysis skill is mostly coherent, but it includes optional paths that can run user-configured local commands and pass sensitive paper content to external or local LLM backends.

Install only if you are comfortable with a research helper that reads and stores paper/review content locally. Avoid using the command or command-file LLM providers unless you fully trust the configured executable and environment, and do not process confidential unpublished work through external APIs without explicit permission.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain

Findings (34)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: prompt_file.write_text(prompt, encoding="utf-8") command = build_command(request, request_json, response_json, prompt_file) proc = subprocess.run( command, capture_output=True, text=True,
Confidence: 96% confidence
Finding: proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

subprocess module call

Medium

Category: Dangerous Code Execution
Content: "LLM_STAGE": stage, } ) proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True,
Confidence: 93% confidence
Finding: proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True, encoding="utf-8", timeout=arg

subprocess module call

Medium

Category: Dangerous Code Execution
Content: if not args.llm_command: raise RuntimeError("--llm-command is required for provider=command") request = {"paper_key": paper_key, "stage": stage, "prompt": prompt} proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True,
Confidence: 90% confidence
Finding: proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True, text=True,

subprocess module call

Medium

Category: Dangerous Code Execution
Content: prompt_file.write_text(prompt, encoding="utf-8") command = build_command(request, request_json, response_json, prompt_file) proc = subprocess.run( command, capture_output=True, text=True,
Confidence: 94% confidence
Finding: proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

subprocess module call

Medium

Category: Dangerous Code Execution
Content: "LLM_STAGE": stage, } ) proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True,
Confidence: 93% confidence
Finding: proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True, encoding="utf-8", timeout=arg

subprocess module call

Medium

Category: Dangerous Code Execution
Content: if not args.llm_command: raise RuntimeError("--llm-command is required for provider=command") request = {"paper_key": paper_key, "stage": stage, "prompt": prompt} proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True,
Confidence: 90% confidence
Finding: proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True, text=True,

Tainted flow: 'command' from os.environ.get (line 98, credential/environment) → subprocess.run (code execution)

Medium

Category: Data Flow
Content: prompt_file.write_text(prompt, encoding="utf-8") command = build_command(request, request_json, response_json, prompt_file) proc = subprocess.run( command, capture_output=True, text=True,
Confidence: 98% confidence
Finding: proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

Tainted flow: 'command' from os.environ.get (line 98, credential/environment) → subprocess.run (code execution)

Medium

Category: Data Flow
Content: prompt_file.write_text(prompt, encoding="utf-8") command = build_command(request, request_json, response_json, prompt_file) proc = subprocess.run( command, capture_output=True, text=True,
Confidence: 97% confidence
Finding: proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The script implements a generic bridge to execute ACP/Node commands configured externally, which is broader than the documented purpose of a paper contribution helper. This mismatch matters because users and reviewers may reasonably expect a domain helper, not a general-purpose process launcher tied to prompt content and environment state. The broader capability increases abuse potential and makes other execution flaws more dangerous in context.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: This is a capability mismatch: the skill claims research-assistance behavior but includes a generic bridge for launching externally configured ACP/Node commands. That broad execution surface materially increases risk because users and reviewers would not expect arbitrary local program execution from this skill context.

Description-Behavior Mismatch

Medium

Confidence: 85% confidence
Finding: The helper is presented as modular and self-contained, but core behavior is delegated to arbitrary ACP CLI commands configured via environment variables. This hidden indirection makes the skill more dangerous because the effective behavior can change at runtime without code changes or user awareness.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The script explicitly supports `--llm-provider=command` and `command-file` with a user-supplied `--llm-command`, then later passes that command into downstream analysis. That creates a built-in arbitrary program execution surface that is unrelated to the core paper-helper function and becomes dangerous if untrusted configs, wrappers, or generated build recipes can influence these arguments.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: This script does substantially more than 'helper construction': it enumerates accepted ICLR submissions, fetches forum content, scores papers for novelty risk, downloads PDFs, and packages materials to local storage. In the context of a paper-contribution helper, bulk harvesting and structured packaging of papers plus review artifacts increases the chance of unintended large-scale data collection and downstream misuse beyond the user’s likely expectations.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code retrieves and stores official reviews, meta reviews, decisions, reviewer follow-ups, and public comments at scale, then uses them to classify 'novelty risk' and generate candidate flags. Within this skill’s context, that is more dangerous than a generic crawler because it operationalizes reviewer/forum harvesting into a reusable pipeline for profiling papers and packaging discussion content, which can expose sensitive or context-dependent review material and facilitate misuse.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The helper transmits full paper text and potentially review/reply content to external LLM APIs, which may expose unpublished research, reviewer comments, or other sensitive material outside the local environment. In this skill context, the transmission is functionally related to analysis, but it still creates a real confidentiality and compliance risk if users are not explicitly informed and no minimization controls are applied.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The command-mode LLM backend allows the skill to run arbitrary local commands, which is not necessary for ordinary paper contribution analysis and materially increases attack surface. If an attacker can influence configuration or convince a user to enable an unsafe command, this becomes an execution primitive with access to local data and the prompts supplied by the tool.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The file-exchange mode writes prompts to disk, propagates environment variables, formats a command string, and then executes it, creating a broad local execution and data exposure surface. For a paper helper, this is especially risky because it combines sensitive content staging with arbitrary process launch, enabling leakage of prompt data, abuse of inherited environment variables, and unintended filesystem access.

Natural-Language Policy Violations

Medium

Confidence: 92% confidence
Finding: The template explicitly forces Chinese for diagnosis, reasoning, ranking, and risk output without checking user preference or system policy. This can override expected language behavior, reduce transparency for users who cannot read Chinese, and make security-relevant analysis harder to review or audit.

Natural-Language Policy Violations

Medium

Confidence: 89% confidence
Finding: The file hard-codes Chinese for most diagnostic and planning content regardless of the user's requested language. In an agent skill, this can override user intent, reduce usability, and create safety/compliance issues when users cannot adequately review nuanced claims, risks, or manuscript-ready guidance in their preferred language.

Natural-Language Policy Violations

Medium

Confidence: 93% confidence
Finding: The workflow hard-codes a response language and ordering requirement ('plain Chinese first') regardless of the user's stated preferences. In an agent setting, this can override higher-priority user expectations, reduce usability, and create prompt-conflict behavior where the system follows embedded workflow text instead of the active request.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The script writes the full prompt to a sidecar file on disk without any visible access controls, minimization, or disclosure in the code path. If prompts contain unpublished paper text, API data, credentials, or proprietary material, this creates a confidentiality risk through residual local artifacts. In this skill context, users are likely to process sensitive research content, making silent persistence more concerning.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script sends extracted paper text, reviews, replies, and synthesized prompts to an external LLM API. In this skill context, that content may include unpublished research, reviewer commentary, or other sensitive material, and the code provides no consent gate, redaction step, or prominent disclosure before transmission.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The workflow directs the assistant to write `assistant_deep_read_records.jsonl` and then run an import step that updates project state, but it does not require an explicit user confirmation before modifying files. In an agent setting, implicit file writes can violate user expectations and create integrity or workflow-safety issues, especially if the assistant acts on ambiguous prompts or the wrong run directory.

Natural-Language Policy Violations

Medium

Confidence: 96% confidence
Finding: The contract hard-codes Chinese as the main report language without indicating that this should be conditional on user preference. This can override a user's requested language or reduce usability, transparency, and reviewability for users expecting another language, which is a real policy and safety issue even if not a direct security exploit.

Natural-Language Policy Violations

Medium

Confidence: 94% confidence
Finding: The contract hard-codes a response language ('Use Chinese for diagnosis, reasoning, ranking, risks, anonymous analogies, and revision planning') without user opt-in. This can override user preferences or downstream system expectations, creating prompt steering and reduced usability; in a security-sensitive workflow, forced language can also hinder reviewer visibility and policy auditing if operators cannot easily inspect the output.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal