Paper Contribution Helper

Security checks across malware telemetry and agentic risk

Overview

This paper-analysis skill is mostly coherent, but it includes optional paths that can run user-configured local commands and pass sensitive paper content to external or local LLM backends.

Install only if you are comfortable with a research helper that reads and stores paper/review content locally. Avoid using the command or command-file LLM providers unless you fully trust the configured executable and environment, and do not process confidential unpublished work through external APIs without explicit permission.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Findings (34)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
prompt_file.write_text(prompt, encoding="utf-8")
    command = build_command(request, request_json, response_json, prompt_file)

    proc = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
96% confidence
Finding
proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"LLM_STAGE": stage,
            }
        )
        proc = subprocess.run(
            shlex.split(rendered, posix=(os.name != "nt")),
            capture_output=True,
            text=True,
Confidence
93% confidence
Finding
proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True, encoding="utf-8", timeout=arg

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if not args.llm_command:
            raise RuntimeError("--llm-command is required for provider=command")
        request = {"paper_key": paper_key, "stage": stage, "prompt": prompt}
        proc = subprocess.run(
            shlex.split(args.llm_command),
            input=json.dumps(request, ensure_ascii=False),
            capture_output=True,
Confidence
90% confidence
Finding
proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True, text=True,

subprocess module call

Medium
Category
Dangerous Code Execution
Content
prompt_file.write_text(prompt, encoding="utf-8")
    command = build_command(request, request_json, response_json, prompt_file)

    proc = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
94% confidence
Finding
proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"LLM_STAGE": stage,
            }
        )
        proc = subprocess.run(
            shlex.split(rendered, posix=(os.name != "nt")),
            capture_output=True,
            text=True,
Confidence
93% confidence
Finding
proc = subprocess.run( shlex.split(rendered, posix=(os.name != "nt")), capture_output=True, text=True, encoding="utf-8", timeout=arg

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if not args.llm_command:
            raise RuntimeError("--llm-command is required for provider=command")
        request = {"paper_key": paper_key, "stage": stage, "prompt": prompt}
        proc = subprocess.run(
            shlex.split(args.llm_command),
            input=json.dumps(request, ensure_ascii=False),
            capture_output=True,
Confidence
90% confidence
Finding
proc = subprocess.run( shlex.split(args.llm_command), input=json.dumps(request, ensure_ascii=False), capture_output=True, text=True,

Tainted flow: 'command' from os.environ.get (line 98, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
prompt_file.write_text(prompt, encoding="utf-8")
    command = build_command(request, request_json, response_json, prompt_file)

    proc = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
98% confidence
Finding
proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

Tainted flow: 'command' from os.environ.get (line 98, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
prompt_file.write_text(prompt, encoding="utf-8")
    command = build_command(request, request_json, response_json, prompt_file)

    proc = subprocess.run(
        command,
        capture_output=True,
        text=True,
Confidence
97% confidence
Finding
proc = subprocess.run( command, capture_output=True, text=True, encoding="utf-8", errors="replace", timeout=timeout, check=False, en

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The script implements a generic bridge to execute ACP/Node commands configured externally, which is broader than the documented purpose of a paper contribution helper. This mismatch matters because users and reviewers may reasonably expect a domain helper, not a general-purpose process launcher tied to prompt content and environment state. The broader capability increases abuse potential and makes other execution flaws more dangerous in context.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
This is a capability mismatch: the skill claims research-assistance behavior but includes a generic bridge for launching externally configured ACP/Node commands. That broad execution surface materially increases risk because users and reviewers would not expect arbitrary local program execution from this skill context.

Description-Behavior Mismatch

Medium
Confidence
85% confidence
Finding
The helper is presented as modular and self-contained, but core behavior is delegated to arbitrary ACP CLI commands configured via environment variables. This hidden indirection makes the skill more dangerous because the effective behavior can change at runtime without code changes or user awareness.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The script explicitly supports `--llm-provider=command` and `command-file` with a user-supplied `--llm-command`, then later passes that command into downstream analysis. That creates a built-in arbitrary program execution surface that is unrelated to the core paper-helper function and becomes dangerous if untrusted configs, wrappers, or generated build recipes can influence these arguments.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
This script does substantially more than 'helper construction': it enumerates accepted ICLR submissions, fetches forum content, scores papers for novelty risk, downloads PDFs, and packages materials to local storage. In the context of a paper-contribution helper, bulk harvesting and structured packaging of papers plus review artifacts increases the chance of unintended large-scale data collection and downstream misuse beyond the user’s likely expectations.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The code retrieves and stores official reviews, meta reviews, decisions, reviewer follow-ups, and public comments at scale, then uses them to classify 'novelty risk' and generate candidate flags. Within this skill’s context, that is more dangerous than a generic crawler because it operationalizes reviewer/forum harvesting into a reusable pipeline for profiling papers and packaging discussion content, which can expose sensitive or context-dependent review material and facilitate misuse.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The helper transmits full paper text and potentially review/reply content to external LLM APIs, which may expose unpublished research, reviewer comments, or other sensitive material outside the local environment. In this skill context, the transmission is functionally related to analysis, but it still creates a real confidentiality and compliance risk if users are not explicitly informed and no minimization controls are applied.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The command-mode LLM backend allows the skill to run arbitrary local commands, which is not necessary for ordinary paper contribution analysis and materially increases attack surface. If an attacker can influence configuration or convince a user to enable an unsafe command, this becomes an execution primitive with access to local data and the prompts supplied by the tool.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The file-exchange mode writes prompts to disk, propagates environment variables, formats a command string, and then executes it, creating a broad local execution and data exposure surface. For a paper helper, this is especially risky because it combines sensitive content staging with arbitrary process launch, enabling leakage of prompt data, abuse of inherited environment variables, and unintended filesystem access.

Natural-Language Policy Violations

Medium
Confidence
92% confidence
Finding
The template explicitly forces Chinese for diagnosis, reasoning, ranking, and risk output without checking user preference or system policy. This can override expected language behavior, reduce transparency for users who cannot read Chinese, and make security-relevant analysis harder to review or audit.

Natural-Language Policy Violations

Medium
Confidence
89% confidence
Finding
The file hard-codes Chinese for most diagnostic and planning content regardless of the user's requested language. In an agent skill, this can override user intent, reduce usability, and create safety/compliance issues when users cannot adequately review nuanced claims, risks, or manuscript-ready guidance in their preferred language.

Natural-Language Policy Violations

Medium
Confidence
93% confidence
Finding
The workflow hard-codes a response language and ordering requirement ('plain Chinese first') regardless of the user's stated preferences. In an agent setting, this can override higher-priority user expectations, reduce usability, and create prompt-conflict behavior where the system follows embedded workflow text instead of the active request.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The script writes the full prompt to a sidecar file on disk without any visible access controls, minimization, or disclosure in the code path. If prompts contain unpublished paper text, API data, credentials, or proprietary material, this creates a confidentiality risk through residual local artifacts. In this skill context, users are likely to process sensitive research content, making silent persistence more concerning.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script sends extracted paper text, reviews, replies, and synthesized prompts to an external LLM API. In this skill context, that content may include unpublished research, reviewer commentary, or other sensitive material, and the code provides no consent gate, redaction step, or prominent disclosure before transmission.

Missing User Warnings

Low
Confidence
87% confidence
Finding
The workflow directs the assistant to write `assistant_deep_read_records.jsonl` and then run an import step that updates project state, but it does not require an explicit user confirmation before modifying files. In an agent setting, implicit file writes can violate user expectations and create integrity or workflow-safety issues, especially if the assistant acts on ambiguous prompts or the wrong run directory.

Natural-Language Policy Violations

Medium
Confidence
96% confidence
Finding
The contract hard-codes Chinese as the main report language without indicating that this should be conditional on user preference. This can override a user's requested language or reduce usability, transparency, and reviewability for users expecting another language, which is a real policy and safety issue even if not a direct security exploit.

Natural-Language Policy Violations

Medium
Confidence
94% confidence
Finding
The contract hard-codes a response language ('Use Chinese for diagnosis, reasoning, ranking, risks, anonymous analogies, and revision planning') without user opt-in. This can override user preferences or downstream system expectations, creating prompt steering and reduced usability; in a security-sensitive workflow, forced language can also hinder reviewer visibility and policy auditing if operators cannot easily inspect the output.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal