OpenPaperGraph

Security checks across malware telemetry and agentic risk

Overview

This skill is mostly purpose-aligned, but its interactive server can expose unauthenticated graph-changing actions to the network.

Install only if you are comfortable with a research tool that makes external API calls and can persistently edit graph files. Avoid running serve mode on untrusted networks; prefer localhost-only isolation if possible, keep backups of graph JSON files, and do not enter sensitive API keys or private manuscript data unless you intend to send that content to the selected provider.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (24)

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The server enables CORS for all origins and exposes unauthenticated mutation endpoints such as add, convert, delete, enrich, and save. Because it binds to 0.0.0.0, any reachable host or any website visited by the user can send cross-origin requests that modify or delete the local graph file, which is a real integrity issue rather than a mere design concern.

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
This export module goes beyond generating a static visualization and embeds browser-side LLM features that collect API keys, build prompts from paper metadata, and invoke external endpoints. That substantially expands the trust boundary of an HTML export feature and can lead to inadvertent disclosure of research data and user-supplied credentials to third parties.

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The exported HTML includes UI and code to collect an API key and send paper titles, authors, abstracts, and other metadata to many arbitrary remote LLM endpoints directly from the browser. In a shared or redistributed HTML file, users may trust the artifact as a local visualization and not realize it can exfiltrate sensitive research content and credentials to external services.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
This file implements a generic LLM gateway that can send arbitrary prompts to many third-party providers selected by environment configuration. In a literature-analysis skill, prompts may contain paper text, user queries, PDFs-derived content, or other research data, so broad external transmission is a real data-exposure risk rather than a purely theoretical concern.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The README states that all graph mutations are saved to the JSON file immediately, but it does not prominently warn users before mutation features are used that actions are destructive and persist to disk in real time. In an agent-skill context, this increases the risk of unintended file modification or data loss when an AI-driven workflow invokes add/convert/remove/enrich operations on behalf of the user.

Vague Triggers

Medium
Confidence
86% confidence
Finding
The description is broad enough that the skill could activate for many generic research-related requests, including ones where users did not intend network calls, file writes, PDF parsing, browser serving, or persistent graph modification. Over-broad triggering increases the chance of unintended tool execution and exposure of local files, uploaded PDFs, or API-backed operations in contexts that only required lightweight advice.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill documents persistent graph modification and destructive operations like `remove-seed`, `remove-paper`, and a `serve` mode that writes changes immediately to disk, but it does not prominently require explicit user confirmation before overwriting or deleting data. In an agent setting, this can lead to accidental loss or silent mutation of user research data and generated graph files.

Missing User Warnings

Low
Confidence
81% confidence
Finding
The skill instructs users to supply API keys for Zotero, Semantic Scholar, and LLM providers, but it does not clearly warn against exposing secrets in command arguments, logs, generated files, or chat transcripts. In an agent workflow, users may paste credentials directly into commands, increasing the risk of accidental disclosure through shell history, stdout/stderr capture, or persisted notes.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The code sends extracted PDF text to an external LLM service for reference extraction without any visible consent, warning, or data-minimization control. Academic PDFs can contain unpublished manuscripts, licensed content, or sensitive notes, so transmitting even truncated text to a third party creates a real confidentiality and compliance risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The `zotero` subcommand requires `--api-key` on the command line, which can expose secrets through shell history, process listings, job control tools, audit logs, and telemetry in multi-user or managed environments. In an agent skill context, this is more dangerous because orchestration layers often log invoked commands and arguments, increasing the chance of credential leakage beyond the local terminal session.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The LLM summarization path builds a prompt from paper titles, authors, years, citation counts, and abstract text, then sends it to an external provider via llm_chat without any consent gate, redaction step, or visible warning in this code path. In a literature-analysis skill, papers may include unpublished manuscripts, proprietary corpora descriptions, or sensitive bibliographic datasets, so silent third-party transmission creates a real confidentiality and compliance risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The code sends up to 3000 characters of first-page PDF text to an external LLM service in `_llm_extract_metadata` without any consent gate, disclosure, or redaction. In an academic-paper workflow this text can contain unpublished research, author identities, affiliations, emails, or other sensitive content, so silent transmission creates a real privacy and data-governance risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The LLM fallback extracts text from uploaded/downloaded PDFs and sends up to 12,000 characters to an external model via `llm_chat` without any consent gate, redaction, or clear runtime disclosure. Academic PDFs can contain unpublished manuscripts, reviewer copies, institutional watermarks, or sensitive content, so this creates a real data exfiltration/privacy risk rather than a purely internal implementation detail.

Missing User Warnings

High
Confidence
93% confidence
Finding
The delete endpoint performs destructive mutation and immediately persists the change to disk without any server-side authorization, CSRF protection, or origin restriction. In this skill context, the graph represents user-curated research data, so unauthorized deletion can cause direct data loss and corruption of literature-analysis results.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The 'Copy for LLM' feature exports a structured bundle of paper metadata, URLs, DOIs, abstracts, and citation edges to the clipboard without a clear warning that the content is intended for pasting into external AI services. This creates a real data-leak risk, especially for unpublished, proprietary, or sensitive literature collections where users may not appreciate the scope of copied data.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
The client-side summary flow transmits collected paper data to externally selected LLM providers using a user-entered API key, but only gives a generic note about session storage and possible CORS issues. That is insufficient informed consent for a feature that sends potentially large amounts of research content to third parties and encourages credential entry into a generated HTML artifact.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The llm_chat function sends both system and user prompt content to an external API without any in-function warning, consent gate, or sensitivity screening. Because this skill processes academic content and may summarize uploaded or parsed PDFs, users may not realize their inputs are being transferred off-platform to third parties.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
When GROBID is enabled, the code uploads the full PDF to a local HTTP service over an unauthenticated plaintext endpoint. Even though it targets localhost, this still transfers potentially sensitive paper contents to another process/container without an explicit consent boundary, and any local compromise, port forwarding, misbinding, or hostile local service on that port could receive the document.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
This code sends bibliographic metadata derived from user-provided references, including title and optionally author names, to the CrossRef API without any evidence in this file of user consent, disclosure, or a privacy gate. While this is expected functionality for a literature-resolution tool, extracted references may come from private or unpublished PDFs, so transmitting them to third-party services can leak sensitive research topics or manuscripts.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
This code transmits reference titles, and sometimes publication year, to the OpenAlex API as part of resolution. In the context of PDF-derived references, this can expose confidential reading lists, draft citations, or unpublished research interests to an external service if users are not clearly warned and given control.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The code automatically downloads and parses remote PDFs as part of normal reference retrieval without any explicit user consent, provenance checks, or restrictions shown here. In an agent context, this creates a data-handling and resource-consumption risk: user-triggered queries can cause unsolicited outbound fetches and local file creation/parsing of untrusted documents, which expands attack surface and may violate user expectations or deployment policy.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The function sends paper titles to Google Scholar for citation lookup without an explicit disclosure or consent boundary in this module. While titles are usually public, transmitting user-supplied or sensitive research topics to a third-party service can expose intent, interests, or confidential project subjects in an agent workflow.

Env Variable Harvesting

High
Category
Data Exfiltration
Content
return None

    # Resolve API key: provider-specific env var, or generic LLM_API_KEY
    api_key = os.environ.get(provider.env_key) or os.environ.get("LLM_API_KEY", "")
    base_url = os.environ.get("LLM_BASE_URL", provider.base_url)
    model = model or os.environ.get("LLM_MODEL", provider.default_model)
Confidence
76% confidence
Finding
os.environ.get(provider.env_key

Env Variable Harvesting

High
Category
Data Exfiltration
Content
return None

    # Resolve API key: provider-specific env var, or generic LLM_API_KEY
    api_key = os.environ.get(provider.env_key) or os.environ.get("LLM_API_KEY", "")
    base_url = os.environ.get("LLM_BASE_URL", provider.base_url)
    model = model or os.environ.get("LLM_MODEL", provider.default_model)
Confidence
76% confidence
Finding
os.environ.get("LLM_API_KEY

VirusTotal

51/51 vendors flagged this skill as clean.

View on VirusTotal