Back to skill

Security audit

Eve Research Supervisor Pro

Security checks across malware telemetry and agentic risk

Overview

The skill is a real research assistant, but it needs Review because it can run install-time code, use local AI credentials, send research notes to external LLM endpoints, and operate over SSH with weak controls.

Install only after reviewing the installer and scripts. Use Manual mode for sensitive projects, verify which LLM endpoint and API key will be used, avoid using broad SSH keys or production servers, re-enable host key verification if possible, and periodically inspect or delete the skill's memory directory.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (32)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
ssh += ["-i", os.path.expanduser(key)]
    ssh += [f"{user}@{host}", cmd]
    try:
        r = subprocess.run(ssh, capture_output=True, text=True, timeout=timeout+5)
        return r.stdout.strip()
    except Exception:
        return None
Confidence
93% confidence
Finding
r = subprocess.run(ssh, capture_output=True, text=True, timeout=timeout+5)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
cmd += [f"{user}@{host}", remote_cmd]

    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)
        return result.stdout.strip(), result.stderr.strip()
    except subprocess.TimeoutExpired:
        return None, "SSH timeout"
Confidence
95% confidence
Finding
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)

Context-Inappropriate Capability

Medium
Confidence
85% confidence
Finding
The README presents a research-paper assistant but also advertises SSH-based server monitoring, job watching, and pulling results from a GPU server. Those capabilities expand the trust boundary from local document assistance into remote system access, which is sensitive and potentially dangerous if users do not clearly understand what systems and data the skill may touch.

Context-Inappropriate Capability

Medium
Confidence
87% confidence
Finding
The README states the skill may use a built-in key automatically and also instructs users to export API credentials and a third-party base URL. Credential use and routing model traffic to an external endpoint are security-relevant behaviors; without strong disclosure and controls, users may expose secrets or send sensitive research content to an untrusted service.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
The skill includes SSH/SLURM monitoring and remote job execution capabilities that expand it from a research assistant into a remote infrastructure operator. That materially increases risk because broad natural-language triggers can cause the agent to inspect servers, submit jobs, or pull remote results, exposing sensitive infrastructure data and enabling unintended remote actions.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
Real-time experiment alerting and automatic log parsing/updating lets the skill ingest operational logs and transform them into persistent project data without a strong consent boundary. This can capture sensitive paths, metrics, errors, dataset names, and other internal details from training logs, increasing data exposure beyond the core research-writing function.

Intent-Code Divergence

Medium
Confidence
80% confidence
Finding
The skill claims it should stop and ask when uncertain or before major actions, but Auto mode directs it to run a multi-step pipeline without pausing. This inconsistency weakens user control and can lead to unreviewed searches, downloads, file writes, and memory updates being performed automatically.

Intent-Code Divergence

Low
Confidence
75% confidence
Finding
The skill says to confirm before major actions, yet onboarding and setup immediately save profile data and create directories once inputs are parsed. While less severe than remote execution, it still results in persistent writes without an explicit final consent checkpoint.

Intent-Code Divergence

Medium
Confidence
79% confidence
Finding
The critical rules say Semi-Manual mode should confirm before running pipeline steps, but the Semi-Auto section says technical steps run automatically without approvals. This mismatch can mislead users about the level of control they retain and permit actions they did not expect to be executed.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The script reads API credentials and endpoint configuration from a local settings file and environment variables, which is sensitive capability use. In context, this appears functionally related to calling an LLM for gap detection rather than credential theft, but it still increases risk because the skill silently accesses secrets and can redirect requests to arbitrary endpoints.

Context-Inappropriate Capability

Medium
Confidence
97% confidence
Finding
The script transmits note content to an external HTTP API for LLM processing. Even if intended for legitimate analysis, this creates a data exfiltration path for potentially sensitive research notes, especially since the endpoint is configurable and not constrained to a trusted provider.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
The README advertises automatic SSH-based server monitoring and result pulling but does not warn users about privacy, command execution scope, host trust, or the possibility of accessing sensitive logs and files. In a skill context, undisclosed remote-system interaction can lead users to authorize actions with broader operational impact than expected.

Missing User Warnings

Medium
Confidence
84% confidence
Finding
The README claims persistent session memory across sessions and weeks but does not explain what research data is stored, for how long, or how users can inspect and delete it. Persistent retention increases privacy and data leakage risk, especially for unpublished research, credentials accidentally included in prompts, or sensitive institutional information.

Vague Triggers

Medium
Confidence
88% confidence
Finding
Triggering full Auto mode on broad phrases like 'just do it' or 'run everything' creates a prompt-injection and accidental-activation hazard. Common conversational language could unintentionally launch a large autonomous workflow involving downloads, parsing, writing, and persistence.

Vague Triggers

Medium
Confidence
90% confidence
Finding
Broad natural-language mappings like 'check my server' or 'pull results' overlap with ordinary conversation and can trigger sensitive infrastructure operations. Because these commands touch remote systems and potentially credentials or experiment outputs, accidental or injected activation is materially dangerous.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill collects major, interests, goals, venue, and compute information and stores them persistently, but the description does not clearly warn the user up front that this personal/project data will be retained on disk across sessions. That undermines informed consent and increases privacy risk.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The server-monitoring and experiment-log features can expose infrastructure metadata, logs, and result files, yet the skill text does not provide a clear privacy or security warning for those capabilities. Users may not realize that operational details and log contents may be accessed, parsed, or persisted.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill states it will use built-in API settings and fallback to environment API keys, but it does not clearly warn that it may access local credential material and send data to external APIs. This combination of credential access and external transmission is sensitive and should be explicitly disclosed.

Natural-Language Policy Violations

Medium
Confidence
85% confidence
Finding
The manifest defaults to a third-party Hong Kong-specific API endpoint without documenting why that locale/provider is used or obtaining user opt-in. This can silently route prompts, memory-derived data, and research content to an unexpected external service, creating data residency, privacy, and supply-chain trust concerns.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The manifest describes very broad, high-impact capabilities such as persistent supervision, paper writing, monitoring, and experiment alerts, but provides no activation boundaries, trigger conditions, or scope limitations in the package metadata. In an agent ecosystem, this increases the risk of overbroad invocation and unintended autonomy, especially when paired with a postinstall script and persistent-agent framing, which can lead users or orchestrators to authorize actions beyond what they expect.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The script sends research notes to an external LLM API without explicit consent, warning, or data-sensitivity checks. If the notes contain unpublished research, proprietary material, personal data, or secrets, this can cause unintended disclosure to a third-party service.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
The code sends extracted note text to an external LLM API without an explicit user warning or consent prompt. This is dangerous because users may reasonably expect local processing of their notes, while the script silently uploads potentially confidential material.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The script sends prompts containing notes, citation summaries, ideas, topic, and venue to an external LLM endpoint without explicit user consent or a clear disclosure at runtime. This can expose unpublished research, proprietary material, or sensitive academic content to third-party services, especially since the endpoint is configurable and may not be the expected provider.

Missing User Warnings

Medium
Confidence
99% confidence
Finding
SSH is configured with StrictHostKeyChecking=no, which disables server identity verification and allows transparent man-in-the-middle attacks. In this tool's context, that is especially dangerous because it is used to submit jobs, inspect logs, and access research servers, potentially exposing credentials, job data, and command integrity.

Missing User Warnings

Medium
Confidence
99% confidence
Finding
SCP is also run with host key checking disabled, so result downloads may be intercepted or redirected to an attacker-controlled host without warning. Because this function imports files into the local environment, a MITM could tamper with experiment outputs or deliver malicious payloads disguised as results.

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal