Glmv Caption Tunnel

Security checks across malware telemetry and agentic risk

Overview

This captioning skill is mostly coherent, but local videos and documents can be automatically exposed through a temporary public Cloudflare tunnel without a strong per-use consent gate.

Install only if you are comfortable sending selected media to Zhipu/BigModel and, for local videos or documents, temporarily making those selected files reachable through a public Cloudflare URL. Avoid confidential, regulated, or secret-bearing files unless you add explicit per-file consent and understand the exposure path.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (11)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill uses sensitive capabilities including environment variables, file writing, shell execution, and network access, yet does not declare permissions or prominently constrain those behaviors. This reduces transparency and prevents hosts or users from making informed trust decisions before execution.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The skill is presented as a captioning utility, but for local videos and documents it also starts a local server and exposes selected files to the public internet through a Cloudflare quick tunnel. That behavior materially changes the risk profile because sensitive local content leaves the machine via an externally reachable URL, which is far more dangerous than ordinary local file upload semantics.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The CLI automatically exposes local video and document paths through a public Cloudflare tunnel so a third-party API can fetch them, which creates an unintended public-access path for local files. Even if the tunnel is temporary, users may not realize sensitive local content is being published externally, and the manifest description does not clearly communicate that local files will be made remotely accessible.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This helper intentionally publishes selected local files to a public trycloudflare.com URL, creating an external data-exfiltration channel from the host. In the context of an image/video captioning skill, that is especially risky because local user-supplied paths may contain sensitive documents, and the capability is broader than the user would reasonably expect from caption generation alone.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explicitly promotes automatically exposing local videos and documents through a public Cloudflare Quick Tunnel, but its warning language is understated relative to the privacy risk. Users may reasonably interpret 'transparent local file support' and 'handles everything automatically' as low-risk convenience, while the actual behavior publishes local files to a public HTTPS endpoint for the duration of processing, which can leak sensitive data if the URL is guessed, logged, shared, or mishandled by downstream services.

Vague Triggers

Medium

Confidence: 72% confidence
Finding: The invocation guidance is broad enough to trigger on common requests about describing or summarizing files, images, and videos without sufficiently narrowing when the skill should or should not run. In practice, this can cause unnecessary activation of a networked skill on ambiguous user requests, increasing unintended data disclosure risk.

Vague Triggers

Medium

Confidence: 74% confidence
Finding: The trigger examples use vague phrases like describing or interpreting media without boundaries for private, regulated, or local content. This makes accidental invocation more likely, which is risky here because the skill can transmit user media and even expose local files through a tunnel.

Natural-Language Policy Violations

High

Confidence: 83% confidence
Finding: The skill mandates a fixed Chinese-language user-facing installation and consent script without regard to the user's language preference. This can impair informed consent, especially for security-relevant actions like installing cloudflared and exposing files publicly, because users may not fully understand what they are approving.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The tool creates a public tunnel for local files at execution time without a strong user-facing consent gate, so sensitive files may be exposed externally without informed approval. In a captioning skill, users may expect remote inference but not that their local documents or videos will first be hosted through a public Cloudflare endpoint.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The code exposes files over a public tunnel without any user-facing confirmation, warning, or disclosure at execution time that the files will be reachable via a public HTTPS URL. That hidden behavior materially increases the chance of accidental sensitive-data exposure, particularly in an agent setting where users may believe files are processed locally or sent only to the intended model provider.

Ssd 3

Medium

Confidence: 89% confidence
Finding: The instruction to always show the full raw model output can directly surface sensitive content extracted from uploaded documents, videos, or images without minimization or redaction. In this context, the model may summarize or quote confidential information from local files, so mandatory verbatim display increases exposure to any requester or bystander with access to the session.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal