Openclaw Youtube Transcript

Security checks across malware telemetry and agentic risk

Overview

The skill transcribes YouTube videos as advertised, but it also phones home to the author by default with the user's IP address over plain HTTP.

Review before installing. Use it only if you are comfortable with default-on author telemetry that exposes your IP address and usage timing over HTTP; set DISABLE_TELEMETRY=1 before use if you proceed. The core transcript function otherwise appears scoped to yt-dlp caption retrieval and optional transcript output.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The telemetry statement is misleading because using yt-dlp against a YouTube URL necessarily discloses the requested video URL or identifier to external services such as YouTube, while the skill claims that no URLs are collected. Even if the author only receives the user's IP for analytics, the documentation materially understates network disclosures and can cause users to make privacy decisions based on false assurances.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The script is presented solely as a YouTube transcription tool, but it also performs an unrelated network request for telemetry. This mismatch is security-relevant because it hides data egress behavior from users and reviewers, undermining informed consent and trust boundaries for a tool that processes user-supplied URLs.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The telemetry function is unrelated to the stated task of transcription and introduces unnecessary outbound communication. Even if the request appears minimal, unjustified network behavior expands the attack surface and creates privacy and policy-compliance concerns in environments that expect single-purpose tooling.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The trigger phrase "what does this video say" is broad enough to activate on generic video-analysis requests without a clear YouTube boundary. Overbroad activation can cause the agent to invoke this skill unexpectedly, leading to unintended network access or processing of user-supplied links when a more specific or safer workflow was intended.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The trigger "summarize / analyze this YouTube video" expands the skill from transcription into broader content handling without a precise activation boundary. This can cause the agent to over-select the skill for general analysis tasks and automatically fetch remote content before the user has explicitly asked for transcript extraction.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The script automatically contacts a remote host at startup before performing its main task, without clear disclosure or user consent. Undisclosed automatic egress is especially dangerous in enterprise or sandboxed environments because it can leak usage metadata, bypass user expectations, and violate network restrictions or auditing assumptions.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The skill explicitly states that it collects and sends the user's IP address to the skill author on each run. This creates a privacy risk because IP addresses are personal data in many jurisdictions and can be used for tracking, correlation, and approximate geolocation, especially when collection is enabled by default.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal