抖音视频转文字

Security checks across malware telemetry and agentic risk

Overview

This skill appears to perform the advertised Douyin transcription task, but it needs review because it asks for API keys in chat, persists them locally, sends media to third-party APIs, and builds shell commands from user-controlled input.

Review before installing. Use only non-sensitive videos, assume audio and transcript text may be sent to Groq or OpenAI and saved locally, avoid pasting API keys into chat, and prefer a fixed version that uses a secure secret store plus argument-array process execution instead of shell strings.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill instructs the agent to read and write `.env`, execute shell commands, navigate a browser, and run local scripts, but the metadata shown in the file does not declare those capabilities or permissions. This creates a transparency and consent gap: users and the platform may not realize the skill can access local environment/configuration and modify files before it does so.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The script automatically reads a repository-level .env file and later also consumes an optional cookies file, expanding its access to local secrets beyond the minimum needed for a simple 'transcribe this file/link' workflow. In an agent-skill context, this creates unnecessary secret exposure risk because the skill can ingest credentials from disk without explicit per-use consent or strong scoping.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README explicitly describes sending extracted audio to Groq for transcription and writing results to Markdown, but it does not warn users that video/audio content may contain sensitive information or that transcripts will be stored locally. In this skill context, the omission is meaningful because the workflow automatically processes third-party media and exports content to an external service and local files, creating privacy and data-retention risk.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: The trigger words include broad everyday terms such as '转文字' and '转录', which could cause accidental activation in unrelated conversations. That is primarily a safety and UX issue, but in this skill it matters more because activation can lead to browser automation, file writes, and credential-handling workflows.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill tells the agent to ask the user to paste a Groq API key into chat and then write it into `.env`. Collecting secrets through chat exposes them to logging, transcript retention, unintended model access, and accidental disclosure, and the skill provides no explicit warning or safer alternative.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The skill directs the agent to copy `.env.example` to `.env`, creating or modifying local files without an explicit up-front warning that local configuration will be written. Silent file changes are risky because users may not expect persistent modifications, especially in environments where `.env` may affect other tooling or contain sensitive settings.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The trigger list contains generic terms like “转录”, “transcribe”, and “视频转文本”, which can match many unrelated user requests and cause the skill to activate when the user did not specifically intend a Douyin-only workflow. Because this skill requests browser and exec capabilities and may process external media, unintended invocation increases the chance of unnecessary external access, command execution, or privacy-impacting processing.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The script uploads audio derived from local files or downloaded media to Groq/OpenAI APIs, and may additionally send transcript text to Groq for post-processing, without an explicit upfront warning that content leaves the machine. In a transcription tool, users may provide sensitive recordings, so silent external transmission creates a real privacy and data-handling risk.

Ssd 3

Medium

Confidence: 98% confidence
Finding: This is a direct secret-handling flaw: the instructions explicitly tell the agent to solicit an API key in chat and persist it into local configuration. Secrets shared in conversational channels are frequently logged and replayable, and writing them automatically increases the chance of exposure through filesystem access, backups, or later debugging output.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal