Security audit

抖音视频智能助手

Security checks across malware telemetry and agentic risk

Overview

This skill mostly does the advertised Douyin transcription job, but it asks for an API key in chat and can send audio/transcripts to third-party services with weak consent and scoping.

Install only if you are comfortable with browser-based Douyin access, local command execution, remote upload of audio/transcripts to Groq or OpenAI, and local transcript storage. Configure API keys through a secure environment or secret store rather than pasting them into chat, and avoid processing sensitive or private videos unless you understand where the resulting audio and text will be sent and saved.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 82% confidence
Finding: The skill clearly instructs use of environment variables and a local .env file, but the metadata shown does not declare corresponding permissions or capabilities. This creates a transparency and governance gap: users and platforms cannot accurately understand that the skill handles configuration secrets and local execution context.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The manifest describes a transcription assistant, but the instructions also cover downloading media, sending content to third-party services, formatting with an external LLM, and persisting outputs locally and potentially to external platforms. That mismatch reduces informed consent and can mislead users about network exfiltration, storage, and side effects.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill instructs the agent to ask the user for a Groq API key in chat and then store it in .env. Collecting secrets through normal conversation is unsafe because chat logs may be retained, exposed to the model, or mishandled by tooling, turning a convenience step into credential collection.

Context-Inappropriate Capability

Medium

Confidence: 76% confidence
Finding: The skill expands into creating documents in Feishu or Notion, which goes beyond simple transcription and archiving. This broadens the data-sharing surface by potentially sending transcript content to additional third-party services without being part of the core declared scope.

Description-Behavior Mismatch

Low

Confidence: 92% confidence
Finding: The header describes the tool as a Douyin video-to-text script, but the implementation also sends transcript text to a separate external LLM for punctuation and segmentation. That mismatch can mislead users about data flows and privacy exposure, causing them to share audio/text they would not have sent if the external post-processing were clearly disclosed.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README states that extracted audio is sent to Groq for transcription and LLM processing, but it does not clearly warn users that potentially sensitive spoken content will be transmitted to a third-party service. In a transcription skill, this creates a real privacy and data-governance risk because users may reasonably assume local processing unless disclosure is explicit.

Vague Triggers

Medium

Confidence: 71% confidence
Finding: The trigger phrases are broad enough to match ordinary conversation, which can cause the skill to activate when the user did not intend media extraction or third-party transcription. In this skill, accidental invocation is more dangerous because activation can lead to browser automation, downloads, file storage, and external API transmission.

Vague Triggers

Medium

Confidence: 74% confidence
Finding: The intent table uses vague phrases like '帮我看看' or '你怎么看', which can misclassify user intent and trigger broader processing than expected. Because the default flow automatically transcribes and summarizes first, ambiguous language can still cause data handling and external service use without clear consent.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The skill instructs saving uploaded video files under a local temp directory without a clear upfront disclosure, retention policy, or cleanup guarantee. User-uploaded videos may contain sensitive personal or copyrighted content, so silent local persistence increases privacy and data handling risk.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The workflow sends audio and transcript content to Groq for transcription and punctuation, but the skill does not present a clear upfront privacy notice or explicit opt-in for third-party processing. This is especially risky because videos may contain personal data, sensitive speech, or copyrighted material that users may not expect to leave the local environment.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The trigger list contains generic activation terms such as "转录" and "transcribe" that are not uniquely tied to Douyin content, which can cause the skill to activate in unrelated conversations. Because this skill requests powerful capabilities including browser, exec, and external tooling, accidental activation increases the chance of unnecessary tool access, unintended processing of user content, and confused-deputy behavior in broader workflows.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script uploads full audio content to Groq/OpenAI for transcription and may also send transcript text to Groq for formatting, but it does not present an explicit user warning or consent step about third-party data sharing. In a transcription skill, users may reasonably expect processing, but the lack of transparent disclosure increases privacy and compliance risk, especially if videos contain personal or sensitive speech.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The skill explicitly directs the agent to solicit a user's Groq API key in chat and write it into .env. That is a direct secret-handling anti-pattern: credentials exposed in chat can be logged, leaked through prompt/context, or accessed by other tools reading the workspace.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dangerous_exec, suspicious.exposed_secret_literal

Shell command execution detected (child_process).

Critical

Code: suspicious.dangerous_exec
Location: scripts/transcribe.js:32

File appears to expose a hardcoded API secret or token.

Critical

Code: suspicious.exposed_secret_literal
Location: scripts/transcribe.js:457