Back to skill

Security audit

抖音视频智能助手

Security checks across malware telemetry and agentic risk

Overview

This skill mostly does the advertised Douyin transcription job, but it asks for an API key in chat and can send audio/transcripts to third-party services with weak consent and scoping.

Install only if you are comfortable with browser-based Douyin access, local command execution, remote upload of audio/transcripts to Groq or OpenAI, and local transcript storage. Configure API keys through a secure environment or secret store rather than pasting them into chat, and avoid processing sensitive or private videos unless you understand where the resulting audio and text will be sent and saved.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (13)

Lp3

Medium
Category
MCP Least Privilege
Confidence
82% confidence
Finding
The skill clearly instructs use of environment variables and a local .env file, but the metadata shown does not declare corresponding permissions or capabilities. This creates a transparency and governance gap: users and platforms cannot accurately understand that the skill handles configuration secrets and local execution context.

Tp4

High
Category
MCP Tool Poisoning
Confidence
89% confidence
Finding
The manifest describes a transcription assistant, but the instructions also cover downloading media, sending content to third-party services, formatting with an external LLM, and persisting outputs locally and potentially to external platforms. That mismatch reduces informed consent and can mislead users about network exfiltration, storage, and side effects.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The skill instructs the agent to ask the user for a Groq API key in chat and then store it in .env. Collecting secrets through normal conversation is unsafe because chat logs may be retained, exposed to the model, or mishandled by tooling, turning a convenience step into credential collection.

Context-Inappropriate Capability

Medium
Confidence
76% confidence
Finding
The skill expands into creating documents in Feishu or Notion, which goes beyond simple transcription and archiving. This broadens the data-sharing surface by potentially sending transcript content to additional third-party services without being part of the core declared scope.

Description-Behavior Mismatch

Low
Confidence
92% confidence
Finding
The header describes the tool as a Douyin video-to-text script, but the implementation also sends transcript text to a separate external LLM for punctuation and segmentation. That mismatch can mislead users about data flows and privacy exposure, causing them to share audio/text they would not have sent if the external post-processing were clearly disclosed.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The README states that extracted audio is sent to Groq for transcription and LLM processing, but it does not clearly warn users that potentially sensitive spoken content will be transmitted to a third-party service. In a transcription skill, this creates a real privacy and data-governance risk because users may reasonably assume local processing unless disclosure is explicit.

Vague Triggers

Medium
Confidence
71% confidence
Finding
The trigger phrases are broad enough to match ordinary conversation, which can cause the skill to activate when the user did not intend media extraction or third-party transcription. In this skill, accidental invocation is more dangerous because activation can lead to browser automation, downloads, file storage, and external API transmission.

Vague Triggers

Medium
Confidence
74% confidence
Finding
The intent table uses vague phrases like '帮我看看' or '你怎么看', which can misclassify user intent and trigger broader processing than expected. Because the default flow automatically transcribes and summarizes first, ambiguous language can still cause data handling and external service use without clear consent.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The skill instructs saving uploaded video files under a local temp directory without a clear upfront disclosure, retention policy, or cleanup guarantee. User-uploaded videos may contain sensitive personal or copyrighted content, so silent local persistence increases privacy and data handling risk.

Missing User Warnings

High
Confidence
95% confidence
Finding
The workflow sends audio and transcript content to Groq for transcription and punctuation, but the skill does not present a clear upfront privacy notice or explicit opt-in for third-party processing. This is especially risky because videos may contain personal data, sensitive speech, or copyrighted material that users may not expect to leave the local environment.

Vague Triggers

Medium
Confidence
94% confidence
Finding
The trigger list contains generic activation terms such as "转录" and "transcribe" that are not uniquely tied to Douyin content, which can cause the skill to activate in unrelated conversations. Because this skill requests powerful capabilities including browser, exec, and external tooling, accidental activation increases the chance of unnecessary tool access, unintended processing of user content, and confused-deputy behavior in broader workflows.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The script uploads full audio content to Groq/OpenAI for transcription and may also send transcript text to Groq for formatting, but it does not present an explicit user warning or consent step about third-party data sharing. In a transcription skill, users may reasonably expect processing, but the lack of transparent disclosure increases privacy and compliance risk, especially if videos contain personal or sensitive speech.

Ssd 3

Medium
Confidence
97% confidence
Finding
The skill explicitly directs the agent to solicit a user's Groq API key in chat and write it into .env. That is a direct secret-handling anti-pattern: credentials exposed in chat can be logged, leaked through prompt/context, or accessed by other tools reading the workspace.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dangerous_exec, suspicious.exposed_secret_literal

Shell command execution detected (child_process).

Critical
Code
suspicious.dangerous_exec
Location
scripts/transcribe.js:32

File appears to expose a hardcoded API secret or token.

Critical
Code
suspicious.exposed_secret_literal
Location
scripts/transcribe.js:457