douyin-analyse-batch

Security checks across malware telemetry and agentic risk

Overview

This skill should be reviewed because it sets up recurring email automation but also contains hard-coded recipients and bundled download, proxy, Telegram, MCP, and transcription features that are broader than the advertised daily report.

Install only after reviewing and changing all recipient settings, cron entries, SMTP credentials, and API keys. Treat the bundled downloader, transcription, MCP server, WebUI proxy, and Telegram artifacts as additional capabilities, not just dependencies for a simple daily email report.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (65)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
docx_path = None
    try:
        # 用 venv Python 运行 md_to_docx(依赖 python-docx)
        p = subprocess.run(
            [VENV_PY, str(SCRIPT_DIR / 'helpers' / 'md_to_docx.py'), str(file_path)],
            capture_output=True, text=True, timeout=60
        )
Confidence
88% confidence
Finding
p = subprocess.run( [VENV_PY, str(SCRIPT_DIR / 'helpers' / 'md_to_docx.py'), str(file_path)], capture_output=True, text=True, timeout=60 )

Tainted flow: 'video_info' from os.getenv (line 359, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
if show_progress:
            print(f"正在下载视频: {video_info['title']}")

        response = requests.get(video_info['url'], headers=HEADERS, stream=True)
        response.raise_for_status()

        # 获取文件大小
Confidence
88% confidence
Finding
response = requests.get(video_info['url'], headers=HEADERS, stream=True)

Tainted flow: 'files' from open (line 251, file read) → requests.post (network output)

High
Category
Data Flow
Content
}

        try:
            response = requests.post(self.api_base_url, files=files, headers=headers)
            response.raise_for_status()

            result = response.json()
Confidence
97% confidence
Finding
response = requests.post(self.api_base_url, files=files, headers=headers)

Tainted flow: 'share_url' from requests.get (line 69, network input) → requests.get (network output)

Medium
Category
Data Flow
Content
raise ValueError("未找到有效的分享链接")
        
        share_url = urls[0]
        share_response = requests.get(share_url, headers=HEADERS)
        video_id = share_response.url.split("?")[0].strip("/").split("/")[-1]
        share_url = f'https://www.iesdouyin.com/share/video/{video_id}'
Confidence
88% confidence
Finding
share_response = requests.get(share_url, headers=HEADERS)

Tainted flow: 'video_info' from requests.get (line 314, network input) → requests.get (network output)

Medium
Category
Data Flow
Content
ctx.info(f"正在下载视频: {video_info['title']}")
        
        response = requests.get(video_info['url'], headers=HEADERS, stream=True)
        response.raise_for_status()
        
        # 获取文件大小
Confidence
83% confidence
Finding
response = requests.get(video_info['url'], headers=HEADERS, stream=True)

Tainted flow: 'cmd' from os.environ.get (line 26, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
def run(cmd, capture=True, check=True):
    if isinstance(cmd, str):
        cmd = cmd.split()
    p = subprocess.run(cmd, capture_output=capture, text=True)
    if check and p.returncode != 0:
        raise RuntimeError(p.stderr.strip() or f"command failed: {cmd}")
    return p.stdout.strip() if capture else ""
Confidence
87% confidence
Finding
p = subprocess.run(cmd, capture_output=capture, text=True)

Tainted flow: 'cmd' from os.environ.get (line 26, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
'-o', str(out_path.with_suffix('.%(ext)s')),
        url
    ]
    p = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
    if p.returncode == 0:
        downloaded = list(out_path.parent.glob(f"{out_path.stem}.*"))
        if downloaded:
Confidence
84% confidence
Finding
p = subprocess.run(cmd, capture_output=True, text=True, timeout=120)

Tainted flow: 'VENV_PY' from os.environ.get (line 18, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
print(text)
_run()
"""
    p = subprocess.run([VENV_PY, '-c', code], capture_output=True, text=True, timeout=300)
    if p.returncode != 0:
        raise RuntimeError(p.stderr.strip() or 'transcription failed')
    return p.stdout.strip()
Confidence
89% confidence
Finding
p = subprocess.run([VENV_PY, '-c', code], capture_output=True, text=True, timeout=300)

Tainted flow: 'VENV_PY' from os.environ.get (line 16, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
print(text)
_run()
"""
    p = subprocess.run([VENV_PY, '-c', code], capture_output=True, text=True, timeout=600)
    if p.returncode != 0:
        raise RuntimeError(p.stderr.strip() or p.stdout.strip() or 'transcription failed')
    return p.stdout.strip()
Confidence
90% confidence
Finding
p = subprocess.run([VENV_PY, '-c', code], capture_output=True, text=True, timeout=600)

Tainted flow: 'VENV_PY' from os.environ.get (line 23, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
docx_path = None
    try:
        # 用 venv Python 运行 md_to_docx(依赖 python-docx)
        p = subprocess.run(
            [VENV_PY, str(SCRIPT_DIR / 'helpers' / 'md_to_docx.py'), str(file_path)],
            capture_output=True, text=True, timeout=60
        )
Confidence
98% confidence
Finding
p = subprocess.run( [VENV_PY, str(SCRIPT_DIR / 'helpers' / 'md_to_docx.py'), str(file_path)], capture_output=True, text=True, timeout=60 )

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill documentation describes capabilities including shell execution, file read/write, network access, environment-variable handling, cron installation, and email delivery, but no explicit permissions are declared. This creates a transparency and consent problem: users may install or invoke a skill with materially broader powers than they were informed about, increasing the chance of unintended system changes or data exposure.

Tp4

High
Category
MCP Tool Poisoning
Confidence
97% confidence
Finding
The documented purpose is a Douyin daily report and email sender, but the referenced dependency set appears to include materially broader behaviors such as Telegram pushing, local JSON/TXT exports, video parsing/downloading, transcription, MCP server exposure, and a FastAPI WebUI/proxy. This mismatch is dangerous because users may authorize a seemingly narrow reporting skill that actually enables data exfiltration channels, remote interfaces, and media-processing features well beyond the stated use case.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The installation flow runs a host-level setup script that creates environments, writes configuration, and modifies cron, which goes beyond simple report generation. Documentation that encourages broad system management through a single install command increases the risk of persistent changes, difficult rollback, and abuse if the script is altered or misunderstood.

Context-Inappropriate Capability

Low
Confidence
80% confidence
Finding
The skill includes cron-based persistence for twice-daily execution, which is a host-level capability rather than a simple on-demand analysis action. While plausibly related to a 'daily report' feature, persistence still increases attack surface because the skill continues to run and access credentials/network without an active user request.

Description-Behavior Mismatch

High
Confidence
93% confidence
Finding
The skill metadata promises email-based daily report generation and delivery, but this file is clearly built around Telegram output and even embeds a specific Telegram chat_id. That mismatch is dangerous because users and operators may authorize or deploy the skill expecting email handling while data is actually prepared for a different channel, creating a risk of unintended disclosure, misrouting, and deceptive behavior in an automation context.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The JSON includes Telegram-specific routing metadata (`chat_id`, `channel`) and a fully formatted Telegram message even though the skill description emphasizes Word report generation and email delivery. This creates an unintended cross-channel data exposure risk: downstream components or logs may reveal recipient identifiers or send content to Telegram paths that users did not request or consent to.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The bundled dependency advertises capabilities far beyond the parent skill’s declared purpose, including general short-video downloading and transcription. This scope mismatch increases attack surface and creates a supply-chain trust problem: users may install a skill for benign reporting while inheriting unrelated media-processing features that can later be invoked or abused.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The README explicitly documents watermark-free video downloading, which is unrelated to generating and emailing daily Douyin trend reports. In this skill context, that capability is more dangerous because it enables bulk content acquisition outside the declared use case, raising legal, policy, and misuse risks while expanding what the installed skill can do.

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The embedded downloader/transcriber functionality exceeds the stated purpose of generating a Douyin daily hot-list report and email summary. This hidden scope expansion increases attack surface, introduces unexpected network and file-processing behavior, and makes data handling less transparent to users and reviewers.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The code provides general-purpose arbitrary Douyin video download capability, which is broader than necessary for hot-list report generation. Extra capability increases misuse potential, licensing/compliance risk, and exposure to untrusted media parsing and storage paths without clear necessity.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The script uses environment-backed API credentials to send audio to a third-party speech-to-text service outside the declared skill scope. In this context, that is more dangerous because users expecting report generation may not realize their media-derived content is being transmitted externally and billed against their credentials.

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
This dependency provides watermark-free video downloading and transcript extraction, which is materially broader than the parent skill's stated purpose of generating daily hot-list reports and sending email summaries. Hidden or unnecessary capabilities expand attack surface and can surprise users by enabling content acquisition and processing that was not disclosed.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The server exposes a tool specifically for obtaining watermark-free download links, a capability unrelated to routine report generation and email automation. Such undisclosed functionality can be abused for unauthorized media retrieval and increases legal, policy, and trust risk for anyone installing the skill.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The `/api/video/download` endpoint accepts an arbitrary `url` parameter and server-side fetches it with `requests.get(...)`, then streams the response back to the caller. This is effectively an open fetch/proxy primitive and can be abused for SSRF, access to internal services, and bandwidth/resource abuse; in the context of a Douyin report/transcript UI, this capability is broader than necessary and therefore more suspicious and dangerous.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This endpoint provides arbitrary external HTTP proxying unrelated to the stated purpose of generating Douyin daily reports and transcript extraction. Because any caller can cause the server to request attacker-chosen URLs and relay the response, it materially expands the attack surface and can be used for SSRF, network pivoting, and unauthorized retrieval of resources reachable from the server but not from the attacker.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal