抖音爬虫V2

Security checks across malware telemetry and agentic risk

Overview

The skill appears purpose-built for Douyin scraping, but it needs review because it can automatically run networked searches, use a Brave API key, install browser tooling, and return sample data without strong source or consent boundaries.

Install only if you are comfortable with a scraper that sends search terms to Douyin and sometimes Brave Search, may use a BRAVE_API_KEY from the environment, installs Playwright/Chromium, and can save output files. Treat returned results cautiously unless the skill clearly identifies whether they came from Douyin, Brave Search, or sample data.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (13)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
query = f"site:douyin.com {keyword}"
            # Try using the built-in web search by calling the CLI
            proc = subprocess.run(
                ["curl", "-s", "https://api.search.brave.com/res/v1/web/search",
                 "-H", "Accept: application/json",
                 "-H", f"X-Subscription-Token: {self._get_brave_key()}",
Confidence
93% confidence
Finding
proc = subprocess.run( ["curl", "-s", "https://api.search.brave.com/res/v1/web/search", "-H", "Accept: application/json", "-H", f"X-Subscr

subprocess module call

Medium
Category
Dangerous Code Execution
Content
try:
            import subprocess
            query = f"site:douyin.com 抖音热榜 {category}" if category else "site:douyin.com 抖音热榜"
            proc = subprocess.run(
                ["curl", "-s", f"https://api.search.brave.com/res/v1/web/search?q={query}&count={min(limit, 10)}",
                 "-H", "Accept: application/json",
                 "-H", f"X-Subscription-Token: {self._get_brave_key()}"],
Confidence
94% confidence
Finding
proc = subprocess.run( ["curl", "-s", f"https://api.search.brave.com/res/v1/web/search?q={query}&count={min(limit, 10)}", "-H", "Accept: application/json",

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill instructs the agent to execute shell commands, access the network, and potentially write files, yet it declares no permissions. This creates a transparency and governance gap: users and the platform cannot accurately assess what capabilities will be used, increasing the chance of unexpected external requests or local side effects.

Tp4

High
Category
MCP Tool Poisoning
Confidence
93% confidence
Finding
The documented behavior says the skill searches Douyin content, but the workflow also falls back to external web search, may return mock/example data, and appears to support file export not disclosed in the description. This mismatch can mislead users about data provenance, privacy exposure, and side effects, causing them to trust results or consent assumptions that are not accurate.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The skill reads `BRAVE_API_KEY` from the environment and uses it to access Brave Search, which is not apparent from the skill description of scraping Douyin content. Secret use plus undisclosed third-party transmission creates a data-governance and least-surprise violation, especially in agent environments where env vars may be sensitive.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The search path silently falls back from browser scraping to Brave web search and even returns mock data, which materially differs from the advertised behavior of scraping Douyin videos/copy. In an agent context, this can mislead users, exfiltrate queries to a third party, and produce fabricated results that may be consumed as if they were real.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The hot-list feature does not fetch an actual Douyin ranking source; it queries Brave Search and then may fabricate sample entries. This mismatch increases the risk of deceptive output and unintended external sharing of user/category inputs in a skill that users expect to remain within the Douyin context.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The README explicitly promotes scraping and exporting Douyin video metadata and content descriptions, but it does not adequately address privacy, consent, retention, or lawful handling of third-party data. In a scraping-focused skill, this omission can encourage users to collect and store personal or creator-related data without understanding compliance and privacy risks.

Vague Triggers

Medium
Confidence
80% confidence
Finding
The trigger phrases are broad enough to overlap with normal conversation about searching or Douyin, which can cause the skill to activate when the user did not intend to invoke it. Because activation leads to shell execution and external network requests, a false trigger can leak user queries to third parties or cause unintended actions.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The activation rules describe when to use the skill but do not define when it must not activate, leaving the agent to infer intent from ambiguous natural language. In this context, ambiguity is risky because the skill can invoke scraping scripts and external search tools without a clear user opt-in.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The fallback instructs the agent to send the user's keyword to an external web search service without telling the user first. This is a privacy issue because user queries may contain sensitive interests or business terms, and the skill presents the fallback as mandatory rather than consent-based.

Missing User Warnings

Medium
Confidence
74% confidence
Finding
The script creates a virtual environment and installs packages and browser binaries with minimal user-facing warning or consent flow. In an agent-skill context, automatically performing environment changes and downloading executable components can surprise users and increase supply-chain and host-modification risk, especially for a scraper that may be triggered as part of setup automation.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
The script writes JSON or CSV output to an arbitrary user-supplied path with fs.writeFileSync and no validation, restriction, or overwrite confirmation. In an agent-skill context, if untrusted natural-language input can influence the output path, this can overwrite local files, clobber application data, or write into sensitive locations accessible to the running user.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal