短视频爆款拆解(叙事式)

Security checks across malware telemetry and agentic risk

Overview

This skill appears to do its stated video-analysis job, but it should be reviewed carefully because it uploads private media to StepFun and includes unsafe credential-handling guidance.

Install only if you are comfortable sending chosen videos, optional extracted speech, and transcript-derived context to StepFun. Do not use the API key shown in the quickstart; use your own scoped key and rotate it if you copied or exposed it. Avoid --keep-upload unless you intentionally want provider-side retention, and avoid sensitive or regulated videos unless StepFun's privacy and retention terms fit your needs.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (17)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill declares no permissions even though its documented behavior requires reading local files, writing outputs, using environment secrets, making network requests, and invoking ffmpeg. This is dangerous because users and orchestrators cannot accurately assess or constrain what the skill can access, which undermines consent and sandboxing expectations.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The skill is presented as a local video-analysis/report-generation tool, but the documentation reveals that it uploads the video to external StepFun cloud services and may extract audio for ASR. This is dangerous because users may provide sensitive or private videos under the assumption of local processing, resulting in unanticipated third-party data disclosure.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The README indicates the skill processes a local MP4 and writes local outputs, but it also discloses that the video is uploaded to StepFun cloud services for analysis and deleted afterward. This is a real security/privacy issue because users may reasonably assume purely local handling, while sensitive video content is actually transmitted to a third party and protected by external retention and access controls.

Description-Behavior Mismatch

Medium
Confidence
88% confidence
Finding
The quickstart explicitly shows that user videos and optional ASR-derived transcript content are uploaded to StepFun for processing, while the skill metadata emphasizes local file input and local output artifacts. That mismatch can mislead users about where their data goes, creating a privacy and trust issue, especially when videos may contain faces, voices, or other sensitive content.

Description-Behavior Mismatch

Low
Confidence
85% confidence
Finding
When --with-asr is used, the skill writes a full transcript file to disk even though the skill metadata only declares a markdown report and raw JSON output. This creates an undocumented data artifact that may contain sensitive spoken content, increasing privacy and data-handling risk for users who rely on the manifest to understand what will be stored.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The --keep-upload option allows analyzed videos to remain on the remote StepFun service after processing, which exceeds the stated local-analysis purpose and increases exposure of potentially sensitive user media. If used accidentally or by default in automation, private videos may persist in third-party storage longer than expected, creating confidentiality and compliance risks.

Context-Inappropriate Capability

High
Confidence
96% confidence
Finding
The code sends audio derived from a local user-provided video to an external ASR endpoint, but the manifest does not clearly disclose this third-party data transfer. In this skill context, users may expect local analysis of a local mp4 path; undisclosed upload of media content can expose sensitive speech, personal data, or copyrighted material.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The code uploads a user-provided local MP4 to StepFun's cloud file API and then analyzes it remotely, which means local media leaves the host boundary. In the stated skill context, users may reasonably expect a local-path input to be processed locally unless cloud transfer is clearly disclosed, so this is a real data-exposure/privacy issue rather than a purely cosmetic implementation detail.

Vague Triggers

Medium
Confidence
72% confidence
Finding
Broad trigger phrases like generic requests to analyze a video can cause accidental invocation in conversations where the user did not intend to run this specific skill. Because this skill uploads media to a cloud API, unintended activation has privacy and cost consequences beyond a simple local utility misfire.

Missing User Warnings

High
Confidence
98% confidence
Finding
The guide includes a real-looking plaintext API key and tells users to export it and persist it in shell startup files without any warning about credential sensitivity. If this key is valid, it could be abused for unauthorized API usage, billing fraud, or access to associated account resources; even if illustrative, normalizing this practice encourages poor secret handling.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The documentation describes uploading user video and ASR transcript content to an external AI service but does not provide an explicit privacy, consent, or data-transmission warning. In this skill context, the risk is heightened because short videos commonly contain biometric, personal, or sensitive conversational data, so users may unknowingly transmit private content to a third party.

Vague Triggers

Medium
Confidence
91% confidence
Finding
The trigger phrases are broad, generic requests such as '分析这条视频' and '视频拆解', which can match ordinary conversation and cause the skill to activate when the user did not explicitly intend to invoke it. In an agent environment, overbroad activation increases the chance of unintended file handling or external model/API usage on user content, creating avoidable privacy, cost, and workflow-integrity risks.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The function streams user-derived audio to an external service without any in-code prompt, warning, or consent check. In a skill handling local media files, this increases privacy risk because users may not realize their content leaves the local environment.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The function sends video content and, when present, ASR transcript text to a third-party API without any visible user-facing notice or consent mechanism in this file. Because transcripts can contain sensitive spoken content and the skill is marketed around analyzing a local MP4 path, the lack of disclosure increases privacy and compliance risk.

Ssd 3

High
Confidence
99% confidence
Finding
Publishing a real-looking API key in plaintext is a direct secret exposure issue. Attackers or unintended readers can copy the credential and use it immediately, and the surrounding instruction to persist it increases the chance of further accidental disclosure through shell history, screenshots, dotfile sync, or repository commits.

External Transmission

Medium
Category
Data Exfiltration
Content
- `manifest.json` 已声明 `entry / inputs / outputs / triggers / models`,可被 StepClaw Agent 直接 dispatch
- 默认 `STEP_API_KEY` 走环境变量或 skill 根目录的 `.env`
- 输出路径 `output/{video_stem}-{report.md, analysis.json}` 是固定 schema,下游 Agent 可直接读取
- ASR 与 vision 模型都走 `https://api.stepfun.com/v1`,不需要额外 endpoint

## 限制
Confidence
89% confidence
Finding
https://api.stepfun.com/

External Transmission

Medium
Category
Data Exfiltration
Content
from openai import OpenAI

MODEL = "step-1o-turbo-vision"
BASE_URL = "https://api.stepfun.com/v1"
UPLOAD_TIMEOUT_SEC = 600.0
ANALYZE_TIMEOUT_SEC = 420.0
Confidence
93% confidence
Finding
https://api.stepfun.com/

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal