Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Viral Video Replicator

v1.1.0

视频逆向复刻 — 分析参考视频(FFmpeg帧提取+Vision LLM) + 生成复刻Seedance 2.0 Prompt + 4种素材替换模式。支持单个和批量。Use when: '复刻这个视频', '分析爆款视频', 'replicate this video', '视频逆向', '反编译视频', '批量...

0· 4·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Functionally the required pieces (FFmpeg locally; Vision LLM key; optional ASR token and object storage for audio) are coherent with 'analyze video frames + transcribe audio + call vision/asr APIs' — those services are needed to implement the described pipeline. However the registry metadata lists no required env vars while the SKILL.md explicitly requires ARK_API_KEY/ARK_VISION_MODEL, ASR_ACCESS_TOKEN, and TOS credentials. That metadata mismatch is an incoherence that should be resolved with the author/platform.
!
Instruction Scope
The SKILL.md instructs the agent to: extract frames and audio locally, encode/send base64 frame grids to a remote Vision API, upload audio to cloud object storage (TOS) and submit to a remote ASR service. It also contains explicit system-prompt text for the Vision LLM. These are expected for the stated task, but they involve sending potentially sensitive or proprietary video frames and audio off‑platform and require collecting service credentials from the user — higher risk than an offline-only tool. The SKILL.md also directs the agent to ask users for long‑lived secrets (access key + secret).
Install Mechanism
Instruction-only skill with no install spec or downloaded code. This is the lower-risk installation model (nothing written to disk by an installer).
!
Credentials
The skill requires multiple credentials: a Vision API key and model id (expected), plus conditional ASR_ACCESS_TOKEN and full TOS object‑storage credentials (Access Key + Secret + Bucket + Region). Asking for TOS access key/secret grants broad access to the user's object storage and is high privilege; while technically needed to implement the described upload/presigned‑URL flow, it is sensitive and should be minimized (use short‑lived tokens or ask users to upload and provide presigned URLs instead). Also the registry metadata omits these env requirements which is a discrepancy.
Persistence & Privilege
Skill is user-invocable and not always-enabled. No install scripts or persistent modifications are declared. SKILL.md asks for keys during clarification, but it does not state any mechanism to store them; verify how the platform handles any supplied secrets before providing them.
Scan Findings in Context
[system-prompt-override] expected: The SKILL.md contains explicit 'system prompt' text intended to be sent as the Vision LLM's system role. That is expected for building structured prompts to a remote Vision API. However the same pattern can be used for prompt‑injection if misapplied to the local agent's system prompt; treat the presence as an opportunity to review how your agent platform separates skill-internal prompts from the agent's own system prompt and whether the skill can override agent policies.
What to consider before installing
What to consider before installing/use: - Metadata mismatch: the registry claims no env vars but the SKILL.md asks for ARK_API_KEY/ARK_VISION_MODEL and (conditionally) ASR and TOS credentials. Ask the publisher/platform to correct the metadata before trusting the skill. - Sensitive credentials: avoid handing long‑lived TOS Access Key + Secret unless you trust the skill + operator. Prefer alternatives: (a) upload audio yourself and provide a presigned URL, (b) provide short‑lived/limited-scope credentials, or (c) use an account specifically provisioned for this purpose. - Data privacy: this skill uploads extracted frames and audio to third‑party Vision/ASR services — do not upload videos containing sensitive personal data or copyrighted content you do not control without explicit legal review. - Prompt injection risk: the skill contains system‑prompt text for remote LLM calls. Verify the platform prevents skill content from modifying the agent's own system prompt or policies. - Verify storage/retention: ask whether the agent platform or the skill will store any API keys, presigned URLs, or uploaded media and for how long; request deletion after processing. - If you must proceed: test in a low‑risk way — use non‑sensitive sample videos and ephemeral credentials; prefer providing presigned URLs instead of raw access keys. If you want, I can draft questions to send to the skill author/platform to clarify the metadata discrepancy, credential storage/retention policy, and whether presigned‑URL flows are supported so you don't need to hand over full TOS secrets.
!
references/vision-analysis.md:11
Prompt-injection style instruction pattern detected.
About static analysis
These patterns were detected by automated regex scanning. They may be normal for skills that integrate with external APIs. Check the VirusTotal and OpenClaw results above for context-aware analysis.

Like a lobster shell, security has layers — review code before you run it.

latestvk976f1ya8emc17gqdqj9r0e8gx841kef

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Skill: viral-video-replicator

Overview

Reverse-engineer reference videos (e.g., competitor viral content) into replicable Seedance 2.0 prompts. The pipeline: FFmpeg frame extraction -> contact sheet grids -> audio extraction + ASR transcription -> Vision LLM structured analysis -> Seedance prompt assembly with optional material replacement (face/body/clothing). Supports single and batch modes.

When to Activate

User query contains any of:

  • "视频复刻", "视频逆向", "反编译视频", "复刻爆款"
  • "分析这个视频", "replicate this video", "video analysis"
  • "批量分析", "批量复刻"
  • "这个视频怎么拍的", "帮我分析一下这个爆款", "我想拍一个类似的视频"
  • "reverse engineer this video", "analyze this fashion video"

Do NOT activate for:

  • Creating fashion videos from scratch (no reference video) -> use fashion-video-creator
  • "帮我做个穿搭视频", "生成模特图" -> use fashion-video-creator
  • Pure video editing / trimming -> not applicable
  • Non-fashion video analysis -> not applicable

Prerequisites

Local tools (REQUIRED):

# macOS
brew install ffmpeg

# Linux
apt install ffmpeg

# Verify
ffmpeg -version && ffprobe -version

Cloud APIs (collected via clarification):

REQUIRED: ARK_API_KEY + ARK_VISION_MODEL (Vision LLM for frame analysis)
CONDITIONAL: ASR_ACCESS_TOKEN (if video has dialogue)
CONDITIONAL: TOS credentials (if ASR is needed — audio transfers through TOS)

Clarification Flow

Phase 1: API Key Acquisition

Ask IN ORDER. Use plain language — explain WHY each service is needed.

Q1: Vision Analysis (REQUIRED)

"分析视频内容需要一个能'看懂图片'的 AI 模型。 它会看视频的截图,识别出人物长相、服装细节、场景布局、动作时间轴。 你有火山方舟的账号和 API Key 吗?还需要视觉模型的ID。"

If no API key -> STOP. Guide user to 火山方舟. Do NOT proceed.

Q2: Speech Transcription (CONDITIONAL)

"参考视频里的人有说话吗? 如果有对话,需要用语音转文字来提取台词 — 这样复刻出的视频才能有完整的对白内容。 没有对话的纯画面视频可以跳过这步。"

If yes -> ask for ASR_ACCESS_TOKEN.

Q3: Audio Storage (CONDITIONAL — only if Q2 = yes)

"语音转文字需要通过云存储传输音频文件。 需要火山引擎对象存储(TOS)的 4 个信息:Access Key, Secret Key, Bucket, Region。"

If user has ASR but no TOS -> warn: "没有 TOS 则 ASR 无法工作,等同于没有语音转录。"

Phase 2: Mandatory Recommendations

MUST show. Each item has WHY explanation:

============================================================
API Configuration — Mandatory Recommendations
============================================================

[REQUIRED] Vision model: doubao-seed-1-6-vision-250815 or newer
  WHY: Older models cannot distinguish clothing fabric textures
  (acetate vs chiffon), stitching details (overlocked vs raw edge),
  or fit nuances (slim vs A-line). Analysis quality drops ~60%.

[REQUIRED] If video has dialogue: configure BOTH ASR + TOS
  WHY: Without ASR, all spoken content is lost. The generated prompt
  will only contain visual descriptions. Video fidelity drops from
  ~90% to ~50% because dialogue drives 40%+ of viewer engagement.
  TOS is the audio transfer pipeline — no TOS means no ASR.

[REQUIRED] Video resolution: 720p or higher
  WHY: Frames are extracted at 360x640 thumbnails. Source below 480p
  means thumbnails are upscaled garbage — clothing patterns and
  textures become unrecognizable blobs.

[RECOMMENDED] Exact mode for same-category replacement
  WHY: "exact" does nested structured analysis (10 fields with typed
  subobjects) — precision matters when replacing one dress with another.
  "rewrite" does flat analysis (10 string fields) — better for
  extracting viral logic across different product categories.
============================================================

Phase 3: Mode Selection

"你要分析几个视频?单个还是批量?"

Q5: Replicate Mode (per video if batch)

"你想怎么复刻?

  • 精确复刻: 逐帧分析每个细节,尽可能1:1还原
  • 提取改写: 提取爆款节奏和逻辑,用新方式重新演绎"

Q6: Material Replacement (per video if batch)

"要替换视频中的哪些元素?

  • 不换(纯复刻)
  • 换人脸/身材(上传模特参考图)
  • 换衣服(上传商品图)
  • 都换(上传模特图 + 商品图)"

Batch-Specific Recommendations

============================================================
Batch Mode — Additional Recommendations
============================================================

[REQUIRED] ALL videos should be 720p+
  WHY: One low-res video doesn't just fail for itself — it wastes
  API costs on a Vision LLM call that returns unusable analysis.

[RECOMMENDED] Pre-sort by replicate mode
  WHY: exact mode takes 2-3 min/video (nested analysis), rewrite
  takes 1-2 min/video (flat analysis). Grouping avoids context switches.

[WARNING] Each video runs the FULL pipeline independently.
  N videos = approximately N * 2-3 minutes. Plan accordingly.
============================================================

Four Replacement Modes

ModeWhat User Uploads@image Tags in PromptWhat Gets Replaced
cloneNothingNone (pure text)Nothing — exact replication
face_swapFace/body reference@image1 = face refPerson replaced, clothing preserved
outfit_swapGarment product image@image1 = garmentClothing replaced, person preserved
full_swapGarment + face reference@image1 = garment, @image2 = face refBoth replaced

Mode auto-determination:

has_person_ref AND has_garment_ref -> full_swap
has_garment_ref only -> outfit_swap
has_person_ref only -> face_swap
neither -> clone

Core Workflow

Step 0: Environment Check (mandatory, never skip)

ffmpeg -version && ffprobe -version
  • Returns version -> proceed to Step 1
  • command not found -> guide install (brew/apt/choco). Still fails after install -> Soft fallback: Ask user: "FFmpeg 不可用,你能手动提供视频截图和音频文件吗?" If user provides frames manually -> skip FFmpeg steps, proceed from Step 4 (Vision analysis) with user-provided images. Quality warning (MUST show to user): "手动截图模式下分析质量会显著降低:无精确时间戳标注、无均匀3fps采样、帧数可能不足导致动作时间轴不准确。建议安装 FFmpeg 以获得最佳效果。" If user cannot provide frames -> STOP. FFmpeg is required for automated extraction.

Step 0b: Verify API Key (before reaching Step 4)

Validate ARK_API_KEY early to avoid wasting FFmpeg processing time on an invalid key:

If bash/Python available:

resp = httpx.get(f"{ARK_API_BASE}/api/v3/models",
                 headers={"Authorization": f"Bearer {ARK_API_KEY}"}, timeout=10)
  • 200 -> proceed
  • 401/403 -> STOP. Key invalid. Fix before continuing.

If no code execution: Trust user-provided key, validate on first Vision API call.

Single Mode

Step 1: Collect API keys + mode + replacement materials
Step 2: Extract frame grids (3fps) + extract audio — PARALLEL via asyncio.gather()
        (Both are FFmpeg subprocesses launched concurrently in Python, not LLM-level parallelism)
        Read references/frame-extraction.md for FFmpeg specs
Step 3: Upload audio to TOS -> ASR transcription
        Read references/asr-pipeline.md for protocol
Step 4: Vision LLM analysis (grids + transcript -> structured JSON)
        Read references/vision-analysis.md for exact vs rewrite schemas
Step 5: Determine replacement mode from uploaded materials
Step 6: Assemble Seedance 2.0 prompt
        Read references/reverse-prompt.md for 4-mode assembly
Step 7: Generate mode-specific SOP
Step 8: Validate output (see below)
Step 9: Return: prompt + analysis + transcript + SOP + replacement summary

Batch Mode

Step 0: Verify FFmpeg
Step 1: Collect API keys + video count + per-video configs
Step 2: For each video (sequential):
  a. Extract frame grids + audio (parallel)
  b. TOS upload -> ASR transcription
  c. Vision LLM analysis
  d. Determine replacement mode
  e. Assemble prompt
  f. Generate SOP
  g. Validate this video's output
  h. Mark: completed / failed
Step 3: Return all results with progress summary

Progress: queued -> processing -> completed/failed
Partial success: batch completes even if some videos fail.

Output Validation (mandatory, never skip)

Before delivering results, verify ALL:

  • Analysis JSON is valid and contains all required fields?
  • Prompt correctly uses @image tags matching the replacement mode?
  • If clone mode: prompt has NO @image references (pure text)?
  • If outfit_swap/full_swap: prompt includes "Do not alter clothing pattern, color, texture or style"?
  • If has_speech: dialogue content is present in prompt (not empty)?
  • SOP upload instructions match the number of images for this mode?
  • Replacement summary correctly lists what was preserved vs replaced?

Any NO -> fix before delivering. Do NOT send unvalidated output.

Error Handling

FailureDetectionAction
FFmpeg not installedcommand not foundSTOP. Provide install command. Do NOT proceed.
No API keyARK_API_KEY emptySTOP. Guide user to 火山方舟. Do NOT proceed.
Vision model error4xx/5xx from APIReport error with model ID used. Suggest checking model availability.
Vision returns invalid JSONJSON parse failsRetry once with same grids. Still fails -> report raw response for debugging.
Frame extraction failsFFmpeg non-zero exitCheck video format. Try re-encoding. Report if still fails.
No audio trackextract_audio returns NoneSkip ASR. Proceed with visual-only analysis. Note in output: "No audio detected."
TOS upload failsUpload exception after 2 retriesSkip ASR. Proceed visual-only. Warn: "Audio transcription unavailable — dialogue will be missing."
ASR timeoutNo result after 120sSkip transcript. Proceed visual-only. Warn: "Speech transcription timed out."
ASR silent audioStatus 20000003Normal — video has no speech. Proceed with visual-only.
Video too large>200MBReject immediately. Ask user to compress or trim.
Batch video failsException during pipelineMark failed with error. Continue remaining. Report partial results.

Degraded Modes (graceful degradation chain)

Failure PointDegraded ModeWhat User Still GetsQuality Impact
ASR fails (TOS/timeout)Visual-only analysisPrompt with visual descriptions, no dialogue~50% fidelity — all spoken content lost
Vision exact mode failsAuto-retry with rewrite modeFlat analysis (less precise)~70% fidelity — loses nested structure (clothing/scene subfields)
Vision rewrite also failsReturn raw materialsFrame grids + transcript for manual analysis~20% — no automated analysis, user must write prompt manually
Seedance prompt assembly failsReturn analysis onlyAnalysis JSON + transcript~30% — user has data but no ready-to-use prompt
FFmpeg unavailable (user provides screenshots)Manual frame modeAnalysis from user-provided images~40% — no timestamps, uneven sampling, incomplete frame coverage

Always prefer delivering partial results over delivering nothing. Every degraded output MUST clearly state: (1) what is missing, (2) why, and (3) the estimated quality impact.

See references/fallbacks.md for detailed recovery procedures per failure case.

Usage Example

Input: "帮我复刻这个爆款视频,换成我的衣服" + uploaded video (15s, 720p) + uploaded garment image

Resolved: mode=exact, replacement=outfit_swap (garment_ref provided, no face_ref)

Output 1 — Structured Analysis:

{
  "person": {
    "gender": "female", "age_range": "22-26",
    "face": "鹅蛋脸,大眼睛,双眼皮",
    "skin_tone": "白皙", "hair": "黑色长直发,中分,自然垂落",
    "build": "纤细高挑", "makeup": "淡妆,裸色唇彩"
  },
  "clothing": {
    "type": "V领碎花连衣裙", "color": "奶油白底+粉色碎花",
    "material_look": "轻薄飘逸雪纺", "neckline": "V领",
    "fit": "A字收腰", "length": "及膝",
    "details": "腰部抽绳系带,裙摆荷叶边"
  },
  "scene": {"location": "现代公寓客厅", "lighting_source": "右侧落地窗自然光"},
  "actions": "0-2s: 正面微笑打招呼;2-5s: 右手拉起裙摆展示面料;5-8s: 小幅转身展示裙摆飘动;8-12s: 右手翻开裙子内侧展示车线;12-15s: 右手捏腰部展示松紧",
  "dialogue": "姐妹们你们快看...(右手拉起裙摆)这个面料是醋酸缎面的...滑滑的凉凉的..."
}

Output 2 — Seedance Prompt (outfit_swap):

一位鹅蛋脸、白皙肤色、黑色长直发中分自然垂落、纤细高挑身材、淡妆的年轻女性,穿着@图片1中的服装。在现代公寓客厅中,右侧落地窗自然光。她的动作:0-2s: 正面微笑打招呼;2-5s: 右手拉起衣角展示面料...对着镜头说:「姐妹们你们快看...这个面料...滑滑的凉凉的...你们猜多少钱?不到两百!超显腿长,闭眼入。」语气自然亲切,像在跟闺蜜视频通话。Do not alter clothing pattern, color, texture or style. 手持vlog镜头感,竖屏9:16。

Output 3 — Transcript: "姐妹们你们快看...这个面料是醋酸缎面的..."

Output 4 — SOP: outfit_swap mode, 1 image upload (@图片1=garment)

Output 5 — Replacement Summary: garment_replaced=true, original_preserved=[face, body, scene, actions, dialogue, camera]

Domain Knowledge Role Declaration

The reference files contain FFmpeg specs, ASR protocols, Vision prompts, and prompt assembly templates. Their role is to assist pipeline execution — providing exact API formats, analysis schemas, and assembly rules. They do NOT replace the execution workflow. Never output reference content directly as the final answer. Always execute: extract frames -> transcribe -> analyze -> assemble -> validate -> deliver.

References

FilePurposeWhen to read
references/frame-extraction.mdFFmpeg filter chain, grid stitching, audio extraction specsStep 2: extracting frames and audio
references/asr-pipeline.mdTOS upload protocol, Seed-ASR-2.0 submit/poll APIStep 3: transcribing audio
references/vision-analysis.mdVision LLM prompts for exact and rewrite modes, output schemasStep 4: analyzing video
references/reverse-prompt.md4-mode prompt assembly, clothing generalization map, SOP templatesStep 6-7: building prompt and SOP
references/fallbacks.md8 failure cases with recovery procedures and degradation chainOn any error during Steps 2-8

Files

7 total
Select a file
Select a file to preview.

Comments

Loading comments…