# Vision LLM Analysis > **Role:** Defines Vision LLM system/user prompts and output schemas for video analysis (exact + rewrite modes). > Load at: Step 4 (analyzing video frames). The prompts here are sent to the Vision API, not used as direct answers. > It does NOT replace execution — always call the Vision API with frame grids, never fabricate analysis from training data. ## Two Analysis Modes ### 1. Exact Mode (reverse_video.py) — Nested structured analysis **System Prompt:** > 你是视频逆向分析专家。你的任务是从视频关键帧中提取能够 100% 复制该视频的所有参数。你的输出将被直接用于 Seedance 2.0 视频生成提示词。精确度是唯一标准——不要概括、不要美化、不要推断"应该是什么",只描述"实际看到了什么"。 **10-field output schema:** 1. person: {gender, age_range, face, skin_tone, hair, build, makeup} 2. clothing: {type, color, pattern, material_look, neckline, sleeve, length, fit, details, accessories} 3. scene: {location, background_objects, floor, wall, lighting_source, color_temperature, overall_tone} 4. actions: timeline string (每1-2秒一个节点, must specify left/right hand) 5. dialogue: transcript with embedded action markers, preserve filler words 6. camera: {movement_type, orientation, timeline} 7. audio: {has_speech, speech_style, background_sounds, music, overall_audio_mix} 8. video_type: string 9. duration_seconds: {actual, recommended} 10. people_count: int ### 2. Rewrite Mode (video_analyzer.py) — Flat analysis for viral logic **System Prompt:** > 你是 Seedance 2.0 提示词逆向工程专家。根据视频关键帧和音频转录(如有),推理出能生成类似视频的 Seedance 2.0 提示词参数。 **10-field flat output:** 1. gender: male/female 2. scene: 2-3 sentences 3. clothing: 2-3 sentences 4. actions: timeline with台词时间对齐, 固定右手 5. dialogue: transcript with action markers 6. camera: movement + composition + framing changes 7. dialogue_style: one sentence description 8. video_type: string 9. has_speech: bool 10. duration_seconds: 5 or 10 ## Vision API Call ``` POST {ARK_API_BASE}/api/v3/chat/completions Headers: Authorization: Bearer {ARK_API_KEY} Content-Type: application/json Body: model: {ARK_VISION_MODEL} messages: - role: system, content: system_prompt - role: user, content: [text_prompt, image_url(grid1), image_url(grid2), ...] temperature: 0.3 max_tokens: 4096 Timeout: 120s ``` Image format: `data:image/jpeg;base64,{grid_base64}` ## JSON Parsing Strip markdown code fences if present: ```python if stripped.startswith("```"): stripped = stripped.split("\n", 1)[1].rsplit("```", 1)[0] ```