Video Script Generation Skill
This skill provides instructions on how to transform a user's storytelling prompt into a valid input-scripts.json entry for the video generator.
1. JSON Structure
Every script must be an object within the main array in input/input-scripts.json.
{
"id": "unique-id",
"title": "Display Title",
"orientation": "landscape" | "portrait",
"voice": "en-US-JennyNeural" | "en-US-GuyNeural",
"script": "The actual narrative content..."
}
2. Director Mode (Manual Visual Cues)
To ensure high-quality, relevant stock footage, use "Director Mode" tags. Place them at the start of the sentence or block they describe.
- Syntax:
[Visual: Descriptive Query]
- Best Practice: Be specific. Instead of
[Visual: nature], use [Visual: green forest sunlight rays].
- Placement: The visual will stay active until the next
[Visual: ] tag appears.
Example:
"[Visual: futuristic city neon night] The city never sleeps. [Visual: robotic arm assembly] High-tech manufacturing is the backbone of the economy."
3. Audio & Voice Settings
You can choose from several high-quality neural voices. Specify these in the voice field of your JSON job.
Available Voices
| Gender | Voice ID | Style/Region |
|---|
| 👨 Male | en-US-GuyNeural | Deep, Authoritative (Recommended) |
| 👨 Male | en-US-ChristopherNeural | Calm, Steady |
| 👨 Male | en-GB-RyanNeural | British Accent |
| 👨 Male | en-IN-PrabhatNeural | Indian Accent |
| 👩 Female | en-US-JennyNeural | Warm, Professional (Recommended) |
| 👩 Female | en-US-AriaNeural | Friendly, Helpful |
| 👩 Female | en-US-SaraNeural | Cheerful, Bright |
| 👩 Female | en-GB-SoniaNeural | British Accent |
4. Job Settings Keys
| Key | Type | Description |
|---|
id | String | Unique slug for the video (used for the folder name). |
title | String | The main title displayed in the video. |
orientation | String | landscape (16:9) or portrait (9:16). |
voice | String | Use one of the Voice IDs from the table above. |
showText | Boolean | (Optional) Set to false to hide captions. |
defaultVideo | String | (Optional) Local filename for fallback (in input-assests/). |
script | String | The content to be spoken, including [Visual: ...] tags. |
5. Script Writing Rules
- Scene Breaks: Aim for scene changes every 5-10 seconds. Longer scenes can get repetitive.
- Voiceovers: The
script text is exactly what will be spoken. Do NOT include instructions like (Scene 1) in the script text, as the TTS will read it.
- Troubleshooting Visuals: If a search is too specific and returns 0 results, the system will automatically try fallbacks (Pixabay -> Generic).
- Tip: Use keywords that describe the action or vibe (e.g., "morning sunrise", "fast car driving").