Video Narrator

v1.0.1

Generate SenseAudio TTS narration tracks for videos, including timestamped segments, style variants, and editor-ready voiceover exports. Use when users need...

⭐ 1· 403·2 current·2 all-time

by@scikkk

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for scikkk/video-narrator.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Video Narrator" (scikkk/video-narrator) from ClawHub.
Skill page: https://clawhub.ai/scikkk/video-narrator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: SENSEAUDIO_API_KEY
Required binaries: python3, ffmpeg
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install video-narrator

ClawHub CLI

Package manager switcher

npx clawhub@latest install video-narrator

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (video narration, timestamped segments, editor exports) align with requested items: SENSEAUDIO_API_KEY, python3, ffmpeg, requests, and pydub — all reasonable for producing and assembling TTS audio for video.

✓

Instruction Scope

SKILL.md instructions are scoped to preparing timed scripts, calling the SenseAudio TTS API, decoding returned audio, and optionally assembling clips locally. There are no instructions to read unrelated system files, exfiltrate extra data, or post data to endpoints outside senseaudio.cn.

✓

Install Mechanism

Declared installs are two Python packages (requests, pydub) — typical and proportionate. The installer kind is 'uv' in metadata (unusual label in this manifest) but the packages themselves are standard PyPI libraries; no arbitrary URL downloads or archive extraction are used.

✓

Credentials

Only a single credential is required (SENSEAUDIO_API_KEY) and it is clearly tied to the service the skill integrates with. The SKILL.md explicitly instructs to send the key only in the Authorization header and warns against logging or embedding it.

✓

Persistence & Privilege

Skill is not always-enabled, does not request permanent system presence, and does not instruct modifications to other skills or global agent settings.

Assessment

This skill appears coherent for generating voiceover tracks, but before installing: 1) Verify the origin and trustworthiness of the SenseAudio service (https://senseaudio.cn) and obtain an API key with least privilege. 2) Confirm your environment's installer mapping for 'uv'—ensure it will install requests and pydub from official PyPI rather than fetching code from an untrusted host. 3) Keep the API key out of logs and examples as the skill recommends. 4) Because pydub relies on ffmpeg, ensure your ffmpeg binary is the expected trusted system package. 5) If you need stronger assurance, review any runtime code the skill will actually execute (there are no code files bundled here) or run it first in an isolated/test environment.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3, ffmpeg

EnvSENSEAUDIO_API_KEY

Primary envSENSEAUDIO_API_KEY

Install

uvuv tool install requests

uvuv tool install pydub

latestvk9781at23j0ykt66fh5tknba7182xvje

403downloads

1stars

2versions

Updated 1mo ago

v1.0.1

MIT-0

SenseAudio Video Narrator

Create professional narration audio for videos with timing-aware segmentation, natural delivery, and editor-friendly exports.

What This Skill Does

Generate narration audio synchronized to script timestamps
Match narration style to video genre such as documentary or tutorial
Control pacing with official TTS parameters and text break markers
Create multiple narration takes with different voices or styles
Export audio segments and merged narration tracks for editing workflows

Credential and Dependency Rules

Read the API key from SENSEAUDIO_API_KEY.
Send auth only as Authorization: Bearer <API_KEY>.
Do not place API keys in query parameters, logs, or saved examples.
If Python helpers are used, this skill expects python3, requests, and pydub.
pydub is used only for optional local audio assembly and mixing.

Official TTS Constraints

Use the official SenseAudio TTS rules summarized below:

HTTP endpoint: POST https://api.senseaudio.cn/v1/t2a_v2
Model: SenseAudio-TTS-1.0
Max text length per request: 10000 characters
voice_setting.voice_id is required
voice_setting.speed range: 0.5-2.0
voice_setting.pitch range: -12 to 12
Optional audio formats: mp3, wav, pcm, flac
Optional sample rates: 8000, 16000, 22050, 24000, 32000, 44100
Optional MP3 bitrates: 32000, 64000, 128000, 256000
Optional channels: 1 or 2
extra_info.audio_length returns segment duration in milliseconds
Inline break markup such as <break time=500> is supported in text

Recommended Workflow

Prepare the script:

Split narration into timestamped segments.
Keep each segment comfortably below the 10000 character limit.

Choose a voice and pacing profile:

Pick a voice_id and tune speed, pitch, and optional vol.
Use shorter segments when timing precision matters.

Generate audio segments:

Call the TTS API for each segment.
Decode data.audio from hex before saving.
Capture extra_info.audio_length for timeline metadata.

Assemble the narration track locally:

Use pydub to position clips on a silent master track.
Keep per-segment files for easier editor import and retiming.

Validate timing against the video:

Leave small gaps when natural pacing is needed.
Adjust segment boundaries instead of overusing extreme speed values.

Minimal Timed Narration Helper

import binascii
import os
import re

import requests

API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"


def parse_timed_script(script):
    pattern = r"\[(\d{2}):(\d{2}):(\d{2})\]\s*(.+?)(?=\n\[|\Z)"
    segments = []
    for match in re.finditer(pattern, script, re.DOTALL):
        hours, minutes, seconds, text = match.groups()
        timestamp_ms = (int(hours) * 3600 + int(minutes) * 60 + int(seconds)) * 1000
        segments.append({"timestamp": timestamp_ms, "text": text.strip()})
    return segments


def synthesize_segment(text, voice_id, speed=1.0, pitch=0, vol=1.0):
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "model": "SenseAudio-TTS-1.0",
            "text": text,
            "stream": False,
            "voice_setting": {
                "voice_id": voice_id,
                "speed": speed,
                "pitch": pitch,
                "vol": vol,
            },
            "audio_setting": {
                "format": "mp3",
                "sample_rate": 32000,
                "bitrate": 128000,
                "channel": 2,
            },
        },
        timeout=60,
    )
    response.raise_for_status()
    data = response.json()
    return {
        "audio_bytes": binascii.unhexlify(data["data"]["audio"]),
        "duration_ms": data["extra_info"]["audio_length"],
        "trace_id": data.get("trace_id"),
    }

Local Assembly Pattern

from pydub import AudioSegment


def create_synced_narration(audio_segments, video_duration_ms):
    narration_track = AudioSegment.silent(duration=video_duration_ms)
    for segment in audio_segments:
        clip = AudioSegment.from_file(segment["file"])
        narration_track = narration_track.overlay(clip, position=segment["timestamp"])
    return narration_track

Style Presets

Documentary: slower speed such as 0.95, neutral pitch
Tutorial: speed near 1.0, slightly warmer pitch
Commercial: modestly faster speed, slightly higher pitch

Prefer conservative tuning and script editing over extreme voice parameter changes.

Output Options

Per-segment narration clips in mp3 or wav
Timing metadata in json
Merged narration track for video editors
Optional alternate takes with different styles

Safety Notes

Do not hardcode credentials.
Do not assume local media tooling exists beyond what is declared here.
Treat returned trace_id and generated narration assets as potentially sensitive production data.

Comments

Loading comments...