Video Narrator

v1.0.1

Generate SenseAudio TTS narration tracks for videos, including timestamped segments, style variants, and editor-ready voiceover exports. Use when users need...

1· 403·2 current·2 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for scikkk/video-narrator.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Video Narrator" (scikkk/video-narrator) from ClawHub.
Skill page: https://clawhub.ai/scikkk/video-narrator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: SENSEAUDIO_API_KEY
Required binaries: python3, ffmpeg
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install video-narrator

ClawHub CLI

Package manager switcher

npx clawhub@latest install video-narrator
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (video narration, timestamped segments, editor exports) align with requested items: SENSEAUDIO_API_KEY, python3, ffmpeg, requests, and pydub — all reasonable for producing and assembling TTS audio for video.
Instruction Scope
SKILL.md instructions are scoped to preparing timed scripts, calling the SenseAudio TTS API, decoding returned audio, and optionally assembling clips locally. There are no instructions to read unrelated system files, exfiltrate extra data, or post data to endpoints outside senseaudio.cn.
Install Mechanism
Declared installs are two Python packages (requests, pydub) — typical and proportionate. The installer kind is 'uv' in metadata (unusual label in this manifest) but the packages themselves are standard PyPI libraries; no arbitrary URL downloads or archive extraction are used.
Credentials
Only a single credential is required (SENSEAUDIO_API_KEY) and it is clearly tied to the service the skill integrates with. The SKILL.md explicitly instructs to send the key only in the Authorization header and warns against logging or embedding it.
Persistence & Privilege
Skill is not always-enabled, does not request permanent system presence, and does not instruct modifications to other skills or global agent settings.
Assessment
This skill appears coherent for generating voiceover tracks, but before installing: 1) Verify the origin and trustworthiness of the SenseAudio service (https://senseaudio.cn) and obtain an API key with least privilege. 2) Confirm your environment's installer mapping for 'uv'—ensure it will install requests and pydub from official PyPI rather than fetching code from an untrusted host. 3) Keep the API key out of logs and examples as the skill recommends. 4) Because pydub relies on ffmpeg, ensure your ffmpeg binary is the expected trusted system package. 5) If you need stronger assurance, review any runtime code the skill will actually execute (there are no code files bundled here) or run it first in an isolated/test environment.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3, ffmpeg
EnvSENSEAUDIO_API_KEY
Primary envSENSEAUDIO_API_KEY

Install

uvuv tool install requests
uvuv tool install pydub
latestvk9781at23j0ykt66fh5tknba7182xvje
403downloads
1stars
2versions
Updated 1mo ago
v1.0.1
MIT-0

SenseAudio Video Narrator

Create professional narration audio for videos with timing-aware segmentation, natural delivery, and editor-friendly exports.

What This Skill Does

  • Generate narration audio synchronized to script timestamps
  • Match narration style to video genre such as documentary or tutorial
  • Control pacing with official TTS parameters and text break markers
  • Create multiple narration takes with different voices or styles
  • Export audio segments and merged narration tracks for editing workflows

Credential and Dependency Rules

  • Read the API key from SENSEAUDIO_API_KEY.
  • Send auth only as Authorization: Bearer <API_KEY>.
  • Do not place API keys in query parameters, logs, or saved examples.
  • If Python helpers are used, this skill expects python3, requests, and pydub.
  • pydub is used only for optional local audio assembly and mixing.

Official TTS Constraints

Use the official SenseAudio TTS rules summarized below:

  • HTTP endpoint: POST https://api.senseaudio.cn/v1/t2a_v2
  • Model: SenseAudio-TTS-1.0
  • Max text length per request: 10000 characters
  • voice_setting.voice_id is required
  • voice_setting.speed range: 0.5-2.0
  • voice_setting.pitch range: -12 to 12
  • Optional audio formats: mp3, wav, pcm, flac
  • Optional sample rates: 8000, 16000, 22050, 24000, 32000, 44100
  • Optional MP3 bitrates: 32000, 64000, 128000, 256000
  • Optional channels: 1 or 2
  • extra_info.audio_length returns segment duration in milliseconds
  • Inline break markup such as <break time=500> is supported in text

Recommended Workflow

  1. Prepare the script:
  • Split narration into timestamped segments.
  • Keep each segment comfortably below the 10000 character limit.
  1. Choose a voice and pacing profile:
  • Pick a voice_id and tune speed, pitch, and optional vol.
  • Use shorter segments when timing precision matters.
  1. Generate audio segments:
  • Call the TTS API for each segment.
  • Decode data.audio from hex before saving.
  • Capture extra_info.audio_length for timeline metadata.
  1. Assemble the narration track locally:
  • Use pydub to position clips on a silent master track.
  • Keep per-segment files for easier editor import and retiming.
  1. Validate timing against the video:
  • Leave small gaps when natural pacing is needed.
  • Adjust segment boundaries instead of overusing extreme speed values.

Minimal Timed Narration Helper

import binascii
import os
import re

import requests

API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"


def parse_timed_script(script):
    pattern = r"\[(\d{2}):(\d{2}):(\d{2})\]\s*(.+?)(?=\n\[|\Z)"
    segments = []
    for match in re.finditer(pattern, script, re.DOTALL):
        hours, minutes, seconds, text = match.groups()
        timestamp_ms = (int(hours) * 3600 + int(minutes) * 60 + int(seconds)) * 1000
        segments.append({"timestamp": timestamp_ms, "text": text.strip()})
    return segments


def synthesize_segment(text, voice_id, speed=1.0, pitch=0, vol=1.0):
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "model": "SenseAudio-TTS-1.0",
            "text": text,
            "stream": False,
            "voice_setting": {
                "voice_id": voice_id,
                "speed": speed,
                "pitch": pitch,
                "vol": vol,
            },
            "audio_setting": {
                "format": "mp3",
                "sample_rate": 32000,
                "bitrate": 128000,
                "channel": 2,
            },
        },
        timeout=60,
    )
    response.raise_for_status()
    data = response.json()
    return {
        "audio_bytes": binascii.unhexlify(data["data"]["audio"]),
        "duration_ms": data["extra_info"]["audio_length"],
        "trace_id": data.get("trace_id"),
    }

Local Assembly Pattern

from pydub import AudioSegment


def create_synced_narration(audio_segments, video_duration_ms):
    narration_track = AudioSegment.silent(duration=video_duration_ms)
    for segment in audio_segments:
        clip = AudioSegment.from_file(segment["file"])
        narration_track = narration_track.overlay(clip, position=segment["timestamp"])
    return narration_track

Style Presets

  • Documentary: slower speed such as 0.95, neutral pitch
  • Tutorial: speed near 1.0, slightly warmer pitch
  • Commercial: modestly faster speed, slightly higher pitch

Prefer conservative tuning and script editing over extreme voice parameter changes.

Output Options

  • Per-segment narration clips in mp3 or wav
  • Timing metadata in json
  • Merged narration track for video editors
  • Optional alternate takes with different styles

Safety Notes

  • Do not hardcode credentials.
  • Do not assume local media tooling exists beyond what is declared here.
  • Treat returned trace_id and generated narration assets as potentially sensitive production data.

Comments

Loading comments...