tts

v1.0.3

Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS',...

1· 534·6 current·6 all-time
bykusuriuri@ksuriuri
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (text-to-speech, voice cloning, timeline rendering) match the included scripts and capabilities (kokoro local backend, Noiz cloud backend, timeline rendering, SRT handling, voice maps). Requesting NOIZ_API_KEY as primary credential is appropriate for the Noiz cloud features.
Instruction Scope
Runtime instructions and scripts call out network and filesystem operations (downloading reference audio, posting text/audio to Noiz endpoints, reading/writing SRT/audio files, invoking ffmpeg/kokoro-tts). This is expected for a TTS skill, but the code will download arbitrary reference_audio URLs you provide and will access ~/ .noiz_api_key (legacy) for migration; review any URLs and local paths you pass to the tool.
Install Mechanism
There is no install spec (instruction-only), which limits automatic installation risks. However, the scripts depend on external programs/libraries (ffmpeg/ffprobe, kokoro-tts CLI for local backend, and the Python 'requests' package) that are not declared as required binaries in the registry metadata. The code also writes files (temp audio, final outputs).
Credentials
The only cloud credential requested is NOIZ_API_KEY (declared as primaryEnv), which aligns with the cloud backend features. The registry listing shows 'required env vars: none' but metadata/SKILL.md do identify NOIZ_API_KEY as primary credential — this minor mismatch is informational rather than malicious.
Persistence & Privilege
The skill writes the API key to ~/.config/noiz/api_key (0600) and will copy a legacy ~/.noiz_api_key into that location if present (non-destructive copy). It also creates temporary files and output audio under user-specified paths. It does not request always:true or modify other skills; persistence is limited to its own config file.
Assessment
This skill appears to do what it says — convert text to speech using either a local Kokoro CLI or the Noiz cloud API. Before installing, consider the following: - Credential handling: If you configure a NOIZ_API_KEY it will be normalized and saved to ~/.config/noiz/api_key (permissions forced to 0600). If you have an old ~/.noiz_api_key it will be copied to the new path (not deleted). If you do not trust that legacy file, remove it first. - Network activity: The skill will call Noiz endpoints (default base https://noiz.ai/v1) and may download reference audio from URLs you supply (or its default sample URLs stored on storage.googleapis.com / noiz.ai). Only provide reference_audio URLs you trust. - Local dependencies: The timeline/rendering features use ffmpeg/ffprobe and (for the local backend) kokoro-tts; the Python 'requests' package is required for Noiz backends. The registry metadata does not list these required binaries — ensure they are available or use guest mode / Kokoro as appropriate. - Data flow: Uploaded reference audio or generated audio is sent to Noiz when using the cloud backend (authenticated with your API key). If you want to avoid sending data to the cloud, use the Kokoro backend or avoid providing an API key. - Review defaults: The script points to default reference audio files hosted externally; if you are concerned about unexpected downloads, replace or remove those defaults. If any of the above is unacceptable, do not install; otherwise the skill is coherent with its stated purpose. If you want higher assurance, review the included Python files yourself and test guest mode first (no API key required) to validate behavior.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Primary envNOIZ_API_KEY
latestvk97d19fbgkr0m57k2te4f4nn7s835k8m
534downloads
1stars
4versions
Updated 1mo ago
v1.0.3
MIT-0

tts

Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.

Triggers

  • text to speech / tts / speak / say
  • voice clone / dubbing
  • epub to audio / srt to audio / convert to audio
  • 语音 / 说 / 讲 / 说话

Simple Mode — text to audio

speak is the default — the subcommand can be omitted:

# Basic usage (speak is implicit)
python3 skills/tts/scripts/tts.py -t "Hello world"          # add -o path to save
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3

# Voice cloning — local file path or URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav

# Voice message format
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg

Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.

Timeline Mode — SRT to time-aligned audio

For precise per-segment timing (dubbing, subtitles, video narration).

Step 1: Get or create an SRT

If the user doesn't have one, generate from text:

python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500

--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.

Step 2: Create a voice map

JSON file controlling default + per-segment voice settings. segments keys support single index "3" or range "5-8".

Kokoro voice map:

{
  "default": { "voice": "zf_xiaoni", "lang": "cmn" },
  "segments": {
    "1": { "voice": "zm_yunxi" },
    "5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
  }
}

Noiz voice map (adds emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):

{
  "default": { "voice_id": "voice_123", "target_lang": "zh" },
  "segments": {
    "1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
    "2-4": { "reference_audio": "./refs/guest.wav" }
  }
}

Dynamic Reference Audio Slicing: If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the --ref-audio-track argument instead of setting reference_audio in the map:

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav

See examples/ for full samples.

Step 3: Render

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav

When to Choose Which

NeedRecommended
Just read text aloud, no fussKokoro (default)
EPUB/PDF audiobook with chaptersKokoro (native support)
Voice blending ("v1:60,v2:40")Kokoro
Voice cloning from reference audioNoiz
Emotion control (emo param)Noiz
Exact server-side duration per segmentNoiz

When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.

Guest Mode (no API key)

When no API key is configured, tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.

# Guest mode (auto-detected when no API key is set)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav

# Explicit backend override to use kokoro instead
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro

Available guest voices (15 built-in):

voice_idnamelanggendertone
063a4491販売員(なおみ)jaF喜び
4252b9c8落ち着いた女性jaF穏やか
578b4be2熱血漢(たける)jaM怒り
a9249ce7安らぎ(みなと)jaM穏やか
f00e45a1旅人(かいと)jaM穏やか
b4775100悦悦|社交分享zhFJoyful
77e15f2c婉青|情绪抚慰zhFCalm
ac09aeb4阿豪|磁性主持zhMCalm
87cb2405建国|知识科普zhMCalm
3b9f1e27小明|科技达人zhMJoyful
95814addScience NarrationenMCalm
883b6b7cThe Mentor (Alex)enMJoyful
a845c7deThe Naturalist (Silas)enMCalm
5a68d66bThe Healer (Serena)enFCalm
0e4ab6ecThe Mentor (Maya)enFCalm

Security & data disclosure

This skill performs the following file and network operations at runtime:

  • Credential storage: When you run config --set-api-key, the key is saved to ~/.config/noiz/api_key (permissions 0600). The NOIZ_API_KEY environment variable is also supported as an alternative.
  • Legacy key migration: If ~/.noiz_api_key exists and ~/.config/noiz/api_key does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.
  • Network calls (Noiz backend): Text and optional reference audio are uploaded to https://noiz.ai/v1/ for synthesis. No data is sent unless you invoke a Noiz command.
  • Reference audio download: When --ref-audio is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from storage.googleapis.com or noiz.ai.
  • Temp files: Temporary audio/text files may be created during synthesis and are cleaned up after use.
  • ffmpeg: Invoked only in timeline render mode to assemble the final audio.

No files outside the output path and ~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.

Requirements

  • ffmpeg in PATH (timeline mode only)
  • requests package: uv pip install requests (required for Noiz backend)
  • Get your API key at Noiz Developer, then run python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY (guest mode works without a key but has limited features)
  • Kokoro: if already installed, pass --backend kokoro to use the local backend

Noiz API authentication

Use only the base64-encoded API key as Authorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.

For backend details and full argument reference, see reference.md.

Comments

Loading comments...