Byted Text To Speech
v1.0.0将文本合成为语音(TTS)。使用火山引擎豆包语音合成 API,支持流式合成、多种音色、语速/音调/音量调节、Markdown 过滤和 LaTeX 公式播报。当用户需要把文字转成语音、生成朗读音频、配音、旁白、播报,或提到「文字转语音」「TTS」「语音合成」「朗读」「配音」时使用本技能。
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
Name/description match a Volcengine TTS integration and the included text_to_speech.py implements that API. However, the skill metadata declared no required environment variables while the runtime and SKILL.md expect MODEL_SPEECH_API_KEY. The code also contains logic to use ARK_SKILL_API_KEY and ARK_SKILL_API_BASE to list/create API keys if MODEL_SPEECH_API_KEY is missing — those additional credentials are not mentioned in the description or metadata and are not obviously required for a simple TTS client.
Instruction Scope
SKILL.md instructs the agent to check MODEL_SPEECH_API_KEY and references local docs; it does not tell the user to provide ARK_SKILL_API_KEY or ARK_SKILL_API_BASE. The actual runtime code (scripts/api_key.py) will, if MODEL_SPEECH_API_KEY is absent, attempt network calls to ARK_SKILL_API_BASE using ARK_SKILL_API_KEY to list or create API keys and then persist the obtained key. That network-and-credential escalation is outside the SKILL.md's explicit instructions and expands scope beyond simply calling the TTS API.
Install Mechanism
No installer that downloads arbitrary code is present. This is an instruction-only skill with bundled Python scripts and a requirements.txt (httpx). No external archives, URL shorteners, or nonstandard install actions are used.
Credentials
The skill relies on MODEL_SPEECH_API_KEY (documented in SKILL.md) but the package metadata did not declare it as a required env var. More importantly, api_key.py will also accept ARK_SKILL_API_KEY and ARK_SKILL_API_BASE to retrieve/create MODEL_SPEECH_API_KEY automatically — these are additional credentials of a different service surface that the skill does not justify in its description. The code will write any retrieved key into a .env file and set os.environ; that persistence and cross-service credential use is disproportionate for a TTS wrapper unless explicitly requested and authorized by the user.
Persistence & Privilege
The skill writes the discovered/created MODEL_SPEECH_API_KEY into a .env file in the scripts directory (and attempts chmod 0600), and also sets it in os.environ. While the skill is not marked 'always: true' and it does not modify other skills, automatic creation/retrieval and storage of credentials in the workspace is a persistent side-effect that elevates its privilege and can persist secrets on disk without explicit user action.
What to consider before installing
This skill otherwise appears to implement Volcengine's TTS API correctly, but there are important mismatches to be aware of: 1) metadata did not declare any required environment variables while the code needs MODEL_SPEECH_API_KEY — you should supply this yourself rather than relying on automatic behavior; 2) the bundled scripts will, if MODEL_SPEECH_API_KEY is missing, attempt to call an ARK API endpoint using ARK_SKILL_API_KEY and ARK_SKILL_API_BASE (environment variables the SKILL.md and metadata don't mention) to list/create an API key and then persist that key into a local .env file. Before installing, consider: - Only provide MODEL_SPEECH_API_KEY; avoid setting ARK_SKILL_API_KEY and ARK_SKILL_API_BASE unless you understand and trust the endpoint they point to. - Review scripts/api_key.py (the key-fetching logic) and remove or disable the auto-create/list behavior if you don't want the skill to call other management APIs or persist secrets. - Be aware the skill will write a .env file in its directory containing the API key if one is fetched; if you don't want secrets on disk, run the tool in an environment where writing is disallowed or modify the script to skip persistence. - If you need stronger assurance, ask the publisher for clarification why ARK credentials are necessary and for the skill metadata to be updated to declare required env vars.Like a lobster shell, security has layers — review code before you run it.
latest
Byted-Text-to-Speech Skill
基于火山引擎豆包语音合成(HTTP Chunked/SSE 单向流式-V3)将文本转为语音并保存为音频文件。
何时使用
当用户有以下需求时,优先使用本 skill:
- 需要把一段文字转成语音、朗读音频
- 需要生成配音、旁白、播报、有声读物片段
- 需要将代码注释、文档、文章等内容转为音频便于收听
- 需要生成多语言语音(中文、英文等)
- 用户提到「文字转语音」「TTS」「语音合成」「朗读」「配音」「念出来」「读给我听」
- 用户没有明确说"语音合成",但任务本质上需要将文本内容转为可播放的音频时
使用前检查
优先检查是否已配置以下凭证:
MODEL_SPEECH_API_KEY
如果缺少凭证,打开 references/setup-guide.md 查看开通、申请和配置方式,并给予用户开通建议
脚本参数
| 参数 | 简写 | 必填 | 说明 |
|---|---|---|---|
--text | -t | 是 | 要合成的文本内容 |
--output | -o | 否 | 输出音频文件路径(默认自动生成) |
--speaker | -s | 否 | 发音人,默认 zh_female_vv_uranus_bigtts,音色列表 |
--format | 否 | 音频格式:mp3(默认)、pcm、ogg_opus | |
--sample-rate | 否 | 采样率,如 16000、24000(默认 24000) | |
--speech-rate | 否 | 语速 [-50, 100],100 代表 2.0 倍速,-50 代表 0.5 倍速,默认 0 | |
--pitch-rate | 否 | 音调 [-12, 12],默认 0 | |
--loudness-rate | 否 | 音量 [-50, 100],100 代表 2.0 倍音量,-50 代表 0.5 倍音量,默认 0 | |
--bit-rate | 否 | 比特率,对 mp3 和 ogg_opus 格式生效(如 64000、128000),默认 64000 | |
--filter-markdown | 否 | 过滤 markdown 语法(如 **你好** 读为"你好"),默认关闭 | |
--enable-latex | 否 | 启用 LaTeX 公式播报(使用 latex_parser v2,自动开启 markdown 过滤),默认关闭 |
返回值说明
脚本输出 JSON,包含:
status:"success"或"error"local_path: 本地音频文件路径format: 音频格式error: 失败时的错误信息
请将 local_path 或可访问的音频 URL 返回给用户,便于播放或下载。
错误处理
- 若报错
PermissionError: MODEL_SPEECH_API_KEY ... 需在环境变量中配置:提示用户在 API Key 管理 获取并配置MODEL_SPEECH_API_KEY,写入 workspace 下的环境变量文件后重试。 - 若返回 4xx/5xx 或业务错误码:根据错误信息提示用户检查文本内容、发音人 ID 及账号是否已开通豆包语音服务。
故障排查
- 缺少凭证:打开
references/setup-guide.md - 需要查 API 参数、字段、错误码:打开
references/docs-index.md - 如果脚本返回权限错误,优先检查服务是否已开通、凭证是否有效,给予用户明确的操作指引
参考资料
按需打开以下文件,不必默认全部加载:
references/setup-guide.md:服务开通、凭证申请、环境变量配置references/docs-index.md:API 文档索引、参数说明、音色列表、错误码速查
示例
# 基本用法
python scripts/text_to_speech.py -t "欢迎使用火山引擎语音合成服务。"
# 指定发音人与输出格式
python scripts/text_to_speech.py -t "这是一段测试语音。" -s zh_female_vv_uranus_bigtts -o output.mp3 --format mp3
# 指定语速与采样率
python scripts/text_to_speech.py -t "语速和音调可调。" --speech-rate 10 --sample-rate 16000
Comments
Loading comments...
