Azure Speech TTS
Azure Speech TTS skill for generating local audio files from text or SSML with Azure Speech. Use when the user asks to use Azure Speech / Azure TTS / Microso...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 28 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The name/description, README, SKILL.md, and the included script all consistently implement Azure Speech Text-to-Speech (text/SSML → local audio file). Network calls target Microsoft Cognitive Services endpoints and the options (voice, format, SSML, output file) match the stated purpose.
Instruction Scope
Runtime instructions and the script operate within the stated scope: they read text/SSML input, optionally read files/stdin, construct SSML, call Azure token and synthesis endpoints, and write local audio and optional SSML files. There are no hidden external endpoints, token exfiltration code, or instructions to read unrelated system files in the provided code and documentation.
Install Mechanism
No install spec is present (instruction-only install). A Python script is included and uses only the standard library (urllib, pathlib, etc.), so there is no package download or archive extraction risk in the manifest.
Credentials
SKILL.md and the script require AZURE_SPEECH_KEY and AZURE_SPEECH_REGION (and optionally AZURE_SPEECH_VOICE / FORMAT) but the registry metadata lists no required environment variables and no primary credential. The omission of the required Azure credentials from the declared metadata is an inconsistency and a proportionally important one — the skill needs subscription credentials to function and that should be declared explicitly.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or persistent platform privileges. It writes output under a local download/ directory only and does not modify other skills or system-wide configuration.
What to consider before installing
This skill appears to be a legitimate Azure Speech TTS helper, but the package metadata fails to declare the required Azure credentials (AZURE_SPEECH_KEY and AZURE_SPEECH_REGION). Before installing or running it: (1) confirm the source/publisher and that the missing credential declaration is intentional; (2) never paste secrets into config.json — use environment variables as documented; (3) run with --dry-run first to inspect generated SSML; (4) provide only a limited/rotatable Azure Speech key and region you control (rotate the key afterwards if you supply an existing secret); (5) if you need stricter assurance, run the script in an isolated environment (container) and inspect the full script — it only uses Microsoft cognitive endpoints, but you should verify no other network endpoints are present. If you plan to publish or reuse this skill, ask the publisher to update the registry metadata to declare the required env vars / primary credential so the credential requirements are visible up front.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.1
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Azure Speech TTS
Use Azure Speech to turn text or SSML into a local audio file under download/.
What this skill does
- Synthesize plain text into speech
- Synthesize full SSML payloads directly
- Choose voice, output format, rate, pitch, style, and role
- Save the result as a local audio file and print a JSON summary
Configuration
This skill uses a small default config file plus environment variables.
Default config file
File:
config.json
Default values:
default_voice:zh-CN-Yunqi:DragonHDOmniLatestNeuraldefault_format:mp3default_output_dir:downloaddefault_timeout_seconds:60
Secret values
Set these in the local shell environment:
AZURE_SPEECH_KEYAZURE_SPEECH_REGION
Optional environment overrides
AZURE_SPEECH_VOICEAZURE_SPEECH_FORMAT
Precedence
Use this order:
- CLI flag
- Environment variable
config.json- Built-in fallback
Quick start
python3 scripts/azure_tts.py \
--text "你好,这是一段测试语音。" \
--voice zh-CN-Yunqi:DragonHDOmniLatestNeural \
--format mp3 \
--output download/test.mp3
For SSML:
python3 scripts/azure_tts.py \
--ssml-file temp/input.ssml \
--format wav \
--output download/test.wav
Workflow
- Decide whether the input is plain text or full SSML.
- Use
--text/--text-filefor normal narration. - Use
--ssml/--ssml-fileonly when the payload already contains a complete<speak>document. - Pick the voice and output format, or let
config.jsonsupply the defaults. - Run
scripts/azure_tts.py. - Return the generated audio path to the user.
Rules
- Prefer plain text unless the user needs pauses, emphasis, multi-voice content, or expressive styling.
--ssmlinput must include a full<speak>root element.- Default voice is
zh-CN-Yunqi:DragonHDOmniLatestNeuralif nothing else is set. - Default output folder is
download/. - If the user does not specify format, use the default MP3 output.
- Do not put secrets in
config.json.
Common formats
See references/azure-speech-cheatsheet.md for the format map and examples.
Short aliases supported by the script:
mp3wavpcmogg
Useful options
--voice: Azure voice name, for exampleen-US-AriaNeural--language: SSMLxml:langfor plain-text mode--rate: speaking rate, for example+10%--pitch: pitch adjustment, for example+2st--style: expressive style such ascheerful,sad,chat--style-degree: strength of the expressive style--role: voice role when supported--save-ssml: write the generated SSML to a file for inspection--dry-run: print the generated SSML without calling Azure
Output
The helper script writes the audio file and prints JSON like:
{
"ok": true,
"output_path": "download/test.mp3",
"format": "audio-24khz-48kbitrate-mono-mp3",
"voice": "zh-CN-Yunqi:DragonHDOmniLatestNeural",
"language": "zh-CN",
"bytes": 123456
}
Use the printed output_path as the deliverable path.
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
