Voice (Edge TTS)
v1.10.0Convert text to speech using Microsoft Edge TTS with real-time streaming, customizable voice settings, and support for multiple languages including Chinese a...
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The skill's files and docs align with a Microsoft Edge TTS streaming tool (requires edge-tts and ffmpeg). However, package.json/lock also list an npm 'edge-tts' dependency while the implementation calls the Python CLI (pip edge-tts). This mismatch is odd but plausibly a packaging oversight rather than malicious.
Instruction Scope
SKILL.md repeatedly asserts 'no shell execution' and a strict voice whitelist, but the code contradicts this: index.js builds and runs a concatenated command string via execAsync for general TTS and for installation, and does not apply the voice whitelist for the 'tts' action (whitelist only enforced for 'stream'). The 'play' action calls PowerShell -c with an interpolated file path string, which can be abused if a user-controlled filePath is provided. These inconsistencies increase the chance of command injection or unexpected execution.
Install Mechanism
There is no platform install spec, but the skill contains an installDependencies method that runs 'pip3 install edge-tts' at runtime (network fetch). package-lock.json shows an npm package resolved from a non-default mirror. Runtime installation and mixed packaging (Python CLI expected + npm dependency present) are moderate risk and should be reviewed.
Credentials
The skill requests no environment variables or credentials, which is proportional to a local TTS/playback tool. There are no declared secrets, though runtime pip/network access will occur if install is invoked.
Persistence & Privilege
The skill is not 'always: true' and does not request elevated platform persistence. It does create and clean a local temp directory under a relative path, which is expected behavior for temporary audio files.
Scan Findings in Context
[use_of_exec_with_concatenated_command] unexpected: index.js uses execAsync() with a command string built by joining arguments (edge-tts --text "..." ...). SKILL.md claims command injection protection and use of spawn instead — this is a direct contradiction and increases injection risk.
[unvalidated_user_input_in_cmd] unexpected: The textToSpeech path accepts user text and voice and inserts them into a shell command string without applying the documented voice whitelist (whitelist is only used for 'stream'). User-controlled text/voice could affect the constructed command.
[powershell_command_execution_with_interpolated_string] unexpected: playAudio uses spawn('powershell', ['-c', `(New-Object Media.SoundPlayer "${filePath}").PlaySync();`]) — passing an interpolated string to PowerShell -c can be dangerous if filePath is attacker-controlled (the 'play' action accepts a filePath parameter).
[hardcoded_ffmpeg_path_in_python_script] unexpected: stream_speak.py hardcodes FFMPEG_PATH to 'E:\tools\ffmpeg\bin\ffplay'. Hardcoded absolute paths reduce portability and may mask behavior on systems without that path; it's not necessary for legitimate cross-platform skill behavior.
[runtime_package_install_via_pip] expected: The skill runs 'pip3 install edge-tts' when installDependencies is invoked; network fetch of the edge-tts Python package is expected for this skill but it increases runtime risk and should be done deliberately in a controlled environment.
What to consider before installing
This skill appears to be a legitimate Edge TTS tool but the implementation contradicts its security claims. Before installing or enabling it: 1) Do not run it in a sensitive environment until code is audited. 2) Fix the command-execution issues: replace execAsync(string) with spawn/execFile and consistently apply the voice whitelist for all actions. 3) Sanitize and/or restrict inputs used in any command or PowerShell -c invocation (the 'play' action accepts an arbitrary filePath). 4) Remove or correct the hardcoded ffplay path in stream_speak.py and ensure ffmpeg/ffplay usage is documented and optional. 5) Prefer pre-installing Python deps (pip install edge-tts) in a controlled environment rather than allowing runtime pip installs, and verify the source of any npm/pip packages (the package-lock references a non-default mirror). If you are not comfortable reviewing or changing code, avoid installing this skill or run it in an isolated sandbox.Like a lobster shell, security has layers — review code before you run it.
chineselatestsecuritystreamingttsvoice
Voice Skill (Edge TTS)
Text-to-speech skill using Microsoft Edge TTS engine with real-time streaming playback support.
Features 功能特点
- Edge TTS Engine - High quality text-to-speech using Microsoft Edge
- Streaming Playback - Real-time audio streaming (边生成边播放)
- Multiple Voices - Support for Chinese, English, Japanese, Korean voices
- Customizable - Adjust rate, volume, and pitch
- Secure Implementation - No command injection vulnerabilities
Installation 安装
1. Install Python dependencies
pip install edge-tts
2. Install ffmpeg (required for streaming)
Windows:
Download from: https://github.com/GyanD/codexffmpeg/releases
Extract and add bin folder to PATH
macOS:
brew install ffmpeg
Linux:
sudo apt install ffmpeg
Usage 使用
Streaming Playback (Recommended) 流式播放(推荐)
Real-time audio generation and playback:
// Basic usage
await skill.execute({
action: 'stream',
text: '你好,我是小九'
});
// With custom voice
await skill.execute({
action: 'stream',
text: 'Hello, how are you?',
options: {
voice: 'en-US-Standard-A',
rate: '+10%',
volume: '+0%',
pitch: '+0Hz'
}
});
Text-to-Speech with File 生成语音文件
await skill.execute({
action: 'tts',
text: 'Hello, how are you today?',
options: {
voice: 'zh-CN-XiaoxiaoNeural'
}
});
// Returns: { success: true, media: 'MEDIA: /path/to/file.mp3' }
Direct Speak 直接播放
await skill.execute({
action: 'speak',
text: 'Hello!'
});
List Available Voices 查看可用语音
await skill.execute({
action: 'voices'
});
Available Voices 可用语音
| Language | Voice ID |
|---|---|
| Chinese (Female) | zh-CN-XiaoxiaoNeural |
| Chinese (Male) | zh-CN-YunxiNeural |
| Chinese (Male) | zh-CN-YunyangNeural |
| English (US Female) | en-US-Standard-A |
| English (US Male) | en-US-Standard-D |
| English (UK) | en-GB-Standard-A |
| Japanese | ja-JP-NanamiNeural |
| Korean | ko-KR-SunHiNeural |
Options 参数
| Option | Default | Description |
|---|---|---|
| voice | zh-CN-XiaoxiaoNeural | Voice ID |
| rate | +0% | Speech rate (-50% to +100%) |
| volume | +0% | Volume adjustment (-50% to +50%) |
| pitch | +0Hz | Pitch adjustment |
Security 安全
This skill implements enterprise-grade security best practices:
🛡️ Security Features
| Feature | Implementation |
|---|---|
| Input Validation | Voice parameter whitelist validation - only allowed voices can be used |
| No Shell Execution | Uses spawn() with array arguments instead of shell command concatenation |
| Command Injection Prevention | All user inputs are properly validated and escaped |
| Path Safety | Fixed script path prevents path traversal |
Security Details
// ❌ UNSAFE - Don't use exec with string concatenation
exec(`py script.py "${userText}" --voice ${userVoice}`);
// ✅ SAFE - Use spawn with array arguments
spawn('py', [scriptPath, text, '--voice', voice], { shell: false });
Voice Whitelist
Only these voices are allowed:
const allowedVoices = [
'zh-CN-XiaoxiaoNeural', 'zh-CN-YunxiNeural', 'zh-CN-YunyangNeural',
'zh-CN-YunyouNeural', 'zh-CN-XiaomoNeural',
'en-US-Standard-C', 'en-US-Standard-D', 'en-US-Wavenet-F',
'en-GB-Standard-A', 'en-GB-Wavenet-A',
'ja-JP-NanamiNeural', 'ko-KR-SunHiNeural'
];
Any invalid voice parameter will be rejected and replaced with the default voice.
Changelog 更新日志
v1.10 (2026-02-24)
- Enterprise-grade security - Full command injection protection
- Voice whitelist validation
- Replaced exec with spawn for secure process execution
- Input sanitization for all parameters
v1.1.0
- Add streaming playback support (边生成边播放)
- Add ffmpeg dependency
- Fix command injection vulnerability
- Add voice whitelist validation
v1.0.0
- Initial release with basic TTS support
Comments
Loading comments...
