Voice (Edge TTS)
SuspiciousAudited by ClawScan on May 10, 2026.
Overview
The skill’s text-to-speech purpose is clear, but its code contradicts its safety claims and can run shell commands built from user input.
Do not install or enable autonomous use of this skill until the shell execution path is fixed. If you still test it, use an isolated environment, avoid sensitive text, install dependencies manually, and do not pass custom output paths or untrusted text/options.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A crafted text or option value could cause commands to run on the user’s machine under the agent’s privileges.
The command string includes user-controlled text and options and is executed through a shell. Escaping only double quotes in text does not prevent shell command substitution or injection through other options.
const { voice = 'zh-CN-XiaoxiaoNeural', output = null, rate = '+0%', volume = '+0%', pitch = '+0Hz' } = options; ... const cmd = [...].join(' '); ... await execAsync(cmd);Replace execAsync(cmd) with spawn or execFile using an argument array and shell: false; whitelist voice/rate/volume/pitch; and restrict output paths.
Users may trust the skill as hardened against command injection when the included code still has an unsafe shell execution path.
These safety claims are contradicted by index.js, where the tts/speak path constructs a shell command from user-provided values and runs it with execAsync.
**No Shell Execution** | Uses `spawn()` with array arguments instead of shell command concatenation | ... **Command Injection Prevention** | All user inputs are properly validated and escaped |
Remove or correct the security claims until the implementation actually uses safe process invocation and complete input validation.
An agent influenced by untrusted input could write audio files outside the intended temp directory or try to play arbitrary local files.
Callers can provide arbitrary output and play file paths without path restrictions or confirmation, which is broader than the documented user options.
const outputFileName = output || path.join(tempDir, `tts_${Date.now()}.mp3`); ... '--write-media', outputFileName ... case 'play': ... const playResult = await this.playAudio(params.filePath);Keep generated files inside a dedicated skill-owned directory, reject absolute or parent-traversal paths, and require explicit user approval for custom local file paths.
Installing the skill may require modifying the local Python/system environment even though the registry requirements do not fully declare it.
External Python and system dependencies are expected for Edge TTS streaming, but the registry metadata says there is no install spec or required binary.
pip install edge-tts ... Install ffmpeg ... sudo apt install ffmpeg
Install dependencies manually in an isolated environment where possible, pin package versions, and update the registry metadata/install spec to match the actual requirements.
Text submitted for speech generation may leave the local machine, so secrets or private content should not be spoken unless the user accepts the provider’s handling.
The text to be spoken is handed to the Edge TTS library/provider. This is expected for the stated purpose, but the artifacts do not describe privacy handling or retention.
communicate = edge_tts.Communicate(text, voice, rate=rate, volume=volume, pitch=pitch)
Clearly disclose the external TTS data flow and advise users not to send sensitive text unless they trust the provider.
