Voice (Edge TTS)

SuspiciousAudited by ClawScan on May 10, 2026.

Overview

The skill’s text-to-speech purpose is clear, but its code contradicts its safety claims and can run shell commands built from user input.

Do not install or enable autonomous use of this skill until the shell execution path is fixed. If you still test it, use an isolated environment, avoid sensitive text, install dependencies manually, and do not pass custom output paths or untrusted text/options.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence

ASI05: Unexpected Code Execution

What this means

A crafted text or option value could cause commands to run on the user’s machine under the agent’s privileges.

Why it was flagged

The command string includes user-controlled text and options and is executed through a shell. Escaping only double quotes in text does not prevent shell command substitution or injection through other options.

Skill content

const { voice = 'zh-CN-XiaoxiaoNeural', output = null, rate = '+0%', volume = '+0%', pitch = '+0Hz' } = options; ... const cmd = [...].join(' '); ... await execAsync(cmd);

Recommendation

Replace execAsync(cmd) with spawn or execFile using an argument array and shell: false; whitelist voice/rate/volume/pitch; and restrict output paths.

ConcernHigh Confidence

ASI09: Human-Agent Trust Exploitation

What this means

Users may trust the skill as hardened against command injection when the included code still has an unsafe shell execution path.

Why it was flagged

These safety claims are contradicted by index.js, where the tts/speak path constructs a shell command from user-provided values and runs it with execAsync.

Skill content

**No Shell Execution** | Uses `spawn()` with array arguments instead of shell command concatenation | ... **Command Injection Prevention** | All user inputs are properly validated and escaped |

Recommendation

Remove or correct the security claims until the implementation actually uses safe process invocation and complete input validation.

ConcernHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

An agent influenced by untrusted input could write audio files outside the intended temp directory or try to play arbitrary local files.

Why it was flagged

Callers can provide arbitrary output and play file paths without path restrictions or confirmation, which is broader than the documented user options.

Skill content

const outputFileName = output || path.join(tempDir, `tts_${Date.now()}.mp3`); ... '--write-media', outputFileName ... case 'play': ... const playResult = await this.playAudio(params.filePath);

Recommendation

Keep generated files inside a dedicated skill-owned directory, reject absolute or parent-traversal paths, and require explicit user approval for custom local file paths.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing the skill may require modifying the local Python/system environment even though the registry requirements do not fully declare it.

Why it was flagged

External Python and system dependencies are expected for Edge TTS streaming, but the registry metadata says there is no install spec or required binary.

Skill content

pip install edge-tts ... Install ffmpeg ... sudo apt install ffmpeg

Recommendation

Install dependencies manually in an isolated environment where possible, pin package versions, and update the registry metadata/install spec to match the actual requirements.

NoteMedium Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Text submitted for speech generation may leave the local machine, so secrets or private content should not be spoken unless the user accepts the provider’s handling.

Why it was flagged

The text to be spoken is handed to the Edge TTS library/provider. This is expected for the stated purpose, but the artifacts do not describe privacy handling or retention.

Skill content

communicate = edge_tts.Communicate(text, voice, rate=rate, volume=volume, pitch=pitch)

Recommendation

Clearly disclose the external TTS data flow and advise users not to send sensitive text unless they trust the provider.