Voice
Security checks across malware telemetry and agentic risk
Overview
This text-to-speech skill matches its stated purpose, but it builds local shell commands from user-supplied text and options, so crafted input could run commands on your computer.
Install only if you trust the publisher and are comfortable with an external pip package and Microsoft Edge TTS processing your text. Avoid passing untrusted text or custom options to this skill until the command execution is changed to a safer argument-array implementation and cleanup behavior is clarified.
VirusTotal
64/64 vendors flagged this skill as clean.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If the agent passes crafted or untrusted text/options to this skill, it may execute unintended local commands with the user's permissions.
The action parameters flow into a single shell command. Several values, including voice, output, rate, volume, and pitch, are not shell-escaped, and double-quoting the text does not fully prevent shell substitution on common shells.
const { voice = 'zh-CN-XiaoxiaoNeural', output = null, rate = '+0%', volume = '+0%', pitch = '+0Hz' } = options; ... const cmd = [...].join(' '); ... await execAsync(cmd);Replace shell-string execution with execFile or spawn using an argument array, validate allowed voice/rate/volume/pitch values, and restrict output paths to a safe temporary directory.
Installing the skill may add external code to the local Python environment.
The install action downloads and installs an external Python package at runtime without a version pin. This is purpose-aligned for a TTS integration, but users must trust that package source and update behavior.
await execAsync('pip3 install edge-tts');Pin the dependency version, declare the install requirement in the registry/install metadata, and install only in an isolated environment if possible.
Users may believe generated audio files are automatically removed when they can actually remain on disk longer, potentially preserving sensitive spoken text locally.
The documented cleanup claim is broader than the code behavior: generated TTS files are not automatically deleted after one hour unless the cleanup action is invoked; only the direct speak path schedules deletion after 5 seconds.
Files are automatically cleaned up after 1 hour (default)
Either implement automatic cleanup for generated TTS files or clearly document that cleanup is manual except for the direct speak action.
Text converted to speech may be processed by an external TTS service, so sensitive content could leave the local environment.
The skill sends text to the edge-tts provider workflow to generate speech. This is expected for the stated TTS purpose, but the artifacts do not provide detailed data-handling or retention guidance for text sent to the provider.
using Microsoft Edge's TTS engine
Document the provider data flow clearly and avoid using the skill for secrets, credentials, private messages, or other sensitive text unless the provider policy is acceptable.
