Voice

Security checks across malware telemetry and agentic risk

Overview

This text-to-speech skill matches its stated purpose, but it builds local shell commands from user-supplied text and options, so crafted input could run commands on your computer.

Install only if you trust the publisher and are comfortable with an external pip package and Microsoft Edge TTS processing your text. Avoid passing untrusted text or custom options to this skill until the command execution is changed to a safer argument-array implementation and cleanup behavior is clarified.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#
ASI05: Unexpected Code Execution
High
What this means

If the agent passes crafted or untrusted text/options to this skill, it may execute unintended local commands with the user's permissions.

Why it was flagged

The action parameters flow into a single shell command. Several values, including voice, output, rate, volume, and pitch, are not shell-escaped, and double-quoting the text does not fully prevent shell substitution on common shells.

Skill content
const { voice = 'zh-CN-XiaoxiaoNeural', output = null, rate = '+0%', volume = '+0%', pitch = '+0Hz' } = options; ... const cmd = [...].join(' '); ... await execAsync(cmd);
Recommendation

Replace shell-string execution with execFile or spawn using an argument array, validate allowed voice/rate/volume/pitch values, and restrict output paths to a safe temporary directory.

#
ASI04: Agentic Supply Chain Vulnerabilities
Low
What this means

Installing the skill may add external code to the local Python environment.

Why it was flagged

The install action downloads and installs an external Python package at runtime without a version pin. This is purpose-aligned for a TTS integration, but users must trust that package source and update behavior.

Skill content
await execAsync('pip3 install edge-tts');
Recommendation

Pin the dependency version, declare the install requirement in the registry/install metadata, and install only in an isolated environment if possible.

#
ASI09: Human-Agent Trust Exploitation
Low
What this means

Users may believe generated audio files are automatically removed when they can actually remain on disk longer, potentially preserving sensitive spoken text locally.

Why it was flagged

The documented cleanup claim is broader than the code behavior: generated TTS files are not automatically deleted after one hour unless the cleanup action is invoked; only the direct speak path schedules deletion after 5 seconds.

Skill content
Files are automatically cleaned up after 1 hour (default)
Recommendation

Either implement automatic cleanup for generated TTS files or clearly document that cleanup is manual except for the direct speak action.

#
ASI07: Insecure Inter-Agent Communication
Low
What this means

Text converted to speech may be processed by an external TTS service, so sensitive content could leave the local environment.

Why it was flagged

The skill sends text to the edge-tts provider workflow to generate speech. This is expected for the stated TTS purpose, but the artifacts do not provide detailed data-handling or retention guidance for text sent to the provider.

Skill content
using Microsoft Edge's TTS engine
Recommendation

Document the provider data flow clearly and avoid using the skill for secrets, credentials, private messages, or other sensitive text unless the provider policy is acceptable.