Tts Responder

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This skill is coherent for text-to-speech Telegram replies, but users should notice that it uses local audio tools, optional Telegram bot credentials, and sends generated audio plus a short text caption to Telegram.

Before installing, confirm you trust the local Piper/ffmpeg setup, understand that Telegram voice replies send conversation content to Telegram, and configure BOT_TOKEN and CHAT_ID only for the intended bot and chat.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The skill can run local audio conversion commands and create audio files on the machine.

Why it was flagged

The skill invokes local command-line tools to synthesize and convert audio. This is central to the stated TTS purpose and not suspicious by itself, but it means the agent can run those local tools when the skill is used.

Skill content
piper --model "$VOICE" --output_file "$OUTPUT_WAV" ...
ffmpeg -y -i "$OUTPUT_WAV" ... "$OUTPUT_OGG"
Recommendation

Install Piper and ffmpeg only from trusted sources, and use the skill only for text you are comfortable converting to audio.

What this means

If BOT_TOKEN and CHAT_ID are set, the skill can send messages as that Telegram bot to the configured chat.

Why it was flagged

The script uses a Telegram bot token and chat ID if present. That is expected for sending Telegram audio, but the registry metadata lists no required env vars or primary credential.

Skill content
if [[ -n "${BOT_TOKEN:-}" && -n "${CHAT_ID:-}" ]]; then
  curl -s -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendVoice"
Recommendation

Use a bot token with only the permissions you need, keep it private, and confirm the CHAT_ID points to the intended chat.

What this means

Response text may be transmitted to Telegram as audio and partially as a caption when voice mode is enabled.

Why it was flagged

The generated audio file and a short caption derived from the response text are uploaded to Telegram. This matches the skill description, but it is an external data flow users should understand.

Skill content
-F "voice=@${OUTPUT_OGG}" \
    -F "caption=${TEXT:0:100}..."
Recommendation

Avoid enabling voice replies for sensitive conversations unless you are comfortable sending that content through Telegram.

What this means

The first use may fetch an external voice model dependency.

Why it was flagged

The skill relies on Piper voice models that are downloaded automatically on first use. This is normal for TTS tooling, but the artifact does not specify the model source or pin a model version.

Skill content
Los modelos de voz se descargan automáticamente al primer uso (unos 50 MB).
Recommendation

Use trusted Piper model sources and consider pinning or preinstalling the intended voice model in controlled environments.