Pocket TTS Complete Documentation

ReviewAudited by ClawScan on May 1, 2026.

Overview

This is a coherent documentation-only TTS skill, but running its examples installs an external package and may create reusable voice-cloning files that users should protect.

This skill appears safe as documentation. Before running its examples, verify the Pocket TTS package, keep any web server on localhost unless intentionally exposing it, and protect or delete generated voice-cloning files.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If the user runs the install command, they will execute code from the installed Pocket TTS package and its dependencies.

Why it was flagged

The documentation directs users to install an external package without pinning a version. This is normal for a package documentation skill, but it relies on public package provenance.

Skill content
pip install pocket-tts
# or
uv add pocket-tts
Recommendation

Verify the package source and version before installing, and consider pinning a known-good version in controlled environments.

What this means

Running the server makes a local TTS web interface/API available; changing the host could expose it beyond the local machine.

Why it was flagged

The skill documents a command that starts a FastAPI web server. The default and examples bind to localhost, which is purpose-aligned, but server exposure depends on user-selected host settings.

Skill content
pocket-tts serve --host "localhost" --port 8080
Recommendation

Keep the server bound to localhost unless you intentionally want network access, and stop it when finished.

What this means

Saved voice embeddings may allow future generation in the sampled voice if someone else obtains them.

Why it was flagged

Voice samples and derived embeddings can be reusable personal voice-cloning data. The behavior is central to the skill, but users should treat these files as sensitive.

Skill content
convert an audio file to a voice embedding in safetensors format. The safetensors file can then be loaded very quickly whenever you generate speech.
Recommendation

Use voice samples only with permission, store exported safetensors securely, and delete voice embeddings you no longer need.