Cartesia Speech

PassAudited by ClawScan on May 12, 2026.

Overview

This appears to be a straightforward Cartesia text-to-speech plugin; it uses a Cartesia API key, sends text to Cartesia for speech generation, and runs ffmpeg for audio conversion, all of which are disclosed and purpose-aligned.

Install only if you are comfortable sending TTS text to Cartesia, using a Cartesia API key, and running a trusted local ffmpeg binary for voice notes. Keep the default Cartesia endpoint unless you trust an alternative, store the API key securely, and enable suppressDuplicateText only if you want voice-only replies.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI03: Identity and Privilege Abuse

What this means

The plugin can consume Cartesia API quota or incur provider usage under the configured Cartesia account.

Why it was flagged

The plugin needs a Cartesia API key to use the user's Cartesia account for speech generation. This is expected for the stated provider integration and is disclosed.

Skill content

"authMethods": ["api-key"],
        "envVars": ["CARTESIA_API_KEY"]

Recommendation

Use a revocable Cartesia API key, store it via environment/secret management as documented, and monitor Cartesia usage.

Low

#ASI07: Insecure Inter-Agent Communication

What this means

Assistant text chosen for TTS becomes third-party API traffic to Cartesia or to a user-configured baseUrl.

Why it was flagged

The text to be spoken and the configured voice ID are sent to the Cartesia API endpoint, authenticated with the Cartesia API key.

Skill content

fetch(`${baseUrl}/tts/bytes`, {
      method: "POST",
      headers: { "X-API-Key": apiKey, ... },
      body: JSON.stringify({ ... transcript: text, voice: { mode: "id", id: voiceId }, ... })

Recommendation

Avoid sending confidential text to TTS if your policy disallows it, and only override baseUrl to a trusted HTTPS endpoint.

Low

#ASI05: Unexpected Code Execution

What this means

A local ffmpeg binary on PATH will run during voice-note synthesis.

Why it was flagged

The plugin launches ffmpeg to transcode PCM audio into OGG/Opus voice notes. Arguments are fixed and this behavior is documented as required for voice-note support.

Skill content

const ff = spawn(
      "ffmpeg",
      ["-loglevel", "error", ... "-f", "ogg", "-"],
      { stdio: ["pipe", "pipe", "pipe"] }
    );

Recommendation

Install ffmpeg from a trusted source and ensure PATH does not point to an untrusted replacement binary.

Low

#ASI02: Tool Misuse and Exploitation

What this means

If enabled, some text replies may be suppressed in favor of voice-only output, and the README notes edge cases for multi-message or out-of-order delivery.

Why it was flagged

When explicitly enabled, the plugin hooks outgoing message sending and can cancel a text reply after a voice note is synthesized to avoid duplicates.

Skill content

if (suppressDuplicateText && typeof api.on === "function") {
      api.on("message_sending", (event) => { ... if (consumeVoiceTurn(sessionKey)) { return { cancel: true }; } ... });
    }

Recommendation

Leave suppressDuplicateText disabled unless you want voice-only behavior, and test it in the channels where you plan to use it.