Kai Realtime Voice

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a straightforward MiniMax text-to-speech helper, but it sends text to MiniMax using your API key and the implementation is REST-based rather than true WebSocket streaming.

Before installing, confirm you are comfortable sending requested speech text to MiniMax under your API key. Expect a REST-based MP3 generation helper rather than true real-time WebSocket streaming, and ensure curl and xxd are available if you plan to use the included script.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Using the skill may consume MiniMax account quota or incur provider-side usage under the configured API key.

Why it was flagged

The script uses the MiniMax API key as a bearer token for provider requests, which is expected for this integration but gives the skill access to a billable/provider credential.

Skill content
API_KEY="${MINIMAX_API_KEY}" ... -H "Authorization: Bearer ${API_KEY}"
Recommendation

Use a scoped MiniMax API key where possible and monitor usage/costs for that key.

What this means

Text submitted for speech generation may be processed by MiniMax outside the local environment.

Why it was flagged

The provided text is sent to the external MiniMax API for audio generation. This is disclosed and purpose-aligned, but it is still an external data flow.

Skill content
curl -s -X POST "https://api.minimax.io/v1/t2a_v2" ... "text":"$TEXT"
Recommendation

Avoid sending secrets or sensitive private content unless MiniMax's data handling terms are acceptable for that content.

What this means

Users may expect real-time WebSocket streaming but receive a generated MP3 from a REST request instead.

Why it was flagged

The implementation does not currently match the advertised real-time WebSocket behavior, which could mislead users about latency and streaming capability.

Skill content
# For real streaming, would use WebSocket here
# For now, fallback to REST API
Recommendation

Treat the current skill as a REST-based MiniMax TTS helper unless the WebSocket implementation is added and documented.

What this means

The skill may fail or behave differently on systems without these tools installed.

Why it was flagged

The helper depends on local curl and xxd binaries, while the provided requirements list no required binaries. This is an under-declared runtime dependency rather than suspicious behavior.

Skill content
RESP=$(curl -s ...)
...
echo "$HEX" | xxd -r -p > "${WORKSPACE}/realtime_test.mp3"
Recommendation

Document curl and xxd as required binaries, or replace them with declared dependencies.

What this means

Some normal speech text containing quotes or JSON-like characters may fail or produce unintended request parameters.

Why it was flagged

The text argument is interpolated directly into a JSON payload. This is not shell execution, but quotes or special characters in the text can break or alter the API request body.

Skill content
-d "{\"model\":\"speech-02-turbo\",\"text\":\"$TEXT\",\"stream\":false, ...}"
Recommendation

Construct the JSON payload with a proper JSON encoder such as jq, Python, or another escaping-safe method.