Kai Realtime Voice
PassAudited by ClawScan on May 1, 2026.
Overview
This appears to be a straightforward MiniMax text-to-speech helper, but it sends text to MiniMax using your API key and the implementation is REST-based rather than true WebSocket streaming.
Before installing, confirm you are comfortable sending requested speech text to MiniMax under your API key. Expect a REST-based MP3 generation helper rather than true real-time WebSocket streaming, and ensure curl and xxd are available if you plan to use the included script.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Using the skill may consume MiniMax account quota or incur provider-side usage under the configured API key.
The script uses the MiniMax API key as a bearer token for provider requests, which is expected for this integration but gives the skill access to a billable/provider credential.
API_KEY="${MINIMAX_API_KEY}" ... -H "Authorization: Bearer ${API_KEY}"Use a scoped MiniMax API key where possible and monitor usage/costs for that key.
Text submitted for speech generation may be processed by MiniMax outside the local environment.
The provided text is sent to the external MiniMax API for audio generation. This is disclosed and purpose-aligned, but it is still an external data flow.
curl -s -X POST "https://api.minimax.io/v1/t2a_v2" ... "text":"$TEXT"
Avoid sending secrets or sensitive private content unless MiniMax's data handling terms are acceptable for that content.
Users may expect real-time WebSocket streaming but receive a generated MP3 from a REST request instead.
The implementation does not currently match the advertised real-time WebSocket behavior, which could mislead users about latency and streaming capability.
# For real streaming, would use WebSocket here # For now, fallback to REST API
Treat the current skill as a REST-based MiniMax TTS helper unless the WebSocket implementation is added and documented.
The skill may fail or behave differently on systems without these tools installed.
The helper depends on local curl and xxd binaries, while the provided requirements list no required binaries. This is an under-declared runtime dependency rather than suspicious behavior.
RESP=$(curl -s ...)
...
echo "$HEX" | xxd -r -p > "${WORKSPACE}/realtime_test.mp3"Document curl and xxd as required binaries, or replace them with declared dependencies.
Some normal speech text containing quotes or JSON-like characters may fail or produce unintended request parameters.
The text argument is interpolated directly into a JSON payload. This is not shell execution, but quotes or special characters in the text can break or alter the API request body.
-d "{\"model\":\"speech-02-turbo\",\"text\":\"$TEXT\",\"stream\":false, ...}"Construct the JSON payload with a proper JSON encoder such as jq, Python, or another escaping-safe method.
