iMessage Voice Reply

v1.0.3

Send voice message replies in iMessage using local Kokoro-ONNX TTS. Generates native iMessage voice bubbles (CAF/Opus) that play inline with waveform — not f...

0· 575·1 current·1 all-time
byMichael Boland@bolander72
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (iMessage voice replies) align with included scripts and instructions: a local Kokoro TTS pipeline, audio encoding (afconvert/ffmpeg), and use of a BlueBubbles channel to send the resulting CAF/Opus payload. No unrelated services, credentials, or binaries are requested.
Instruction Scope
SKILL.md stays on-task: it instructs setup (venv and pip install), model download to ~/.cache/kokoro-onnx, generating audio with the Python script, and sending via BlueBubbles. It does require network access during pip install and model download; the instructions do not attempt to read unrelated system configuration or secrets. The doc correctly warns about shell escaping and recommends --text-file for untrusted input.
Install Mechanism
There is no packaged install spec, but setup.sh runs python3 -m venv and pip installs kokoro-onnx, soundfile, and numpy from PyPI. This is a standard mechanism but does involve pulling code and (indirectly) models from the network; kokoro-onnx appears to fetch models when instantiated. No arbitrary URL downloads or archive extraction from unknown hosts are present in the repo files.
Credentials
The skill requests no environment variables or credentials. It expects an existing BlueBubbles channel configured in OpenClaw to send the message (that is reasonable and proportional). The scripts write to the user's home cache (~/.cache/kokoro-onnx) and /tmp, which is expected for local model storage and temp audio files.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and only installs a venv under the skill directory and model files under the user's cache. It runs only when invoked by the user/agent (default behavior).
Assessment
This skill appears coherent and matches its description, but review these practical points before installing: 1) setup.sh will pip install kokoro-onnx (and deps) from PyPI and will cause Kokoro to download ~136MB of models into ~/.cache/kokoro-onnx — expect network activity and disk usage. 2) Confirm you trust the kokoro-onnx PyPI package and its model source (audit upstream if needed). 3) The script uses tempfile.mktemp (potential race-condition/insecure temp path pattern) — not an immediate red flag but worth noting if you run in multi-user environments. 4) On non-macOS the setup requires ffmpeg; on macOS it uses afconvert. 5) The skill does not request credentials, but it uses your configured BlueBubbles channel to send attachments — ensure BlueBubbles is configured and trusted. If you want extra safety, run setup in an isolated environment (container or VM) and inspect/verify the kokoro-onnx package and downloaded model files before use.

Like a lobster shell, security has layers — review code before you run it.

latestvk97776kb9t72aeedkkm3n6f7b981yvng
575downloads
0stars
4versions
Updated 1mo ago
v1.0.3
MIT-0

iMessage Voice Reply

Generate and send native iMessage voice messages using local Kokoro TTS. Voice messages appear as inline playable bubbles with waveforms — identical to voice messages recorded in Messages.app.

How It Works

Your text response → Kokoro TTS (local) → afconvert (native Apple encoder) → CAF/Opus → BlueBubbles → iMessage voice bubble

Setup

bash ${baseDir}/scripts/setup.sh

Installs: kokoro-onnx, soundfile, numpy. Downloads Kokoro models (~136MB) to ~/.cache/kokoro-onnx/.

Requires: BlueBubbles channel configured in OpenClaw (channels.bluebubbles).

Generating and Sending a Voice Reply

Step 1: Generate audio

Write the response text to a temp file, then pass it via --text-file to avoid shell injection:

echo "Your response text here" > /tmp/voice_text.txt
${baseDir}/.venv/bin/python ${baseDir}/scripts/generate_voice_reply.py --text-file /tmp/voice_text.txt --output /tmp/voice_reply.caf

Alternatively, pass text directly (ensure proper shell escaping):

${baseDir}/.venv/bin/python ${baseDir}/scripts/generate_voice_reply.py --text "Your response text here" --output /tmp/voice_reply.caf

Options:

  • --voice af_heart — Kokoro voice (default: af_heart)
  • --speed 1.15 — Playback speed (default: 1.15)
  • --lang en-us — Language code (default: en-us)

Security note: The Python script uses argparse and subprocess.run with list arguments (no shell=True). Input is handled safely within the script. When calling from a shell, prefer --text-file for untrusted input to avoid shell metacharacter issues.

Step 2: Send via BlueBubbles

Use the message tool:

{
  "action": "sendAttachment",
  "channel": "bluebubbles",
  "target": "+1XXXXXXXXXX",
  "path": "/tmp/voice_reply.caf",
  "filename": "Audio Message.caf",
  "contentType": "audio/x-caf",
  "asVoice": true
}

Critical parameters for native voice bubble:

  • filename must be "Audio Message.caf"
  • contentType must be "audio/x-caf"
  • asVoice must be true

All three are required for iMessage to render the message as an inline voice bubble with waveform instead of a file attachment.

Voice Options

LanguageFemaleMale
Englishaf_heart ⭐am_puck
Spanishef_doraem_alex
Frenchff_siwis
Japanesejf_alphajm_beta
Chinesezf_xiaobeizm_yunjian

When to Reply with Voice

Reply with a voice message when:

  • The user sent you a voice message (voice-for-voice)
  • The user explicitly asks for an audio/voice response

Always include a text reply alongside the voice message for accessibility.

Audio Format

  • macOS: CAF container, Opus codec, 48kHz mono, 32kbps — encoded by Apple's native afconvert. Identical to what Messages.app produces.
  • Fallback: MP3 via ffmpeg (works but may not render as native voice bubble on all iMessage versions).

Cost

$0. Kokoro TTS runs entirely locally. No API calls for voice generation.

Troubleshooting

Voice message shows as file attachment — Ensure all three parameters are set: filename="Audio Message.caf", contentType="audio/x-caf", asVoice=true.

First word clipped — The script prepends 150ms silence automatically. If still clipped, increase the silence pad in the script.

Kokoro model not found — Run bash ${baseDir}/scripts/setup.sh.

afconvert not found — Only available on macOS. Script falls back to ffmpeg/MP3 on Linux.

Comments

Loading comments...