Walkie-Talkie Mode

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 4 · 2.2k · 5 current installs · 5 all-time installs

byRubén Fernández Boullón@rubenfb23

duplicate of @rubenfb23/walkie-talkie-vigo

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The description (voice-to-voice on WhatsApp using local transcription and TTS) matches the actions described in SKILL.md. However, the skill metadata declares no required binaries or files while the instructions explicitly reference tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp, and sherpa-onnx-tts. Those are necessary for the stated purpose but are neither included nor declared, which is an inconsistency.

Instruction Scope

Instructions tell the agent to execute local scripts/binaries and to send .ogg files via the message tool. They do not ask for extra env vars or unrelated files, but they require executing code at host paths (tools/transcribe_voice.sh, bin/sherpa-onnx-tts). Because those files are not provided, the skill will rely on whatever binaries exist on the host—this gives the agent power to run arbitrary local code if those paths are populated.

✓

Install Mechanism

This is instruction-only (no install spec or code). That reduces the risk of the skill dropping arbitrary code during installation. However, the runtime depends on externally installed local binaries which the user must provide.

✓

Credentials

The skill requests no environment variables or credentials, which is proportionate to its described local-only operation. There is no unexplained request for unrelated secrets. Be aware that sending messages via the agent's messaging integration still requires whatever platform credentials the agent normally uses, but those are not requested by this skill.

✓

Persistence & Privilege

The skill is not always-enabled and does not request elevated platform privileges or attempt to modify other skills. Autonomous invocation is allowed (platform default), which is normal and not by itself a red flag.

What to consider before installing

This skill's behavior is plausible but inconsistent: SKILL.md requires local scripts/binaries (tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp) that are neither included nor declared. Before installing or enabling it, verify: 1) where those binaries/scripts will come from and that they are from trusted sources; 2) the exact content of tools/transcribe_voice.sh (so it doesn't run unexpected commands); 3) that you are comfortable the agent can execute local binaries on the host. If you can't audit or control the referenced scripts/binaries, consider not installing or asking the author for a clear dependency list and safe installation instructions. Providing the missing files or explicit dependency declarations (and ideally checksums or official sources) would reduce the concern and could change the assessment to benign.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97bzz5tatd84dazgsb34syrq17zzdjn

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Walkie-Talkie Mode

This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.

Workflow

Incoming Audio: When a user sends an audio/ogg/opus file:
- Use tools/transcribe_voice.sh to get the text.
- Process the text as a normal user prompt.
Outgoing Response:
- Instead of a text reply, generate speech using bin/sherpa-onnx-tts.
- Send the resulting .ogg file back to the user as a voice note.

Triggers

User sends an audio message.
User says "activa modo walkie-talkie" or "hablemos por voz".

Constraints

Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
Maintain a fast response time (RTF < 0.5).
Always reply with BOTH text (for clarity) and audio.

Manual Execution (Internal)

To respond with voice manually:

bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"

Then send /tmp/reply.ogg via message tool with filePath.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…