Walkie-Talkie Mode

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 1.5k · 1 current installs · 1 all-time installs
byRubén Fernández Boullón@rubenfb23
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The name/description (voice-to-voice WhatsApp) matches the SKILL.md workflow, but the skill metadata declares no required binaries, env vars, or installs while the instructions explicitly require local tooling (ffmpeg, whisper-cpp, sherpa-onnx-tts), a helper script (tools/transcribe_voice.sh), and a local TTS binary (bin/sherpa-onnx-tts). That inconsistency means the skill either omits necessary requirements or assumes access to arbitrary local executables.
!
Instruction Scope
Runtime instructions tell the agent to run local scripts/binaries and read/write files (e.g., /tmp/reply.ogg) and to use a 'message' tool to send files. These actions are coherent with the stated purpose, but they reference specific local paths and tools not declared in metadata. This grants the skill broad discretion to execute unspecified local programs and rely on local model artifacts.
!
Install Mechanism
There is no install spec (lowest install risk), which is fine for an instruction-only skill — but here it's problematic because the skill expects several local binaries and scripts. Because nothing will be installed by the skill, the operator must supply these dependencies; the missing install/dependency declarations are an integrity/usability risk.
Credentials
The skill requests no environment variables or credentials (appropriate). However, it implicitly requires access to local filesystem paths and local model binaries; the SKILL.md does not request or document any permissions or configuration for those resources.
Persistence & Privilege
The skill does not request always:true and does not declare persistent/system-wide changes. It appears to be user-invocable only and does not request elevated persistent privileges.
What to consider before installing
This skill's behavior (transcribe incoming audio, produce local TTS, send .ogg back) matches its description, but the SKILL.md depends on local tools and scripts that are not declared anywhere. Before installing or enabling: 1) Verify the agent environment actually has the required binaries (ffmpeg, whisper-cpp, sherpa-onnx-tts) and the helper script paths (tools/transcribe_voice.sh, bin/sherpa-onnx-tts). 2) Ask the author to update metadata to list required binaries, exact paths, and any model files or hardware needs. 3) Confirm the 'message' tool used to send files is the authorized platform tool (so audio is sent only to the intended chat) and that no unexpected external endpoints are contacted. 4) Review file permissions around /tmp and any model data to avoid exposing unrelated data. 5) Test in a sandboxed agent first — if the required local tools are missing, the skill will fail or may attempt to run arbitrary local programs if created later. If you cannot verify or supply the declared dependencies, treat this skill as untrusted.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97d05bt6w4cddzxdn8gknpvxn7zzcvm

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Walkie-Talkie Mode

This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.

Workflow

  1. Incoming Audio: When a user sends an audio/ogg/opus file:

    • Use tools/transcribe_voice.sh to get the text.
    • Process the text as a normal user prompt.
  2. Outgoing Response:

    • Instead of a text reply, generate speech using bin/sherpa-onnx-tts.
    • Send the resulting .ogg file back to the user as a voice note.

Triggers

  • User sends an audio message.
  • User says "activa modo walkie-talkie" or "hablemos por voz".

Constraints

  • Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
  • Maintain a fast response time (RTF < 0.5).
  • Always reply with BOTH text (for clarity) and audio.

Manual Execution (Internal)

To respond with voice manually:

bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"

Then send /tmp/reply.ogg via message tool with filePath.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…