Walkie-Talkie Mode

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 1.4k · 0 current installs · 0 all-time installs
byRubén Fernández Boullón@rubenfb23
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name and description align with the SKILL.md: the instructions describe local transcription and local TTS for WhatsApp voice messages. However the skill references local artifacts (tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp) yet the registry metadata lists no required binaries/install steps and no code files are supplied. It also assumes the agent has a 'message' tool capable of sending WhatsApp voice notes; that capability is not documented here. The mismatch between declared requirements (none) and actual referenced tools is an incoherence.
Instruction Scope
SKILL.md gives concrete runtime steps (transcribe incoming audio with a local script, produce TTS with a local binary, send .ogg via message tool). It does not instruct reading unrelated system files or exfiltrating data. The concern is that the instructions direct execution of unspecified local scripts/binaries — those could run arbitrary code. The RTF < 0.5s constraint is unrealistic and may lead to aggressive behavior or retries.
Install Mechanism
There is no install spec (instruction-only), which minimizes direct install risk. That said, the skill requires local tools to be present; since none are provided or listed, the runtime will depend on whatever binaries/scripts exist on the host.
Credentials
The skill declares no required environment variables or credentials. This is proportionate to an instruction-only local TTS/transcription flow. Caveat: WhatsApp integration implies the agent/runtime already has messaging credentials or tools; those are external to this skill and not declared here.
Persistence & Privilege
always:false and default invocation rules are set. The skill does not request persistent installation or elevated platform privileges in its metadata.
What to consider before installing
This skill is instruction-only and expects local transcription and TTS tools that are not included or declared. Before installing or enabling it: 1) Verify the existence and provenance of tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, and whisper-cpp on the host; inspect the contents of any referenced scripts to ensure they don't execute unexpected commands. 2) Confirm your agent's 'message' tool legitimately has WhatsApp access and that any WhatsApp tokens live in a secure place you control (the skill does not request or document credentials). 3) Run the skill in a sandbox or isolated environment first to observe behavior (it executes local binaries and writes/reads /tmp/* files). 4) Beware the strict RTF requirement — it may cause retries or other aggressive actions. If you cannot review or vet the referenced binaries/scripts, treat this skill as untrusted and avoid enabling it.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk972v8x19nb36zjkdjb82st6xh7zyth2

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Walkie-Talkie Mode

This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.

Workflow

  1. Incoming Audio: When a user sends an audio/ogg/opus file:

    • Use tools/transcribe_voice.sh to get the text.
    • Process the text as a normal user prompt.
  2. Outgoing Response:

    • Instead of a text reply, generate speech using bin/sherpa-onnx-tts.
    • Send the resulting .ogg file back to the user as a voice note.

Triggers

  • User sends an audio message.
  • User says "activa modo walkie-talkie" or "hablemos por voz".

Constraints

  • Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
  • Maintain a fast response time (RTF < 0.5).
  • Always reply with BOTH text (for clarity) and audio.

Manual Execution (Internal)

To respond with voice manually:

bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"

Then send /tmp/reply.ogg via message tool with filePath.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…