Vocal Chat

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 12 · 3.1k · 16 current installs · 18 all-time installs

byRubén Fernández Boullón@rubenfb23

duplicate of @rubenfb23/walkie-talkie-vigo

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The description (voice-to-voice on WhatsApp) is plausible, but the manifest declares no required binaries, no install steps, and no WhatsApp integration credentials or endpoints. The SKILL.md explicitly requires local tools (ffmpeg, whisper-cpp, sherpa-onnx-tts) and scripts (tools/transcribe_voice.sh, bin/sherpa-onnx-tts) which are not declared in the registry metadata. That mismatch is disproportionate to the claimed purpose and means the skill may fail or assume access it hasn't requested.

Instruction Scope

The instructions tell the agent to run local scripts and binaries and to send audio via a `message` tool, but they do not explain how incoming audio is surfaced to the agent, where the scripts come from, or what the `message` tool's required parameters/permissions are. The SKILL.md restricts use to 'local tools only' (no cloud) and asks the agent to always return both text and audio — no steps ask to read unrelated files or environment variables, but the instructions assume filesystem and binary access that aren't guaranteed.

ℹ

Install Mechanism

There is no install spec (instruction-only), which lowers install risk. However, the skill depends on external binaries and scripts that would need to be present on the host. The lack of an install mechanism or references to known release sources means the agent or operator must manually install/verify those dependencies; that operational gap is noteworthy but not inherently malicious.

ℹ

Credentials

The skill declares no environment variables or credentials, which is consistent with its claim to use local-only tools. However, because it targets WhatsApp conversations, the absence of any declared messaging/WhatsApp credential or integration details is suspicious — the skill assumes the agent has access to a messaging tool capable of sending files but doesn't declare what access is required.

✓

Persistence & Privilege

The skill does not request always:true and uses default invocation settings. It does not attempt to modify system-wide settings in the provided instructions. No persistence or elevated platform privileges are requested in the manifest.

What to consider before installing

Before installing or enabling this skill, verify the following: (1) Confirm which binaries and scripts it requires (ffmpeg, whisper-cpp, sherpa-onnx-tts, tools/transcribe_voice.sh, bin/sherpa-onnx-tts) and install them from trusted sources — the manifest currently lists none. (2) Ensure your agent actually has a 'message' tool and WhatsApp integration set up and understand what credentials or API access that requires; the skill does not declare any credentials. (3) Ask the publisher to update the manifest to list required binaries, install instructions, and any needed credentials. (4) Consider running the skill in a sandbox or test account first — audio processing can involve sensitive content, and the skill assumes local filesystem access which could fail or be abused. (5) Note the performance constraint (RTF < 0.5) may be unrealistic for local models and could lead to degraded behavior; confirm resource needs. If the publisher cannot clarify these gaps, treat the skill as untrusted.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97a4n0phg3403w3zmpsdp1g4s7zz525

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Walkie-Talkie Mode

This skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.

Workflow

Incoming Audio: When a user sends an audio/ogg/opus file:
- Use tools/transcribe_voice.sh to get the text.
- Process the text as a normal user prompt.
Outgoing Response:
- Instead of a text reply, generate speech using bin/sherpa-onnx-tts.
- Send the resulting .ogg file back to the user as a voice note.

Triggers

User sends an audio message.
User says "activa modo walkie-talkie" or "hablemos por voz".

Constraints

Use local tools only (ffmpeg, whisper-cpp, sherpa-onnx-tts).
Maintain a fast response time (RTF < 0.5).
Always reply with BOTH text (for clarity) and audio.

Manual Execution (Internal)

To respond with voice manually:

bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquí"

Then send /tmp/reply.ogg via message tool with filePath.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…