Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

whatsappVoiceOpenSkill

Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 1.8k · 5 current installs · 5 all-time installs
bySyed Ateebul Islam@syedateebulislam
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The code and docs implement transcription via a local Whisper model, intent parsing, and handlers — matching the stated purpose. However there are important mismatches: SKILL.md and docs claim OGG/Opus (WhatsApp) works “no-FFmpeg”, but the Python transcription uses soundfile/libsndfile — libsndfile typically does not support OGG/Opus/Opus-in-OGG without additional codecs/FFmpeg, so the "no FFmpeg" claim is likely incorrect. Also transcribe.py calls model.transcribe(..., language="en") (forces English) even though the skill advertises automatic multi-language detection; the pipeline actually detects language in JS after transcription, which contradicts claims of automatic language detection at the transcription stage.
Instruction Scope
Runtime instructions are narrowly scoped to watching ~/.clawdbot/media/inbound/, saving temp files under TEMP, running the bundled transcribe.py via child_process.execSync, parsing intents, and optionally making outbound HTTP requests (weather handler fetches wttr.in). The daemon prints JSON to stdout for a parent process to handle sending via WhatsApp. The instructions do not request external credentials, nor do they read arbitrary system config files, but they do read/write local files (.voice-processed.log and temp files) and spawn a Python process — both expected for a local transcription pipeline.
Install Mechanism
There is no install spec; SKILL.md and requirements.txt ask users to pip install openai-whisper, soundfile, numpy. Installing openai-whisper will download models and runtime dependencies (large downloads, potential CPU/GPU usage). This is a normal distribution mechanism but it is non-trivial (model downloads, large memory use). There are no remote downloads of arbitrary archives in an install script.
Credentials
The skill declares no required environment variables or credentials (and none are necessary for the provided handlers). The code uses common env variables for paths (HOME/APPDATA/TEMP) only. Note: example/custom handlers reference external SDKs (drone-sdk, music-api) which, if enabled by a user, would require their own credentials — but those are optional user modifications, not required by the skill.
Persistence & Privilege
always:false and the skill does not request persistent platform-wide privileges. It writes a local processed log and temporary audio files (expected for a daemon). It does not modify other skills or system-wide settings.
What to consider before installing
What to check before installing or running this skill: - Audio format support: The README claims OGG/Opus works without FFmpeg, but transcribe.py uses soundfile/libsndfile. libsndfile often cannot read Opus-in-OGG; test with real WhatsApp files. If you see failures, add FFmpeg-based conversion or use a tool that supports Opus. - Language handling: transcribe.py forces language="en" when calling Whisper, which will hurt Hindi/other-language transcripts. The skill does language detection only after transcription. If you expect multi-language input, update transcribe.py to let Whisper detect language (or pass correct language param). - Missing/implicit dependencies: package.json lists no dependencies, yet code uses fetch (Node versions <18 may not have global fetch), and examples require drone-sdk or music-api — those are not provided. Ensure you install and pin the dependencies you actually need. - Network and device actions: built-in weather handler makes an HTTP request to wttr.in (expected). Example handlers show controlling drones or smart-home devices — these are only executed if you wire in such handlers, but be careful: adding handlers can introduce network/device access and will need appropriate credentials and safety checks. - Resource & security posture: Whisper models are large and memory- and CPU-intensive; running locally will download models and consume ~1+ GB RAM (per docs). Run in a sandboxed environment first; don’t point it at directories with sensitive files you wouldn’t want processed or uploaded. - Testing: Run the daemon in a controlled environment, feed a few sample WhatsApp voice files, and verify transcription/language detection. Inspect logs (.voice-processed.log) and ensure parent process handling of printed JSON will not leak data to unexpected endpoints. If you want, I can produce a short checklist of code fixes (e.g., remove language="en", add a fallback ffmpeg conversion, pin dependencies) and a minimal safe test plan to validate the skill in a sandbox.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk970k5fb8vbnq8kadfsh69k9hd809n6p

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

WhatsApp Voice Talk

Turn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.

Perfect for:

  • Voice assistants on WhatsApp
  • Hands-free command interfaces
  • Multi-lingual chatbots
  • IoT voice control (drones, smart home, etc.)

Quick Start

1. Install Dependencies

pip install openai-whisper soundfile numpy

2. Process a Voice Message

const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');

// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');

// Process it
const result = await processVoiceNote(buffer);

console.log(result);
// {
//   status: 'success',
//   response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
//   transcript: "What's the weather today?",
//   intent: 'weather',
//   language: 'en',
//   timestamp: 1769860205186
// }

3. Run Auto-Listener

For automatic processing of incoming WhatsApp voice messages:

node scripts/voice-listener-daemon.js

This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.

How It Works

Incoming Voice Message
        ↓
    Transcribe (Whisper API)
        ↓
  "What's the weather?"
        ↓
  Detect Language & Intent
        ↓
   Match against INTENTS
        ↓
   Execute Handler
        ↓
   Generate Response
        ↓
   Convert to TTS
        ↓
  Send back via WhatsApp

Key Features

Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.

Multi-Language - Automatic English/Hindi detection. Extend easily.

Intent-Driven - Define custom intents with keywords and handlers.

Real-Time Processing - 5-10 seconds per message (after first model load).

Customizable - Add weather, status, commands, or anything else.

Production Ready - Built from real usage in Clawdbot.

Common Use Cases

Weather Bot

// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."

// (Built-in intent, just enable it)

Smart Home Control

// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"

Task Manager

// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"

Status Checker

// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"

Customization

Add a Custom Intent

Edit voice-processor.js:

  1. Add to INTENTS map:
const INTENTS = {
  'shopping': {
    keywords: ['shopping', 'list', 'buy', 'खरीद'],
    handler: 'handleShopping'
  }
};
  1. Add handler:
const handlers = {
  async handleShopping(language = 'en') {
    return {
      status: 'success',
      response: language === 'en' 
        ? "What would you like to add to your shopping list?"
        : "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
    };
  }
};

Support More Languages

  1. Update detectLanguage() for your language's Unicode:
const urduChars = /[\u0600-\u06FF]/g; // Add this
  1. Add language code to returns:
return language === 'ur' ? 'Urdu response' : 'English response';
  1. Set language in transcribe.py:
result = model.transcribe(data, language="ur")

Change Transcription Model

In transcribe.py:

model = whisper.load_model("tiny")    # Fastest, 39MB
model = whisper.load_model("base")    # Default, 140MB  
model = whisper.load_model("small")   # Better, 466MB
model = whisper.load_model("medium")  # Good, 1.5GB

Architecture

Scripts:

  • transcribe.py - Whisper transcription (Python)
  • voice-processor.js - Core logic (intent parsing, handlers)
  • voice-listener-daemon.js - Auto-listener watching for new messages

References:

  • SETUP.md - Installation and configuration
  • API.md - Detailed function documentation

Integration with Clawdbot

If running as a Clawdbot skill, hook into message events:

// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');

message.on('voice', async (audioBuffer) => {
  const result = await processVoiceNote(audioBuffer, message.from);
  
  // Send response back
  await message.reply(result.response);
  
  // Or send as voice (requires TTS)
  await sendVoiceMessage(result.response);
});

Performance

  • First run: ~30 seconds (downloads Whisper model, ~140MB)
  • Typical: 5-10 seconds per message
  • Memory: ~1.5GB (base model)
  • Languages: English, Hindi (easily extended)

Supported Audio Formats

OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.

WhatsApp uses Opus-coded OGG by default — works out of the box.

Troubleshooting

"No module named 'whisper'"

pip install openai-whisper

"No module named 'soundfile'"

pip install soundfile

Voice messages not processing?

  1. Check: clawdbot status (is it running?)
  2. Check: ~/.clawdbot/media/inbound/ (files arriving?)
  3. Run daemon manually: node scripts/voice-listener-daemon.js (see logs)

Slow transcription? Use smaller model: whisper.load_model("base") or "tiny"

Further Reading

  • Setup Guide: See references/SETUP.md for detailed installation and configuration
  • API Reference: See references/API.md for function signatures and examples
  • Examples: Check scripts/ for working code

License

MIT - Use freely, customize, contribute back!


Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.

Files

10 total
Select a file
Select a file to preview.

Comments

Loading comments…