{"skill":{"slug":"whatsapp-voice-chat-integration-open-source","displayName":"whatsappVoiceOpenSkill","summary":"Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.","description":"---\nname: whatsapp-voice-talk\ndescription: Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.\n---\n\n# WhatsApp Voice Talk\n\nTurn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: **voice → transcription → intent detection → response generation → text-to-speech**.\n\nPerfect for:\n- Voice assistants on WhatsApp\n- Hands-free command interfaces  \n- Multi-lingual chatbots\n- IoT voice control (drones, smart home, etc.)\n\n## Quick Start\n\n### 1. Install Dependencies\n\n```bash\npip install openai-whisper soundfile numpy\n```\n\n### 2. Process a Voice Message\n\n```javascript\nconst { processVoiceNote } = require('./scripts/voice-processor');\nconst fs = require('fs');\n\n// Read a voice message (OGG, WAV, MP3, etc.)\nconst buffer = fs.readFileSync('voice-message.ogg');\n\n// Process it\nconst result = await processVoiceNote(buffer);\n\nconsole.log(result);\n// {\n//   status: 'success',\n//   response: \"Current weather in Delhi is 19°C, haze. Humidity is 56%.\",\n//   transcript: \"What's the weather today?\",\n//   intent: 'weather',\n//   language: 'en',\n//   timestamp: 1769860205186\n// }\n```\n\n### 3. Run Auto-Listener\n\nFor automatic processing of incoming WhatsApp voice messages:\n\n```bash\nnode scripts/voice-listener-daemon.js\n```\n\nThis watches `~/.clawdbot/media/inbound/` every 5 seconds and processes new voice files.\n\n## How It Works\n\n```\nIncoming Voice Message\n        ↓\n    Transcribe (Whisper API)\n        ↓\n  \"What's the weather?\"\n        ↓\n  Detect Language & Intent\n        ↓\n   Match against INTENTS\n        ↓\n   Execute Handler\n        ↓\n   Generate Response\n        ↓\n   Convert to TTS\n        ↓\n  Send back via WhatsApp\n```\n\n## Key Features\n\n✅ **Zero Setup Complexity** - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.\n\n✅ **Multi-Language** - Automatic English/Hindi detection. Extend easily.\n\n✅ **Intent-Driven** - Define custom intents with keywords and handlers.\n\n✅ **Real-Time Processing** - 5-10 seconds per message (after first model load).\n\n✅ **Customizable** - Add weather, status, commands, or anything else.\n\n✅ **Production Ready** - Built from real usage in Clawdbot.\n\n## Common Use Cases\n\n### Weather Bot\n```javascript\n// User says: \"What's the weather in Bangalore?\"\n// Response: \"Current weather in Delhi is 19°C...\"\n\n// (Built-in intent, just enable it)\n```\n\n### Smart Home Control\n```javascript\n// User says: \"Turn on the lights\"\n// Handler: Sends signal to smart home API\n// Response: \"Lights turned on\"\n```\n\n### Task Manager\n```javascript\n// User says: \"Add milk to shopping list\"\n// Handler: Adds to database\n// Response: \"Added milk to your list\"\n```\n\n### Status Checker\n```javascript\n// User says: \"Is the system running?\"\n// Handler: Checks system status\n// Response: \"All systems online\"\n```\n\n## Customization\n\n### Add a Custom Intent\n\nEdit `voice-processor.js`:\n\n1. **Add to INTENTS map:**\n```javascript\nconst INTENTS = {\n  'shopping': {\n    keywords: ['shopping', 'list', 'buy', 'खरीद'],\n    handler: 'handleShopping'\n  }\n};\n```\n\n2. **Add handler:**\n```javascript\nconst handlers = {\n  async handleShopping(language = 'en') {\n    return {\n      status: 'success',\n      response: language === 'en' \n        ? \"What would you like to add to your shopping list?\"\n        : \"आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?\"\n    };\n  }\n};\n```\n\n### Support More Languages\n\n1. Update `detectLanguage()` for your language's Unicode:\n```javascript\nconst urduChars = /[\\u0600-\\u06FF]/g; // Add this\n```\n\n2. Add language code to returns:\n```javascript\nreturn language === 'ur' ? 'Urdu response' : 'English response';\n```\n\n3. Set language in `transcribe.py`:\n```python\nresult = model.transcribe(data, language=\"ur\")\n```\n\n### Change Transcription Model\n\nIn `transcribe.py`:\n```python\nmodel = whisper.load_model(\"tiny\")    # Fastest, 39MB\nmodel = whisper.load_model(\"base\")    # Default, 140MB  \nmodel = whisper.load_model(\"small\")   # Better, 466MB\nmodel = whisper.load_model(\"medium\")  # Good, 1.5GB\n```\n\n## Architecture\n\n**Scripts:**\n- `transcribe.py` - Whisper transcription (Python)\n- `voice-processor.js` - Core logic (intent parsing, handlers)\n- `voice-listener-daemon.js` - Auto-listener watching for new messages\n\n**References:**\n- `SETUP.md` - Installation and configuration\n- `API.md` - Detailed function documentation\n\n## Integration with Clawdbot\n\nIf running as a Clawdbot skill, hook into message events:\n\n```javascript\n// In your Clawdbot handler\nconst { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');\n\nmessage.on('voice', async (audioBuffer) => {\n  const result = await processVoiceNote(audioBuffer, message.from);\n  \n  // Send response back\n  await message.reply(result.response);\n  \n  // Or send as voice (requires TTS)\n  await sendVoiceMessage(result.response);\n});\n```\n\n## Performance\n\n- **First run:** ~30 seconds (downloads Whisper model, ~140MB)\n- **Typical:** 5-10 seconds per message\n- **Memory:** ~1.5GB (base model)\n- **Languages:** English, Hindi (easily extended)\n\n## Supported Audio Formats\n\nOGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.\n\nWhatsApp uses Opus-coded OGG by default — works out of the box.\n\n## Troubleshooting\n\n**\"No module named 'whisper'\"**\n```bash\npip install openai-whisper\n```\n\n**\"No module named 'soundfile'\"**\n```bash\npip install soundfile\n```\n\n**Voice messages not processing?**\n1. Check: `clawdbot status` (is it running?)\n2. Check: `~/.clawdbot/media/inbound/` (files arriving?)\n3. Run daemon manually: `node scripts/voice-listener-daemon.js` (see logs)\n\n**Slow transcription?**\nUse smaller model: `whisper.load_model(\"base\")` or `\"tiny\"`\n\n## Further Reading\n\n- **Setup Guide:** See `references/SETUP.md` for detailed installation and configuration\n- **API Reference:** See `references/API.md` for function signatures and examples\n- **Examples:** Check `scripts/` for working code\n\n## License\n\nMIT - Use freely, customize, contribute back!\n\n---\n\nBuilt for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.\n","topics":["WhatsApp","Message","Transcribe","Weather"],"tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":2951,"installsAllTime":111,"installsCurrent":5,"stars":0,"versions":1},"createdAt":1769861237736,"updatedAt":1779076508116},"latestVersion":{"version":"1.0.0","createdAt":1769861237736,"changelog":"opensource skill setup for whatsapp voice chat with your bot","license":null},"metadata":null,"owner":{"handle":"syedateebulislam","userId":"s17b3nwy5pvzwkr77v275t0dtd8850qp","displayName":"Syed Ateebul Islam","image":"https://avatars.githubusercontent.com/u/32341313?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1779918421608}}