Virtual voice builder

v1.0.0

Wires a real microphone through an AI brain (STT → LLM → TTS) and routes the output to a virtual audio cable so apps like Google Meet hear the processed voic...

⭐ 0· 89·0 current·0 all-time

bySuhas Rudra@suhas12345685-pro

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description match the code and scripts: audio capture via ffmpeg, Deepgram STT, an LLM call, sentence chunking, TTS and writing to a virtual cable. Required external services (Deepgram, LLM provider, TTS provider) are appropriate for the stated purpose. Minor mismatch: the index.js REQUIRED list includes TTS_VOICE_ID but the top-level registry-required env list omitted TTS_VOICE_ID; the repo references TTS_VOICE_ID in several places (scripts/06_tts_ws.js and package docs), so the registry metadata is incomplete.

ℹ

Instruction Scope

SKILL.md instructs running the provided scripts, installing ffmpeg/virtual audio driver, and explicitly instructs to write API keys into the project's .env file. The code reads environment vars and opens WebSocket connections to external services (Deepgram, provider LLM endpoints, ElevenLabs/Cartesia). These actions are consistent with the function but the instruction to persist secrets into .env increases risk if users expect keys to remain only in memory. A pre-scan detected a 'system-prompt-override' pattern in SKILL.md — the file contains 'Critical Rules' and runtime guidance that could be interpreted as attempting to influence agent/system prompts; surface this for review.

✓

Install Mechanism

There is no automated install that downloads arbitrary binaries; the package is instruction- and code-based (no external URL downloads or archive extraction). It depends on ffmpeg and the user installing a virtual audio driver; both are expected for this functionality.

Credentials

The skill requires API keys for STT, LLM, and TTS — which is proportional to its function. However there are two issues: (1) index.js and several scripts require TTS_VOICE_ID but the top-level registry 'required env' list omitted it (metadata mismatch). (2) scripts/04_llm_stream.js contains a bug in its env-check logic that effectively skips enforcing LLM_API_KEY (it filters REQUIRED with an expression that excludes 'LLM_API_KEY' from the check), meaning the code may attempt to run without the LLM key or behave unexpectedly — this is an implementation bug that could lead to confusing failures or accidental credential misuse. Also the SKILL.md explicitly tells the agent to write keys to a .env file, which may not be acceptable to all users.

✓

Persistence & Privilege

The skill does not request permanent 'always:true' inclusion, does not modify other skills, and has no install step that alters system-wide agent settings. It exports a start/stop API and spawns child processes (ffmpeg) — appropriate for its role.

Scan Findings in Context

[system-prompt-override] unexpected: SKILL.md contains prescriptive 'Critical Rules' and runtime instructions; the scanner flagged a 'system-prompt-override' pattern. While developer guidance is expected, any content that looks like it tries to override or strongly influence model/system prompts should be reviewed. This is not direct evidence of maliciousness, but it is unexpected for a simple runtime instruction file and warrants human review.

What to consider before installing

This skill appears to implement the described mic → AI → virtual-cable pipeline, but review these before installing: 1) Inspect and confirm required environment variables: ensure TTS_VOICE_ID is provided (code expects it) and update your .env only if you accept storing keys in a file. 2) The LLM env-check in scripts/04_llm_stream.js contains a bug that bypasses LLM_API_KEY validation; either fix that check or be prepared for runtime errors. 3) The SKILL.md was flagged for a 'system-prompt-override' pattern — read the file yourself and ensure it doesn't include instructions that would alter agent/system prompts or perform unexpected actions. 4) Run the scripts in a controlled environment first (not as root), test each stage independently (device listing, capture, STT, LLM stream, TTS), and avoid granting more privileges than necessary. 5) If you will use real API keys, consider using short-lived keys or scoped accounts and remove keys from .env after testing; monitor outbound connections (Deepgram, your LLM provider, and TTS provider endpoints) to confirm they match expectations. If any of these inconsistencies concern you or you don't want to store API keys in a project file, treat this skill as potentially risky until corrected.

✗

scripts/01_list_device.js:23

Shell command execution detected (child_process).

✗

scripts/02_list.js:23

Shell command execution detected (child_process).

✗

scripts/07_pcm_write.js:60

Shell command execution detected (child_process).

✗

scripts/04_llm_stream.js:12

Environment variable access combined with network send.

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

EnvDEEPGRAM_API_KEY, LLM_PROVIDER, LLM_API_KEY, LLM_MODEL, TTS_PROVIDER, TTS_API_KEY, VIRTUAL_CABLE_NAME

latestvk97ea7cbyy38xfkzyg302d1egn83tb37

89downloads

0stars

1versions

Updated 3w ago

v1.0.0

MIT-0

Voice Pipeline Skill

Guide the user through building a real-time audio interception pipeline in Node.js. The pipeline: Mic → ffmpeg resample → Deepgram STT → LLM → Sentence chunker → TTS → PCM decode → VB-Cable.

Capabilities

OS Prep: Walk through virtual audio driver install and ffmpeg setup.
Device Discovery: Run scripts/01_list_devices.js to find the exact VB-Cable name.
Step-by-step build: Execute scripts 01–07 in order, stress-testing each before proceeding.
Kill switch: Export a stop() function the host app calls to instantly halt the pipeline.
Env wiring: Write required keys to the project's .env file.

Critical Rules

Never use naudiodon or electron — both are incompatible with a portable Node.js skill.
Always resample to 16kHz mono PCM before sending to Deepgram.
Never stream raw LLM tokens to TTS — buffer to sentence boundaries first.
TTS output (MP3) must be decoded to PCM before writing to VB-Cable.
The kill switch must be IPC-based, not globalShortcut (Electron-only).

Execution Order

Run each script independently to stress-test before wiring together.

Step 1 → OS prep (no code, see references/step_by_step.md)
Step 2 → scripts/package.json + npm install
Step 3 → node scripts/01_list_devices.js
Step 4 → node scripts/02_capture_resample.js
Step 5 → node scripts/03_deepgram_ws.js
Step 6 → node scripts/04_llm_stream.js + scripts/05_sentence_chunker.js
Step 7 → node scripts/06_tts_ws.js + scripts/07_pcm_write.js

Progressive Loading

For architecture diagram and signal flow: read references/architecture.md. For full corrected step-by-step: read references/step_by_step.md. For all required env vars: read references/env_schema.md.

Comments

Loading comments...