Voice Bridge Light

v1.0.1

提供兼容OpenAI接口的轻量级本地STT(Whisper)和TTS(Edge TTS/Piper)语音桥接HTTP服务。

0· 185·2 current·2 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fangbb-coder/voice-bridge-light.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Voice Bridge Light" (fangbb-coder/voice-bridge-light) from ClawHub.
Skill page: https://clawhub.ai/fangbb-coder/voice-bridge-light
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install voice-bridge-light

ClawHub CLI

Package manager switcher

npx clawhub@latest install voice-bridge-light
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, SKILL.md, requirements.txt and api_server.py are coherent: the code implements Whisper-based STT and Edge/Piper TTS and the listed Python packages match those capabilities. There are no unrelated environment variables, binaries, or credentials requested that would be inconsistent with a local TTS/STT bridge.
Instruction Scope
Runtime instructions stay within the stated purpose (install Python deps, run the Flask server, call /audio/speech and /audio/transcriptions). Notable operational points the user should be aware of: the default host is 0.0.0.0 and port 18790 (exposes the HTTP API to the network unless firewall/binding changed), Edge TTS uses Microsoft online services (network access), and the systemd example runs as root in /root/.openclaw/... (poor practice but not hidden behavior). The code writes uploaded audio to a temporary file and deletes it after transcription (expected for STT).
Install Mechanism
There is no registry-level install script, but the bundle includes requirements.txt and skill.yaml with a pip install -r requirements.txt instruction. Dependencies are standard PyPI packages (no obscure download URLs or extracted archives). This is a normal, low-risk install mechanism for Python packages, though some packages will pull large model artifacts at runtime.
Credentials
No required secrets or credentials are declared. The environment variables documented (host, port, engine selection, model paths, voice selection, model size) are appropriate for configuring a local TTS/STT service. The skill does not request unrelated credentials or access to other services beyond optional Edge TTS network use.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges in its manifest. The SKILL.md includes an optional systemd service example that runs the server as root and in /root/.openclaw/workspace — running as root is unnecessary and increases risk; this is an operational recommendation in the docs rather than hidden behavior in the code.
Assessment
This skill appears to do what it claims, but consider these precautions before installing: - Network exposure: by default the server listens on 0.0.0.0:18790. If you don't want the service reachable from other hosts, bind to localhost or restrict access with a firewall. - Edge TTS uses Microsoft online services (requires outbound network). If you need fully offline TTS, configure TTS_ENGINE=piper and provide local Piper models. - Model downloads and memory usage: faster-whisper and Piper may download large model files and use hundreds of MB of RAM. Ensure you have disk and RAM headroom. - Avoid running the service as root. If you use the provided systemd unit, change User to a non-privileged account and set WorkingDirectory appropriately. - Verify Python package sources (PyPI) and review package reputations if you require higher assurance. - If you need strong confidentiality, review access controls for the HTTP endpoint and consider adding authentication or only binding to localhost.

Like a lobster shell, security has layers — review code before you run it.

latestvk97d9gw5pn8y9scaw1z6tft6th8340sb
185downloads
0stars
2versions
Updated 1mo ago
v1.0.1
MIT-0

Voice Bridge Light

Lightweight offline voice bridging service providing OpenAI-compatible STT/TTS HTTP API.

Features

  • TTS Text-to-Speech: Supports Edge TTS (online) and Piper (local)
  • STT Speech Recognition: Based on Whisper local recognition
  • OpenAI Compatible API: Compatible with OpenAI Audio API
  • Lightweight Deployment: Minimal dependencies, easy to install

Usage

Installation

pip install -r requirements.txt

Start Service

Default using Edge TTS:

python api_server.py

Using Piper (model required):

TTS_ENGINE=piper PIPER_MODEL=models/piper/zh_CN-huayan-medium.onnx python api_server.py

API Endpoints

EndpointMethodDescription
GET /healthGETHealth check
POST /audio/speechPOSTTTS speech synthesis
POST /audio/transcriptionsPOSTSTT speech recognition

Configuration Environment Variables

VariableDefaultDescription
VOICE_BRIDGE_HOST0.0.0.0Listen address
VOICE_BRIDGE_PORT18790Listen port
TTS_ENGINEedgeTTS engine: edge or piper
EDGE_VOICEzh-CN-XiaoxiaoNeuralEdge TTS voice
PIPER_MODELmodels/piper/zh_CN-huayan-medium.onnxPiper model path
STT_MODELbaseWhisper model size

TTS Request Example

curl -X POST http://localhost:18790/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, world!",
    "voice": "zh-CN-XiaoxiaoNeural",
    "response_format": "mp3"
  }' \
  --output speech.mp3

STT Request Example

curl -X POST http://localhost:18790/audio/transcriptions \
  -F "file=@speech.mp3" \
  -H "Content-Type: multipart/form-data"

OpenClaw Integration

Configure in openclaw.json:

{
  "tts": {
    "enabled": true,
    "provider": "local-piper",
    "baseUrl": "http://127.0.0.1:18790",
    "apiKey": "local",
    "voice": "zh-CN-XiaoxiaoNeural"
  }
}

Dependencies

  • Python 3.8+
  • edge-tts (Edge TTS)
  • faster-whisper (Whisper STT)
  • soundfile (audio processing)
  • Flask + Flask-CORS (web service)

Service Management

systemd Service (Recommended)

[Unit]
Description=Voice Bridge Light - STT/TTS HTTP API
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/.openclaw/workspace/skills/voice-bridge-light
ExecStart=/usr/bin/python3 api_server.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start:

systemctl daemon-reload
systemctl enable voice-bridge-light.service
systemctl start voice-bridge-light.service

Performance

  • TTS latency: < 1s (Edge TTS requires network)
  • STT latency: depends on audio length, real-time CPU
  • Memory usage: ~300-500MB (mainly from Whisper model)

Notes

  • Edge TTS requires internet access to Microsoft services
  • Piper requires downloading model files (first use)
  • Whisper model loads slowly on first run, recommend warm-up
  • Production environment recommended to use systemd management

License

MIT

Comments

Loading comments...