Voice Chat Skill

v1.0.0

语音对话集成技能,支持双向语音交流。使用TTS和STT实现完整的语音对话功能。

0· 332·1 current·2 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (双向语音对话 using TTS/STT) matches the included Python code (speech_recognition, pyaudio, local TTS hooks). However the documentation and examples also reference Node-based OpenClaw TTS tooling and cloud APIs (ElevenLabs, OpenAI, Whisper) while the skill's declared requirements list only python — a partial mismatch. The code hardcodes an OpenClaw path pointing at a Windows npm global folder (C:\Users\41728\AppData\Roaming\npm\node_modules\openclaw), which is unusual but could be an environment-specific convenience rather than malicious intent.
!
Instruction Scope
SKILL.md and code direct the agent to access the microphone, list devices, create temporary files, and optionally call external services (Google STT, Whisper local model, ElevenLabs TTS). The docs include a subprocess example that invokes `node path/to/openclaw/tts-tool.js` and the ElevenLabs example reads ELEVENLABS_API_KEY from the environment — but the skill metadata declares no required env vars. The README also references install_deps.ps1 and voice_chat_launcher.ps1 that are not present in the package. These gaps expand runtime behavior beyond what's declared and could surprise a user (e.g., network calls, usage of Node, optional API keys).
Install Mechanism
There is no install spec (instruction-only install), so nothing is automatically downloaded or executed by the platform. That lowers install-time risk. The included source is plain Python (no obfuscated code).
!
Credentials
The skill declares no required environment variables, but SKILL.md/code reference environment keys (e.g., ELEVENLABS_API_KEY via os.environ.get) and mention OpenAI API use. It also shows an example calling a Node tts-tool (which implies Node must be present). These undeclared dependencies/credentials are disproportionate to the declared metadata and should be declared explicitly so a user can decide whether to provide them.
Persistence & Privilege
The skill does not request always:true and is user-invocable only. It does not attempt to modify other skills or system-wide agent settings. No elevated persistence or privilege escalation is requested in the package.
What to consider before installing
This skill appears to implement the advertised voice chat features, but there are several inconsistencies you should address before installing or running it: - Binaries and runtime: The skill metadata only requires Python, but the docs/examples show calling a Node-based OpenClaw TTS tool. If you plan to use OpenClaw TTS as shown, ensure Node and the referenced tts-tool script exist and come from a trusted source. - Environment variables / API keys: The code references ELEVENLABS_API_KEY (and mentions OpenAI) but the skill doesn't declare or require any env vars. Do not provide API keys unless you trust the maintainer and have confirmed which keys are actually needed. Consider running the skill without keys to verify local-only behavior first. - Missing files & hardcoded paths: README mentions install_deps.ps1 and launcher scripts that are not included, and a Windows-specific openclaw_path is hardcoded. Ask the publisher to clarify and remove or parameterize hardcoded paths. - Network activity: Some modes check network connectivity (urllib to baidu) and the ElevenLabs example does outbound requests. If you need to keep audio data local for privacy, avoid enabling cloud STT/TTS or supplying cloud keys. - Safe testing: Run the package in a controlled environment (sandbox or VM) first. Inspect or search for any unexpected subprocess invocations or external endpoints (the visible code shows only typical TTS/STT and benign subprocess/play commands, but the node path and example subprocess calls would execute external code if present). If you want to proceed, request that the maintainer update the skill metadata to list all required binaries (node?), required env vars (ELEVENLABS_API_KEY, OPENAI_API_KEY if used), and provide the missing install/launcher scripts or remove references to them. That will make the skill's behavior transparent and easier to judge.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎤 Clawdis
Binspython
latestvk974zgxcs04vt8a4c7s86hw41h823dsr
332downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

语音对话技能

实现完整的双向语音对话功能,支持语音输入和语音输出。

功能特性

✅ 已实现功能

  1. 文本转语音(TTS)

    • 使用OpenClaw内置tts工具
    • 支持中英文混合
    • 实时音频生成
  2. 语音转文本(STT)

    • 使用Python speech_recognition库
    • 支持麦克风输入
    • 多引擎支持(Google、Whisper等)
  3. 对话管理

    • 自动语音检测
    • 对话上下文保持
    • 中断处理

🔧 技术架构

语音输入 → STT转换 → 文本处理 → AI响应 → TTS转换 → 语音输出

安装要求

必需组件

  1. Python 3.8+
  2. speech_recognition库
  3. pyaudio库(Windows需要额外安装)

可选组件

  1. Whisper - 更准确的本地STT
  2. ElevenLabs API - 高质量TTS
  3. OpenAI API - 云端STT

快速开始

1. 安装依赖

# 安装Python库
pip install SpeechRecognition pyaudio

# Windows pyaudio安装(如果失败)
pip install pipwin
pipwin install pyaudio

2. 基础语音对话脚本

# voice_chat.py
import speech_recognition as sr
import subprocess
import tempfile
import os

class VoiceChat:
    def __init__(self):
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        
    def listen(self):
        """监听语音输入并转换为文本"""
        with self.microphone as source:
            print("🎤 请说话...")
            audio = self.recognizer.listen(source)
            
        try:
            text = self.recognizer.recognize_google(audio, language='zh-CN')
            print(f"📝 识别结果: {text}")
            return text
        except sr.UnknownValueError:
            return "无法识别语音"
        except sr.RequestError:
            return "语音识别服务不可用"
    
    def speak(self, text):
        """使用OpenClaw TTS朗读文本"""
        # 调用OpenClaw tts工具
        print(f"🗣️ 正在朗读: {text}")
        # 这里可以集成OpenClaw tts工具
        
    def conversation_loop(self):
        """对话循环"""
        print("🎧 语音对话已启动,按Ctrl+C退出")
        while True:
            # 监听语音
            user_input = self.listen()
            
            if user_input and "退出" not in user_input:
                # 生成响应(这里可以集成AI模型)
                response = f"我听到你说: {user_input}"
                
                # 语音输出
                self.speak(response)

if __name__ == "__main__":
    chat = VoiceChat()
    chat.conversation_loop()

3. 集成OpenClaw TTS

def openclaw_tts(text, output_file="output.mp3"):
    """调用OpenClaw TTS工具"""
    import subprocess
    import json
    
    # 创建临时文件
    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
        tts_request = {
            "text": text,
            "channel": "webchat"
        }
        json.dump(tts_request, f)
        request_file = f.name
    
    try:
        # 调用tts工具(需要OpenClaw环境)
        result = subprocess.run([
            "node", "path/to/openclaw/tts-tool.js",
            "--input", request_file,
            "--output", output_file
        ], capture_output=True, text=True)
        
        if result.returncode == 0:
            print(f"✅ 语音文件已生成: {output_file}")
            # 播放音频
            subprocess.run(["start", output_file], shell=True)
        else:
            print(f"❌ TTS失败: {result.stderr}")
    finally:
        os.unlink(request_file)

高级配置

使用Whisper进行本地STT

def whisper_stt(audio_file):
    """使用Whisper进行语音识别"""
    import whisper
    
    model = whisper.load_model("base")
    result = model.transcribe(audio_file, language="zh")
    return result["text"]

使用ElevenLabs高质量TTS

def elevenlabs_tts(text, voice_id="21m00Tcm4TlvDq8ikWAM", api_key=None):
    """使用ElevenLabs TTS"""
    import requests
    
    url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
    headers = {
        "xi-api-key": api_key or os.environ.get("ELEVENLABS_API_KEY"),
        "Content-Type": "application/json"
    }
    
    data = {
        "text": text,
        "model_id": "eleven_multilingual_v2",
        "voice_settings": {
            "stability": 0.5,
            "similarity_boost": 0.5
        }
    }
    
    response = requests.post(url, json=data, headers=headers)
    if response.status_code == 200:
        with open("output.mp3", "wb") as f:
            f.write(response.content)
        return "output.mp3"
    else:
        raise Exception(f"ElevenLabs TTS失败: {response.text}")

故障排除

常见问题

  1. 麦克风无法识别

    • 检查麦克风权限
    • 尝试指定麦克风设备索引
  2. pyaudio安装失败

    • Windows: 使用pipwin install pyaudio
    • macOS: brew install portaudio
  3. 语音识别准确率低

    • 调整环境噪音
    • 使用更准确的模型(Whisper large)
    • 添加语音训练

性能优化

  1. 缓存模型:预加载Whisper模型
  2. 流式处理:实时语音处理
  3. 降噪处理:改善语音质量

安全注意事项

  1. API密钥保护:不要硬编码API密钥
  2. 隐私保护:语音数据本地处理
  3. 权限管理:麦克风访问权限控制

扩展功能

计划中的功能

  1. 多语言支持:自动检测语言
  2. 语音命令:特定语音指令识别
  3. 情绪识别:从语音中识别情绪
  4. 实时翻译:跨语言语音对话

技能版本: 1.0.0 最后更新: 2026-02-28

Comments

Loading comments...