Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

OmniVoice

v1.0.0

All-in-one voice identity toolkit: speaker identification, voice library management, voice cloning, and speech-to-text. The only OpenClaw skill with speaker...

0· 118·0 current·0 all-time
byYang Qibin@yangqibin-caibi

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yangqibin-caibi/omnivoice.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "OmniVoice" (yangqibin-caibi/omnivoice) from ClawHub.
Skill page: https://clawhub.ai/yangqibin-caibi/omnivoice
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install omnivoice

ClawHub CLI

Package manager switcher

npx clawhub@latest install omnivoice
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
Functionality (speaker ID, library management, cloning, Feishu delivery) matches the included scripts. However the registry metadata claims no required environment variables while the SKILL.md and scripts require SF_API_KEY for SiliconFlow and FEISHU_APP_ID/FEISHU_APP_SECRET for sending audio. That omission is an incoherence in declared purpose/requirements: the skill legitimately needs those secrets for cloning and Feishu sending, but the package metadata does not declare them.
Instruction Scope
SKILL.md stays largely within the stated domain (local voice refs, transcribe, identify, clone). Two points to watch: (1) it instructs manual edits to SPEAKER_MAP inside scripts/voice_identify.py to register speakers (i.e., modify the skill's code to add speakers), which is unusual and grants the agent or user permission to change shipped code; (2) voice cloning sends reference audio (possibly private) to an external API (SiliconFlow) which is necessary for cloning but is a privacy/exfiltration risk. The skill also downloads a ~360MB model to /tmp on first run (resource/disk considerations).
Install Mechanism
This is instruction-only (no automated install spec). Dependencies are standard for the tasks (whisper/transformers/librosa/ffmpeg). No installers or external arbitrary downloads beyond model weights from HuggingFace (expected for UniSpeech-SAT).
!
Credentials
The skill requires SF_API_KEY (SiliconFlow) and Feishu credentials (FEISHU_APP_ID and FEISHU_APP_SECRET) according to the SKILL.md and scripts, but the registry metadata lists no required env vars. Requiring third-party API keys is proportionate to voice cloning and Feishu message sending, but the metadata omission is misleading. Also, sending reference audio to an external service (SiliconFlow) means sensitive audio data will leave your environment — request for SF_API_KEY and choice of endpoint should be evaluated before use.
Persistence & Privilege
always:false and no OS restrictions — the skill does not request permanent, universal inclusion. It will write files into workspace directories (voice-refs/, TOOLS.md) and may modify its own SPEAKER_MAP if the user follows the instructions; these are local operations and not system-wide privilege escalations. No indication it modifies other skills or global agent config.
What to consider before installing
Key things to consider before installing/using OmniVoice: - Metadata mismatch: The registry metadata claims no required env vars, but the skill requires SF_API_KEY for the SiliconFlow cloning API and FEISHU_APP_ID/FEISHU_APP_SECRET to send messages to Feishu. Confirm you are comfortable providing those secrets and update metadata expectations. - Privacy risk: Voice cloning sends reference audio (base64 or a remote URL) to https://api.siliconflow.cn. Any audio you provide (including recordings of other people) will be transmitted to that third party. Do not upload recordings you do not have permission to share. Review SiliconFlow's privacy/TOS before use. - Manual code edits: The documentation instructs you to register speakers by editing SPEAKER_MAP in scripts/voice_identify.py. This means the workflow relies on modifying source files — consider instead keeping references in a separate metadata file to avoid altering shipped code, or be aware that the skill expects write access to its own files. - Resource use: The speaker-identification model downloads ~360MB to /tmp on first run and requires CPU/GPU resources; ensure your runtime environment has sufficient disk and compute. - Feishu integration: The provided shell script will exchange your FEISHU_APP_ID/SECRET for a tenant token and upload audio. Limit credential scope for the app and confirm you trust the destination Feishu tenant. - Operational safety: If you need to evaluate the skill, run it in an isolated/sandboxed environment, inspect network traffic to confirm where audio is uploaded, and avoid giving production credentials until you trust the behavior. If you want, I can: (1) list the exact environment variables and commands you must run to test the skill safely in a sandbox, (2) suggest a safer workflow that avoids editing code (store speaker metadata in TOOLS.md and read it at runtime), or (3) help craft a minimal wrapper that blocks external uploads for local-only testing.
scripts/voice_identify.py:87
Dynamic code execution detected.
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

latestvk97759zv5fww19zcsftrwaqb4x83h1dw
118downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

OmniVoice

Ten operations across four capabilities: identify (认) · manage (存) · transcribe (听) · clone (说).

Dependencies

ComponentInstallPurpose
Whisperpip install openai-whisperSpeech-to-text
Speaker IDpip install transformers librosaSpeaker identification (UniSpeech-SAT)
CosyVoice2SiliconFlow API (SF_API_KEY)Voice cloning
ffmpegSystem packageAudio conversion

Voice references are stored in voice-refs/ at workspace root. Metadata lives in TOOLS.md under a "Voice Library" section. See references/voice-library-format.md for format spec.

Operations

Op 1 · Speaker Identification (声纹查询)

Input: audio → Output: who is speaking (or "unknown")

python3 scripts/voice_identify.py <audio_file> [--threshold 0.75]

Compares audio against all voice-refs/*-ref*.* using UniSpeech-SAT x-vector embeddings. First run downloads model (~360MB) to /tmp/hf_models/.

Accuracy: Reliably separates male/female voices. Same-gender speakers need ≥5s audio for best results. Threshold 0.75 is default; raise to 0.85 for stricter matching.

Op 2 · Add Voice to Library (声音入库)

Input: audio + speaker name → stores in voice library

  1. Copy audio to voice-refs/<name>-ref1.<ext>
  2. Transcribe to get reference text: whisper <audio> --model small --output_format txt --output_dir /tmp
  3. Add entry to TOOLS.md (see format in references/)
  4. Register speaker in voice_identify.py SPEAKER_MAP

Good reference audio: 10-15s clear speech, minimal noise, natural pace. 5s minimum.

Op 3 · Voice Library CRUD (声音库管理)

  • List: Check TOOLS.md voice library section + ls voice-refs/
  • Add: See Op 2
  • Update: Replace file in voice-refs/, update TOOLS.md entry
  • Delete: Remove file from voice-refs/, remove TOOLS.md entry, remove from SPEAKER_MAP

Op 4 · Voice Clone (声音克隆)

Input: text + library speaker → Output: audio in that speaker's voice

set -a; source <env_file_with_SF_API_KEY>; set +a

python3 scripts/cosyvoice_clone.py \
  --text "Text to speak" \
  --ref voice-refs/<speaker>-ref1.<ext> \
  --ref-text "What is said in reference audio" \
  --output /tmp/clone_output.wav

Long reference (>15s): truncate first with ffmpeg -y -i <ref> -t 15 -ar 24000 -ac 1 /tmp/ref_trimmed.wav.

Op 5 · Transcribe (纯转文字)

Input: audio → Output: text

whisper <audio_file> --model small --output_format txt --output_dir /tmp --language <lang>

Languages: zh (Chinese), en (English), ja (Japanese). Omit for auto-detect.

Op 6 · Transcribe + Identify (转文字+识别)

Input: audio → Output: who said what

Run Op 5 and Op 1 in parallel, report both results together.

Op 7 · Speaker Verification (声纹验证)

Input: two audio files → Output: same person or not

python3 scripts/voice_identify.py <audio_1> --threshold 0.75
python3 scripts/voice_identify.py <audio_2> --threshold 0.75

Compare the top-ranked speaker from both runs. If they match → same person. For direct pairwise comparison without a library, extract embeddings and compute cosine similarity (see voice_identify.py internals).

Op 8 · Voice Swap (声音换皮)

Input: audio + library speaker → Output: same words, different voice

  1. Transcribe input audio (Op 5)
  2. Clone with target speaker's voice (Op 4), using transcribed text

Op 9 · Persona Voice Reply — from Audio (人格化语音回复·语音版)

Input: audio question + library speaker → Output: AI answer in that speaker's voice

  1. Transcribe the question (Op 5)
  2. Generate answer text via LLM
  3. Clone answer with target speaker's voice (Op 4)

Op 10 · Persona Voice Reply — from Text (人格化语音回复·文字版)

Input: text question + library speaker → Output: AI answer in that speaker's voice

  1. Generate answer text via LLM
  2. Clone answer with target speaker's voice (Op 4)

Send Audio (Feishu)

set -a; source <env_file>; set +a
bash scripts/feishu_send_audio.sh <wav_file> <receive_id>

Converts wav → opus, uploads, sends as voice message. Requires FEISHU_APP_ID + FEISHU_APP_SECRET env vars.

Extract Audio from Video

ffmpeg -y -i <video_file> -vn -ar 24000 -ac 1 /tmp/extracted_audio.wav

Comments

Loading comments...