Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Voice

v1.0.1

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

0· 2.8k·18 current·19 all-time
byzhaov@zhaov1976
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
The name, SKILL.md, package.json and code all describe a TTS skill using edge-tts. Requested dependencies and behaviors (generate audio, play files, cleanup) are consistent with the stated purpose.
!
Instruction Scope
The runtime instructions and code run shell commands (execAsync) to call the edge-tts CLI and to install dependencies. The edge-tts invocation is built as a single shell command string that includes untrusted user text; because exec runs via a shell, constructs like $(...), `...`, or other shell metacharacters inside the text can result in arbitrary command execution (command injection). The skill also spawns system audio players and writes/cleans files under a temp directory two levels above the skill directory, which is surprising and should be reviewed.
Install Mechanism
There is no package install spec in the registry metadata, but the skill's code and SKILL.md instruct users (and provide an 'install' action) to run `pip3 install edge-tts`. Installing via pip is expected for this functionality, but runtime installation (exec of pip3) means the agent will perform network installs and execute whatever the installer does — acceptable for a TTS skill but worth noting.
Credentials
The skill requests no environment variables or credentials. No unrelated secrets are requested. The main risk is filesystem and shell invocation rather than excessive credential access.
Persistence & Privilege
The skill is not always-included and does not request elevated platform privileges. It doesn't modify other skills or global agent config. Its temporary file management and install action affect only local FS and pip.
What to consider before installing
This skill appears to do what it says (edge-tts TTS + playback), but the implementation builds and executes shell command strings with user-provided text. That creates a real command-injection risk: a maliciously crafted input could execute arbitrary shell commands on the host. Before installing or enabling this skill in sensitive environments, consider the following: - Do not run it on production systems or hosts with sensitive data until reviewed or sandboxed. - Inspect and/or modify the code to avoid exec with a concatenated command string. Safer alternatives: - Use child_process.spawn with an argument array (no shell) or spawnFile so the text is passed as an argument rather than interpolated into a shell command. - Or call the Python API (edge-tts package) from a subprocess with structured arguments or via an RPC/worker, avoiding shell interpolation. - Properly escape or validate user text (but escaping is easy to get wrong; prefer avoiding the shell entirely). - Consider changing the temp directory to a skill-local, non-shared path and ensure it cannot traverse outside the skill folder. The code currently writes to path.join(__dirname, '..', '..', 'temp'), which may be broader than expected. - Avoid running the 'install' action automatically; perform dependency installation manually in a controlled environment. If you are not able to patch the code, run the skill only in an isolated sandbox or container and avoid giving it access to sensitive files or credentials.

Like a lobster shell, security has layers — review code before you run it.

audiovk978mmj3z9xttav44fs4gz85zs80btnvedge-ttsvk978mmj3z9xttav44fs4gz85zs80btnvlatestvk9726e0nwzp165zs6zw4sbrgg580er00latest edge-tts text-to-speechvk9726e0nwzp165zs6zw4sbrgg580er00text-to-speechvk978mmj3z9xttav44fs4gz85zs80btnvttsvk978mmj3z9xttav44fs4gz85zs80btnvvoicevk978mmj3z9xttav44fs4gz85zs80btnv
2.8kdownloads
0stars
2versions
Updated 21h ago
v1.0.1
MIT-0

Voice Skill

The Voice skill provides enhanced text-to-speech functionality using edge-tts, allowing you to convert text to spoken audio with multiple playback options.

Features

  • Text-to-speech conversion using Microsoft Edge's TTS engine
  • Support for various voice options and audio settings
  • Direct playback of generated audio
  • Automatic cleanup of temporary audio files
  • Integration with the MEDIA system for audio playback

Installation

Before using this skill, you need to install the required dependency:

pip3 install edge-tts

Or use the skill's install action:

await skill.execute({ action: 'install' });

Usage

Direct Speaking (Recommended)

Speak text directly without storing to file:

const result = await skill.execute({
  action: 'speak',  // New improved action
  text: 'Hello, how are you today?'
});
// Audio is played directly and temporary file is cleaned up automatically

Text-to-Speech with File Generation

Convert text to speech with default settings:

const result = await skill.execute({
  action: 'tts',
  text: 'Hello, how are you today?'
});
// Returns a MEDIA link to the audio file

With direct playback:

const result = await skill.execute({
  action: 'tts',
  text: 'Hello, how are you today?',
  playImmediately: true  // Plays the audio immediately after generation
});

With custom options:

const result = await skill.execute({
  action: 'tts',
  text: 'This is a sample of voice customization.',
  options: {
    voice: 'zh-CN-XiaoxiaoNeural',
    rate: '+10%',
    volume: '-5%',
    pitch: '+10Hz'
  }
});

Play Existing Audio File

Play an existing audio file:

const result = await skill.execute({
  action: 'play',
  filePath: '/path/to/audio/file.mp3'
});

List Available Voices

Get a list of available voices:

const result = await skill.execute({
  action: 'voices'
});

Cleanup Temporary Files

Clean up temporary audio files older than 1 hour (default):

const result = await skill.execute({
  action: 'cleanup'
});

Or specify a custom age threshold:

const result = await skill.execute({
  action: 'cleanup',
  options: {
    hoursOld: 2  // Clean files older than 2 hours
  }
});

Options

The following options are available for text-to-speech:

  • voice: The voice to use (default: 'zh-CN-XiaoxiaoNeural')
  • rate: Speech rate adjustment (default: '+0%')
  • volume: Volume adjustment (default: '+0%')
  • pitch: Pitch adjustment (default: '+0Hz')

Supported Voices

Edge-TTS supports many voices in different languages:

  • Chinese: zh-CN-XiaoxiaoNeural, zh-CN-YunxiNeural, zh-CN-YunyangNeural
  • English (US): en-US-Standard-C, en-US-Standard-D, en-US-Wavenet-F
  • English (UK): en-GB-Standard-A, en-GB-Wavenet-A
  • Japanese: ja-JP-NanamiNeural
  • Korean: ko-KR-SunHiNeural
  • And many more...

File Management

  • Audio files are temporarily stored in the temp directory
  • Files are automatically cleaned up after 1 hour (default)
  • Direct speaking option cleans up files after 5 seconds

Requirements

  • Python 3.x
  • pip package manager
  • edge-tts library (install via pip3 install edge-tts)

Comments

Loading comments...