小野语音系统

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a coherent text-to-speech skill, with the main privacy point being that non-Chinese text is sent to Edge-TTS cloud service.

Install this if you are comfortable with a TTS skill that uses macOS say and ffmpeg locally, stores generated audio under ~/.openclaw/outputs/xiaoye_voice, and sends non-Chinese text to Edge-TTS cloud service. For private text, prefer local-only use and periodically delete generated audio files.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI07: Insecure Inter-Agent Communication

What this means

Non-Chinese text converted with this skill may be processed by Microsoft Edge-TTS rather than staying fully local.

Why it was flagged

For non-Chinese text, the implementation passes the user-supplied text to the Edge-TTS library/service to synthesize audio. This is disclosed and purpose-aligned, but it means that text may leave the device.

Skill content

communicate = edge_tts.Communicate(text, voice); await communicate.save(str(output_path))

Recommendation

Avoid sending sensitive non-Chinese text unless you are comfortable with cloud TTS processing; use a fully local TTS path for private content.

Info

#ASI05: Unexpected Code Execution

What this means

The skill depends on local audio tools and will execute them when generating speech.

Why it was flagged

The skill invokes local say and ffmpeg commands for speech generation and audio conversion. The commands use argument lists rather than shell strings and are central to the stated TTS purpose.

Skill content

result = subprocess.run(cmd, capture_output=True, text=True) ... result = subprocess.run(conv_cmd, capture_output=True, text=True)

Recommendation

Install ffmpeg from a trusted source, run on macOS as documented, and avoid debug mode for private text if console logs are collected.

Info

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Manual installation may fetch a newer Edge-TTS package than the one originally tested by the author.

Why it was flagged

The cloud TTS dependency is declared with a lower-bound version rather than a pinned version. This is common for Python projects but means a future compatible package version could be installed.

Skill content

edge-tts>=6.1.9

Recommendation

Pin dependency versions in production or install from a trusted package index; registry metadata should also declare the macOS and ffmpeg requirements.

Info

#ASI06: Memory and Context Poisoning

What this means

Generated audio files, and a small amount of text in their filenames, may remain on disk until deleted.

Why it was flagged

Generated audio is stored persistently under the user's home directory, and filenames include a short prefix derived from the input text.

Skill content

self.base_dir = Path.home() / ".openclaw" / "outputs" / "xiaoye_voice" ... safe_text = ''.join(c for c in text[:20] if c.isalnum() or c in (' ', '_')) ... output_name = f"xiaoye_{timestamp}_{safe_text}.{self.output_format}"

Recommendation

Delete generated files when no longer needed and avoid converting highly sensitive text on shared machines unless you control the output directory and filenames.