Voice Agent
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: voice-agent Version: 1.1.0 The `scripts/client.py` file contains critical vulnerabilities that allow for arbitrary file read and write operations. The `transcribe` function reads the content of an arbitrary file path provided as an argument and sends it to `http://localhost:8000/transcribe`. Similarly, the `synthesize` function writes the generated audio content to an arbitrary file path provided as an argument. These flaws, while not explicitly malicious in intent, enable an attacker to read sensitive local files or write arbitrary content to any location on the filesystem, posing a significant risk for data exfiltration or local privilege escalation.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A chosen audio file is handed to the local backend, and an output path can be created or overwritten by the synthesis command.
The client reads a provided audio file for transcription and writes synthesized audio to a provided output path. This is expected for the skill, but users should ensure only intended files and output locations are used.
with open(filename, 'rb') as f: data += f.read() ... with open(output_file, 'wb') as f:
Use explicit, non-sensitive audio inputs and safe output paths; avoid pointing the output at existing important files.
The skill’s safety depends partly on the backend service running on localhost:8000, not just on the packaged client script.
The skill is client-only and relies on a separately managed backend and repository docs outside the included package. That dependency is disclosed, but the backend is part of the trust decision.
Requires a running backend API at `http://localhost:8000`. Backend setup instructions are in this repository: - `README.md` - `walkthrough.md` - `DOCKER_README.md`
Install and run the backend only from a trusted source, review its setup instructions, and avoid running an unexpected service on localhost:8000.
Text sent for speech generation may be handled beyond the local machine by AWS Polly through the backend.
The skill discloses that text-to-speech uses AWS Polly via the backend. That is purpose-aligned, but synthesis text may be processed by an external provider depending on backend configuration.
It uses **local Whisper** for Speech-to-Text transcription and **AWS Polly** for Text-to-Speech generation.
Do not synthesize highly sensitive text unless you are comfortable with the backend and AWS Polly handling it.
