Voice Agent

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

A chosen audio file is handed to the local backend, and an output path can be created or overwritten by the synthesis command.

Why it was flagged

The client reads a provided audio file for transcription and writes synthesized audio to a provided output path. This is expected for the skill, but users should ensure only intended files and output locations are used.

Skill content

with open(filename, 'rb') as f: data += f.read() ... with open(output_file, 'wb') as f:

Recommendation

Use explicit, non-sensitive audio inputs and safe output paths; avoid pointing the output at existing important files.

NoteMedium Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

The skill’s safety depends partly on the backend service running on localhost:8000, not just on the packaged client script.

Why it was flagged

The skill is client-only and relies on a separately managed backend and repository docs outside the included package. That dependency is disclosed, but the backend is part of the trust decision.

Skill content

Requires a running backend API at `http://localhost:8000`. Backend setup instructions are in this repository: - `README.md` - `walkthrough.md` - `DOCKER_README.md`

Recommendation

Install and run the backend only from a trusted source, review its setup instructions, and avoid running an unexpected service on localhost:8000.

NoteMedium Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Text sent for speech generation may be handled beyond the local machine by AWS Polly through the backend.

Why it was flagged

The skill discloses that text-to-speech uses AWS Polly via the backend. That is purpose-aligned, but synthesis text may be processed by an external provider depending on backend configuration.

Skill content

It uses **local Whisper** for Speech-to-Text transcription and **AWS Polly** for Text-to-Speech generation.

Recommendation

Do not synthesize highly sensitive text unless you are comfortable with the backend and AWS Polly handling it.