Jetson CUDA Voice Pipeline

ReviewAudited by ClawScan on May 10, 2026.

Overview

This is mostly a coherent Jetson voice assistant, but its management script can turn a crafted TTS test phrase into local Python code execution.

Review and fix the test-tts quoting issue before using the management script with arbitrary text. If you install it, understand that it creates persistent user services, listens locally for a wake word, stores an OpenRouter key in a service file, and sends LLM requests to the configured endpoint.

Findings (6)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence
ASI05: Unexpected Code Execution
What this means

If an agent or user runs the TTS test with untrusted text, that text could execute local commands instead of only being spoken.

Why it was flagged

The second command-line argument is inserted directly into Python source code inside a quoted string. A phrase containing a single quote and Python syntax could escape the string and run arbitrary Python as the current user.

Skill content
TEXT="${2:-Hello, I am your Jetson voice assistant.}" ... python3 -c "... from voice_pipeline import speak, load_piper; load_piper(); speak('$TEXT')"
Recommendation

Do not pass untrusted text to test-tts until fixed. Implement the command by passing the phrase via argv, stdin, or a safely encoded value rather than interpolating it into python -c.

NoteHigh Confidence
ASI10: Rogue Agents
What this means

After setup, the assistant can run in the background on login and keep listening locally for the wake word until stopped or disabled.

Why it was flagged

The setup script installs persistent user-level services that restart automatically. This is expected for a wake-word assistant, but it means the microphone pipeline can keep running after installation.

Skill content
Restart=always ... systemctl --user enable whisper-server voice-pipeline
Recommendation

Install only if you want a persistent background voice service. Use the provided stop command and consider disabling the user services when not needed.

What this means

Anyone with access to the user's service files may be able to read or reuse the OpenRouter API key.

Why it was flagged

The OpenRouter API key is written into the generated systemd user service file. This is purpose-aligned for LLM access, but it is plaintext credential persistence.

Skill content
Environment="OPENROUTER_API_KEY=$API_KEY"
Recommendation

Use a limited-scope API key, protect the user account and home directory, and rotate the key if the device is shared or compromised.

What this means

Transcribed requests may be sent to OpenRouter or another configured OpenAI-compatible endpoint for the LLM response.

Why it was flagged

The skill is framed as offline, but the documentation also discloses that the LLM call uses the internet. Users should not assume all spoken content stays local.

Skill content
Fully offline, GPU-accelerated local voice assistant ... No cloud for STT or TTS — only the LLM call uses the internet
Recommendation

Review the LLM endpoint before use and choose a local OpenAI-compatible endpoint if fully local processing is required.

What this means

Future package or model changes could affect what is installed or run on the device.

Why it was flagged

The setup instructions rely on unpinned package installs and remote model downloads. These are normal for this kind of skill, but provenance and repeatability are not tightly controlled.

Skill content
pip install openwakeword piper-tts numpy requests pyusb ... wget https://huggingface.co/rhasspy/piper-voices/resolve/main/...
Recommendation

Pin package versions, verify model checksums, and install from trusted sources when possible.

What this means

Sensitive things said in recent turns may remain in context during the running session and affect later answers.

Why it was flagged

The assistant keeps recent conversation context so later answers can use it. This is a normal assistant feature, but it means prior spoken content may influence later responses and may be included in LLM calls.

Skill content
Conversation history — 20-turn rolling context for natural follow-ups
Recommendation

Avoid speaking secrets to the assistant, restart or clear the service between sensitive sessions, or configure a local/private LLM endpoint.