audioclaw-skills-voice-intake

Security checks across malware telemetry and agentic risk

Overview

This voice transcription skill is mostly aligned with its purpose, but it depends on unreviewed local credential helper code and can expose sensitive audio transcripts and metadata through remote processing and logs.

Install only if you expect chosen voice messages to be uploaded to SenseAudio/AudioClaw ASR and you are comfortable with transcript data appearing in command output or saved JSON. Before enabling broadly, review the missing shared credential helper modules and confirm how SENSEAUDIO_API_KEY and the ~/.audioclaw credential state are handled, especially in regulated or multi-user environments.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (5)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill directs execution of local Python scripts, reads local audio files, writes JSON output, accesses environment variables for API keys, and sends data over the network, yet it declares no permissions. That mismatch is a real security issue because operators and agent frameworks cannot accurately gate or review the skill's capabilities, increasing the risk of unintended data access or exfiltration.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The default prompt broadly instructs the agent to use this skill for transcribing voice messages and to hand off to another skill based on ongoing voice-reply mode, but it does not define clear trigger conditions, scope limits, or exclusions. This can cause over-activation, unintended routing, or use of a fixed voice_id without sufficient user confirmation, increasing the risk of incorrect tool invocation or unsafe workflow transitions.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The script emits full transcription results, request metadata, audio path, user/channel identifiers, and raw ASR response data to stdout and optionally to a JSON file. In agent or pipeline contexts, stdout is often logged automatically, which can expose sensitive speech content and metadata to unintended operators, log stores, or downstream tools.

Missing User Warnings

Medium
Confidence
84% confidence
Finding
This code transmits full audio contents and optional metadata such as language, sentiment, diarization, and timestamps to a third-party endpoint, but the code itself contains no consent, disclosure, or policy gate before sending potentially sensitive voice data off-device. In a voice-intake skill, this increases privacy and compliance risk because user speech often contains personal, confidential, or regulated information.

External Transmission

Medium
Category
Data Exfiltration
Content
## Runtime model

Official HTTP ASR API:
- Endpoint: `https://api.senseaudio.cn/v1/audio/transcriptions`
- Content type: `multipart/form-data`
- File size limit: `<=10MB`
- Practical local input suffixes accepted by this skill: `.wav`, `.mp3`, `.ogg`, `.opus`, `.flac`, `.aac`, `.m4a`, `.mp4`
Confidence
88% confidence
Finding
https://api.senseaudio.cn/

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal