audioclaw-skills-voice-intake

Security checks across malware telemetry and agentic risk

Overview

This voice transcription skill is mostly aligned with its purpose, but it depends on unreviewed local credential helper code and can expose sensitive audio transcripts and metadata through remote processing and logs.

Install only if you expect chosen voice messages to be uploaded to SenseAudio/AudioClaw ASR and you are comfortable with transcript data appearing in command output or saved JSON. Before enabling broadly, review the missing shared credential helper modules and confirm how SENSEAUDIO_API_KEY and the ~/.audioclaw credential state are handled, especially in regulated or multi-user environments.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill directs execution of local Python scripts, reads local audio files, writes JSON output, accesses environment variables for API keys, and sends data over the network, yet it declares no permissions. That mismatch is a real security issue because operators and agent frameworks cannot accurately gate or review the skill's capabilities, increasing the risk of unintended data access or exfiltration.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The default prompt broadly instructs the agent to use this skill for transcribing voice messages and to hand off to another skill based on ongoing voice-reply mode, but it does not define clear trigger conditions, scope limits, or exclusions. This can cause over-activation, unintended routing, or use of a fixed voice_id without sufficient user confirmation, increasing the risk of incorrect tool invocation or unsafe workflow transitions.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The script emits full transcription results, request metadata, audio path, user/channel identifiers, and raw ASR response data to stdout and optionally to a JSON file. In agent or pipeline contexts, stdout is often logged automatically, which can expose sensitive speech content and metadata to unintended operators, log stores, or downstream tools.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: This code transmits full audio contents and optional metadata such as language, sentiment, diarization, and timestamps to a third-party endpoint, but the code itself contains no consent, disclosure, or policy gate before sending potentially sensitive voice data off-device. In a voice-intake skill, this increases privacy and compliance risk because user speech often contains personal, confidential, or regulated information.

External Transmission

Medium

Category: Data Exfiltration
Content: ## Runtime model Official HTTP ASR API: - Endpoint: `https://api.senseaudio.cn/v1/audio/transcriptions` - Content type: `multipart/form-data` - File size limit: `<=10MB` - Practical local input suffixes accepted by this skill: `.wav`, `.mp3`, `.ogg`, `.opus`, `.flac`, `.aac`, `.m4a`, `.mp4`
Confidence: 88% confidence
Finding: https://api.senseaudio.cn/

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal