audioclaw-skills-voice-reply

Security checks across malware telemetry and agentic risk

Overview

The skill matches its voice-reply purpose, but it can send generated audio to Feishu using local credentials and an inferred chat destination, so it needs review before installation.

Install only if you intend this agent to generate voice replies and potentially send them through Feishu. Prefer passing an explicit chat_id, test with --skip-direct-send first, keep Feishu app permissions narrow, avoid sending sensitive text to cloud TTS, and review the local _shared helper modules used for credential and path resolution.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (7)

Tainted flow: 'out_path' from os.getenv (line 1226, credential/environment) → shutil.copyfile (file write)

Medium

Category: Data Flow
Content: def output_from_cache(cache_file: Path, out_path: Optional[Path]) -> Path: if out_path: out_path.parent.mkdir(parents=True, exist_ok=True) shutil.copyfile(cache_file, out_path) return out_path return cache_file
Confidence: 94% confidence
Finding: shutil.copyfile(cache_file, out_path)

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill states that it will upload audio to Feishu and infer the active chat from session logs, but it does not provide a clear user-facing warning or confirmation before sending content to an external service. This is risky because reply text may contain sensitive user data, and chat inference increases the chance of misdelivery to the wrong recipient or conversation.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill documents automatic use of API credentials from environment variables and local credential files without a user-facing warning or clear scoping controls. This is dangerous because it normalizes silent credential access and substitution, which can lead to unintended use of privileged tokens, account confusion, or secret exposure through downstream tools and logs.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The default prompt applies the skill broadly whenever voice replies are needed and also embeds behavioral preferences such as always defaulting to a specific clone voice and suppressing normal textual confirmation. Without tighter trigger conditions, an agent may invoke this skill in contexts where the user did not clearly request voice output or where the host channel cannot safely handle the side effects, leading to unintended audio delivery or hidden actions.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The prompt instructs the agent to send a final assistant message in a specific language (Chinese) if a textual completion is needed, regardless of the user's language preference. Fixed-language output can confuse users, miscommunicate important status information, and create unsafe UX failures in multilingual environments, especially when the text is the only visible confirmation that an audio action occurred.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The script sends arbitrary request text to an external TTS provider via synthesize() without any built-in disclosure, consent, or data-classification guardrails. In a messaging/voice-reply skill, that can expose sensitive user content or internal data to a third-party service unexpectedly, making the context more privacy-sensitive than a standalone local utility.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: This code sends both arbitrary input text and a bearer API key to a third-party TTS provider, which creates a real data exposure and trust-boundary issue if users are unaware that message contents leave the local system. In this skill context, the text may contain chat content or sensitive user data, so undisclosed transmission to an external service increases privacy, compliance, and credential-handling risk even though it appears to be the intended functionality.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal