senseaudio-game-npc-director

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate NPC voice workflow, but it defaults to repeatedly sending generated voice content to Feishu and relies on local credentials without enough opt-in control.

Review before installing. Use this only if users are comfortable sending player speech, dialogue text, generated audio, and related metadata to AudioClaw/SenseAudio services and possibly Feishu. Prefer requiring explicit per-session opt-in before Feishu sending, confirming the destination chat, disabling sticky external delivery by default, and setting retention rules for transcripts and generated files.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Rogue AgentSelf-Modification, Session Persistence
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (9)

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill describes and instructs use of shell commands, filesystem access, environment-based credential lookup, and networked ASR/TTS/messaging integrations, but it does not declare permissions for those capabilities. This reduces transparency and prevents informed consent or policy enforcement for actions like reading local credential files, invoking external tools, and transmitting data off-platform.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The skill is presented as an AudioClaw NPC voice workflow, but the instructions reveal materially different behavior: SenseAudio services are used for ASR/TTS, local credential substitution occurs, and outputs may be sent to Feishu after transcoding. This mismatch can mislead users about where their audio/text goes, which credentials are accessed, and which third parties receive content.

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The pipeline includes an optional path to send generated NPC audio to Feishu, which introduces external data egress beyond local NPC voice generation. In a skill advertised for reusable in-world voice behavior, adding outbound messaging increases the chance that user-derived content, transcripts, or generated audio are transmitted to third-party systems without clear necessity or strong guardrails.

Context-Inappropriate Capability

Medium
Confidence
89% confidence
Finding
This block operationalizes the Feishu send capability by invoking a separate script with workspace and chat/session targeting parameters. That creates a concrete outbound communications channel not justified by the stated NPC voice-generation use case, raising exfiltration and misuse concerns if transcripts or synthesized content include sensitive data.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The skill makes Feishu audio delivery a normal or default path for generated NPC responses without a clear warning that user-derived content and synthesized audio will be transmitted to an external messaging platform. In a voice-interaction context, this can expose sensitive speech content, transcripts, or game/session context to a third-party service unexpectedly.

Vague Triggers

Medium
Confidence
85% confidence
Finding
The default prompt uses broad activation language and persistent state instructions that can cause the agent to remain in NPC mode across subsequent turns without a narrowly scoped trigger. In practice, this increases the chance of unintended behavior, including applying the skill to unrelated user requests and continuing voice-output actions longer than the user expects.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The script sends each manifest line's text to an external third-party TTS service, which can expose sensitive dialogue, narrative content, or user-derived material if manifests contain private data. In this skill context, external synthesis is the intended behavior, but the lack of explicit disclosure, consent, or content classification increases privacy and data-governance risk.

Autonomous Decision Making

Medium
Category
Excessive Agency
Content
2. If the input is text, still run `scripts/run_player_voice_npc_pipeline.py --input-text ...` so the reply stays on the same voice pipeline.
3. In ongoing NPC dialogue mode, default to `--send-feishu-audio` so the generated NPC lines are sent one by one as Feishu `audio` messages.
4. Only fall back to text-first replies if the user explicitly asks for text-only output or the channel cannot play voice.
5. If the user says "直接发语音" or "一条一条发 NPC 语音", keep the same voice mode and continue sending audio without asking again.

NPC mode should be sticky inside the same session:
Confidence
94% confidence
Finding
without asking

Session Persistence

Medium
Category
Rogue Agent
Content
- If you want faster perceived NPC response generation, use stream ASR for the player-input leg.
- Treat cloned voices or exclusive voices as drop-in replacements for the same workflow.
- Official clone support is a two-step chain:
  - create the clone on the AudioClaw platform first
  - then use the prepared clone `voice_id` here

## API key lookup
Confidence
80% confidence
Finding
create the clone on the AudioClaw platform first - then use the prepared clone `voice_id` here ## API key lookup For the NPC generation side of this skill: - TTS-oriented scripts now default to `

VirusTotal

52/52 vendors flagged this skill as clean.

View on VirusTotal