Xiaozhi English Speaking Coach

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent English-speaking coach, but it uses broad auto-activation and persistent learner profiling without enough scoping, retention, or deletion detail.

Install only if you are comfortable with an English coach that may activate on broad English voice interactions and maintain a local long-term learner profile. Before use, confirm how profile memory can be reviewed, disabled, or deleted, especially for students or minors.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (6)

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The skill declares mandatory activation for a wide range of common phrases and even any English voice input, which can cause unintended invocation in ordinary conversation. In a voice assistant context, this may override user intent, unexpectedly engage memory-related behavior, and route users into a specialized workflow without clear consent.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: Treating any English voice opening or generic greetings like 'Good morning' as a trigger is overly ambiguous and likely to fire during unrelated interactions. Because this skill also references persistent learner profiles, accidental activation can expose prior-session context or begin collecting speech-related data when the user did not intend to start coaching.

Natural-Language Policy Violations

Medium

Confidence: 81% confidence
Finding: The rule that the assistant should never use Chinese for content that could be practiced in English removes user choice and can make instructions less understandable for beginners. This is primarily a safety and usability policy issue rather than a direct exploit, but it can still lead to confusion, mis-training, or inaccessible guidance for younger or lower-proficiency users.

Natural-Language Policy Violations

Medium

Confidence: 84% confidence
Finding: Forcing English-only interaction in roleplay without opt-in can push users into a mode they did not agree to and may reduce comprehension during task-critical moments. In an educational skill this is less severe than code execution or data exfiltration, but it still creates a policy and consent problem, especially for novice learners.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The trigger logic is broad enough to activate on generic phrases like 'Good morning' or voice-mode speech, which can cause the skill to start without clear user intent. In a conversational coaching skill, unintended activation can lead to surprise collection of speech-derived profile data and an experience the user did not explicitly request.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The state machine and persistence model store language-profile data, including pronunciation weaknesses, grammar issues, interests, and session history, but the file does not describe consent, retention limits, deletion, or access controls. Because this is longitudinal user profiling tied to behavior over time, the privacy risk is real and elevated, especially if the users may be students or minors.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal