Xiaozhi Teach Math Exam Designer

Security checks across malware telemetry and agentic risk

Overview

This is a coherent math assessment-design skill, but users should be careful with student data because it includes cross-skill sharing, archives, and a few inconsistent templates around rankings and full exams.

Install only if you are comfortable using it for teacher-controlled assessment planning. Avoid entering real student names, remove ranking fields from reports, confirm before writing results to other skills or archives, and keep any stored assessment records pseudonymized with a clear retention policy.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (7)

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
The skill’s boundary statement says it will not generate full exam papers, but the interface contract exposes `examDesigner.exam` as a write target. This creates a specification conflict that downstream components or prompting logic could exploit to coerce full exam generation, bypassing the declared safety boundary and potentially enabling unauthorized content production such as copyrighted test material.

Intent-Code Divergence

Medium
Confidence
97% confidence
Finding
The skill states that AI must not rank students and must not disclose student score rankings, yet the student report template includes a `排名` field. This inconsistency can normalize collection and output of comparative student performance data, increasing privacy and fairness risks and making policy bypass easier in real deployments.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The behavior rules prohibit outputting complete exams, but the collaboration/write sections still describe exam generation and propagation to other components. This contradiction weakens guardrails because integrators may trust the interface definition over prose restrictions, causing the system to generate or distribute artifacts the policy says should never exist.

Vague Triggers

Medium
Confidence
89% confidence
Finding
The trigger list includes broad natural-language phrases such as '这周考什么' and '试卷怎么出', which can plausibly appear in ordinary teacher conversation outside the exact intended scope. This increases the chance of unintended skill activation, causing context hijacking or routing the user into the wrong workflow, which is a real security and reliability issue for agentic systems even though the content itself is educational.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The trigger phrases include broad everyday expressions such as ‘这周考什么’ and ‘试卷怎么出’, which can cause the skill to activate outside the intended context. Overbroad activation increases the chance of unintended data access, inappropriate workflow takeover, or accidental invocation when a user is discussing assessment casually rather than requesting this skill’s full functionality.

Vague Triggers

Medium
Confidence
91% confidence
Finding
The trigger section lists examples but does not define boundaries, exclusions, or disambiguation logic. Without explicit non-trigger conditions, the skill may activate on partial keyword matches and operate on educational data or planning tasks that belong to other skills, increasing misuse and privacy exposure through unnecessary processing.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill describes writing assessment results back to other skills and storing them in archives, but it does not clearly warn users that data may be shared or persisted across components. In an education context, assessment outcomes can be sensitive student data, so silent cross-skill sharing materially increases privacy, consent, and retention risks.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal