Axioma Skill Evaluator Strict 90%

Security checks across malware telemetry and agentic risk

Overview

This skill is a local skill-quality checker, but its code can approve scores below the strict 90% gate it repeatedly promises.

Treat this as a Review item, not a confirmed attack. Install only if you are comfortable using it as an advisory checker and not as an authoritative publishing or security gate. Do not rely on its APPROVED output until the 70%/90% mismatch, report text, exit behavior, and unimplemented --improve claims are fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The skill metadata explicitly promises a strict 90% minimum approval gate, but the implementation marks skills as passed at 70% and above. This creates a policy-enforcement mismatch that can allow lower-quality or unsafe skills to be incorrectly approved, undermining downstream trust in the evaluator.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The generated report states that approval occurs at 70%, directly contradicting the advertised strict 90% gate. Misleading output is dangerous because users and automation may rely on the report text as the authoritative compliance result, causing noncompliant skills to be accepted.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The CLI success/failure messages advertise a 70% approval rule instead of the documented 90% strict gate. This is risky because operators may make manual approval decisions based on these messages, and scripts parsing stdout may propagate incorrect pass/fail outcomes.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The trigger list contains broad phrases like 'evaluate skill' and related generic quality terms that may match ordinary conversation. Over-broad activation increases the chance the skill runs unintentionally in unrelated contexts, which is more concerning here because the skill also references shell commands, file operations, hardcoded paths, and evaluation workflows.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal