AB Test Framework

Security checks across malware telemetry and agentic risk

Overview

This skill does not appear malicious, but it is incomplete and claims to perform A/B model selection while actually returning success without comparing models.

Review carefully before installing. Do not rely on this version for model selection or governance because it returns success without performing A/B testing. Avoid using sensitive prompts unless logging and alert behavior are fixed, and prefer a corrected version that implements the comparison, documents external data handling, removes unused exec/filesystem imports, redacts logs and alerts, and fixes package.json.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The skill can send external alerts on failure even though its stated purpose is only local A/B model comparison. That creates an unnecessary outbound communication path that could leak error details or be abused as an exfiltration/notification channel, especially since the alert includes the raw error message.

Intent-Code Divergence

High

Confidence: 92% confidence
Finding: The documentation advertises an A/B testing framework, but the implementation is only a placeholder that returns sanitized input. This mismatch is dangerous because downstream systems or operators may trust the skill to perform security-relevant evaluation or gating when in reality it does nothing, enabling silent policy bypass or incorrect model selection decisions.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The manifest and code claim model comparison via A/B testing, but the implementation only validates and sanitizes inputs and never performs any comparison logic. In a selection pipeline, this can cause users or automated orchestration to believe models were evaluated when they were not, undermining governance and potentially approving unsafe or untested models.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: This skill compares external models and accepts arbitrary test prompts, but the description does not warn users that those prompts may be transmitted to third-party model providers. That omission can cause unintentional disclosure of sensitive or proprietary data during evaluation, especially because A/B testing commonly forwards the same inputs to multiple external services.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal