Acceptance Test

Security checks across malware telemetry and agentic risk

Overview

This skill appears non-destructive, but its test runners can present random demo results as real acceptance validation.

Treat this as a demo or placeholder, not a real QA gate. Do not use its pass/fail results, risk level, approval flag, or report output for release approval, customer acceptance, compliance evidence, or sign-off unless the scripts are replaced with deterministic logic that actually reads and executes your test artifacts.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (8)

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The function claims to generate an acceptance report, but its key conclusions such as overall status, risk level, and blockers are produced randomly rather than derived from real validation data. In an acceptance-testing skill, this can mislead users into approving or rejecting deliverables based on fabricated results, undermining business controls and quality gates.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill metadata states it supports business requirement validation and acceptance reporting, but the implementation performs no requirement verification and instead returns mostly fixed or randomized fields. This creates deceptive assurance: users may treat the output as authoritative despite there being no actual validation behind it.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The function claims to run acceptance tests but ignores the provided stories input and fabricates pass/fail outcomes with randomness. In a QA or release workflow, this can falsely approve or reject builds, undermining delivery decisions and creating a serious integrity risk even without direct code execution or data theft.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The module and docstring state that the code executes acceptance testing, but the implementation only generates synthetic results. This mismatch is dangerous because operators may trust the output as evidence of business requirement validation when no real verification occurred.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The function claims to execute BDD tests using the provided feature and step paths, but it never reads or runs those inputs and instead returns hardcoded scenarios with randomized pass/fail outcomes. In an acceptance-testing skill, this can misrepresent system quality, produce false assurance, and cause unsafe or noncompliant software to be accepted based on fabricated results.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The module docstring and function description state that the code executes BDD tests, but the implementation only simulates results. This mismatch is dangerous because downstream users, agents, or automation may trust the output as evidence of real verification, especially given the skill's stated role in acceptance and delivery quality checks.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The trigger conditions are very broad and cover generic activities like requirement validation, scenario testing, acceptance checks, and report generation without clear exclusions or routing boundaries. This can cause the skill to activate in contexts beyond acceptance testing, leading to inappropriate handling of user requests, confusion with other skills, and increased attack surface for prompt-routing abuse or unintended execution paths.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The skill metadata and content are written entirely in Chinese and appear to assume Chinese-language interaction, without stating whether other languages are supported or allowing user preference. This can cause unsafe misunderstandings in multilingual environments, especially for acceptance criteria, sign-off conditions, or test results where precision matters.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal