Voight-Kampff Test

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent Blade Runner-style test, but it asks agents to collect sensitive emotional and behavioral observations and label users without enough consent, scope, or retention controls.

Install only if you want an explicit entertainment or role-play assessment. Do not use its results for real identity, employment, security, legal, safety, or trust decisions. Before running it on someone, disclose the sensitive themes and any recording, allow them to decline or skip questions, and avoid the external microphone-enabled web version unless you separately trust that site.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (6)

Context-Inappropriate Capability

Low

Confidence: 94% confidence
Finding: The demo routine writes test results, including subject identifiers and responses-derived assessment data, to a fixed absolute path without user consent or path control. Even though this is demo code, it creates an unnecessary local data persistence behavior that can leak sensitive assessment output or fail unpredictably in other environments.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The activation guidance is broad enough to trigger on ordinary phrases like 'test me' or 'am I a robot,' which can cause the agent to enter a specialized persona or workflow unexpectedly. In this skill, that matters because the workflow encourages labeling users as HUMAN/REPLICANT and inferring traits from normal conversation, increasing the risk of misclassification, unwanted roleplay, and prompt-collision with unrelated user requests.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The trigger conditions are broad enough to activate during ordinary conversation, including vague phrases like '测试我', '灵魂测试', or any dialogue with '疑似 AI 的行为模式'. In context, that is risky because the skill performs identity-style classification of users or agents as 'HUMAN' versus 'REPLICANT', which can lead to unintended profiling, derailment of user intent, or inappropriate escalation based on weak signals.

Natural-Language Policy Violations

Medium

Confidence: 85% confidence
Finding: The skill is written to operate in Chinese without offering a language choice, which can cause misunderstanding of sensitive instructions and test questions. In this specific skill, the risk is amplified somewhat because it makes quasi-diagnostic judgments about whether someone is 'human' or 'replicant', so language mismatch can distort responses and produce misleading classifications.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The file contains numerous emotionally intense prompts involving death, grief, fear, bodily harm, sexual content, and moral distress, but it does not provide any upfront warning, consent step, or guidance for safely handling distress. In a skill explicitly designed to probe identity and emotional reactions, this raises the likelihood of psychological harm, coercive engagement, or inappropriate use with vulnerable users.

Missing User Warnings

Low

Confidence: 91% confidence
Finding: The demo saves a JSON report to disk without notice, which can silently persist potentially sensitive assessment metadata and outputs. In the context of a skill that evaluates whether a subject appears human or automated, storing these results increases privacy risk and may surprise operators who expected an in-memory demo only.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal