Behuman

Security checks across malware telemetry and agentic risk

Overview

This is a prompt-only style skill that changes how replies sound and shows a reflection format, with no evidence of code execution, credential use, data exfiltration, or persistence.

Install this only if you want replies to become more informal, opinionated, and emotionally direct. Prefer explicit invocation such as /behuman or behuman: rather than broad automatic activation, and avoid using it as a default for medical, legal, financial, crisis, security, or other safety-sensitive conversations. Review the marketplace capability tags separately because the files reviewed do not support crypto or purchase behavior.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (10)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs the agent to emit 'Self' and 'Mirror' sections that resemble internal deliberation, which can expose hidden reasoning-style content to users. Even if not revealing true chain-of-thought, this normalizes disclosure of internal process and can leak sensitive intermediate analysis, policy reasoning, or prompt-derived behavior that should remain internal.

Intent-Code Divergence

Low

Confidence: 83% confidence
Finding: The file first requires a visible three-part output including internal reflection, then later says the mirror process may become internal-only. This inconsistency is dangerous because it creates ambiguous agent behavior around whether internal reasoning should be surfaced, increasing the chance of accidental disclosure or inconsistent compliance with safety expectations.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The README states the skill activates automatically for broad categories like emotionally charged conversations, advice, and personal writing, plus loose trigger phrases such as 'be real' and 'talk like a person.' In a prompt-routing system, ambiguous activation can cause the skill to engage outside intended contexts and override normal response style or safety posture, especially for sensitive support scenarios where 'human-like' output may encourage overfamiliar, manipulative, or inappropriate responses.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The skill explicitly states it will auto-activate for broad categories such as emotional dialogue, advice, and personal writing. In practice, these are common conversational contexts, so the skill may trigger without clear user intent and alter assistant behavior in many normal interactions. Because the skill changes tone and response style rather than adding a narrowly scoped capability, overbroad activation increases prompt-surface area and can unexpectedly override safer or more appropriate defaults.

Vague Triggers

High

Confidence: 97% confidence
Finding: The listed manual triggers include vague everyday phrases such as 'be real', 'talk like a person', 'say it like a human', and similar Chinese variants, which can easily appear in ordinary user speech without intending to invoke a special skill. That makes accidental activation likely, especially in emotionally charged conversations where users may casually ask for a less robotic tone. Since the skill encourages hidden internal behavior changes and more emotionally persuasive output, unintended triggering is more dangerous in this context than for a narrowly bounded formatting skill.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The metadata description defines very broad triggers such as 'more human', 'be real', and general emotional-support contexts, which can cause the skill to activate in many ordinary conversations without clear user consent. Because the skill changes response style and may expose internal reasoning-like content, unintended activation can degrade reliability, confuse users, and increase the chance of policy-inconsistent outputs in sensitive conversations.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The automatic activation rules rely on subjective cues such as 'emotionally charged,' 'personal advice,' and 'frustrated with AI-sounding responses,' which are ambiguous and easy to over-trigger. In practice, this can silently override normal assistant behavior in high-sensitivity contexts and push the model toward roleplay-like introspection or reduced guardrailing when users most need clear, predictable assistance.

Natural-Language Policy Violations

Medium

Confidence: 85% confidence
Finding: The prescribed output format hardcodes Chinese-labeled headings and bilingual presentation without checking the user's language preference. This is not a direct security exploit, but it is a real prompt-quality and trust issue: forced language switching can confuse users, reveal internal processing structure unnecessarily, and make prompt injection or instruction-boundary mistakes harder for users to detect.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The evals define activation for broad, ordinary-language phrases and emotionally charged topics such as life advice, grief, and philosophy, which can cause the skill to trigger outside a narrowly scoped user opt-in. In a skill designed to alter response style and simulate 'more human' or inner-dialogue behavior, overbroad triggering increases the chance of unintended persona changes, response-policy drift, and confusing users in sensitive contexts.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The examples include Chinese-only trigger phrases and Chinese prompts without clarifying that language should follow the user's preference rather than the skill's defaults. If implemented naively, this can bias the agent toward switching into Chinese-language interaction or privileging Chinese trigger patterns without explicit user consent, creating usability issues and potentially bypassing expected style-selection constraints.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal