anti-sycophancy

Security checks across malware telemetry and agentic risk

Overview

This skill openly changes local prompt handling and persistent assistant rules to reduce sycophancy, with no evidence of data theft, credential misuse, or destructive behavior.

Install only if you want your assistant to rewrite some confirmatory prompts and carry persistent critical-response rules into later sessions. Review the changes to ~/.claude/settings.json, ~/.claude/CLAUDE.md, and SOUL.md, and use the documented status or uninstall commands if the behavior becomes disruptive.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill instructs the agent to create directories, copy hook scripts, modify ~/.claude/settings.json, and append persistent rules to ~/.claude/CLAUDE.md without requiring an explicit, up-front warning or confirmation about persistent local changes. This is dangerous because users may invoke a seemingly simple 'install' command without understanding that it alters execution hooks and long-lived prompt policy files in their home directory.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The OpenClaw flow persistently modifies workspace-level SOUL.md, which can affect future model behavior across the workspace, but the skill does not mandate an explicit warning that these are durable prompt-rule changes. Persistent workspace prompt mutations are security-relevant because they can silently reshape later agent decisions beyond the current session.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The installation guide explicitly states that the skill writes to user-scoped configuration and persistence locations such as ~/.claude/settings.json, ~/.claude/CLAUDE.md, hook directories, and workspace SOUL.md, but it does not clearly warn users before the install/uninstall commands that these files will be modified. Silent modification of agent configuration and persistent prompt files can change future model behavior in ways users may not expect, making this a real security-relevant transparency issue.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: The documented trigger behavior includes broad intent phrases like 'I want to hear counterarguments' and similar natural-language descriptions, which can match ordinary user conversation rather than an intentional skill invocation. That increases the chance of unintended activation of a behavior-modifying skill, especially because the skill is designed to alter response style and could persistently influence agent output.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal