Self Discipline

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed local discipline and validation workflow; it is powerful, but its persistence and validator behavior are purpose-aligned and gated by user approval.

Install this only if you want a persistent local system that records instruction-failure incidents and can add enforcement checks. Review any generated validator script, any proposed AGENTS.md or HEARTBEAT.md change, and any incident details saved under ~/self-discipline/ before approving them.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The document states that validators never modify anything, yet the periodic validator's security manifest says it writes to ~/self-discipline/validator-log.md. This inconsistency can mislead operators and reviewers about side effects, weakening trust in the control boundary and making it easier for future validators to justify unintended writes.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The skill can be triggered by vague emotional signals such as a frustrated user or phrases like 'I told you...,' which are common in normal conversations and not limited to true critical incidents. That broad trigger surface can cause the agent to enter an escalation-and-persistence workflow unnecessarily, leading to overcollection of incident data, creation of enforcement artifacts, or pressure to modify persistent files when the situation does not warrant it.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The severity matrix uses highly subjective signals like emotional wording and absolute terms as escalation criteria without requiring objective incident characteristics. In this skill's context, that can cause routine interactions to be overclassified, triggering unnecessary root-cause workflows, validators, and process changes based on weak evidence rather than actual risk.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The automatic CRITICAL trigger 'Production affected' is undefined and can be interpreted broadly, allowing minor or transient issues to be treated as maximum severity. In a compliance-enforcing skill, this ambiguity is more dangerous because CRITICAL classification mandates heavyweight corrective actions, creating operational disruption and incentivizing excessive escalation.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal