Rule Creation

Security checks across malware telemetry and agentic risk

Overview

This skill is openly meant to automate rule wiring, but it can automatically change persistent agent-governance files and generate executable workflows from unreviewed rule text.

Install only if you intentionally want an agent to update persistent governance files automatically. Use a review-first workflow, restrict writes to known governance files, remove or tightly control TARGET_FILE, keep backups, and inspect any generated Lobster workflow before enabling or running it.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Memory PoisoningPersistent Context Injection, Context Window Stuffing, Memory Manipulation
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 82% confidence
Finding: The skill instructs the agent to inspect environment/plugin state via OpenClaw commands and referenced scripts, but the metadata shown does not declare corresponding permissions. Undeclared capabilities make review and containment harder and can lead to unexpected execution paths or access to local environment details during normal use.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The description presents the skill as an auto-trigger for compliance/enforcement, but the body authorizes materially broader actions: editing governance files, creating workflow files, probing plugin state, and performing end-to-end automation. This mismatch can mislead users and reviewers about the true write and execution behavior, increasing the chance of unauthorized repository changes or trust abuse.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The script accepts TARGET_FILE directly from the environment and passes it to fs.readFileSync/fs.writeFileSync without validating that the path stays within an approved governance-docs directory. That gives the caller arbitrary file write capability anywhere the process has permission, which exceeds the stated purpose of appending rule entries to governance files and can be abused to overwrite configuration, scripts, or other sensitive files.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Beyond the explicit override, the file selection logic ultimately permits writes to arbitrary paths because targetFile may be set externally and the script will create missing files if needed. In an agent skill context, this is more dangerous because environment-controlled file writes can be chained into persistence, tampering with other skills, or altering agent behavior by modifying unrelated files under the user's home directory or elsewhere.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README explicitly says the skill will modify governance/context files immediately and without permission-seeking, which creates an unsafe write-on-detection behavior. In an agent setting, automatic persistence of new rules into files like TOOLS.md or AGENTS.md can permanently alter future agent behavior, enabling prompt-injection persistence, policy drift, or unauthorized configuration changes from loosely validated inputs.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The auto-trigger is described as activating whenever the AI recognizes broadly defined 'rule-creation intent', but the README provides no precise boundaries, exclusions, or confirmation step. That ambiguity increases the chance of unintended activation from normal conversation or adversarial phrasing, which is especially risky because the skill's documented behavior includes immediate wiring of rules into persistent governance files.

Vague Triggers

High

Confidence: 96% confidence
Finding: The trigger condition covers ordinary discussion about rules, policies, processes, and workflows, which is broad enough to activate on benign conversation rather than explicit requests to modify project state. In a skill that can write persistent governance files and create workflows, ambiguous activation creates a high risk of unintended or adversarially induced changes.

Vague Triggers

High

Confidence: 97% confidence
Finding: The listed phrases include common language like 'From now on' and 'The rule is,' which can appear in normal brainstorming or quoted text. Because the workflow then mandates immediate action without permission, these broad triggers make prompt injection and accidental activation much more likely.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The workflow directs automatic modification of persistent documentation and possibly creation of workflow files without a user-facing warning that repository state will change. Silent writes reduce user awareness and can let untrusted prompt content convert directly into durable policy/configuration changes.

Natural-Language Policy Violations

Medium

Confidence: 98% confidence
Finding: The instruction to 'always' act and 'Do NOT ask permission' removes an important safety checkpoint for high-impact operations. In combination with broad triggers and persistent writes, this creates a direct path from natural-language content to unauthorized changes in governance artifacts and enforcement workflows.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The trigger phrase at line 27 ("from now on") is extremely broad and commonly appears in ordinary conversation, which can cause unintended skill activation. In a governance/enforcement skill, accidental invocation is more dangerous because it may cause the agent to apply compliance or enforcement logic to benign discussion and silently reshape behavior or outputs.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The manifest defines several generic triggers for policy and rule creation without clear boundaries, confirmation steps, or contextual constraints. Because this skill auto-triggers enforcement-related behavior, loose activation criteria increase the chance of prompt-collision, unintended policy wiring, or abuse by adversarial phrasing embedded in otherwise normal requests.

Persistent Context Injection

Medium

Category: Memory Poisoning
Content: **Context Engineering** is the practice of wiring rules into your AI's context so they're reliably followed — not just hoped for via prompting. Credit: [Muratcan Koylan (@koylanai)](https://x.com/koylanai/status/2025286163641118915) Use this skill any time you are: - Declaring a new rule (e.g. "From now on, always X") - Formalizing a policy (e.g. "The policy for Y is Z") - Creating a behavioral constraint (e.g. "Never do X without doing Y first") - Proposing a process or workflow
Confidence: 88% confidence
Finding: From now on, always

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal