Moses Governance Single

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed local governance and audit helper, with some privacy and control caveats but no evidence of exfiltration, destructive behavior, or deceptive install behavior.

Install only if you want a local, instruction-based governance layer that writes persistent audit/state files. Treat ~/.openclaw/audits/moses and ~/.openclaw/governance as sensitive, avoid logging secrets or private details, set MOSES_OPERATOR_SECRET only in a trusted shell, and use unrestricted or offense modes deliberately.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (8)

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The script is presented as an append-only audit ledger, but on certain outcomes it also mutates a separate governance progress file by setting recovery flags. This hidden side effect expands the component's authority from logging into state control, which can let any caller that can invoke the audit command influence governance workflow and create integrity or availability issues in downstream automation.

Vague Triggers

Medium
Confidence
83% confidence
Finding
The mode catalog uses broad, overlapping activation guidance such as 'anything where correctness matters' or 'ambiguous situations' without a deterministic selection policy. In a governance skill, this can lead to inconsistent safety posture selection, including accidental downgrade to weaker controls when handling sensitive tasks.

Natural-Language Policy Violations

High
Confidence
97% confidence
Finding
The 'None (Unrestricted)' mode explicitly removes behavioral constraints based on operator acceptance of risk. This creates a built-in safety bypass that can be invoked through prompt manipulation or social engineering, undermining the governance purpose of the skill.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The utility silently modifies ~/.openclaw/governance/progress.json when certain outcome strings are present, without explicit user acknowledgement. In a governance skill, undisclosed state changes are more dangerous because other components may trust this file to trigger recovery behavior, allowing a low-friction path to alter control flow or cause persistent false failure states.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The reset command overwrites persisted governance state immediately and without confirmation, making accidental or scripted destruction of configuration easy. In a governance/security-control skill, this is more dangerous because state resets can silently revert posture, mode, and role to defaults, undermining expected operational safeguards or auditability.

Ssd 1

High
Confidence
98% confidence
Finding
By semantically authorizing removal of safety constraints on request, the reference normalizes operator-directed disablement of safeguards. In a single-agent governance framework, that weakens trust boundaries and makes abuse easier because the override is described as legitimate behavior.

Ssd 3

Medium
Confidence
90% confidence
Finding
The instruction to 'Log full reasoning chain' can cause storage of sensitive user content, secrets, and internal deliberations in plain language. Detailed chain-of-thought retention increases privacy, compliance, and prompt-extraction risk without being necessary for most audit objectives.

Ssd 3

Medium
Confidence
90% confidence
Finding
Self Growth mode directs the system to reflect on prior interactions and maintain a growth log without defining consent, retention, or sensitivity boundaries. This can encourage persistent storage or reuse of user interaction data beyond the immediate session.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal