Moses Governance

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent governance harness, but it overstates security guarantees while granting broad control over agent workflow, local governance state, signing, audit logs, and optional external reporting.

Install only if you want a broad governance layer that can steer agent behavior and persist audit/governance records. Keep REFEREE_ENABLED and MOSES_WITNESS_ENABLED off unless you trust the destination and are comfortable sharing derived task or governance data. Do not rely on this version as a strong security boundary for approvals, rollback, presence, audit non-repudiation, or high-value signing until the signature verification, audit failure handling, and headless approval behavior are fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (29)

Tainted flow: 'req' from os.environ.get (line 314, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: method="POST", ) try: with urllib.request.urlopen(req, timeout=15) as resp: referee_response = json.loads(resp.read()) referee_response["submitted"] = True # Run structural comparison immediately on receipt
Confidence: 91% confidence
Finding: with urllib.request.urlopen(req, timeout=15) as resp:

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The script asserts that anyone can independently reconstruct the archival chain from external references, but two of the claims are explicitly marked as internal prior work and are not externally reproducible from the cited materials. This creates a misleading trust signal: users may believe the provenance is independently verifiable when in practice verification depends on trusting embedded, author-supplied assertions.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The verify workflow does not verify provenance evidence from independent sources; it only checks whether the stored file matches the hardcoded PROVENANCE_CLAIMS in the script. That means the command can prove local consistency of bundled data, but not the truth of the provenance narrative it claims to establish, which may mislead downstream systems or operators into overtrusting the result.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The audit stub is presented as an append-only ledger component, but it also mutates governance state by setting recovery flags in a separate progress tracker when certain outcomes occur. This side effect creates hidden control-plane behavior: any caller able to invoke the logger with crafted outcomes can influence downstream governance or recovery workflows, violating separation of duties and making the logger more privileged than its stated role suggests.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The documentation states that live cross-system exchanges require a presence object, but verify treats envelopes without presence as fully verified and returns ACCEPT if the other checks pass. This creates a trust-boundary failure where operators or downstream tooling may rely on 'verified' status even though the anti-replay/live-witness property was never established.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The unpack command claims to operate on a verified envelope but performs no verification before returning kernel and metadata from arbitrary JSON input. This can enable downstream consumers to process attacker-crafted envelopes as if they were authenticated, undermining the integrity guarantees of the handshake scheme.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The file presents itself as a governance and audit-chain initializer, but it only writes mutable local JSON state and creates an empty ledger file with no integrity checks, signing, append-only enforcement, or verification. In a security/governance skill, this mismatch can mislead operators into believing protections exist when they do not, enabling silent tampering with state and audit artifacts.

Intent-Code Divergence

Low

Confidence: 82% confidence
Finding: The documentation implies governance-state management with audit integrity, but the code only initializes directories, stores JSON, and creates an empty ledger file. While this is primarily a design/assurance issue, in the context of a tool marketed as a constitutional enforcement layer, overstated security guarantees can cause unsafe reliance and weaken operational security decisions.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: This is a real authorization bypass. When MOSES_OPERATOR_SECRET is set, _verify_operator_sig() accepts any value of the form 'hmac:<64 hex chars>' without recomputing the HMAC for the proposal_id or using hmac.compare_digest, so an attacker can forge an approval or rollback signature and mutate governance state. In a governance harness whose purpose is trust and policy enforcement, this completely undermines the signing gate.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The header and CLI describe rollback as if it can 'apply or rollback' amendments, but rollback_amendment() only records a rollback event and moves proposal metadata while explicitly not restoring constitution.json. This can mislead operators into believing a dangerous amendment was reverted when the effective policy remains active, causing continued insecure operation and failed incident response.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The docstring and UX claim the responder produces a 'signed response', but the implementation only computes a plain SHA-256 digest over attacker-controlled JSON plus the nonce. Because no secret key or asymmetric private key is used, any party can forge a valid-looking response for any agent_id, mode, posture, or lineage value that satisfies the local checks, defeating the identity and trust guarantees this protocol claims to provide.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file markets itself as a governance and trust enforcement handshake, but verification is based on a hardcoded local constant (LINEAGE_ANCHOR), accepted enumerated strings, and a recomputable response hash over self-declared data. In the context of a governance harness that claims to make execution 'trustworthy', this is especially dangerous because operators may rely on it as a security boundary even though it provides no cryptographic proof of identity, lineage, or governed state.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The code claims signing and audit logging are atomic, but the exception handler explicitly allows a signature to be returned even when audit persistence fails. In a governance/signing harness, this breaks accountability guarantees and enables authorized signing actions to occur without a durable audit trail, undermining non-repudiation and forensic review.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The module documentation asserts there is no bypass path to secret access before governance checks, but the verify command reads MOSES_OPERATOR_SECRET without invoking the governance gate. Even if verification is less sensitive than signing, this contradiction weakens the stated security boundary and creates an undocumented code path that exposes secret-dependent functionality outside the governance model.

Context-Inappropriate Capability

Medium

Confidence: 72% confidence
Finding: The file advertises additional external-reviewer environment variables unrelated to the implemented witness logger, which creates ambiguity about hidden data flows and expands operator trust assumptions. In a governance/audit component, undocumented or mismatched external integrations are risky because they normalize outbound sharing and can mislead users about what may be exfiltrated.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The documentation claims the external witness provides a trustworthy second ledger the agent cannot unilaterally edit after the fact, but the code only posts selected events and does not guarantee completeness, sequencing, or mandatory logging. In a governance harness, overstating audit guarantees is dangerous because users may rely on the absence of records as evidence that no blocked, failed, or malicious action occurred.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README instructs users to export `MOSES_OPERATOR_SECRET` directly in their shell without any guidance on secure generation, storage, rotation, or avoiding shell history and process-environment exposure. Because this secret appears to gate governance or signing behavior in a security-sensitive skill, weak handling could let other local users, logs, or subprocesses access it and undermine trust in the audit or control plane.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The 'None (Unrestricted)' mode effectively permits operation without behavioral safeguards, and its trigger condition ('operator explicitly accepts full risk') is underspecified and easy to abuse or simulate. In a governance harness, ambiguous activation of an unrestricted mode undermines the entire control model and can be used to bypass safety, privacy, and transaction controls.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The file introduces an unrestricted mode but does not present a clear user-facing warning about concrete risks such as data disclosure, destructive actions, policy bypass, or system compromise. In a document framing itself as a trust and governance layer, omission of explicit risk communication makes unsafe activation more likely and increases the chance of informed-consent failure.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The function can transmit review artifacts to an external service when environment flags are enabled, but there is no prominent user-facing warning or interactive confirmation at the point of transmission. In a governance harness, users may reasonably assume reviews stay local; silent network egress increases the risk of unintended disclosure of instruction/output-derived data to third parties.

Missing User Warnings

Low

Confidence: 86% confidence
Finding: The script silently writes to a separate progress tracker file as a side effect of logging failures. Undisclosed state mutation can surprise operators and allows a caller controlling outcome text to trigger workflow changes without an explicit permission boundary or user confirmation.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The documented headless mode explicitly bypasses operator confirmation for governed actions, undermining the stated purpose of a governance harness as a human control gate. In security-sensitive or destructive workflows, this can permit unattended execution of risky steps that would otherwise require review, especially if upstream policy checks are incomplete or state is tampered with.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The function posts event details, state metadata, and optional extra fields to a public external service once enabled, without an explicit warning or consent prompt at the point of transmission. Because this is a governance component likely to handle sensitive task descriptions, blocked actions, and failure details, it can unintentionally disclose confidential operational data.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill automatically embeds session-derived provenance data into generated documents without a clear up-front warning to the user. Even if the session ID is truncated and derived from hashed inputs, this creates hidden metadata disclosure and traceability that may be inappropriate for sensitive, anonymous, or externally shared documents.

Natural-Language Policy Violations

Low

Confidence: 79% confidence
Finding: The hard-coded runtime/vendor string embeds platform attribution into every stamped document regardless of user intent or actual execution context. This can leak environmental information, create misleading provenance claims, and expose users to unnecessary fingerprinting when documents are shared externally.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal