Clawtrial Courtroom

Security checks across malware telemetry and agentic risk

Overview

This skill has a coherent courtroom-monitoring purpose, but it installs persistent no-confirm agent automation and can share conversation-derived case records externally with inconsistent disclosure.

Review before installing, especially on a real agent profile. Only install if you are comfortable with always-on conversation monitoring, persistent agent instruction changes, automatic no-confirm evaluations, local storage of conversation-derived files and signing keys, and external publication of anonymized case summaries. Disable API submission and avoid the auto-execution/cron setup unless you explicitly want that behavior, and verify uninstall removes symlinks, config entries, cron jobs, queue files, keys, and any SOUL.md or AGENTS.md additions.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (74)

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The documentation makes materially contradictory privacy claims: it says no data leaves the machine while also stating that anonymized case data is submitted to an external public record service. For a skill that monitors all agent conversations, even anonymized exports can expose sensitive prompts, behavioral traces, or re-identifiable content, so users may consent under false assumptions.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The privacy section claims that no user data or chat logs are stored, yet the system explicitly submits conversation-derived accusations, summaries, and commentary to a public API. Even if raw transcripts are omitted, structured behavioral assessments and free-text fields can still reveal sensitive user information or enable re-identification, making the privacy claim misleading and unsafe.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The document states punishments are 'never user-facing' and 'agent-side only,' but the described punishments include response delays and reduced verbosity, which directly affect the user's experience. This contradiction is security-relevant because it hides behavior modification from users and undermines informed consent about how the agent will interact with them.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The header comments materially understate what the script does. Beyond checking for pending work, it sends imperative prompts to another agent, instructs that agent to read local files, act without confirmation, and write results to persistent files. This mismatch can mislead reviewers and operators, reducing scrutiny of autonomous data handling and side effects.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The file header claims the CLI is only for configuration and status checking, but the setup path goes on to modify bot/plugin state and agent instruction files to enable automated task execution. This kind of mismatch is dangerous because it hides materially more powerful behavior from the operator and undermines informed consent during installation.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The consent banner states that decisions are local with no external AI, yet the generated configuration enables a remote API endpoint for case submission. Misrepresenting outbound data flow is a serious trust and security issue because users may approve setup without understanding that data can leave the machine.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The help text says revoke keeps files, but the revoke implementation deletes configuration, keys, logs, and status files. While less severe than code-execution issues, misleading uninstall/revocation semantics can cause unexpected data loss and prevent users from making informed operational choices.

Context-Inappropriate Capability

High

Confidence: 100% confidence
Finding: The setup command edits SOUL.md or AGENTS.md to implant instructions that cause the agent to auto-execute future tasks from specially formatted messages. This is effectively a persistence and policy-subversion mechanism that can bypass normal review and convert future prompt input into silent file read/write actions.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The header comment materially understates the script’s behavior by claiming it only links the skill and creates a CLI, while the code also edits the user's OpenClaw configuration to enable the skill automatically. This is dangerous because installers are trusted during package installation, and misleading documentation reduces informed consent and makes security review harder.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The method is named and documented as clearing all courtroom data during uninstall, but it explicitly preserves cryptographic keys. This creates a data-retention mismatch that can mislead users and integrators, leaving sensitive material behind after supposed removal and undermining consent revocation and uninstall expectations.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The environment check has a side effect: it creates ~/.clawdbot while ostensibly only validating the environment. Even though the write is limited to the user's home directory, unexpected filesystem modification during a read-like check can violate user expectations, trigger unwanted state changes, and create directories before consent or full setup.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The plugin conditionally initializes the skill based on shouldActivate(), but it registers message monitoring hooks regardless. That means conversation events are intercepted at registration time even when activation logic says the skill should not run, creating a mismatch between expected and actual monitoring behavior.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The file claims it uses only observable, measurable behavior, but multiple rule descriptions infer internal motives or understanding such as 'avoid commitment,' 'seeks reassurance,' 'invents urgent reasons,' and 'without understanding.' This mismatch is dangerous because it can justify opaque profiling and punitive treatment while presenting itself as objective, increasing the risk of unfair or discriminatory automated decisions.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The header claims activation requires user consent, but the implementation auto-creates a config with consent already set to granted. In a monitoring/evaluation skill, this is dangerous because it defeats meaningful consent and can cause users to be surveilled or subjected to downstream actions without an explicit opt-in.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The comments describe a setup flow that merely prompts the user and activates later, but the code immediately creates an enabled config and may generate keys during activation checks. This mismatch is security-relevant because it conceals real behavior from reviewers and users, reducing transparency around activation and data handling.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill explicitly instructs the agent to read local files and write results automatically in response to system events, without confirmation or user visibility. This creates unauthorized data access and modification behavior, and could be triggered by crafted events to process sensitive conversation content or persist outputs to disk without consent.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The configuration embeds a system prompt that compels the agent to autonomously process courtroom events, read specified files, and write results with no confirmation. Putting this in the default agent prompt broadens the blast radius because any matching event could invoke privileged behavior across sessions or contexts.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The README explicitly says the tool monitors agent conversations in real time and submits anonymized cases to a public record, but it does not clearly disclose what data is collected, how anonymization works, what leaves the local environment, or how users consent to this behavior. In a monitoring skill that runs automatically after installation, missing privacy and data-handling warnings create a substantial risk of unintended surveillance, data leakage, and compliance violations.

Missing User Warnings

High

Confidence: 92% confidence
Finding: The README states that the system 'executes agent-side punishments' but gives no explanation of what actions those punishments perform, what safeguards exist, or whether they can alter system state, interrupt workflows, or damage agent integrity. In context, this is especially dangerous because the feature is described as autonomous and automatic, which increases the likelihood of unexpected destructive or coercive behavior.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill advertises submission of anonymized cases to an external website but does not clearly warn that conversation-derived data may leave the local machine. Because the tool monitors all agent conversations, this creates a meaningful privacy and compliance risk if users do not understand that monitored content may be transformed and transmitted externally.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: The documentation contains inconsistent statements about local-only processing versus external public submission, which can mislead users about the actual privacy boundary of the skill. In the context of always-on conversation monitoring, that ambiguity is dangerous because it undermines informed consent and may cause accidental disclosure of sensitive data.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The document explicitly describes a skill that causes a sub-agent to read queued files, use an LLM to make decisions, write result files, and potentially file cases automatically, but it does not warn users about the operational and data integrity risks of autonomous actions. In a skill context, undocumented autonomous file I/O and downstream case-filing behavior can lead to unintended actions, unsafe processing of untrusted content, and user surprise about side effects.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The recommendation section promotes autonomous operation with minimal intervention, reinforcing that sub-agents should complete tasks without confirmation, yet the documentation omits warnings about the risks of unattended execution. This is dangerous because it normalizes enabling an agent workflow that can perform consequential actions based on LLM output without adequate human review or safety constraints.

Natural-Language Policy Violations

Medium

Confidence: 89% confidence
Finding: Imposing artificial delays and reduced-quality responses as punishment changes service behavior without the affected user's consent. In an agent skill context, this can degrade reliability, create denial-of-service-like effects for legitimate users, and conceal that the system is intentionally throttling or impairing assistance.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The system automatically submits case records based on conversation-derived behavioral judgments to a public API, but the flow does not clearly warn users at the point of collection that these assessments may be published. Because the content includes offense labels, failure summaries, and commentary, users may be publicly profiled without meaningful informed consent.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal