Checkmate

Security checks across malware telemetry and agentic risk

Overview

Checkmate is transparent about its purpose, but it can run long-lived autonomous agents with shell, network, connected-account, messaging, and session-control access.

Install only if you intentionally want to delegate trusted, well-scoped work to a high-privilege autonomous loop. Prefer interactive mode, avoid --no-interactive except in isolated environments, disable unrelated OAuth-backed skills, use a disposable workspace/profile when possible, verify recipient/channel values, and do not paste secrets or untrusted third-party instructions into checkpoint replies.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (12)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The skill description materially understates behavior by framing the skill as a quality-enforcement loop while the body also performs session injection, outbound messaging, interactive control flow, and consumes user replies to drive execution. In a high-privilege skill where workers inherit exec, OAuth-backed tools, and all installed skills, this mismatch can mislead users into authorizing a much broader and riskier orchestration capability than expected.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The worker prompt explicitly authorizes use of the full agent runtime, including powerful tools such as exec, web access, and browser automation, even though the skill's stated purpose is iterative task completion and judging. This unnecessarily enlarges the attack surface: any adversarial task content or feedback injected into the loop could cause the worker to perform networked actions, execute commands, or access sensitive context beyond what is needed.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill can send direct outbound messages to arbitrary targets and channels based on runtime parameters, which exceeds a narrow orchestration role and creates a data-exfiltration and abuse surface. In this context, worker/judge outputs and user content may be forwarded externally without strong recipient allowlisting or approval enforcement, making the capability materially risky.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: This code injects arbitrary messages into a live agent session that the comments state has full tool access, all skills, and OAuth-backed auth. Because the injected content explicitly instructs the agent to relay messages and bridge future user replies to a file, this creates a powerful privilege-escalation channel where untrusted orchestration text can steer a highly privileged agent to take external actions.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill description promises strict completion gating, but the implementation explicitly proceeds with best-effort criteria after failed intake and can later emit final output after max iterations without a PASS. In a security-sensitive automation context, this mismatch is dangerous because operators may trust the skill to block incomplete work when it actually degrades into autonomous completion.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The code comments say user approval is the real gate, but request_user_input defaults can auto-advance on timeout using values like 'proceed' or 'go'. That means the system can continue significant actions without an actual human response, undermining the intended checkpoint control and increasing the chance of unwanted execution or data sharing.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The trigger phrases include very generic language such as 'until it passes', 'keep iterating until done', and 'quality loop', which can match ordinary user conversation and cause this high-privilege skill to activate unexpectedly. Because the skill can launch autonomous workers with shell, network, OAuth-backed skills, and sub-agent spawning, accidental invocation materially increases the chance of unintended privileged actions.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger phrases include common conversational language such as 'don't stop until done', 'until it passes', and 'keep going until done', which can activate the skill unintentionally during normal dialogue. Because activation launches a high-privilege orchestrator that may spawn agents with exec and OAuth access, accidental invocation meaningfully increases the chance of unintended autonomous actions.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The prompt instructs the agent to use the full runtime and write output files directly, but provides no user-facing warning that the skill can modify the filesystem or create multiple artifacts. In a skill that loops on judge feedback, this is riskier because adversarial or overly broad tasks can repeatedly drive file writes and artifact creation without transparent user awareness or scoped path restrictions.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: User replies are intentionally written verbatim to a workspace file, and this file becomes part of the orchestration state that later workers and agents may read. In this skill's context, that creates a privacy and prompt-injection risk because sensitive or adversarial user content is persisted and reused without minimization, sanitization, or clear disclosure in the implementation shown.

Ssd 3

Medium

Confidence: 90% confidence
Finding: The bridge instruction explicitly tells another agent to capture the user's next reply verbatim and write it to disk, where it can later be consumed or retransmitted. That design creates a direct data-handling pipeline for potentially sensitive or malicious content, increasing both privacy exposure and downstream prompt-injection risk within an agentic system.

Autonomous Decision Making

Medium

Category: Excessive Agency
Content: This means **the task description you provide directly controls what the worker does** — treat it like code you're about to run, not a message you're about to send. **Batch mode (`--no-interactive`) removes all human gates.** In interactive mode (default), you approve criteria and each checkpoint before the loop continues. In batch mode, criteria are auto-approved and the loop runs to completion autonomously — only use this for tasks and environments you fully trust. **User-input bridging writes arbitrary content to disk.** When you reply to a checkpoint, the main agent writes your reply verbatim to `user-input.md` in the workspace. The orchestrator reads it and acts on it. Don't relay untrusted third-party content as checkpoint replies.
Confidence: 93% confidence
Finding: auto-approve

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal