LLM as Judge

PassAudited by VirusTotal on May 11, 2026.

Overview

Type: OpenClaw Skill Name: llm-as-judge Version: 1.2.0 The 'llm-as-judge' skill implements a standard cross-model verification pattern designed to improve output quality and catch errors in complex tasks like architecture design and security reviews. The files (SKILL.md and references/judge-prompts.md) contain legitimate workflow instructions and prompt templates for peer review without any evidence of data exfiltration, malicious execution, or prompt injection attacks.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Plans, code, architecture details, or security/financial system context may be shared with another model provider during review.

Why it was flagged

The skill explicitly routes review content to a separate model/provider, which is central to its purpose but creates a data-sharing boundary users should understand.

Skill content
Use a different provider than the executor to avoid shared blind spots: ... Executor: Claude → Judge: `kimi` or `grok` or `gemini-pro`
Recommendation

Use this only where cross-provider review is acceptable, avoid including secrets or unnecessary proprietary details, and consider provider data-retention policies.

What this means

Users may treat the judge verdict as stronger assurance than it really is.

Why it was flagged

The skill presents precise effectiveness claims without supporting evidence in the artifacts, which could lead users to overestimate the assurance provided by the judge model.

Skill content
Cross-model review catches ~85% of issues vs ~60% for self-reflection.
Recommendation

Treat judge output as advisory and continue using normal testing, security review, and human judgment for high-stakes work.