LLM as Judge
PassAudited by ClawScan on May 1, 2026.
Overview
This instruction-only skill is coherent and purpose-aligned, but users should notice that it may share task details with a different model provider for review.
This skill appears safe to install as an instruction-only cross-model review workflow. Before using it on confidential code, security designs, trading systems, or long-term planning documents, make sure you are comfortable sending the relevant context to another model provider and do not include secrets unless necessary.
Findings (2)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Plans, code, architecture details, or security/financial system context may be shared with another model provider during review.
The skill explicitly routes review content to a separate model/provider, which is central to its purpose but creates a data-sharing boundary users should understand.
Use a different provider than the executor to avoid shared blind spots: ... Executor: Claude → Judge: `kimi` or `grok` or `gemini-pro`
Use this only where cross-provider review is acceptable, avoid including secrets or unnecessary proprietary details, and consider provider data-retention policies.
Users may treat the judge verdict as stronger assurance than it really is.
The skill presents precise effectiveness claims without supporting evidence in the artifacts, which could lead users to overestimate the assurance provided by the judge model.
Cross-model review catches ~85% of issues vs ~60% for self-reflection.
Treat judge output as advisory and continue using normal testing, security review, and human judgment for high-stakes work.
