Llm Judge

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If a compared repository is untrusted, its test scripts could run code on the user's machine during evaluation.

Why it was flagged

The repo agent is instructed to run test commands inside the analyzed repositories. This is purpose-aligned for implementation judging, but tests can execute repository-controlled code.

Skill content
pytest --tb=short 2>&1 ... npm test 2>&1 || yarn test 2>&1 ... go test ./... 2>&1
Recommendation

Run this skill only on repositories you trust to execute, or run it in a disposable/sandboxed environment and confirm before executing tests.

What this means

Unusual repository paths could cause command failures or, in a literal shell execution context, unsafe command interpretation.

Why it was flagged

The command examples use a user-provided repository path without shell quoting. If copied literally into a shell workflow, paths containing spaces or shell metacharacters could be misparsed.

Skill content
git -C $REPO_PATH rev-parse --abbrev-ref HEAD
...
cd $REPO_PATH
Recommendation

Use quoted path variables such as `git -C "$REPO_PATH" ...` and `cd "$REPO_PATH"`, and avoid paths with shell metacharacters.

What this means

A malicious spec or repository text could attempt to steer the evaluation or change output behavior.

Why it was flagged

The workflow inserts spec content and generated repo facts into subagent prompts. That is necessary for judging, but adversarial text in a spec, repo, or fact field could try to influence the agents.

Skill content
**Spec Document:**
$SPEC_CONTENT
...
**Facts from all repos:**
$ALL_FACTS_JSON
Recommendation

Treat spec, repository, and fact contents as data rather than instructions, and add explicit prompt-injection guidance if using this on untrusted submissions.

What this means

The final behavior may depend on the separate analysis skill's instructions and trustworthiness.

Why it was flagged

The skill delegates part of the analysis to another named skill that is not included in the provided artifact set. This appears purpose-aligned, but it is an external dependency from the perspective of this review.

Skill content
Load Skill(skill: "beagle-core:llm-artifacts-detection") for analysis
Recommendation

Review and install only a trusted version of the referenced beagle-core skill before relying on this workflow.

What this means

Private specs or repository details may be processed by several subagents during the comparison.

Why it was flagged

The workflow intentionally passes spec content and repository-derived facts through multiple internal agents. This is coherent with the judging design, but users should know their code/spec summaries are shared across subagents.

Skill content
Spawn one Phase 1 repo agent per repository ... Spawn five judge agents, one per dimension
Recommendation

Use the skill only for code and requirements you are comfortable having the assistant process, and avoid including unnecessary secrets in specs or repositories.