Llm Judge
AdvisoryAudited by Static analysis on Apr 30, 2026.
Overview
No suspicious patterns detected.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If a compared repository is untrusted, its test scripts could run code on the user's machine during evaluation.
The repo agent is instructed to run test commands inside the analyzed repositories. This is purpose-aligned for implementation judging, but tests can execute repository-controlled code.
pytest --tb=short 2>&1 ... npm test 2>&1 || yarn test 2>&1 ... go test ./... 2>&1
Run this skill only on repositories you trust to execute, or run it in a disposable/sandboxed environment and confirm before executing tests.
Unusual repository paths could cause command failures or, in a literal shell execution context, unsafe command interpretation.
The command examples use a user-provided repository path without shell quoting. If copied literally into a shell workflow, paths containing spaces or shell metacharacters could be misparsed.
git -C $REPO_PATH rev-parse --abbrev-ref HEAD ... cd $REPO_PATH
Use quoted path variables such as `git -C "$REPO_PATH" ...` and `cd "$REPO_PATH"`, and avoid paths with shell metacharacters.
A malicious spec or repository text could attempt to steer the evaluation or change output behavior.
The workflow inserts spec content and generated repo facts into subagent prompts. That is necessary for judging, but adversarial text in a spec, repo, or fact field could try to influence the agents.
**Spec Document:** $SPEC_CONTENT ... **Facts from all repos:** $ALL_FACTS_JSON
Treat spec, repository, and fact contents as data rather than instructions, and add explicit prompt-injection guidance if using this on untrusted submissions.
The final behavior may depend on the separate analysis skill's instructions and trustworthiness.
The skill delegates part of the analysis to another named skill that is not included in the provided artifact set. This appears purpose-aligned, but it is an external dependency from the perspective of this review.
Load Skill(skill: "beagle-core:llm-artifacts-detection") for analysis
Review and install only a trusted version of the referenced beagle-core skill before relying on this workflow.
Private specs or repository details may be processed by several subagents during the comparison.
The workflow intentionally passes spec content and repository-derived facts through multiple internal agents. This is coherent with the judging design, but users should know their code/spec summaries are shared across subagents.
Spawn one Phase 1 repo agent per repository ... Spawn five judge agents, one per dimension
Use the skill only for code and requirements you are comfortable having the assistant process, and avoid including unnecessary secrets in specs or repositories.
