llm-benchmark-analyst

ReviewAudited by ClawScan on May 10, 2026.

Overview

Prompt-injection indicators were detected in the submitted artifacts (unicode-control-chars); human review is required before treating this skill as clean.

This skill appears safe for benchmark research. Before installing, note that it will likely browse external leaderboard pages and that its source repository is not provided; check cited benchmark links and caveats when using its reports for consequential decisions. ClawScan detected prompt-injection indicators (unicode-control-chars), so this skill requires review even though the model response was benign.

Findings (2)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The agent may open external benchmark websites and use page images or screenshots as evidence, so report quality depends on those external sources.

Why it was flagged

The skill discloses that it may use browsing and multimodal extraction tools to inspect external benchmark leaderboards. This is central to the stated research purpose and is not paired with shell execution, credential use, uploads, or mutation.

Skill content
works with browser, web, and multimodal extraction for text, table, canvas, or image-only leaderboards
Recommendation

Verify important citations and scores, and avoid giving the skill private or unrelated information that is not needed for benchmark research.

What this means

It is harder to independently verify the maintainer history or upstream source of the skill, although there is no runnable package code shown.

Why it was flagged

The registry metadata does not provide an external source repository or homepage for provenance checking. Because this is an instruction-only skill with no install spec or code files, the practical execution risk is limited.

Skill content
Source: unknown; Homepage: none
Recommendation

Review the included instruction and reference files before relying on it, especially if benchmark reports will influence important decisions.