ExpertPack Eval

ReviewAudited by ClawScan on May 1, 2026.

Overview

It appears to do the ExpertPack evaluation it advertises, but it uses your OpenRouter account and sends pack/evaluation content to external services.

Before installing or running this skill, confirm that the pack contents, eval questions, expected answers, and agent responses are acceptable to send to OpenRouter and any endpoint you provide. Use a limited OpenRouter key if possible, monitor cost/quota usage, install Python dependencies from trusted sources, and manually review eval results when prompt-injection-like responses are possible.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Running the evaluator can consume OpenRouter quota or incur OpenRouter costs using your local credential.

Why it was flagged

The script can auto-read a locally stored OpenClaw OpenRouter API key after checking OPENROUTER_API_KEY. This is expected for OpenRouter-based evaluation, but it uses a stored account credential.

Skill content
auth_path = Path.home() / ".openclaw" / "agents" / "main" / "agent" / "auth-profiles.json" ... data.get("profiles", {}).get("openrouter:default", {}).get("key", "")
Recommendation

Use a dedicated or limited OpenRouter key, set spending limits where possible, and ensure the credential expectation is clearly declared before running the skill.

What this means

Pack content, evaluation answers, and agent responses may be shared with OpenRouter or sent to the endpoint you choose.

Why it was flagged

The judge prompt includes eval questions, expected answers, and agent responses that are later submitted to OpenRouter for scoring.

Skill content
QUESTION: {question['question']} ... EXPECTED ANSWER: {question['expected_answer']} ... ACTUAL RESPONSE FROM THE AGENT:\n{response}
Recommendation

Only run this on packs/eval sets whose contents may be shared with those services, and use trusted HTTP/WebSocket endpoints.

What this means

An evaluated agent could potentially manipulate its score or produce unreliable eval results.

Why it was flagged

The evaluated agent's response is inserted directly into the LLM judge prompt before the scoring instructions, so a malicious or prompt-injection-style response could try to influence the judge.

Skill content
ACTUAL RESPONSE FROM THE AGENT:\n{response} ... Score the response. Return ONLY valid JSON
Recommendation

Treat automated scores as advisory, review suspicious responses manually, and consider hardening the judge prompt with stronger delimiters and instructions to treat responses only as data.

What this means

You may need to install missing Python packages manually, which introduces normal dependency-provenance considerations.

Why it was flagged

Only python3 is declared, but the included scripts reference third-party Python modules such as requests, yaml/pyyaml, httpx, and websockets, leaving any package installation to the user environment.

Skill content
Install specifications: No install spec — this is an instruction-only skill. ... Required binaries (all must exist): python3
Recommendation

Install dependencies from trusted package sources and prefer pinned versions in a controlled Python environment.