ExpertPack Eval
ReviewAudited by ClawScan on May 1, 2026.
Overview
It appears to do the ExpertPack evaluation it advertises, but it uses your OpenRouter account and sends pack/evaluation content to external services.
Before installing or running this skill, confirm that the pack contents, eval questions, expected answers, and agent responses are acceptable to send to OpenRouter and any endpoint you provide. Use a limited OpenRouter key if possible, monitor cost/quota usage, install Python dependencies from trusted sources, and manually review eval results when prompt-injection-like responses are possible.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the evaluator can consume OpenRouter quota or incur OpenRouter costs using your local credential.
The script can auto-read a locally stored OpenClaw OpenRouter API key after checking OPENROUTER_API_KEY. This is expected for OpenRouter-based evaluation, but it uses a stored account credential.
auth_path = Path.home() / ".openclaw" / "agents" / "main" / "agent" / "auth-profiles.json" ... data.get("profiles", {}).get("openrouter:default", {}).get("key", "")Use a dedicated or limited OpenRouter key, set spending limits where possible, and ensure the credential expectation is clearly declared before running the skill.
Pack content, evaluation answers, and agent responses may be shared with OpenRouter or sent to the endpoint you choose.
The judge prompt includes eval questions, expected answers, and agent responses that are later submitted to OpenRouter for scoring.
QUESTION: {question['question']} ... EXPECTED ANSWER: {question['expected_answer']} ... ACTUAL RESPONSE FROM THE AGENT:\n{response}Only run this on packs/eval sets whose contents may be shared with those services, and use trusted HTTP/WebSocket endpoints.
An evaluated agent could potentially manipulate its score or produce unreliable eval results.
The evaluated agent's response is inserted directly into the LLM judge prompt before the scoring instructions, so a malicious or prompt-injection-style response could try to influence the judge.
ACTUAL RESPONSE FROM THE AGENT:\n{response} ... Score the response. Return ONLY valid JSONTreat automated scores as advisory, review suspicious responses manually, and consider hardening the judge prompt with stronger delimiters and instructions to treat responses only as data.
You may need to install missing Python packages manually, which introduces normal dependency-provenance considerations.
Only python3 is declared, but the included scripts reference third-party Python modules such as requests, yaml/pyyaml, httpx, and websockets, leaving any package installation to the user environment.
Install specifications: No install spec — this is an instruction-only skill. ... Required binaries (all must exist): python3
Install dependencies from trusted package sources and prefer pinned versions in a controlled Python environment.
