Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Llm Eval Router
v1.2.2Shadow-test local Ollama models against a cloud baseline with a multi-judge ensemble. Automatically promotes models when statistically proven equivalent — re...
⭐ 0· 453·2 current·2 all-time
byNissan Dookeran@nissan
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description require local Ollama inference plus cloud judges/ground-truth — the declared binaries (ollama, python3) and env vars (ANTHROPIC_API_KEY, OPENAI_API_KEY) are consistent with that purpose.
Instruction Scope
SKILL.md instructs the agent to perform local inference, run validators, call Anthropic/OpenAI/Gemini for sampled judging, and write scored-run JSON to data/scores/*.json. That scope matches the description, but note: sending prompts to cloud providers is required for ground truth/judging and may expose task prompts to those providers (the skill's text claims 'no telemetry' but that does not prevent the cloud providers from logging requests).
Install Mechanism
No install spec (instruction-only). This is lowest-risk from an install perspective — nothing will be downloaded or written by an installer step beyond whatever the agent runtime does when following SKILL.md.
Credentials
Requesting Anthropic and OpenAI API keys is proportional for ground-truth and judge calls. Minor inconsistency: SKILL.md references Gemini as an optional tiebreaker and Langfuse for observability, but Gemini credentials (or Google auth) and Langfuse connection details are not listed in requires.env — enabling those features will require additional credentials the skill doesn't declare up-front.
Persistence & Privilege
always is false, no config paths requested, and the skill does not request permanent agent-wide privileges. It will store score data locally (data/scores/*.json) as part of normal operation.
Assessment
This skill is coherent with its stated purpose, but before installing: (1) Accept that sampled prompts and judge calls will be sent to Anthropic/OpenAI (and Gemini if you enable it) — those providers may log requests; avoid sending sensitive data. (2) Be prepared to supply additional credentials if you enable Gemini or Langfuse (those are mentioned but not declared as required env vars). (3) Review and control the local storage path (data/scores/*.json) for retention and access. (4) Confirm billing and rate limits on your Anthropic/OpenAI accounts since sampled evaluation will incur API charges. (5) Run on a trusted machine since local model inference and score files are stored locally.Like a lobster shell, security has layers — review code before you run it.
latestvk97bv6p2bx20bq73s8nve7fpnh83rqth
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🧪 Clawdis
Binsollama, python3
EnvANTHROPIC_API_KEY, OPENAI_API_KEY
Primary envANTHROPIC_API_KEY
