Benchmark Model Provider

v1.0.5

Benchmark and rank AI providers/models against a user-specific prompt suite derived from the user's purpose, domain, and usage frequency. Use when users ask...

0· 82·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, required binary (python3), required env (BENCHMARK_API_KEY), example specs, and scripts all align with a benchmarking tool that calls OpenAI‑compatible endpoints. The listed optional publishing helpers (Vercel/Netlify) are consistent with the report-publishing feature.
Instruction Scope
SKILL.md and scripts explicitly perform network I/O to the base_url from a benchmark spec and use the BENCHMARK_API_KEY for auth. This is expected for the stated purpose, but means prompts, model outputs, and the API key will be sent to whichever endpoint the user configures — the skill warns about this. The instructions do not ask for unrelated secrets or arbitrary system files.
Install Mechanism
There is no platform install spec (no remote downloads). The repo includes Python scripts and a small requirements.txt (PyYAML, reportlab). This is low risk; packages are standard and the code is shipped with the skill. Users should still install dependencies in an isolated environment before running.
Credentials
Only BENCHMARK_API_KEY is required (declared as primary). References mention an optional VERCEL_TOKEN for non-interactive publishing, but that is not required by default. No unrelated credentials or excessive env requests are present.
Persistence & Privilege
The skill does not request always:true or system-wide privileges. It stores run artifacts (raw outputs, metrics, reports) locally for audit/reranking — consistent with its purpose. Publishing to web hosts is explicit and documented; it only occurs when the user chooses that step.
Scan Findings in Context
[pre-scan-injection-signals-none] expected: Static pre-scan reported no injection signals. The skill intentionally performs network I/O to user-configured endpoints, which is expected for a benchmarking tool.
Assessment
This skill appears coherent for model benchmarking, but it will send prompts, outputs, and the BENCHMARK_API_KEY to whatever base_url you configure in a benchmark spec. Before running: (1) verify the base_url is a trusted OpenAI‑compatible endpoint, (2) test with non-sensitive prompts first, (3) run in an isolated environment and install PyYAML/reportlab from requirements.txt, and (4) only provide Vercel/Netlify/GitHub tokens if you explicitly want automatic publish — the skill documents that publishing is a separate, opt-in step. If you need tighter safeguards, review run_benchmark.py and publish_report.py to confirm how credentials and artifacts are used/stored.

Like a lobster shell, security has layers — review code before you run it.

latestvk977j6k9w9cnkpccgktz4pq46n842k06

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📊 Clawdis
Binspython3
EnvBENCHMARK_API_KEY
Primary envBENCHMARK_API_KEY

Comments